3/20 (월)

딥러닝/트렌디하게 살자

3/20 (월)

dnap512 2023. 3. 20. 10:26

- https://ds-fusion.github.io/

DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion

DS-Fusion: create artistic typography automatically

ds-fusion.github.io

diffusion model 백본을 사용한 artistic typography 자동 생성기

- https://github.com/THUDM/ChatGLM-6B/blob/main/README_en.md

GitHub - THUDM/ChatGLM-6B: ChatGLM-6B：开源双语对话语言模型 | An Open Bilingual Dialogue Language Model

ChatGLM-6B：开源双语对话语言模型 | An Open Bilingual Dialogue Language Model - GitHub - THUDM/ChatGLM-6B: ChatGLM-6B：开源双语对话语言模型 | An Open Bilingual Dialogue Language Model

github.com

- Open source
- Chinese + English
- Easily deployed on consumer GPUs
- Trained for 1T token
- Can be deployed on consumer GPUs (2080Ti)
- Run with transformers
- Demo

- https://arxiv.org/abs/2303.05759

An Overview on Language Models: Recent Developments and Outlook

Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine translation, etc. Conventio

arxiv.org

Nice overview of language models covering recent developments and future directions
covers linguistic units, structures, training methods, evaluation, and applications.
(from https://twitter.com/omarsar0/status/1635273656858460162)

- https://arxiv.org/abs/2303.07295

Meet in the Middle: A New Pre-training Paradigm

Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, assuming that the next token only depends on the preceding ones. However, this assumption ignores the potential benefits of using the full sequence information d

arxiv.org

Meet In the Middle (MIM) : A New Pretraining Paradigm. MIM(2.7B) outperforms CodeGen 16B, Incoder 6.7B, PaLM 540B, LLaMA 65B, FIM 2.7B in Code generation tasks. Read arxiv.org/abs/2303.07295 to know why MIM could be a new pre-training paradigm for left-to-right and infilling LMs.

(from https://twitter.com/WeizhuChen/status/1635498612938670080)

- https://arxiv.org/abs/2303.06349

Resurrecting Recurrent Neural Networks for Long Sequences

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benef

arxiv.org

Resurrecting Recurrent Neural Networks for Long Sequences Shows that careful design of deep RNNs performs on par with SSMs on long-range reasoning tasks with comparable speed.

(from https://twitter.com/arankomatsuzaki/status/1635453248252391427)

저작자표시 (새창열림)