3/20 (월)
- https://ds-fusion.github.io/
DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion
DS-Fusion: create artistic typography automatically
ds-fusion.github.io
diffusion model 백본을 사용한 artistic typography 자동 생성기
- https://github.com/THUDM/ChatGLM-6B/blob/main/README_en.md
GitHub - THUDM/ChatGLM-6B: ChatGLM-6B:开源双语对话语言模型 | An Open Bilingual Dialogue Language Model
ChatGLM-6B:开源双语对话语言模型 | An Open Bilingual Dialogue Language Model - GitHub - THUDM/ChatGLM-6B: ChatGLM-6B:开源双语对话语言模型 | An Open Bilingual Dialogue Language Model
github.com
- Open source
- Chinese + English
- Easily deployed on consumer GPUs
- Trained for 1T token
- Can be deployed on consumer GPUs (2080Ti)
- Run with transformers
- Demo
- https://arxiv.org/abs/2303.05759
An Overview on Language Models: Recent Developments and Outlook
Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine translation, etc. Conventio
arxiv.org
Nice overview of language models covering recent developments and future directions
covers linguistic units, structures, training methods, evaluation, and applications.
(from https://twitter.com/omarsar0/status/1635273656858460162)
- https://arxiv.org/abs/2303.07295
Meet in the Middle: A New Pre-training Paradigm
Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, assuming that the next token only depends on the preceding ones. However, this assumption ignores the potential benefits of using the full sequence information d
arxiv.org
Meet In the Middle (MIM) : A New Pretraining Paradigm. MIM(2.7B) outperforms CodeGen 16B, Incoder 6.7B, PaLM 540B, LLaMA 65B, FIM 2.7B in Code generation tasks. Read arxiv.org/abs/2303.07295 to know why MIM could be a new pre-training paradigm for left-to-right and infilling LMs.
(from https://twitter.com/WeizhuChen/status/1635498612938670080)
- https://arxiv.org/abs/2303.06349
Resurrecting Recurrent Neural Networks for Long Sequences
Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benef
arxiv.org
Resurrecting Recurrent Neural Networks for Long Sequences Shows that careful design of deep RNNs performs on par with SSMs on long-range reasoning tasks with comparable speed.
(from https://twitter.com/arankomatsuzaki/status/1635453248252391427)