Schedule
Course Overview & Language Modeling Basics (August 26)
Pretraining - Architectures and Methods (September 9)
Slides:
Transformers / Pretraining / Finetuning
Reading Material
Attention is all you need (2017) [link]
BERT, Pre-training of Deep Bidirectional Transformers for Language Understanding [link]
Optional readings:
The Illustrated Transformer [link]
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) [link]
T5 [link]
The Illustrated GPT2 [link]
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? [link]
BART [link]
RoBERTa, A Robustly Optimized BERT Pretraining Approach [link]
Efficiency - Training (LoRA) and Inference(Quantization) (September 16)
Slides:
Reading Material
LoRA: Low-Rank Adaptation of Large Language Models (2021) [link]
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale [link]
Optional readings:
Inference Algorithms (In-Context Learning and Chain-of-Thought) (September 23)
Slides:
Reading Material
Language Models are Few-Shot Learners [link]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models [link]
Optional readings:
Making Pre-trained Language Models Better Few-shot Learners (2021) [link]
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? (2022) [link]
Data Distributional Properties Drive Emergent In-Context Learning in Transformers (2022) [link]
Towards understanding chain-of-thought prompting: An empirical study of what matters (2022) [link]
List of recent CoT papers (2024) [link]
Instruction Following (September 30)
Slides:
Reading Material
Finetuned Language Models Are Zero-Shot Learners [link]
Training language models to follow instructions with human feedback [link]
Optional readings:
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 [link]
The Llama 3 Herd of Models (Sec 4 and the relevant portion of Sec 5) [link]
Fundamental Limitations of Alignment in Large Language Models [link]
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback [link]
Scaling (October 7)
Slides:
Reading Material
Training Compute-Optimal Large Language Models [link]
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters [link]
Optional readings:
Beyond RL/HF (October 14)
Ethics and Safety (October 21)
Slides:
Reading Material
Taxonomy of Risks posed by Language Models [link]
Jailbroken: How Does LLM Safety Training Fail? [link]
Optional readings:
- Ethics in AI (UW Course reading list) [link]
Retrieval / Long Context (October 28)
Slides:
Reading Material
Reliable, Adaptable, and Attributable Language Models with Retrieval [link]
How to Train Long-Context Language Models (Effectively) [link]
Optional readings:
Tokenization (November 4)
Slides:
Reading Material
- Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP [link]
Optional readings:
- The Foundations of Tokenization: Statistical and Computational Concerns [link]