Lectures
Course Overview / Introduction (Jan 14)
Content
- Course logistics
- What is natural language processing?
- What are the features of natural language?
- What do we want to do with NLP?
- What makes it hard?
Slides
Reading Material
Text Classification (Jan 16)
Content
- Defining features
- Building a rule-based classifier
- Training a logistic regression based classifier
- Evaluating classification
Slides
Reading Material
Neural Network Basics (Jan 21)
Content
- Cross Entropy Loss
- Gradient Descent
- Components of a feedforward neural network
Slides
Reading Material
Neural Nets: [Deep Averaging Networks]
Neural Nets: [Deep Learning with Pytorch: A 60 minute blitz]
Word Vectors (Jan 23)
Content
- Deep Averaging Network for Text CLassification
- Lexical Semantics
- Distributional Semantics
- Evaluating Word Vectors
Slides
Reading Material
Neural Nets: [Deep Averaging Networks]
Neural Nets: [Deep Learning with Pytorch: A 60 minute blitz]
Word Vectors: [Eisenstein 3.3.4, 14.5-14.6]
Word Vectors: [Goldberg 5]
Word Vectors: [Mikolov+13 word2vec]
Word Vectors: [Pennington+14 GloVe]
- Word Vectors: [Grave+17 fastText]
- Word Vectors: [Bolukbasi+16 Gender]
Word Vectors / Language Modeling I (Jan 28)
Content
- Distributional Semantics
- Evaluating Word Vectors
- What is a language model
- How to evaluate a language model
Slides
Reading Material
Language Modeling (Jan 30)
Content
- Feedforward Language Model
- Recurrent Neural LM, Attention
- Building blocks of a transformer
Slides
Reading Material
[Luong15]
Transformers (Feb 4)
Content
- Self attention
- Transformer Encoder
- Transformer Decoder (Cross Attention, Masked Self Attention)
- Impact of transformers
Slides
Reading Material
Transformer LMs continued
Content
- Transformer Decoder (Cross Attention, Masked Self Attention)
- Training and inference from a decoder-only autoregressive LM
- Impact of transformers
Slides
Reading Material
Tokenization (Feb 11)
Content
- Word and character tokenization
- Byte pair encoding / WordPiece
- Unigram tokenizer
- Masked LMs (if time permits)
Slides
Reading Material
[J&M 2.5)]
[“Let’s build the GPT Tokenizer” by Andrej Karpathy (practical tour of BPE with a focus on LLMs)]
Tokenization Contd. / Pretraining I (Feb 13)
Content
- Unigram tokenizer
- Pretraining / finetuning paradigm
- Masked LMs - BERT, RoBERTa, NeoBERT
Slides
Reading Material
[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]
[RoBERTa]
Pretraining II (February 18)
Pretraining II (Feb 20)
Content
- Scaling
- Prompting
- In-context learning
- CoT
Slides
Reading Material
Section 7.3 in Jurafsky & Martin
Instruction Following (Feb 25)
Content
- Prompting (continued)
- Instruction Tuning (T0, FLAN)
- Evaluating Instruction Tuned LMs
Slides
Reading Material
Section 9.2-9.3 (before 9.3.2) in Jurafsky & Martin
Instruction Following / Preference Optimization (Feb 27)
Content
- Supervised Finetuning
- Reward Modeling
- Basics of RLHF / RLVR
- Direct Preference Optimization
Slides
Reading Material
Reinforcement Learning + LMs (March 4)
Content
- RLHF
- DPO
Slides
Reading Material
[Illustrating Reinforcement Learning from Human Feedback (RLHF)]
[DPO]
Parameter Efficient Finetuning (Mar 6)
Evaluation (March 11)
Content
- What is Benchmarking
- Open and close ended evaluation
- LLM Evaluation Challenges
Slides
Reading Material
Efficiency II (March 13)
Spring Break (No class)
Multilinguality (Oct 31)
Content
- Linguistic Diversity
- Cross Lingual Transfer
- Multilingual Pretraining and Alignment
Slides
Reading Material
TBD
Multimodality (March 27)
Multimodality II (April 1)
Ethics (April 3)
Content
- Background on Ethics in AI/NLP
- Bias and Fairness
- Toxicity and Other Harmful Content
Slides
Reading Material
See Teams/Canvas
Ethics II (April 8)
Ethics contd / QA / Retrieval (April 10)
Content
- How to safeguard LMs from generating harmful content
- What is QA
- Retrieval
Slides
Reading Material
See Teams/Canvas
Retrieval Augmented LMs (April 15)
Content
- Question Answering
- Sparse and Dense Retrieval
- How to train retrieval augmented LMs
Slides
Reading Material
See Teams/Canvas