Lectures

Course Overview (Jan 8)

Content

Course logistics
What is natural language processing?
What are the features of natural language?
What do we want to do with NLP?
What makes it hard?

Slides

Reading Material

[Eisenstein Chapter 1]

Text Classification (Aug 29)

Content

Defining features
Building a rule-based classifier
Training a logistic regression based classifier
Evaluating classification

Slides

Text Classification

Reading Material

Neural Network Basics (Sept 3)

Content

Cross Entropy Loss
Gradient Descent
Components of a feedforward neural network

Slides

Neural Network Basics

Reading Material

[Eisenstein 2.0-2.5, 4.2-4.4.1, 3]
[J&M: Sections 2.2-2.4, Chapter 5]
Neural Nets: [Deep Averaging Networks]
Neural Nets: [Deep Learning with Pytorch: A 60 minute blitz]

Word Vectors (Sept 5)

Content

Deep Averaging Network for Text CLassification
Lexical Semantics
Distributional Semantics
Evaluating Word Vectors

Slides

Reading Material

Neural Nets: [Deep Averaging Networks]
Neural Nets: [Deep Learning with Pytorch: A 60 minute blitz]
Word Vectors: [Eisenstein 3.3.4, 14.5-14.6]
Word Vectors: [Goldberg 5]
Word Vectors: [Mikolov+13 word2vec]
Word Vectors: [Pennington+14 GloVe]
Word Vectors: [Grave+17 fastText]
Word Vectors: [Bolukbasi+16 Gender]

Language Modeling (Sept 10)

Content

What is a language model
How to evaluate a language model
How to build a language model - N-gram language model, a simple feedforward neural LM

Slides

Language Modeling

Reading Material

[Eisenstein 6.1-6.2, 6.4]

Language Modeling (Sept 12)

Content

Feedforward Language Model
Recurrent Neural LM, Attention
Building blocks of a transformer

Slides

Reading Material

[J&M Chapter 8, 9]

[Eisenstein 6.3]

[Illustrated Transformer]

Transformers (Sept 17)

Content

Self attention
Transformer Encoder
Transformer Decoder (Cross Attention, Masked Self Attention)
Impact of transformers

Slides

Reading Material

[Illustrated Transformer]

[J&M Chapter 9]

[Attention is all you need]

Tokenization (Sept 19)

Content

Word and character tokenization
Byte pair encoding / WordPiece
Unigram tokenizer

Slides

Reading Material

[Neural Machine Translation of Rare Words with Subword Units (the first paper that applied BPE for tokenization)]

[“Let’s build the GPT Tokenizer” by Andrej Karpathy (practical tour of BPE with a focus on LLMs)]

[Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP (an excellent survey on tokenization)]

Tokenization Contd. / Pretraining I (September 24)

Content

Unigram tokenizer
Pretraining / finetuning paradigm
Masked LMs - BERT, RoBERTa, ELECTRA

Slides

Reading Material

[Illustrated BERT)]

[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]

Pretraining II (September 26)

Content

T5 / BART / UL2 / GPT2
Decoding strategies

Slides

Reading Material

[What happend to BERT/T5]

[Decoding strategies]

Pretraining II (October 1)

Content

Scaling
Prompting
In-context learning
CoT

Slides

Reading Material

Instruction Following (October 3)

Content

Instruction Tuning (T0, FLAN)
Evaluating Instruction Tuned LMs
Basics of RLHF

Slides

Instruction Following

Reading Material