Lectures

Course Overview / Introduction (Jan 14)

Content

Course logistics
What is natural language processing?
What are the features of natural language?
What do we want to do with NLP?
What makes it hard?

Slides

Reading Material

[Eisenstein Chapter 1]

Text Classification (Jan 16)

Content

Defining features
Building a rule-based classifier
Training a logistic regression based classifier
Evaluating classification

Slides

Text Classification

Reading Material

Neural Network Basics (Jan 21)

Content

Cross Entropy Loss
Gradient Descent
Components of a feedforward neural network

Slides

Neural Network Basics

Reading Material

[Eisenstein 2.0-2.5, 4.2-4.4.1, 3]
[J&M: Sections 2.2-2.4, Chapter 5]
Neural Nets: [Deep Averaging Networks]
Neural Nets: [Deep Learning with Pytorch: A 60 minute blitz]

Word Vectors (Jan 23)

Content

Deep Averaging Network for Text CLassification
Lexical Semantics
Distributional Semantics
Evaluating Word Vectors

Slides

Reading Material

Neural Nets: [Deep Averaging Networks]
Neural Nets: [Deep Learning with Pytorch: A 60 minute blitz]
Word Vectors: [Eisenstein 3.3.4, 14.5-14.6]
Word Vectors: [Goldberg 5]
Word Vectors: [Mikolov+13 word2vec]
Word Vectors: [Pennington+14 GloVe]
Word Vectors: [Grave+17 fastText]
Word Vectors: [Bolukbasi+16 Gender]

Word Vectors / Language Modeling I (Jan 28)

Content

Distributional Semantics
Evaluating Word Vectors
What is a language model
How to evaluate a language model

Slides

Language Modeling

Reading Material

[Eisenstein 6.1-6.2, 6.4]

Language Modeling (Jan 30)

Content

Feedforward Language Model
Recurrent Neural LM, Attention
Building blocks of a transformer

Slides

Reading Material

[J&M Chapter 8, 9]

[Eisenstein 6.3]

[Illustrated Transformer]

Transformers (Feb 4)

Content

Self attention
Transformer Encoder
Transformer Decoder (Cross Attention, Masked Self Attention)
Impact of transformers

Slides

Reading Material

[Illustrated Transformer]

[J&M Chapter 9]

[Attention is all you need]

Transformer LMs continued

Content

Transformer Decoder (Cross Attention, Masked Self Attention)
Training and inference from a decoder-only autoregressive LM
Impact of transformers

Slides

Reading Material

[Illustrated Transformer]

[J&M Chapter 9]

[Attention is all you need]

Tokenization (Feb 11)

Content

Word and character tokenization
Byte pair encoding / WordPiece
Unigram tokenizer
Masked LMs (if time permits)

Slides

Reading Material

[Neural Machine Translation of Rare Words with Subword Units (the first paper that applied BPE for tokenization)]

[“Let’s build the GPT Tokenizer” by Andrej Karpathy (practical tour of BPE with a focus on LLMs)]

[Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP (an excellent survey on tokenization)]

Tokenization Contd. / Pretraining I (Feb 13)

Content

Unigram tokenizer
Pretraining / finetuning paradigm
Masked LMs - BERT, RoBERTa, NeoBERT

Slides

Reading Material

[Illustrated BERT)]

[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]

Pretraining II (February 18)

Content

T5 / BART / UL2 / GPT2
Decoding strategies

Slides

Reading Material

[What happend to BERT/T5]

[Decoding strategies]

Pretraining II (Feb 20)

Content

Scaling
Prompting
In-context learning
CoT

Slides

Reading Material

Section 7.3 in Jurafsky & Martin