Lectures

Course Overview / Introduction (Jan 14)

Content

  • Course logistics
  • What is natural language processing?
  • What are the features of natural language?
  • What do we want to do with NLP?
  • What makes it hard?

Slides

Course Oveview

Reading Material

Text Classification (Jan 16)

Content

  • Defining features
  • Building a rule-based classifier
  • Training a logistic regression based classifier
  • Evaluating classification

Slides

Text Classification

Reading Material

Neural Network Basics (Jan 21)

Content

  • Cross Entropy Loss
  • Gradient Descent
  • Components of a feedforward neural network

Slides

Neural Network Basics

Reading Material

Word Vectors (Jan 23)

Content

  • Deep Averaging Network for Text CLassification
  • Lexical Semantics
  • Distributional Semantics
  • Evaluating Word Vectors

Slides

Word Vectors

Reading Material

Word Vectors / Language Modeling I (Jan 28)

Content

  • Distributional Semantics
  • Evaluating Word Vectors
  • What is a language model
  • How to evaluate a language model

Slides

Language Modeling

Reading Material

[Eisenstein 6.1-6.2, 6.4]

Language Modeling (Jan 30)

Content

  • Feedforward Language Model
  • Recurrent Neural LM, Attention
  • Building blocks of a transformer

Slides

Neural LM

Reading Material

[J&M Chapter 8, 9]

[Eisenstein 6.3]

[Luong15]

[Illustrated Transformer]

Transformers (Feb 4)

Content

  • Self attention
  • Transformer Encoder
  • Transformer Decoder (Cross Attention, Masked Self Attention)
  • Impact of transformers

Slides

Transformers

Reading Material

[Illustrated Transformer]

[J&M Chapter 9]

[Attention is all you need]

Transformer LMs continued

Content

  • Transformer Decoder (Cross Attention, Masked Self Attention)
  • Training and inference from a decoder-only autoregressive LM
  • Impact of transformers

Slides

Tokenization

Reading Material

[Illustrated Transformer]

[J&M Chapter 9]

[Attention is all you need]

Tokenization (Feb 11)

Tokenization Contd. / Pretraining I (Feb 13)

Content

  • Unigram tokenizer
  • Pretraining / finetuning paradigm
  • Masked LMs - BERT, RoBERTa, NeoBERT

Slides

Masked LMs

Reading Material

[Illustrated BERT)]

[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]

[RoBERTa]

Pretraining II (February 18)

Content

  • T5 / BART / UL2 / GPT2
  • Decoding strategies

Slides

Pretraining II

Reading Material

[What happend to BERT/T5]

[Decoding strategies]

Pretraining II (Feb 20)

Content

  • Scaling
  • Prompting
  • In-context learning
  • CoT

Slides

Pretraining II

Reading Material

Section 7.3 in Jurafsky & Martin

Instruction Following (Feb 25)

Content

  • Prompting (continued)
  • Instruction Tuning (T0, FLAN)
  • Evaluating Instruction Tuned LMs

Slides

Instruction Following

Reading Material

Section 9.2-9.3 (before 9.3.2) in Jurafsky & Martin

Instruction Following / Preference Optimization (Feb 27)

Content

  • Supervised Finetuning
  • Reward Modeling
  • Basics of RLHF / RLVR
  • Direct Preference Optimization

Slides

Learning from Preferences

Reading Material

J&M 9.3

RLHF Illustrated

RLHF Book

Reinforcement Learning + LMs (March 4)

Parameter Efficient Finetuning (Mar 6)

Content

  • LoRA
  • QLoRA

Slides

(q)lora

Reading Material

[LoRA]

[QLoRA]

Evaluation (March 11)

Content

  • What is Benchmarking
  • Open and close ended evaluation
  • LLM Evaluation Challenges

Slides

benchmarking

Reading Material

[The Evolving Landscape of LLM Evaluation (for Quiz)]

Efficiency II (March 13)

Content

  • Speculative Decoding
  • Quantization, Distillation

Slides

Efficiency

Reading Material

TBA

Spring Break (No class)

Content

Enjoy!

Multilinguality (Oct 31)

Content

  • Linguistic Diversity
  • Cross Lingual Transfer
  • Multilingual Pretraining and Alignment

Slides

Multilingual

Reading Material

TBD

Multimodality (March 27)

Content

  • ViT
  • CLIP
  • Image + Text -> Text

Slides

Multimodal

Reading Material

Multimodality II (April 1)

Content

  • ViT
  • CLIP
  • Image + Text -> Text

Slides

Multimodal

Reading Material

Ethics (April 3)

Content

  • Background on Ethics in AI/NLP
  • Bias and Fairness
  • Toxicity and Other Harmful Content

Slides

Ethics

Reading Material

See Teams/Canvas

Ethics II (April 8)

Content

  • Bias and Fairness
  • Toxicity and Other Harmful Content

Slides

Ethics

Reading Material

See Teams/Canvas

Ethics contd / QA / Retrieval (April 10)

Content

  • How to safeguard LMs from generating harmful content
  • What is QA
  • Retrieval

Slides

Ethics

Reading Material

See Teams/Canvas

Retrieval Augmented LMs (April 15)

Content

  • Question Answering
  • Sparse and Dense Retrieval
  • How to train retrieval augmented LMs

Slides

Ethics

Reading Material

See Teams/Canvas

Language Agents (April 17)

Content

  • Agent conceptual framework

Slides

Agents

Reading Material

TBA