Lectures

Course Overview (Jan 8)

Content

  • Course logistics
  • What is natural language processing?
  • What are the features of natural language?
  • What do we want to do with NLP?
  • What makes it hard?

Slides

Course Oveview

Reading Material

Text Classification (Jan 10)

Content

  • Defining features
  • Building a rule-based classifier
  • Training a logistic regression based classifier
  • Evaluating classification

Slides

Text Classification

Reading Material

Neural Network Basics (Jan 15)

Content

  • Cross Entropy Loss
  • Gradient Descent
  • Components of a feedforward neural network

Slides

Neural Network Basics

Reading Material

Word Vectors (Jan 17)

Content

  • Deep Averaging Network for Text CLassification
  • Lexical Semantics
  • Distributional Semantics
  • Evaluating Word Vectors

Slides

Word Vectors

Reading Material

Language Modeling (Jan 22)

Content

  • What is a language model
  • How to evaluate a language model
  • How to build a language model - N-gram language model, a simple feedforward neural LM

Slides

Language Modeling

Reading Material

[Eisenstein 6.1-6.2, 6.4]

Language Modeling (Jan 24)

Content

  • Feedforward Language Model
  • Recurrent Neural LM, Attention
  • Building blocks of a transformer

Slides

Neural LM

Reading Material

[J&M Chapter 8, 9]

[Eisenstein 6.3]

[Luong15]

[Illustrated Transformer]

Transformers (Jan 29)

Content

  • Self attention
  • Transformer Encoder
  • Transformer Decoder (Cross Attention, Masked Self Attention)
  • Impact of transformers

Slides

Transformers

Reading Material

[Illustrated Transformer]

[J&M Chapter 9]

[Attention is all you need]

Tokenization (Jan 31)

Tokenization Contd. / Masked LMs (February 7)

Content

  • Unigram tokenizer
  • Pretraining / finetuning paradigm
  • Masked LMs - BERT, RoBERTa, ELECTRA

Slides

Masked LMs

Reading Material

[Illustrated BERT)]

[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]

[RoBERTa]

[ELECTRA]

Pretraining II (February 12)

Content

  • T5 / BART / UL2 / GPT2
  • Decoding strategies

Slides

Pretraining II

Reading Material

[What happend to BERT/T5]

[Decoding strategies]

Pretraining III (February 14)

Content

  • Scaling
  • Prompting
  • In-context learning
  • CoT

Slides

Pretraining II

Reading Material

Instruction Following (February 19)

Content

  • Instruction Tuning (T0, FLAN)
  • Evaluating Instruction Tuned LMs
  • Basics of RLHF

Slides

Instruction Following

Reading Material

Preference Optimization (February 21)

Content

  • Reward Modeling
  • Basics of RLHF
  • Direct Preference Optimization

Slides

Learning from Preferences

Reading Material

[Illustrating Reinforcement Learning from Human Feedback (RLHF)]

[DPO]

[PPO vs DPO]

[Other resources]

Parameter Efficient Finetuning (February 26)

Content

  • LoRA
  • QLoRA

Slides

(q)lora

Reading Material

[LoRA]

[QLoRA]

Evaluation (February 28)

Content

  • What is Benchmarking
  • Open and close ended evaluation
  • LLM Evaluation Challenges

Slides

benchmarking

Reading Material

[The Evolving Landscape of LLM Evaluation (for Quiz)]

Sequence Tagging (March 5)

Content

  • Why sequence tagging
  • HMMs
  • Viterbi

Slides

sequence tagging

Reading Material

TBA

Parsing (March 7)

Content

  • Constituency Parsing
  • CKY Algorithm
  • Dependency Parsing (Intro)
  • Semantic Parsing (Into)

Slides

Parsing

Reading Material

TBA

Spring Break (March 12 & 14)

No Class

Interpretability (March 19)

Content

  • Global vs Local Explanation
  • Post hoc explanations (LIME, Gradient-based)
  • Probing

Slides

Interpret

Reading Material

TBA

Efficiency (March 21)

Content

  • Speculative Decoding, Flash Attention
  • Quantization, Pruning, Distillation

Slides

Efficiency

Reading Material

TBA

Multimodality (March 26)

Content

  • ViT
  • CLIP
  • Image + Text -> Text

Slides

Multimodal

Reading Material

Multimodality II (March 28)

Content

Multimodality Continued (CLIP, Video, Audio)

Slides

Multimodal II

Reading Material

TBD

Retrieval (April 2)

Content

TBD

Slides

Retrieval

Agents (April 4)

Content

TBD

Slides

Agents

Multilinguality (April 9)

Content

  • Linguistic Diversity
  • Cross Lingual Transfer
  • Multilingual Pretraining and Alignment

Slides

Multilingual

Reading Material

TBD

Ethics (April 11)

Content

  • Background on Ethics in AI/NLP
  • Bias and Fairness
  • Toxicity and Other Harmful Content

Slides

Ethics

Reading Material

TBA