Topics to Look For

  • Word2Vec
  • GloVe
  • Sentiment classification
  • Debiasing
  • Attention; Self-Attention
  • Transformers
  • Large Language Models (LLMs)

Resources

Supplemental

  • The Illustrated Transformer by Jay Alammar — in particular, this covers self-attention and positional encoding with visual illustrations. Note that it covers the original transformer model, which is an encoder-decoder architecture, while LLMs are typically decoder-only architectures, using only the “second half” of the original transformer architecture. So this is a good resource for several details inside LLMs, but other resources will present a more accurate picture of an LLM as a whole.
  • Transformers and Large Language Models (from Speech and Language Processing) presents detailed but relatively clear explanations of attention, embeddings, transformers, and how they all fit together into large language models. Includes many good diagrams that aid in understanding most topics.
  • Decoder-Only Transformers: The Workhorse of Generative LLMs is another well-presented, detailed article explaining transformers, focusing on the decoder-only transformers used in large language models.
  • Attention Mechanisms and Transformers (from from Dive into Deep Learning).
  • Intro to Large Language Models (Youtube, 1hr) — by a very well-regarded expert and instructor. Talks more about the high-level ideas, capabilities, and uses of LLMs than the technical details.
  • LLM Visualization — An interesting interactive visualization and explanation of how large language models function. It’s worth just seeing the visual comparison of the size of a few different LLMs, and it may be useful to explore a particular LLM component (the Table of Contents lets you go directly to explanations of each component).
  • DeepLearning.ai: Sequence Models, Week 3 covers attention models (though not transformers) and some additional topics.