Resources: Sequence Models (2)

Topics to Look For¶

Word2Vec
GloVe
Sentiment classification
Debiasing
Attention; Self-Attention
Transformers
Large Language Models (LLMs)

Resources¶

DeepLearning.ai: Sequence Models, Week 2, Learning Word Embeddings and Applications using Word Embeddings (6 videos)
3Blue1Brown’s video series on LLMs:
- Large Language Models explained briefly (8min) — Very high-level, and you’ll probably be find you know the proper terminology for and have a deeper understanding of many topics he mentions already, but this will also introduce some new ideas about how LLMs work and are trained.
- Visual intro to Transformers (27min) — This will likely repeat and reinforce concepts you’ve seen before. It’s important to make sure you have the right foundation for the following two videos.
- Visualizing Attention (26min)
- How might LLMs store facts (23min)
A jargon-free explanation of how AI large language models work (from Ars Technica) — reinforces some things we’ve already covered and then does an excellent job explaining new concepts from there (similar to topics of the 3Blue1Brown videos).

Supplemental¶

The Illustrated Transformer by Jay Alammar — in particular, this covers self-attention and positional encoding with visual illustrations. Note that it covers the original transformer model, which is an encoder-decoder architecture, while LLMs are typically decoder-only architectures, using only the “second half” of the original transformer architecture. So this is a good resource for several details inside LLMs, but other resources will present a more accurate picture of an LLM as a whole.
Transformers and Large Language Models (from Speech and Language Processing) presents detailed but relatively clear explanations of attention, embeddings, transformers, and how they all fit together into large language models. Includes many good diagrams that aid in understanding most topics.
Decoder-Only Transformers: The Workhorse of Generative LLMs is another well-presented, detailed article explaining transformers, focusing on the decoder-only transformers used in large language models.
Attention Mechanisms and Transformers (from from Dive into Deep Learning).
Intro to Large Language Models (Youtube, 1hr) — by a very well-regarded expert and instructor. Talks more about the high-level ideas, capabilities, and uses of LLMs than the technical details.
LLM Visualization — An interesting interactive visualization and explanation of how large language models function. It’s worth just seeing the visual comparison of the size of a few different LLMs, and it may be useful to explore a particular LLM component (the Table of Contents lets you go directly to explanations of each component).
DeepLearning.ai: Sequence Models, Week 3 covers attention models (though not transformers) and some additional topics.

Tags:

Courses:

cs387

modified: 2025-03-26