Transformers and Large Language Models (from Speech and Language Processing) presents detailed but relatively clear explanations of attention, embeddings, transformers, and how they all fit together into large language models. Includes many good diagrams that aid in understanding most topics.
Intro to Large Language Models (Youtube, 1hr) — by a very well-regarded expert and instructor. Talks more about the high-level ideas, capabilities, and uses of LLMs than the technical details.
LLM Visualization — An interesting interactive visualization and explanation of how large language models function. It’s worth just seeing the visual comparison of the size of a few different LLMs, and it may be useful to explore a particular LLM component (the Table of Contents lets you go directly to explanations of each component).
DeepLearning.ai: Sequence Models, Week 3 covers attention models (though not transformers) and some additional topics.