Not covering DeepSeek‘s DeepSeek-R1 model(s) with RL yet (e.g. check out https://ollama.com/library/deepseek-r1 to try it), but a pretty good visualisation how LLMs generally work:
“Operating System in 1,000 Lines”
A fun little tutorial, showcasing a RISC-V OS in 1k lines