An interesting interview with one of the most intelligent researchers in AI:
Ilya Sutskever – We’re moving from the age of scaling to the age of research
(/ht HackerNews)

Make a diff!
An interesting interview with one of the most intelligent researchers in AI:
Ilya Sutskever – We’re moving from the age of scaling to the age of research
(/ht HackerNews)
Home page: https://ai-2027.com/
Authors addressing some critique, explaining “Why America Wins” (Ed.: Would that even matter much in an end-game between AI and humans? I doubt it. Rather just a bias.)
A YouTube video summary: AI 2027: A Realistic Scenario of AI Takeover
My take:
At the NVIDIA developer workshop days I attended last week, the following paper was highly recommended:
“s1: Simple test-time scaling” by Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto
(PDF)
Github project: https://github.com/simplescaling/s1
Not directly related, but an interesting application of the NVIDIA RAPIDS Accelerator that was also presented: https://aws.amazon.com/blogs/industries/accelerating-fraud-detection-in-financial-services-with-rapids-accelerator-for-apache-spark-on-aws/
“The AI Hedge Fund is an educational model using AI agents to simulate trading decisions, emphasizing various investment strategies and analyses.”
Looks very interesting and inspiring, and @virattt seems to be a cool guy.
https://github.com/virattt/ai-hedge-fund/tree/main
Caveat/Disclaimer: “Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?” by Weixian Waylon Li, Hyeonjun Kim, Mihai Cucuringu, Tiejun Ma – 11 May 2025 (PDF)
“This paper presents an in-depth analysis of the DeepSeek-V3/R1 model architecture and its AI infrastructure, highlighting key innovations such as Multi-head Latent Attention (MLA) for enhanced memory efficiency, Mixture of Experts (MoE) architectures for optimized computation-communication trade-offs, FP8 mixed-precision training to unlock the full potential of hardware capabilities, and a Multi-Plane Network Topology to minimize cluster-level network overhead.”
An interesting YT video by Andrej Karpathy
Not covering DeepSeek‘s DeepSeek-R1 model(s) with RL yet (e.g. check out https://ollama.com/library/deepseek-r1 to try it), but a pretty good visualisation how LLMs generally work: