NVIDIA developer workshop: “s1: Simple test-time scaling” paper

At the NVIDIA developer workshop days I attended last week, the following paper was highly recommended:

“s1: Simple test-time scaling” by Niklas MuennighoffZitong YangWeijia ShiXiang Lisa LiLi Fei-FeiHannaneh HajishirziLuke ZettlemoyerPercy LiangEmmanuel CandèsTatsunori Hashimoto
(PDF)

Github project: https://github.com/simplescaling/s1

Not directly related, but an interesting application of the NVIDIA RAPIDS Accelerator that was also presented: https://aws.amazon.com/blogs/industries/accelerating-fraud-detection-in-financial-services-with-rapids-accelerator-for-apache-spark-on-aws/

An open-source AI hedge fund team by @virattt

“The AI Hedge Fund is an educational model using AI agents to simulate trading decisions, emphasizing various investment strategies and analyses.”

Looks very interesting and inspiring, and @virattt seems to be a cool guy.

https://github.com/virattt/ai-hedge-fund/tree/main

Caveat/Disclaimer: “Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?” by Weixian Waylon Li, Hyeonjun Kim, Mihai Cucuringu, Tiejun Ma – 11 May 2025 (PDF)

 

Paper by DeepSeek: “Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures”

“This paper presents an in-depth analysis of the DeepSeek-V3/R1 model architecture and its AI infrastructure, highlighting key innovations such as Multi-head Latent Attention (MLA) for enhanced memory efficiency, Mixture of Experts (MoE) architectures for optimized computation-communication trade-offs, FP8 mixed-precision training to unlock the full potential of hardware capabilities, and a Multi-Plane Network Topology to minimize cluster-level network overhead.”

Paper: https://www.alphaxiv.org/abs/2505.09343