We have recently trained our first 100M token context model: LTM-2-mini. 100M tokens equals ~10 million lines of code or ~750 novels.
For each decoded token, LTM-2-mini's sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B for a 100M token context window.
The contrast in memory requirements is even larger -- running Llama 3.1 405B with a 100M token context requires 638 H100s per user just to store a single 100M token KV cache. In contrast, LTM requires a small fraction of a single H100's HBM per user for the same context.
— Magic AI
Recent articles
- Qwen3-4B-Thinking: "This is art - pelicans don't ride bikes!" - 10th August 2025
- My Lethal Trifecta talk at the Bay Area AI Security Meetup - 9th August 2025
- The surprise deprecation of GPT-4o for ChatGPT consumers - 8th August 2025