
Introducing Odyssey-2 Max: Scaled World Simulation
Today we’re introducing Odyssey-2 Max, our largest, most powerful general world model yet

Oliver Cameron
April 21st, 2026
Today we’re introducing Odyssey-2 Max, our largest, most powerful general purpose world model yet. Odyssey-2 Max materially advances the state-of-the-art in physical accuracy of world models. Next-token prediction unlocked symbolic intelligence in language models; Odyssey-2 Max demonstrates that next-state prediction at scale leads to high-fidelity world simulation.
Simulating Open-Ended Futures and Physical Dynamics
World models are a new form of multimodal intelligence, distinct from language, image, and video models. They learn to reason about how the world evolves by training directly on visual observations of real-world action, rather than on its compressed reflection in text. We believe this is true multimodal intelligence.
The defining capability of world models is their ability to simulate open-ended futures via continuous, interactive rollouts that evolve with actions in real time. Bidirectional video models like Sora, Veo, Kling, and Runway cannot do this. They generate past, present, and future jointly from a prompt fixed in advance—a structure that rules out real-time interaction, since future frames would have to condition on actions the user has not yet taken. A world model must instead be causal, predicting each state from prior states and actions. This autoregressive formulation is the foundation of the Odyssey-2 series.
Rollout coherence requires world models to learn physics as the model must remain stable as it rolls forward step by step to avoid drift or collapse. This pressure forces the model to internalize how objects move, interact, and change—yielding an implicit simulation of physical processes as a consequence of next-state prediction. As these models scale, the quality of this simulation increases, enabling applications in science, robotics, gaming, defense, and healthcare.
Odyssey-2 Max achieves the highest physics score among evaluated world models—all while running in real time. To evaluate physical accuracy of world models we follow common practice and evaluate on VBench 2—a benchmark designed to assess the faithfulness of generated video. Specifically its physics sub-score assesses the accurate modelling of mechanics, thermotics, materials, and multi-view consistency. Additionally, we evaluate on the physics modelling subset of the commonly used Physical AI benchmark.
Performance on VBench 2's Physics Benchmark

Physics performance improves with scale: Odyssey-2 Max, at roughly 3× the size of Odyssey-2 Pro, increased VBench 2 physics performance from 49.67 to 58.52 and PAI-Bench physics performance from 91.67 to 93.02. This suggests that increasingly consistent dynamics emerge from next-state prediction under a causal training regime as model size and compute scale. Similar improvements can be observed on aspects such as motion smoothness, subject consistency, background consistency, and other aspects of PAI-Bench.
| ModelDomainGeneration Time | VBench 2 | PAI-Bench | ||||
|---|---|---|---|---|---|---|
| Physics | Physics | Subject Consistency | Background Consistency | Motion Smoothness | Image Quality | |
| Odyssey-2 MaxGeneral World Model120+ seconds of generation | 58.52 | 93.02 | 94.15 | 94.08 | 99.10 | 71.17 |
| Odyssey-2 ProGeneral World Model120+ seconds of generation | 49.67 | 91.67 | 94.67 | 95.84 | 99.50 | 67.46 |
| Odyssey-2General World Model120+ seconds of generation | 48.58 | 89.50 | 93.40 | 92.30 | 99.40 | 68.20 |
| Cosmos-Predict2.5-14BPhysical AI World Model30 seconds of generation | 44.92 | 93.50 | 94.80 | 99.10 | 70.00 | 70.00 |
| Cosmos-Predict2-14BPhysical AI World Model30 seconds of generation | 39.22 | 89.20 | 89.60 | 92.80 | 98.00 | 67.50 |
| Cosmos-Predict2.5-2BPhysical AI World Model30 seconds of generation | 35.61 | 91.70 | 92.30 | 94.20 | 99.10 | 70.70 |
| LingBot-World-FastGaming World Model120+ seconds of generation | N/A | 92.19 | 91.42 | 93.88 | 98.14 | 68.05 |
Training Our Largest World Model Yet
We adapt an autoregressive diffusion transformer (AR DiT) architecture for Odyssey-2 Max. Notable design decisions and innovations include:
Proprietary KV cache
We enable real-time inference and training on sequences up to 20x longer than prior work with full backpropagation. Many real-world dynamics only become predictable over longer horizons, so extending context is necessary to capture them.
Causal attention with local and global context
We preserve fine-grained detail while maintaining temporal coherence over extended horizons.
Open-ended interaction
We condition on latent space embeddings which allows us to handle arbitrary forms of input actions. Applying conditioning per latent generation allows for precise, real-time control.
Inference-aware modelling
We incorporate roofline estimates from the very beginning ensuring our final model can be served in real time on the target inference hardware.
Flow matching in continuous latent space
We utilize continuous flow matching yielding high-fidelity simulations at scale, avoiding the quality ceiling of discrete tokenization.
Few-step denoising
We distill our model to generate high-quality visuals in a small number of denoising steps, making real-time autoregressive rollout tractable.
We trained on several hundred NVIDIA Blackwell (B200) GPUs employing a highly optimized orchestration pipeline that integrates advanced model parallelism techniques across multiple stages to maximize hardware utilization and throughput. We expect this training paradigm to evolve rapidly over future model generations given the current speed of research in world models.
Stage 1 training
General visual dynamics. Large-scale video pretraining establishes a base understanding of how the world evolves.
Stage 2 training
Interaction and task conditioning. The model learns to respond to actions and task-specific signals.
Stage 3 training
Long-horizon stability. A final phase trains the system for stable autoregressive operation under extended rollout, where small per-step errors would otherwise accumulate.
Observations From Scaling World Simulation
Odyssey-2 Max is the third model in the Odyssey-2 family, scaling model size, data, and training, with 3x the parameter count and 10x training compute compared to Odyssey-2 Pro. This increase in scale results in the emergence of behaviors not observed in smaller models.
Pre-Scale and Post-Scale
Learning Physical Processes
Learning Biomechanics
Learning Physical Dynamics
Learning Human Behaviors
Learning Interaction
Odyssey-2 Max, Available Now in Private Beta
Odyssey-2 Max is an early but substantial step toward general world models that can simulate and interact with the world in real time. It demonstrates strong performance across physics, interaction, and long-horizon stability, but meaningful work remains.
We see Odyssey-2 Max as a form of pretrained physical intelligence—the equivalent of a human who has spent years observing and interacting with the world, just before learning to drive. Or analogous to language models, the state of GPT-2 just before the transition to ChatGPT.
We’re excited to make Odyssey-2 Max available in private beta to partners working on frontier applications, particularly in robotics, gaming, simulation, defense, and interactive systems. If you’re interested in exploring applications for world models, please get in touch.
The Team That Brought This to Life
Odyssey-2 Max was made possible by the incredible Odyssey team: Ahmad Nazeri, Ahmet Hamdi Guzel, Alexandra Chan, Amogh Adishesha, Andrew Trout, Andy Kolkhorst, Aravind Kaimal, Ben Graham, Derek Sarshad, Fabian Güra, Finley Code, James Grieve, Jeff Hawke, Jenny Seidenschwarz, Jesse Allardice, Jessica Inman, Jonathan Sadeghi, Kaiwen Guo, Kristy McDonough, Nicolas Griffiths, Nima Rezaeian, Oliver Cameron, Renee Huang, Richard Shen, Robin Tweedie, Sarah King, Sirish Srinivasan, Tobiah Rex, Vighnesh Birodkar, Vinh-Dieu Lam, Zygmunt Łenyk. Join us!
