Introducing Odyssey-2 Max: Scaled World Simulation

Today we’re introducing Odyssey-2 Max, our largest, most powerful general world model yet

Oliver Cameron

April 21st, 2026

Today we’re introducing Odyssey-2 Max, our largest, most powerful general purpose world model yet. Odyssey-2 Max materially advances the state-of-the-art in physical accuracy of world models. Next-token prediction unlocked symbolic intelligence in language models; Odyssey-2 Max demonstrates that next-state prediction at scale leads to high-fidelity world simulation.

Every simulation was generated in real-time

Simulating Open-Ended Futures and Physical Dynamics

World models are a new form of multimodal intelligence, distinct from language, image, and video models. They learn to reason about how the world evolves by training directly on visual observations of real-world action, rather than on its compressed reflection in text. We believe this is true multimodal intelligence.

The defining capability of world models is their ability to simulate open-ended futures via continuous, interactive rollouts that evolve with actions in real time. Bidirectional video models like Sora, Veo, Kling, and Runway cannot do this. They generate past, present, and future jointly from a prompt fixed in advance—a structure that rules out real-time interaction, since future frames would have to condition on actions the user has not yet taken. A world model must instead be causal, predicting each state from prior states and actions. This autoregressive formulation is the foundation of the Odyssey-2 series.

Rollout coherence requires world models to learn physics as the model must remain stable as it rolls forward step by step to avoid drift or collapse. This pressure forces the model to internalize how objects move, interact, and change—yielding an implicit simulation of physical processes as a consequence of next-state prediction. As these models scale, the quality of this simulation increases, enabling applications in science, robotics, gaming, defense, and healthcare.

Odyssey-2 Max achieves the highest physics score among evaluated world models—all while running in real time. To evaluate physical accuracy of world models we follow common practice and evaluate on VBench 2—a benchmark designed to assess the faithfulness of generated video. Specifically its physics sub-score assesses the accurate modelling of mechanics, thermotics, materials, and multi-view consistency. Additionally, we evaluate on the physics modelling subset of the commonly used Physical AI benchmark.

Performance on VBench 2's Physics Benchmark

Physics performance improves with scale: Odyssey-2 Max, at roughly 3× the size of Odyssey-2 Pro, increased VBench 2 physics performance from 49.67 to 58.52 and PAI-Bench physics performance from 91.67 to 93.02. This suggests that increasingly consistent dynamics emerge from next-state prediction under a causal training regime as model size and compute scale. Similar improvements can be observed on aspects such as motion smoothness, subject consistency, background consistency, and other aspects of PAI-Bench.

ModelDomainGeneration Time	VBench 2	PAI-Bench
ModelDomainGeneration Time	Physics	Physics	Subject Consistency	Background Consistency	Motion Smoothness	Image Quality
Odyssey-2 MaxGeneral World Model120+ seconds of generation	58.52	93.02	94.15	94.08	99.10	71.17
Odyssey-2 ProGeneral World Model120+ seconds of generation	49.67	91.67	94.67	95.84	99.50	67.46
Odyssey-2General World Model120+ seconds of generation	48.58	89.50	93.40	92.30	99.40	68.20
Cosmos-Predict2.5-14BPhysical AI World Model30 seconds of generation	44.92	93.50	94.80	99.10	70.00	70.00
Cosmos-Predict2-14BPhysical AI World Model30 seconds of generation	39.22	89.20	89.60	92.80	98.00	67.50
Cosmos-Predict2.5-2BPhysical AI World Model30 seconds of generation	35.61	91.70	92.30	94.20	99.10	70.70
LingBot-World-FastGaming World Model120+ seconds of generation	N/A	92.19	91.42	93.88	98.14	68.05

Odyssey-2 Max advances the state-of-the-art in physical accuracy compared to leading world models and bidirectional video models. VBench 2.0 Physics score obtained from Zheng et al., 2025 and our own measurements. PAI-Bench physics score obtained from Zhou et al., 2025 and our own measurements

We have benchmarked publicly available world models. We have not included bidirectional video models, which do not meet the predictive architecture or interactive conditioning required for a world model

Every simulation was generated in real-time

Training Our Largest World Model Yet

We adapt an autoregressive diffusion transformer (AR DiT) architecture for Odyssey-2 Max. Notable design decisions and innovations include:

Proprietary KV cache

We enable real-time inference and training on sequences up to 20x longer than prior work with full backpropagation. Many real-world dynamics only become predictable over longer horizons, so extending context is necessary to capture them.

Causal attention with local and global context

We preserve fine-grained detail while maintaining temporal coherence over extended horizons.

Open-ended interaction

We condition on latent space embeddings which allows us to handle arbitrary forms of input actions. Applying conditioning per latent generation allows for precise, real-time control.

Inference-aware modelling

We incorporate roofline estimates from the very beginning ensuring our final model can be served in real time on the target inference hardware.

Flow matching in continuous latent space

We utilize continuous flow matching yielding high-fidelity simulations at scale, avoiding the quality ceiling of discrete tokenization.

Few-step denoising

We distill our model to generate high-quality visuals in a small number of denoising steps, making real-time autoregressive rollout tractable.

We trained on several hundred NVIDIA Blackwell (B200) GPUs employing a highly optimized orchestration pipeline that integrates advanced model parallelism techniques across multiple stages to maximize hardware utilization and throughput. We expect this training paradigm to evolve rapidly over future model generations given the current speed of research in world models.

Stage 1 training

General visual dynamics. Large-scale video pretraining establishes a base understanding of how the world evolves.

Stage 2 training

Interaction and task conditioning. The model learns to respond to actions and task-specific signals.

Stage 3 training

Long-horizon stability. A final phase trains the system for stable autoregressive operation under extended rollout, where small per-step errors would otherwise accumulate.

Observations From Scaling World Simulation

Odyssey-2 Max is the third model in the Odyssey-2 family, scaling model size, data, and training, with 3x the parameter count and 10x training compute compared to Odyssey-2 Pro. This increase in scale results in the emergence of behaviors not observed in smaller models.

Pre-Scale and Post-Scale

Increased scale unlocks more realistic, stable interactions

Learning Physical Processes

Learning Biomechanics

Learning Physical Dynamics

Learning Human Behaviors

Every simulation was generated in real-time

Learning Interaction

Odyssey-2 Max, Available Now in Private Beta

Odyssey-2 Max is an early but substantial step toward general world models that can simulate and interact with the world in real time. It demonstrates strong performance across physics, interaction, and long-horizon stability, but meaningful work remains.

We see Odyssey-2 Max as a form of pretrained physical intelligence—the equivalent of a human who has spent years observing and interacting with the world, just before learning to drive. Or analogous to language models, the state of GPT-2 just before the transition to ChatGPT.

We’re excited to make Odyssey-2 Max available in private beta to partners working on frontier applications, particularly in robotics, gaming, simulation, defense, and interactive systems. If you’re interested in exploring applications for world models, please get in touch.

The Team That Brought This to Life

Odyssey-2 Max was made possible by the incredible Odyssey team: Ahmad Nazeri, Ahmet Hamdi Guzel, Alexandra Chan, Amogh Adishesha, Andrew Trout, Andy Kolkhorst, Aravind Kaimal, Ben Graham, Derek Sarshad, Fabian Güra, Finley Code, James Grieve, Jeff Hawke, Jenny Seidenschwarz, Jesse Allardice, Jessica Inman, Jonathan Sadeghi, Kaiwen Guo, Kristy McDonough, Nicolas Griffiths, Nima Rezaeian, Oliver Cameron, Renee Huang, Richard Shen, Robin Tweedie, Sarah King, Sirish Srinivasan, Tobiah Rex, Vighnesh Birodkar, Vinh-Dieu Lam, Zygmunt Łenyk. Join us!

API

Build with general-purpose world models

Integrate Odyssey-2 with our API

Get API Access