Introducing PROWL: Learning Through Discovery

PROWL is a novel RL-driven adversarial framework where an RL agent explores game environments with the objective to improve world model performance

Jeff Hawke

May 12th, 2026

Odyssey is pioneering foundation world models. To power reliable applications in robotics, science, healthcare, education, and gaming, world models must learn to predict high-quality visuals with realistic physics while accurately responding to actions. Today’s best world models are still not pixel or physics perfect, and they do not always follow the specified action.

To address this, we’ve developed PROWL (Prioritized Regret-Driven Optimization for World Model Learning), a novel RL-driven adversarial framework where an RL agent explores game environments to discover failures in world models. As the agent interacts with the environment, it is rewarded for uncovering breakdowns in geometry, motion, visual consistency, and action-conditioned dynamics, automatically building an adversarial curriculum that improves the model over time. PROWL also surfaces cases where the world model captures geometry and physics correctly, but fails to follow actions precisely.

Each time PROWL iterates, the world model gets better by learning from the curriculum discovered by the RL agent. With an improved world model, the RL agent then becomes increasingly effective at discovery. This creates a closed-loop cycle of computational feedback, turning discovered failures into targeted training trajectories.

Language models showed the value of scalable feedback loops. It’s time for world models to do the same.

PROWL: RL-Driven Adversarial World Model Training

PROWL provides a framework for improving world models with verifiable rewards. Game environments, physics simulations, and even real-world robot testing offer routes to improving world models as reinforcement learning environments.

First, we initialise the world model with pre-trained parameters trained with Diffusion Forcing. In our experiments, this was trained on a human-collected dataset from Minecraft.

The agent is encouraged to find difficult cases, but constrained to stay near to realistic exploration behavior so that it does not exploit the model with unnatural actions. The world model is continuously fine-tuned on these adversarially discovered trajectories, yielding an adversarial training loop that converts rare failures into a stable, near-distribution training signal without drifting into out-of-distribution exploitation. 

To ensure the algorithm continues to improve the world model during training, we developed a novel Prioritized Adversarial Trajectory (PAT) buffer. This is what turns PROWL’s discovered failures into a progressively harder curriculum. As the world model learns to solve easier failure cases, PAT deprioritizes them and brings harder unresolved trajectories into training. This keeps the curriculum learning and focused on the most informative weaknesses, rather than wasting compute on cases the model already handles.

Our results indicate that scalable world models benefit not only from larger datasets, but also from selectively generating informative training data.

Discovering New Capabilities in Minecraft

We evaluated PROWL in the MineRL research environment. We’re sharing a sample of qualitative capabilities we observed PROWL discover. For full quantitative results please see the paper preprint.

Action-Following

PROWL follows the intended action more faithfully than the baseline, which often predicts the wrong direction or ignores controls entirely.

Visual Quality

PROWL removes persistent artifacts such as rotation seams, colour banding, popping geometry, and unstable textures.

Crosshair Persistence

PROWL keeps 3D-anchored UI elements—like the block-placement crosshair and held items—stable in world space as the camera moves, while the baseline lets them drift, warp, or vanish.

Emergent Behavior

The agent's RL-discovered 180° pivot and dash maneuver lies outside the human-demonstration distribution that trained the static baseline; PROWL still renders it faithfully, while the baseline—having never seen such motion—collapses mid-turn. 

Temporal Coherence

PROWL handles hard scene transitions, such as inventory close → return-to-game, with less ghosting, flicker, or incorrect re-rendering.

Out-of-Distribution Robustness

PROWL generalizes better beyond its training environments, maintaining higher visual and dynamical quality in held-out Minecraft scenes while reducing drift, artifacts, and incorrect scene evolution relative to the baseline.

Environmental Dynamics

PROWL accurately captures non-rigid environmental motion such as flowing waterfalls, while the baseline freezes, distorts, or hallucinates the scene dynamics.

Object Placement & Persistence

PROWL preserves object position, scale, and identity under camera motion. After the agent places a block, PROWL keeps it solid and anchored where it was put.

Discovery to Enlightenment

PROWL points toward a new way of improving world models: where we build systems that generate their own useful experience. It creates a productive tug-of-war between a failure-discovery agent and a world-model trainer: the agent searches for surprising, environment-grounded failures, while the trainer turns those failures into curriculum data that makes the model stronger. As the world model improves, the agent must discover increasingly harder and more informative failures, creating an open-ended loop where better models enable better discovery, and better discovery enables better models.

We’re excited by what closed-loop computational feedback offers to world model development. This lets us improve the physics modelling of a foundation world model by learning from games engines, physics and robotics simulators, and even real-world interaction.

The Team That Brought This to Life

PROWL was made possible by the incredible Odyssey team—Ahmet H. Güzel, Jenny Seidenschwarz, Benjamin Graham, Jonathan Sadeghi, and Jeffrey Hawke—with valuable feedback from University College London advisors Jack Parker-Holder and Ilia Bogunovic.

API

Build with general-purpose world models

Integrate Odyssey-2 with our API

APP

Simulate your

dreams in real-time

Experience Odyssey-2 for free

API

Build with general-purpose world models

Integrate Odyssey-2 with our API

APP

Simulate your

dreams in real-time

Experience Odyssey-2 for free

API

Build with general-purpose world models

Integrate Odyssey-2 with our API

APP

Simulate your

dreams in real-time

Experience Odyssey-2 for free

API

Build with general-purpose world models

Integrate Odyssey-2 with our API

APP

Simulate your

dreams in real-time

Experience Odyssey-2 for free