Sparse Imagination
for Efficient Visual World Model Planning

1Department of Electrical and Computer Engineering, Seoul National University
2Graduate School of Data Science, Seoul National University
*Equal contribution, †Corresponding author

Abstract

World model based planning has significantly improved decision-making in complex environments by enabling agents to simulate future states and make informed choices. This computational burden is particularly restrictive in robotics, where resources are severely constrained.

We propose Sparse Imagination, which improves planning efficiency by using only a subset of visual tokens during latent rollouts. Our method trains a transformer world model with randomized grouped attention so it can robustly predict under dynamic token sparsity. Across simulation and real-world tasks, sparse imagination achieves strong speedups while preserving control performance.

Sparse Imagination Overview

World Model Planning with Sparse Imagination.

Method

The world model predicts future DINO patch tokens conditioned on past observations and actions. During training, randomized grouped attention partitions tokens into groups and applies structured attention masks, enabling robust prediction under arbitrary token subsets. During planning with world model, we apply random token dropout with drop ratio p. This reduces attention cost and allows a direct trade-off between speed and accuracy.

Randomized Grouped Attention

Training with Randomized Grouped Attention Strategy.

Main Results

Test-Time Trajectory Optimization

Evaluated on Simulation environments; PointMaze, Wall, PushT, Block Pushing with MPC-CEM and Granular and Rope with CEM, Moderate drop ratios (10-50%) preserve task performance while reducing planning time substantially.

Method Avg. Success (%) Avg. Planning Time (s/iter) Avg. Change (%)
Full69.8183.3-
CLS50.771.0-61.3%
Drop 10%74.1166.3-9.3%
Drop 20%69.6150.0-18.1%
Drop 30%71.0136.3-25.6%
Drop 40%68.6119.8-34.7%
Drop 50%69.7109.0-40.5%
Drop 60%63.699.8-45.6%
Drop 70%53.389.5-51.2%
Drop 80%54.381.5-55.5%
Drop 90%46.773.8-59.8%

VLA-Guided Planning

For policy-guided planning in long-horizon tasks including Real-world tasks (Pick-and-Place and Close-Drawer) and LIBERO-10, Sparse imagination with 50% token drop consistently matches Full-Patch performance with much lower planning overhead.

Drop 50%

Drop 50%

Full patch

Full patch

LeRobot and LIBERO tradeoff

Performance vs inference-time in Real-world tasks and LIBERO-10.

Why Random Sampling Works

Across token reduction baselines, random sampling remains highly competitive and often best on average. We attribute this to unbiased spatial coverage and reduced blind-spot risk in dynamic planning.

Random sampling pattern

Random

LHS sampling pattern

LHS

LTRP sampling pattern

LTRP

Attention-Encoder sampling pattern

Attention-Encoder

STAR sampling pattern

STAR

Attention-WM sampling pattern

Attention-WM

Method Family Representative Avg. Success (%)
Random SamplingRandom66.7
Fixed64.3
LHS65.7
Learning-Based PruningLTRP59.5
Attention-Based PruningAttention-Encoder63.0
STAR61.4
Attention-WM64.0
Cluster and MergingATC41.7

Information Sufficiency (nHSIC and Attentive Probing)

We further verify that sparse token subsets retain useful state information. nHSIC remains high even under substantial dropout, and attentive probing shows that random token subsets keep strong predictive signal. Notably, even a single random token can be comparable to the CLS token in probing performance.

nHSIC under token dropout

nHSIC between visual tokens and environment states.

Attentive probing results

Attentive probing validation loss under token dropout.

Conclusion

Sparse imagination reduces world-model planning cost by dropping visual tokens at inference while preserving performance across simulation and real-world tasks. The method is simple, robust, and broadly compatible with transformer-based visual planners.

BibTeX

@article{chun2026sparseimagination,
  title   = {Sparse Imagination for Efficient Visual World Model Planning},
  author  = {Junha Chun and Youngjoon Jeong and Taesup Kim},
  journal = {ICLR},
  year    = {2026}
}