Grid with multiple puzzle examples from Phyre and Virtual Tools.

Abstract

Tasks that involve complex interactions between objects with unknown dynamics make planning before execution difficult. These tasks require agents to iteratively improve their actions after actively exploring causes and effects in the environment. For these type of tasks, we propose Causal-PIK, a method that leverages Bayesian Optimization to reason about causal interactions via a Physics-Informed Kernel to help guide efficient search for the best next action. Experimental results on Virtual Tools and PHYRE physical reasoning benchmarks show that Causal-PIK outperforms state-of-the-art results, requiring fewer actions to reach the goal. We also compare Causal-PIK to human studies, including results from a new user study we conducted on the PHYRE benchmark. We find that Causal-PIK remains competitive on tasks that are very challenging, even for human problem-solvers.

Causal-PIK

Method overview.

Results

We evaluate Causal-PIK on two physical reasoning benchmarks: Virtual Tools and PHYRE. We compare our method against state-of-the-art methods in both benchmarks. We also conduct an ablation study by replacing the Physics-Informed Kernel with a standard RBF kernel. Finally, we carry a human study to assess the difficulty of the tasks and compare the performance of our method against human players.
Virtual Tools
Model AUCCESS $\,\,\uparrow$
RAND16.0±20
DQN25.0±24.0
SSUP (Allen et al., 2020)58.0±27.0
Ours RBF42.0±33.0
Ours Causal-PIK65.0±25.0
Humans (Allen et al., 2020)53.25±23
PHYRE-1B Cross
$^*$1K reduced action space. $^\dagger$10K reduced action space. $^+$max of 10 attempts per task.
Model AUCCESS $\,\,\uparrow$
Dec [Joint] (Girdhar et al., 2020)$^*$40.3±8
MEM$^\dagger$18.5±5.1
DQN$^\dagger$36.8±9.7
Ahmed et al. 2021$^\dagger$41.9±8.8
RPIN (Qi et al., 2021)$^\dagger$42.2±7.1
RAND13.0±5.0
Harter et al. 202030.24±8.9
Ours RBF27.70±9.68
Ours Causal-PIK41.6±9.33
Ours Causal-PIK @10$^+$24.8±9.22
Humans @10$^+$36.6±10.2
As shown in the tables above, Causal-PIK outperforms state-of-the-art methods across both benchmarks. In Virtual Tools, Causal-PIK achieves an AUCCESS score of 65, surpassing the second-best method (SSUP) at 58, and human performance at 53.25. In PHYRE, Causal-PIK obtains an AUCCESS score of 41.6, outperforming the next best method (Harter et al., 2020), which scores 30.24. Notably, Causal-PIK achieves these results while operating over the full 2M action space, in contrast to some baseline methods that rely on a drastically reduced action space. Despite the increased complexity, Causal-PIK performs comparably to baselines that tackle a significantly simpler problem. Restricting the maximum number of attempts to 10, Causal-PIK achieves an AUCCESS score of 24.8, compared to 36.6 for humans.

Conclusion

We introduce Causal-PIK, a novel approach that integrates a Physics-Informed Kernel with BO to reason about causality in single-intervention physical reasoning tasks. By leveraging information from past failed attempts, our method enables agents to efficiently search for optimal actions, reducing the number of trials needed to solve tasks with complex dynamics. Our experimental results demonstrate that Causal-PIK outperforms state-of-the-art baselines, requiring fewer attempts on average to solve the puzzles from the Virtual Tools and PHYRE benchmarks.

BibTeX

@inproceedings{morlanscausal,
  title={Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel},
  author={Morlans, Carlota Par{\'e}s and Yi, Michelle and Chen, Claire and Wu, Sarah A and Antonova, Rika and Gerstenberg, Tobias and Bohg, Jeannette},
  booktitle={Forty-second International Conference on Machine Learning}
}