INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations
ICML 2024 (Spotlight)
- Lirui Luo
- Guoxi Zhang
- Hongming Xu
- Yaodong Yang
- Cong Fang
- Qing Li
Abstract
Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. For tasks with visual observations, NS-RL entails structured representations for states, but previous algorithms are unable to refine the structured states with reward signals due to a lack of efficiency. Accessibility is also an issue, as extensive domain knowledge is required to interpret current symbolic policies. In this paper, we present a framework that is capable of learning structured states and symbolic policies simultaneously, whose key idea is to overcome the efficiency bottleneck by distilling vision foundation models into a scalable perception module. Moreover, we design a pipeline that uses large language models to generate concise and readable language explanations for policies and decisions. In experiments on nine Atari tasks, our approach demonstrates substantial performance gains over existing NSRL methods. We also showcase explanations for policies and decisions.
Approach Overview
INSIGHT consists of three components: a perception module, a policy learning module, and a policy explanation module. The perception module learns to predict object coordinates using a frame-symbol dataset distilled from vision foundation models. The policy learning module is responsible for learning coordinate-based symbolic policies. In particular, to address with the limited expressiveness of object coordinates, it uses a neural actor to interact with the environment. The policy explanation module can generate policy interpretations and decision explanations using task description, policy description, and values of object coordinates.
Refer to our paper for Table 1, Table 2, and Table 4, we quantitatively demonstrate INSIGHT's improvements in return and predicting object-related coordinates. We include demos for segmentation and interpretation here.
The Impact of Policy Learning on Segmentation
Here are the segmentation results for nine Atari games, before and after policy learning. It has been observed that the accuracy of policy-irrelevant objects decreases, whereas the accuracy of policy-related objects increases.
Videos before and after policy learing.
Freeway
Seaquest
BeamRider
Breakout
Enduro
MsPacman
Pong
Qbert
SpaceInvaders
Policy illustration
Here is an example for language explanation for Pong. Left: interpretations for a learned policy. The interpretations identify influential input variables and summarize triggering patterns of actions. Right: explanations for an action taken at a state. The four images located at the bottom illustrate the state. The motion of the ball and the opponent's paddle are deduced from input variables, which are used for supporting explanations of actions.
You can view all prompts and LLM responses in the two buttons below.
BibTex
@article{luo2024insight,
title={INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations},
author={Luo, Lirui and Zhang, Guoxi and Xu, Hongming and Yang, Yaodong and Fang, Cong and Li, Qing},
journal={ICML},
year={2024}
}