INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations

ICML 2024 (Spotlight)

Abstract

Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. For tasks with visual observations, NS-RL entails structured representations for states, but previous algorithms are unable to refine the structured states with reward signals due to a lack of efficiency. Accessibility is also an issue, as extensive domain knowledge is required to interpret current symbolic policies. In this paper, we present a framework that is capable of learning structured states and symbolic policies simultaneously, whose key idea is to overcome the efficiency bottleneck by distilling vision foundation models into a scalable perception module. Moreover, we design a pipeline that uses large language models to generate concise and readable language explanations for policies and decisions. In experiments on nine Atari tasks, our approach demonstrates substantial performance gains over existing NSRL methods. We also showcase explanations for policies and decisions.

Approach Overview

INSIGHT consists of three components: a perception module, a policy learning module, and a policy explanation module. The perception module learns to predict object coordinates using a frame-symbol dataset distilled from vision foundation models. The policy learning module is responsible for learning coordinate-based symbolic policies. In particular, to address with the limited expressiveness of object coordinates, it uses a neural actor to interact with the environment. The policy explanation module can generate policy interpretations and decision explanations using task description, policy description, and values of object coordinates.

Refer to our paper for Table 1, Table 2, and Table 4, we quantitatively demonstrate INSIGHT's improvements in return and predicting object-related coordinates. We include demos for segmentation and interpretation here.

The Impact of Policy Learning on Segmentation

Here are the segmentation results for nine Atari games, before and after policy learning. It has been observed that the accuracy of policy-irrelevant objects decreases, whereas the accuracy of policy-related objects increases.

Videos before and after policy learing.

Freeway

Seaquest

BeamRider

Breakout

Enduro

MsPacman

Pong

Qbert

SpaceInvaders

Policy illustration

Here is an example for language explanation for Pong. Left: interpretations for a learned policy. The interpretations identify influential input variables and summarize triggering patterns of actions. Right: explanations for an action taken at a state. The four images located at the bottom illustrate the state. The motion of the ball and the opponent's paddle are deduced from input variables, which are used for supporting explanations of actions.

You can view all prompts and LLM responses in the two buttons below.

BibTex


@article{luo2024insight,
    title={INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations},
    author={Luo, Lirui and Zhang, Guoxi and Xu, Hongming and Yang, Yaodong and Fang, Cong and Li, Qing},
    journal={ICML},
    year={2024}
    }