REVERIE Challenge

Task Description

The REVERIE task requires an intelligent agent to correctly localise a remote target object (can not be observed at the starting location) specified by a concise high-level natural language instruction, as shown by the demo above. Since the target object is in a different location from the starting one, the agent needs first to navigate to the goal location. When the agent determines to stop, it should select one object from a list of candidates provided by the simulator. The agent can attempt to localise the target at any step, which is totally up to algorithm design. But we only allow the agent output once in each episode, which means the agent only can guess the answer once in a single run. Please note that the interaction, such as 'check', with the target object is not required.

Challenge Guidelines

Dataset Download

Please refer to the details at the Dataset page.

Submission

The challenge is hosted at the EvalAI. Please prepare your results as described here and go to the challenge page to submit.

Evaluation Metrics

The primary evaluation metric for REVERIE is Remote Grounding Success rate weighted by Path Length (RGSPL). We also adopt four auxilary metrics to evaluate navigation performance so as to help diagnose performance bottleneck on navigation and visual grounding. Please note that these navigation metrics are slightly different from those in VLN. We reserve the right to use additional metrics to choose winners in case of statistically insignificant SPL differences.

Remote Grounding Success rate (RGS): It is the number of successful tasks over the total of tasks. A task is considered successful if the predicted object ID is the same as the ground truth.

Remote Grounding Success rate weighted by navigation Path Length (RGSPL): It trades-off RGS against path length.

Navigation Length (Nav-Length): Navigation path length in meters.

Navigation Success rate (Nav-Succ): A navigation is considered successful only if the target object can be observed at the stop viewpoint.

Navigation Oracle Success rate (Nav-OSucc): A navigation is considered oracle successful if the target object can be observed at one of its passed viewpoints.

Navigation Success rate weighted by Path Length (Nav-SPL): It is the navigation success weighted by the length of navigation path (see mathmathical definition here).

Requirements

1. Participants should stick to the definition of training, validation and test partition in order to have a fair comparison of different approaches. Note that additional dataset can be used to train your model as long as it has no overlap with the our test split.

2. The Challenge is a team-based contest. Each team can have one or more members, and an individual cannot be a member of multiple teams.

3. Each team can make at most five submissions on test partition and the highest score is finally adopted. You can use val seen or val unseen partitions to test your submission format (10 trials per day). Our code also includes evaluation for these two splits.

4. At the end of the Challenge, all teams will be ranked based on the evaluation described above. The top teams will receive award certificates.

Baseline and Code

The baseline codes and models are released here.