Task Description

demo

The REVERIE task requires an intelligent agent to correctly localise a remote target object (can not be observed at the starting location) specified by a concise high-level natural language instruction. Since the target object is in a different location from the starting one, the agent needs first to navigate to the goal location. When the agent determines to stop, it should select one object from a list of candidates provided by the simulator. The agent can attempt to localise the target at any step, which is totally up to algorithm design. But we only allow the agent output once in each episode, which means the agent only can guess the answer once in a single run. Please note that the interaction, such as 'check', with the target object is not required.

Challenge Guidelines


  • Dataset Download
  • Download the new version of data from here.

  • Channels
  • Channel 1: using our referring expression grounding model. See here for more details.

    Channel 2: using your own refeering expression grounding model. See the baseline example.



  • Submission
  • Send your results on test split to reverie.challenge@gmail.com, and we will evaluate the results for you. You can submit only 5 times totally for the challenge (your best result will be considered for the competition). We provide an evaluation script here for yourself evaluation on val_seen and val_unseen splits. When sending us the result file, please include the following information in the email body: team name, team member. Using suffix "_ch1" in the file name to indicate the channel using our referrfing expression grounding model, and "_ch2" for your own grounding model. The technical report should be sent before the submission deadline.

  • Evaluation Metrics
  • The primary evaluation metric for REVERIE is Remote Grounding Success rate weighted by Path Length (RGSPL). We also adopt four auxilary metrics to evaluate navigation performance so as to help diagnose performance bottleneck on navigation and visual grounding. Please note that these navigation metrics are slightly different from those in VLN. We reserve the right to use additional metrics to choose winners in case of using different input data, statistically insignificant RGSPL differences, etc.

    Remote Grounding Success rate (RGS): It is the number of successful tasks over the total of tasks. A task is considered successful if the predicted object ID is the same as the ground truth.


    Remote Grounding Success rate weighted by navigation Path Length (RGSPL): It trades-off RGS against path length.

    Navigation Length (Nav-Length): Navigation path length in meters.


    Navigation Success rate (Nav-Succ): A navigation is considered successful only if the target object can be observed at the stop viewpoint.

    Navigation Oracle Success rate (Nav-OSucc): A navigation is considered oracle successful if the target object can be observed at one of its passed viewpoints.


    Navigation Success rate weighted by Path Length (Nav-SPL): It is the navigation success weighted by the length of navigation path (see mathmathical definition here).

  • Requirements
  • Besides the rules stated on the home page, below are some additional rules:

    1. Participants should stick to the definition of training, validation and test partition in order to have a fair comparison of different approaches. Note that additional dataset can be used to train your model but we may take this into consideration when choosing winners.


    2. Each team can make at most five submissions on test partition and the highest score is finally adopted. The demo code also includes evaluation on validation splits.

    3. At the end of the Challenge, all teams will be ranked based on the evaluation described above. The top teams will receive award certificates.



  • How to start?
  • If this is a new task for you, we recommend to first read this paper for details. Then you can start from several existing works, such as: Baseline (HOP), RecurrentBERT, ORIST, CKR, AirBERT, etc. Note that all these works (except the baseline) use the original data instead of the new data released in this year challenge.

    Finally, the REVERIE task is based on Matterport 3D dataset and its simulator, you can download and build the running environment as described here.