Dataset Download
Please refer to the details at the Dataset page.
Submission
The challenge is hosted at the EvalAI. Please prepare your results as described here and go to the challenge page to submit.
Evaluation Metrics
The primary evaluation metric for REVERIE is Remote Grounding Success rate weighted by Path Length (RGSPL). We also adopt four auxilary metrics to evaluate navigation performance so as to help diagnose performance bottleneck on navigation and visual grounding. Please note that these navigation metrics are slightly different from those in VLN. We reserve the right to use additional metrics to choose winners in case of statistically insignificant SPL differences.
Remote Grounding Success rate (RGS): It is the number of successful tasks over the total of tasks. A task is considered successful if the predicted object ID is the same as the ground truth.
Remote Grounding Success rate weighted by navigation Path Length (RGSPL): It trades-off RGS against path length.
Navigation Length (Nav-Length): Navigation path length in meters.
Navigation Success rate (Nav-Succ): A navigation is considered successful only if the target object can be observed at the stop viewpoint.
Navigation Oracle Success rate (Nav-OSucc): A navigation is considered oracle successful if the target object can be observed at one of its passed viewpoints.
Navigation Success rate weighted by Path Length (Nav-SPL): It is the navigation success weighted by the length of navigation path (see mathmathical definition here).
Requirements
1. Participants should stick to the definition of training, validation and test partition in order to have a fair comparison of different approaches. Note that additional dataset can be used to train your model as long as it has no overlap with the our test split.
2. The Challenge is a team-based contest. Each team can have one or more members, and an individual cannot be a member of multiple teams.
3. Each team can make at most five submissions on test partition and the highest score is finally adopted. You can use val seen or val unseen partitions to test your submission format (10 trials per day). Our code also includes evaluation for these two splits.
4. At the end of the Challenge, all teams will be ranked based on the evaluation described above. The top teams will receive award certificates.
Baseline and Code
The baseline codes and models are released here.