Queries may be processed using either text, math, or both text and math
Manual and automatic runs will be collected. See the 'Guidelines' document in the corpus for details.
Top-k hits from participants (e.g., top-20) plus hits from additional manual runs by the organizers will be pooled.
Most topics will be assessed once, some doubly-assessed to check agreement. Assessors will include volunteers from teams, along with hired assessors.
During assessment, we propose organizing topics into three sets: (1) all topics, (2) topics where the text seems to characterize the topic on its own, and (3) topics where the formula(s) seem to characterize the topic on their own.
(Mar 23) Update: we will be using nDCG' (i.e., nDCG after ignoring unevaluated hits) to promote fair comparison with systems that do not participate in the task.