Answer Retrieval for Questions on Math

[ News ]

  • Submission deadline has been extended to Monday May 11th (11:59pm AOE)
  • Updated metrics from ARQMath 2020 Task 2 available (below)
  • ARQMath 2021 topics available here (under Task 1 & Task 2 directories).
  • Behrooz Mansouri's ECIR 2021 talk is available here.
  • Slides from Feb. 12 ARQMath 2021 kick-off available here.
  • Tangent-s Task 1 & 2 baselines from ARQMath 2020 available from GitLab.
  • ARQMath article in the December 2020 issue of the SIGIR Forum.

ARQMath is a co-operative evaluation exercise aiming to advance math-aware search and the semantic analysis of mathematical notation and texts.  ARQMath has two tasks, one for answer retrieval, and one for formula retrieval. Datasets, relevance/training data (e.g., qrels and assessor data), and evaluation scripts may be found using the 'Datasets and Tools' button above. Over 70 annotated topics for each task are available. 

ARQMath is being run for the second time at CLEF 2021An overview paper (including results) from ARQMath 2020 is available along with participant papers in the CLEF 2020 working notes

  • Errata (v2): ARQMath 2020 Task 2 results updated.  Correction using visual formula identifiers produces larger scores for all metrics, without changes in system rankings. An updated version of the ARQMath 2020 paper is provided below, along with an explanation and illustration of the corrections made. 

For ARQMath 2021 there will be new topics, updated data (e.g., new formula indices), and tools (e.g., baseline systems for both tasks will be shared soon). Some updates to documentation in the provided datasets and tools are also forthcoming. 

All are welcome to join the lab's discussions on the ARQMath forum.  Announcements and updates are shared primarily through the forum. If you wish to join the forum, you will need to request being added if you are not already a member.

All those planning to participate in ARQMath 2021 should 1) join the forum, and 2) register for the lab through CLEF.


  • May 5, 2021
    • Submission deadline extended to May 11.
  • May 3, 2021
    • Updated Errata for Task 2 results in ARQMath 2020 overview paper.
  • Mar 31, 2021
    • ARQMath 2021 topics have been released.
  • Feb. 21, 2021
    • Slides from kickoff meeting on Feb. 11 posted.
    • Tangent-s baselines from ARQMath 2020 released.  
  • Feb. 5, 2021
    • ARQMath 2021 kickoff meeting will be held Feb. 11th.
  • Dec. 18, 2020
    • ARQMath 2021 web page released.
    • Participants should register through the CLEF 2021 registration site.
    • Errata for ARQMath 2020 Task 2 nDCG' values released (details above).
    • Updated formula index files and evaluation scripts are available for download.
    • Training data (i.e., relevance data for topics) are available from the 'Datasets and Tools' button above. This includes over 70 annotated topics for each task.

Lab Overview

Task Overview Paper (please cite if you use ARQMath data or tools):
Richard Zanibbi, Douglas W. Oard, Anurag Agarwal, Behrooz Mansouri: Overview of ARQMath 2020 (Updated Working Notes Version): CLEF Lab on Answer Retrieval for Questions on Math. CLEF (Working Notes) 2020.  Note: please see the errata and corrected paper provided above.

Tasks. There are two main tasks (see task pages below for details).
    Task 1 - Find Answers
    Task 2 - Formula Search

Collection. The test collection is built using data from Mathematics Stack Exchange, an online Question Answering (QA) site. There were approximately 1.1 million questions on the forum when the main collection was created. Please see the Guidelines document and README files in the collection for additional details.
Formula Representations. We provide three formula encodings. The first two encodings represent formula appearance by the positions of symbols on writing lines (LaTeX and Presentation MathML), while the third encoding represents formula syntax (i.e., the mathematical operations) by operators, arguments, and the order of operations (Content MathML). Both appearance and math syntax encodings are trees: Symbol Layout Trees (SLTs) for appearance, and Operator Trees (OPTs) for formula operation syntax.

Important Note: For Task 2, formulas are grouped by appearance into 'visually distinct' groups prior to assessment. For ARQMath 2021, we have pre-computed and enumerated these groups, and provided the group id for each formula in formula index files.  The groups are computed using their Tangent-S Symbol Layout Tree (SLT) representations,  falling back LaTeX strings where SLT construction fails. See the task paper for more details.
  • Our sincerest thanks to Vít Novotný and Deyan Ginev for help with LaTeXML that allowed us to increase the number of formulas available in MathML formats for the 2nd edition of the lab.
  • We also thank Frank Tompa for suggesting including formula appearance groups in the provided index files.

Key Dates (2021)

  • January  - Initial community meeting & task discussion (date TBD)
  • Feb 28 April 1 - New evaluation tasks released
  • May 7 May 11 - Last day to submit runs for ARQMath 2021
  • May 28 - Participant papers due
  • June 11 - 1) Notification of participant paper acceptance, 2) Lab overview (LNCS draft) provided to participants
  • June 25 - Camera Ready LNCS Overview Paper submitted
  • July 2 - Camera Ready Copy of Participant Papers and Overview Paper for CEUR-WS working notes
  • July 12-16 - CEUR-WS preview available for checking by participants and lab organizers


Richard Zanibbi (
Department of Computer Science
Rochester Institute of Technology
Behrooz Mansouri (
Department of Computer Science
Rochester Institute of Technology

Douglas W. Oard (
College of Information Studies, and
Institute for Advanced Computer Studies   
University of Maryland, College Park
Anurag Agarwal (
School of Mathematical Sciences
Rochester Institute of Technology


ARQMath is supported in part by the National Science Foundation (NSF) USA

ARQMath 2021 @ CLEF 2021