Answer Retrieval for Questions on Math

Update: Overview and Team Papers now available!
Join us Sept. 25th at CLEF (click for schedule)

ARQMath is a co-operative evaluation exercise intended to advance math-aware search and the semantic analysis of mathematical notation and texts. ARQMath is being run for CLEF 2020

A description of ARQMath is provided in an ECIR 2020 Task Paper (presentation slides include updates). ARQMath has two tasks, one for answer retrieval, and one for formula retrieval.

Registration is now closed (CLEF registration site), but feel free to join our discusions on the ARQMath forum

Follow Us on Twitter (#ARQMath):


Tasks. There are two main tasks (see the individual task pages for details):
    Task 1 - Find Answers
    Task 2 - Formula Search

Collection. The test collection is built from Mathematics Stack Exchange, an online Question Answering (QA) site. There are approximately 1.1 million questions on the forum. Please see the Guidelines document and README files in the collection for additional details.
Formula Representations. We provide three encodings for formulas. The first two encodings represent formula appearance by the positions of symbols on writing lines (LaTeX and Presentation MathML), while the third encoding represents formula semantics by operators, arguments, and the order of operations (Content MathML). Both appearance and semantic encodings are represented using trees: Symbol Layout Trees (SLTs) for appearance, and Operator Trees (OPTs) for formula semantics.

Task Overview Paper (please cite this paper if you use the collection). Mansouri, B., Agarwal, A., Oard, D., and Zanibbi, R. (2020) Finding Old Answers to New Math Questions: The ARQMath Lab at CLEF 2020. Proc. European Conference on Information Retrieval (ECIR), LNCS Vol. 12036, pp. 564-571, Springer.

Some older task materials (Please note: the tasks have evolved):


  • (Aug 14) The updated task overview paper has been posted. Registration is now open for CLEF 2020 (link above), please join us!
  • (June 5) Submission deadline extended to Monday June 8th, 11:59pm (AoE). This is a hard deadline, and will not be moved forward.
  • (May 11) Data Set and Guidelines Updated.  An issue with formula region tagging has been addressed in an updated version of the dataset (v1.1). The guidelines have been updated to make clear that answer posts must be returned for Task 1, and formulas in question or answer posts (but not in comments or other regions) may be returned for Task 2.
  • (Apr 16) Final push for participants - please join us! Link is above.
  • (Apr 16) Guidelines Document. A guidelines document, summarizing task resources, evaluation protocols, and requirements for submitted runs is now available with the data in on Google Drive.
  • (Apr 11) Submission Deadline Update: Submissions are now open until June 5th. Please note that this will be an enforced hard deadline.
  • (Apr 11) Topics for Tasks 1 and 2 are now available. Other Data Now Available: formula index has been updated and now includes Content and Presenation MathML representations; HTML files for showing topics in Task 1/2 along with threads for the collection are now available; a document summarizing the evaluation protocol and result formats for Topics 1 and 2 is also available.
  • (Mar 23) Topics for Task 1 (answer retrieval) and the LaTeX formula index are now available. We will release the formula indices in operator tree (Content MML) and layout tree (Presentation MML) soon. Topics for Task 2 will be released soon.
  • (Mar 11) Other updates:
    • Topic Release: We plan to release topics for Task1 (answer retrieval) and Task 2 (formula retrieval) by Friday, March 20th.
    • For those seeking training data for formula search/similarity, you may wish to look at the NTCIR-12 Wikipedia Formula Browsing Task.  Relevance judgement data must be obtained with permission from NTCIR (please see details online).
    • An example system that supports math-aware search in Math Stack Exchange is Approach0. Please note that this version of the system does not index the specific collection used for ARQMath.
    • Task 2 goals/evaluation defined.  Query formulas will come from question posts in Math Stack Exchange, and formula search topics will provide both a formula with its associated post.  Retrieved formulas will be evaluated based on their interpretation within the posts where they appear. For each unique formula in the evaluation pool, we will sample posts where the formula appears, and ask assessors to evaluate the formula's relevance separately for each post. We plan to use the maximum score for each retrieved formula in ranking, but will make all assessments available for future study and training future systems.
    • Ranking metrics: For both Tasks 1 and 2 we will use nDCG' to rank systems. nDCG' is simply nDCG (normalized Discounted Cumulative Gain) after removing unevaluated hits. This will allow systems that do not participate in ARQMath to be evaluated fairly. For the task write-up, additional metrics will be used to analyze results. 
    • The evaluation framework for both tasks has now been implemented in Turkle (by Harman and Costello at Johns Hopkins). We are in the process of testing the system. Relevance assessments will be made using a 4-level likert scale (0: non-relevant, 1: low relevance, 2: medium relevance, 3: high relevance).
  • (Nov 2) A document providing more detailed plans for the ARQMath lab is now available.
  • (Nov 1) The Google Group/mailing list for the lab has been created (ARQMath Lab). To register for the lab, please join the forum, and then submit an email to the "Registration" topic thread.
  • (Nov 1) Registration information will also be available soon; this will be managed through our Google Group (see link above).
  • Doug Oard (one of the task organizers) and his student Han-Chin Shing are attending CLEF 2019, where they will present a poster summarizing the lab, and look to recruit participants and collect ideas and feedback.
  • The ARQMath CLEF lab proposal is available. While the proposal defines major directions for the task, details are still up for discussion - please do get in touch if you have questions or suggestions.
  • If you are interested in participating in the task, or have questions or feedback, please send email to

Key Dates

  • Nov. 5, 2019: Registration opens (through the ARQMath forum
  • Nov. 1 Dec. 3, 2019: Release of data and sample queries
  • Jan. 15 Mar. 23 (Topic 1), Apr. 11, 2020 (Topic 2): Test queries released
  • May 1 June 5 June 8, 2020: Submissions close
  • July 5, 2020:  Task paper draft and results shared with participants
  • July 17, 2020: Participant papers due
  • Aug 15 Aug 28, 2020: Final lab report submitted
  • Sept. 22-25, 2020: CLEF 2020 (virtual)


Richard Zanibbi (
Department of Computer Science
Rochester Institute of Technology
Behrooz Mansouri (
Department of Computer Science
Rochester Institute of Technology

Douglas W. Oard (
College of Information Studies, and
Institute for Advanced Computer Studies (UMIACS)   
University of Maryland, College Park
Wei Zhong (
Department of Computer Science
Rochester Institute of Technology

Anurag Agarwal (
School of Mathematical Sciences
Rochester Institute of Technology


ARQMath is supported in part by the National Science Foundation (NSF) USA.

Answer Retrieval for Questions on Math