Answer Retrieval for Questions on Math

New: ARQMath dataset released!
Please join us on the ARQMath Forum (Google Groups) 

ARQMath is a co-operative evaluation exercise intended to advance math-aware search and the semantic analysis of mathematical notation and texts. ARQMath is being run for CLEF 2020.

Descriptions of the lab are available in the task overview and our poster from CLEF 2019. A shorter overview is also provided below. If you would like to register, and/or join in the lab discusions, please join the ARQMath forum

Follow Us on Twitter (#ARQMath):


  • (Nov 2) A document providing more detailed plans for the ARQMath lab is now available.
  • (Nov 1) The Google Group/mailing list for the lab has been created (ARQMath Lab). To register for the lab, please join the forum, and then submit an email to the "Registration" topic thread.
  • (Nov 1) Registration information will also be available soon; this will be managed through our Google Group (see link above).
  • Doug Oard (one of the task organizers) and his student Han-Chin Shing are attending CLEF 2019, where they will present a poster summarizing the lab, and look to recruit participants and collect ideas and feedback.
  • The ARQMath CLEF lab proposal is available. While the proposal defines major directions for the task, details are still up for discussion - please do get in touch if you have questions or suggestions.
  • If you are interested in participating in the task, or have questions or feedback, please send email to


Tasks. There are two main tasks (see the individual task pages for details):
    Task 1 - Find Answers. Given a posted question as a query, search all answer posts and return relevant answers.
    Task 2 - Formula Search. Given a formula taken from a posted question, search formulas in question and answer posts, and return relevant formulas.

Collection. The test collection is being created from Mathematics Stack Exchange, an online Question Answering (QA) site. There are approximately 1.1 million questions on the forum.  

Formula Representations. We will use three encodings to represent formulas. The first two encodings represent formula appearance by the positions of symbols on writing lines (LaTeX and Presentation MathML), while the third encoding represents formula semantics in terms of operators, arguments, and order of operations (Content MathML). Both appearance and semantic encodings are represented using trees: Symbol Layout Trees (SLTs) for appearance, and Operator Trees (OPTs) for formula semantics.

Key Dates

  • Nov. 5, 2019: Registration opens (through the ARQMath forum
  • Nov. 1 Dec. 3, 2019: Release of data and sample queries
  • Jan. 15, 2020: Test queries released
  • May 1, 2020: Submissions close
  • Aug 15, 2020: Final lab report
  • Sept. 22-25, 2020: CLEF 2020


Richard Zanibbi (
Department of Computer Science
Rochester Institute of Technology
Behrooz Mansouri (
Department of Computer Science
Rochester Institute of Technology

Douglas W. Oard (
College of Information Studies, and
Institute for Advanced Computer Studies (UMIACS)   
University of Maryland, College Park
Wei Zhong (
Department of Computer Science
Rochester Institute of Technology

Anurag Agarwal (
School of Mathematical Sciences
Rochester Institute of Technology


ARQMath is supported in part by the National Science Foundation (NSF) USA.

Answer Retrieval for Questions on Math