Math search for the masses.


  • (Dec) The Alfred P. Sloan foundation has funded a small grant in support of completing the implementation of MathSeer within CiteSeerX.
  • (Nov) Behrooz Mansouri has successfully defended his PhD dissertation proposal on improving use of surrounding context in formula search, and related work in math question answer retrieval.
  • (Oct) A cleaner and faster version of SymbolScraper is now available:
  • (Sept) The first version of the MathSeer extraction pipeline, which locates and recognizes formulas in PDF documents is now available through GitLab, just in time for Ayush K. Shah and Abhisek Dey's talk and poster presentations at GREC 2021 and ICDAR 2021.
  • (June - July ) The ARQMath-2 math-aware search task at CLEF 2021 is running! Our thanks to Behrooz Mansouri who has led the assessment effort, and managed the data collection and associated tools.
  • (May) MathDeck was demonstrated in an interactivity session at CHI 2021. Congratulations to Yancarlos Diaz, Gavin Nishizawa, and Behrooz Mansouri, the student authors on the paper.
  • (May) Yancarlos Diaz and Robin Avenoso have successfully defended their MSc theses!  Their thesis documents are available here.
  • (May) Shaurya Rohatgi is completing an internship at Allen AI this summer.
  • (April) Behrooz Mansouri's paper on the first learning-to-rank model for formula search has been accepted for publication at SIGIR 2021, and he will be participating in the SIGIR 2021 Doctoral Consortium.
  • (April) Ayush Kumar Shah and Abhisek Dey's paper on the MathSeer formula extraction and evaluation pipeline has been accepted for publication at ICDAR 2021.
  • (April) Prof. Zanibbi gave a seminar talk on the MathSeer project in the Boise State Computing PhD seminar series.
  • (Feb) ARQMath was presented at ECIR 2021 by Behrooz Mansouri (video).

MathDeck @ CHI 2021

An updated version of our math-aware search interface with multimodal LaTeX editing (MathDeck) was demonstrated at CHI 2021 (demo video).
Note: MathDeck works best with Google Chrome. 

ARQMath-3 / 2022 

The third Answer Retrieval for Questions on Math (ARQMath -- Overview Video) Lab will be run for CLEF 2022. Visit the task web page and twitter page for more information. ARQMath contains tasks for mathematical question answer retrieval, math formula retrieval, and a possible pilot task on open-domain question answering.


We are creating a system to make finding mathematical information easier. We want students of all ages and the general public to be able to quickly lookup unfamiliar symbols, and see how formulas are defined, used, and analyzed in online resources like Wikipedia, Math StackExchange, and technical document collections such as CiteSeerX. 

These technologies will also be useful for math experts, and for exploring how math is used within and across disciplines. For example, a mathematician studying graph theory could use our system to find related applications in physics, ecology, and social networks.  

Research Goals

To be successful, we need to create innovative search engines, interfaces, and algorithms for extracting and recognizing math. Here are the research topics we are currently working on: 
  • How people search for math online
  • Search interfaces with easy formula authoring, easy inclusion of math in queries, and that present search results so they are easily read, organized, and reused 
  • Indexing and search techniques for individual formulas
  • Indexing and search techniques for document collections that contain both text and math, with support for queries that combine keywords and formulas
  • Fast and accurate recognition of math in handwriting and images
  • Fast and accurate extraction of math from web pages and technical documents (including PDF files, which do not represent the locations or content of formulas)

Related Activities

  • ARQMath. The Answer Retrieval for Questions on Math task is being run for a third time as part of CLEF 2022. See the ARQMath web page and Twitter page for updates. Registration for ARQMath-3 is open.
  • CROHME + TFD 2019 Competition.  In 2019 we again ran an international competition organized around data and evaluation tools concerned with advancing the state-of-the-art in handwritten formula recognition, and detecting formulas in document images. Mashad Mahdavi and Richard Zanibbi co-organized the competition along with Harold Mouchère (Univ. Nantes, France) and Utpal Garain (ISI, India).  The ICDAR paper on the competition is available.

The MathSeer Team

MathSeer is being developed through a collaboration of students and faculty at the Document and Pattern Recognition Lab at the Rochester Institute of Technology, and the Intelligent Information Systems Research Laboratory at PennState, along with faculty from the RIT Math department and the Computational Linguistics and Information Processing Lab at the University of Maryland, College Park. 

Our multi-disciplinary team includes recognized experts in Information Retrieval, Pattern Recognition, Mathematics, and Math Education. Additional information may be found on the Members page.  


The MathSeer project is made possible through research grants from the Alfred P. Sloan Foundation and the National Science Foundation (USA). All materials on this website reflect the work and opinions of the project team, and not the Alfred P. Sloan Foundation or the NSF. 

MathSeer is supported by the Alfred P. Sloan Foundation and the National Science Foundation