I am a Professor of Computer Science at RIT, where I direct the Document and Pattern Recognition Lab.  I hold a PhD and Master's in Computer Science, a BA with a minor in Computer Science, and a Bachelor of Music degree, all from Queen's University, Canada.

My research interests include information retrieval, document recognition, pattern recognition, and machine learning. I was a Program Co-Chair for ICDAR 2023, and I previously chaired the ICFHR 2018, DRR 2012, and DRR 2013 conferences. I also serve on program committees for information retrieval conferences (e.g., SIGIR, and the new SIGIR-AP conference).

At RIT I teach courses on Information Retrieval (grad/undergrad) and Machine Learning (undergrad). I am also the head of the AI Cluster within the Computer Science department. 

Please click on the links above for more information regarding my research, teaching, software and data from the dprl, and resources for students. Some recent news is included below.

ICDAR 2023 Group Photo

Group Photo, ICDAR 2023 (August, San Jose). My thanks to everyone who helped make this a fun and productive meeting. It was an honor serving on the Program Committee alongside Gernot Fink, Rajiv Jain, and Koichi Kise.


News

  • (Jan 2024) Three students from my IR class, Ben Giacalone, Greg Paiement, and Quinn Tucker have published an interesting paper on the role of the [MASK] tokens in the ColBERT retrieval model, which will be presented at the European Conference on Information Retrieval (ECIR) this March in Glasgow, Scotland.
  • (Nov, 2023)  I have posted a small python debugging library on GitLab that was created for my classes and the dprl lab. The library is organized around pretty-printed debug checks/tests with descriptive messages. I've  called it the Message-Oriented Debugging Library for Python (msg_debug). It avoids the need to repeatedly add/remove print, input, and assert statements to check values and types, and provides functions to record and report execution times when our program requirements keep changing, and bugs abound.
  • (Nov, 2023) A paper describing the dprl's ChemScraper parser for molecular diagrams in PDF drawing instructions ('born-digital') is available on arXiv here. The system can also generate annotated training data for visual parsers that recognize raster images (i.e., pixel-based, such as PNG). A link to associated code is provided in a footnote. Congratulations to Ayush Kumar Shah, Bryan Manrique Amador, Abhisek Dey, Ming Creekmore, and Blake Ocampo (PhD candidate, UIUC Dept. of Chemistry) on a job well done.
  • (Sept, 2023) Congratulations to former dprl PhD student Wei Zhong, who successfully defended his dissertation on math-aware search at the University of Waterloo (advisor: Jimmy Lin). Wei had to switch schools and countries due to visa restrictions during COVID. This past summer summer he also worked as a research intern at Microsoft research.
  • (Aug 24, 2023) I gave the keynote talk at GREC 2023 in San Jose, which was held as part of ICDAR 2023.  The talk was an overview of MathDeck and related work in math formula recognition and search. My thanks to everyone who attended, it was a very good experience!
  • (July 11, 2023) A demonstration paper for the MathDeck system searching text and formulas in PDF files from the ACL Anthology will appear at SIGIR 2023. Bryan Amador, Matt Langsenkamp, Abhisek Dey, and Ayush Kumar Shah created the demo. A number of past DPRL students made important contributions as part of the MathSeer project. Our poster and demonstration video illustrate math+text search, formula editing, and formula reuse and annotation in MathDeck.
  • (June 27, 2023) The first (rough, in-progress) version of the ChemScraper tool developed by the dprl, Denmark Lab, and NCSA has been released. There are plans to regularly update the tool moving forward; look for updates in the coming months, and more broadly in the larger AlphaSynthesis platform that ChemScraper is a part of. This work is being done through the MMLI, an NSF-funded AI Center. 
  • (April 19, 2023)  Ayush Kumar Shah published a paper on an improved math formula parsing model using line-of-sight graphs  (the Line-of-sight with Graph Attention Parser (LGAP), previously named QD-GGA).  The paper will be presented at ICDAR 2023.
  • (Oct 27, 2022) I give a talk for the Topos Institute on Mathematical Information Retrieval. The Topos colloquium series, with a YouTube link and slides from the talk are online here: https://topos.site/topos-colloquium. Direct link to the YouTube video is located here.

Richard Zanibbi's Home Page (RIT)