SymbolScraper (by Ritvik Joshi, Parag Mali, Puneeth Kukkadapu, and Mahshad Mahdavi, 2019). An extension of Apache PDFBox that reports character and symbol codes along with their precise bounding box locations in PDF files.
Formula Search Engines
Approach0 (Wei Zhong, 2019). Search engine that uses operator trees representing the mathematical operations in a formula for retrieval. Retrieval is done using paths of varying lengths from the leaves (operands) to the root of an operator tree.
Tangent-v (ECIR 2019 version) (Kenny Davila, 2019). A version of Tangent-v created for visual formula search in .png (raster) and .pdf (vector) formats. Search results from Kenny and Ritvik's ECIR 2019 paper are included in the package.
Tangent-v (Kenny Davila, 2018). A visual formula (and more generally, graphics) search engine. This system searches for formulas based on the appearance of their constitutent symbols and relative spatial positions in line-of-sight graphs. Tangent-v has been successfully applied to formulas in raster images, vector images (e.g., PDF), and to search formulas in lecture videos using rendered LaTeX formula queries.
Tangent-s (Kenny Davila, Richard Zanibbi, Andrew Kane, and Frank Wm. Tompa, 2017). Search engine using a combination of operator trees and layout trees representing formula appearance. Both formula appearance and semantics are queried using pairs of symbols and their relative paths in each type of tree, and formulas appearance and semantics search results are then combined before returning final results.
Search Interfaces and Search Engines
MathSeer Formula Editor and Search Interface (Gavin Nishizawa, Yancarlos Diaz, and Wei Zong). The front-end for the MathSeer system, with support for math input using handwriting, formula images, and LaTeX. MathSeer provides an innovative user interface supporting easy saving and re-use of formulas and parts of formulas. (First release planned for Summer 2019)
A number of formula search engines are currently in development...stay tuned.
Math Formula Parser (by Mahshad Mahdavi, Michael Condon, and Kenny Davila). Python-based system with modules for recognizing formulas in handwritten strokes or images. (Release planned for Summer 2019)