Tangent-S Math Formula Search Engine Paper to Appear at SIGIR 2017
PhD student Kenny Davila and Dr. Richard Zanibbi from the DPRL Lab
(http://www.cs.rit.edu/~dprl) had a short paper describing the Tangent-S math
formula search engine accepted for poster presentation at the 40th
International ACM SIGIR Conference on Research and Development in Information
Retrieval in Tokyo, Japan (http://sigir.org/sigir2017). The conference
received a record number of 398 short-paper submissions and accepted only 121
(30%) of them.
Tangent-S searches math formulae by appearance, semantics, or their
combination. Search is performed using a three-layer model: the first layer matches formulae by
pairs of symbols, the second layer re-ranks top candidates by best query
formula match, and the third layer re-ranks again using linear regression over
metrics for the best query formula match. Visual and semantic search results
can be combined at the third layer, using the query match metrics from both representations.
For the NTCIR-12 Wikipedia Formula Browsing task benchmark, each layer
increased ranking quality, with the combination of appearance and semantic
representations working best. The relatively simple model used in Tangent-S produces
high quality search results, and is able to search large formula collections in real-time.