Task 1. Online Handwritten Formula Recognition

For the traditional task in CROHME, participants must convert a list of handwritten strokes captured as a list of polylines from a tablet or similar devices to a Symbol Layout Tree (SLT). This SLT captures the segmentation of strokes into symbols, symbol classification, and the spatial relationships between symbols. SLTs are represented using labeled directed graphs, so that all segmentation, classification, and relationship (parsing) errors can be automatically identified and compiled using tools developed for CROHME (CROHMELib and LgEval).Participating systems are ranked based on the number of correctly recognized formulas (expression rate).

In subtask a - isolated math symbol recognition - the isolated symbols extracted from the full expression data set will be used. Symbols (stroke groups) in the groundtruth annotations have unique identifiers that allow the location of each symbol in a formula to be identified. In subtask b we allow participants to parse formulas from provided symbols using the same set of training and test expressions. Awards will not be given out for the subtasks.

Task 2. Offline Handwritten Formula Recognition

For offline recognition of handwritten inputs, we will render images from the (x,y) points in the CROHME InkML files. As in the previous task, for a given test image, participating systems must produce one .lg file. Please notice since primitive level information (connected components) is not provided, we evaluate the systems based on the correct symbols and correct relation between the symbols (symbolic evaluation). Systems can produce a LaTeX string or Presentation MathML tree as output. LaTeX and MathML should be converted to symbolic LG for evaluation using provided toolsi (tex2symlg and mml2symlg). There are also tools to convert .lg files to symbolic label graphs (lg2symlg) for interested participants (although they will be defining their own ‘stroke’ primitives in that case).

For evaluation, we will use the same evaluation tools as in online recognition tasks, only ignoring CC (connected components) segmentation and the correspondence of CCs to symbols with this new format in symLGs. Again, participants will be ranked by the expression rate of their system.

Task 3. Detection of Formulas in Document Pages

In this task, for a given document page, participating systems identify the location of formulas using bounding boxes. Evaluation will be done by calculating the intersection over union (IoU) with the groundtruth annotations. We will use thresholds of 50% and 75% to observe coarse and fine detection of formula regions. Participants will also have the option to use character level information, but they also have to submit the final math regions for IoU calculation where regions are defined by the characters that they detected as math characters. This reflects how detection of math regions in born-digital documents (e.g., PDFs generated using a word processor) would be performed when characters are available.

We will give two awards, 1) for the system with the best offline performance (F-measure on math boxes (IoU threshold=0.75), and 2) for the system with the best performance from characters (computed based on an F-measure for formula detection using characters as input).