-----------------------------------------------------------------------
  README - InftyMCCDB-2 Data Set
  Document and Pattern Recognition Lab, RIT, USA

  Created by Michael Condon and Richard Zanibbi, 2017 
  README by Mahshad Mahdavi, 2019, Richard Zanibbi, 2021

  See Michael Condon's MSc thesis for more details: 
  https://www.cs.rit.edu/~dprl/files/Condon_MScThesis_2017.pdf

-----------------------------------------------------------------------

InftyMCCDB-2 dataset is a modified version of InftyCDB-2 which contains
mathematical expressions from scanned article pages. The original dataset
was created by the Dr. Masakazu Suzuki's group at Kyushu University (Japan).
We thank Dr. Suzuki for permitting us to distribute the data set.

The original dataset has 21,056 math expressions. We removed formulas with
matrices and grids, leaving 19,381 formulas. 

The dataset includes 213 symbol classes, and is split into two sets: training
(12551 images), and testing (6830 images) with approximately the same
distribution of symbol classes and relation classes. The expressions range in
size from a single symbol to more than 75 symbols, with an average of 7.33
symbols per expression. 

The original InftyCDB-2 provides ground truth at the symbol level. We extracted
connected component bounding boxes, and generated new ground truth for each
image using a labeled adjacency matrix (`label graph') representation.

The set of .lg (label graph) ground truth files are provided, along with a .png
image for each expression (i.e., input files).


Data:
  ids/                   lists of integer identifiers for all formulas, train set, test set
  png/                   directory of PNG formula images 
  train/                 training formulas in Object-Relation .lg format (12551)
  test/                  testing formulas in Object-Relation .lg format (6830)


Label Graphs and Evaluation Tools:

For a description of the label graph (.lg) format, and to evaluate .lg results,
use the LgEval library, available here:
  
  https://gitlab.com/dprl/lgeval
