Mansouri, B., Zanibbi, R., Oard, D.W., and Agarwal, A.

Visually distinct formula groups are computed using their Tangent-S Symbol Layout Tree (SLT) representations, falling back to LaTeX strings where SLT construction fails. See the previous task overview papers for more details. Our thanks to Frank Tompa for suggesting including formula appearance groups in the provided index files.

<**span** class="math-container" id="844">...<**/span**>

The ARQMath collection itself is provided as a set of XML files from the Internet Archive, along with **HTML thread files **and **formula index files.** In addition, two tools were created in order to generate viewable question threads, and to generate formula index files.

**Formula index**- LaTeX formulas are assigned identifiers, and a separate TSV file with a formula index is produced. The second part of this process converts formulas from LaTeX to Presentation MathML and Content MathML, and finally new TSV formula index files are created for the MathML representations.**HTML question thread files -**Viewable question threads generated from the XML post data. These are intended for use by participants for study/checking, and are also used during assessment.

The following two GitHub repositories are used to (1) **ARQMathCode**: generate the LaTeX formula index and HTML thread files, and provides scripts for creating the MathML TSV index files using LaTeXML (v. 0.8.5); and (2) **MIR-MU ARQMath data processing**: converts LaTeX formulas to Presentation MathML and Content MathML.

The GitHub repository for these two software tools are available here:

The GitHub repository for these two software tools are available here: