- This is dataset of molecule images and their corresponding MOL files.

- The corresponding MOL file has the same image name.

- Images were obtained after scanning and automatically clipping structures from a molecule catalogue provided by http://www.maybridge.com

- Documents were scanned at resolution 600x600 dpi

- CAS numbers accompnaying structures were used to look up their corresponding InChI identifiers from online databases.

- There InChI's were converted into MOL files using openbabel

- The file name consists of 3 parts:
  image source name, page number and a random number
  for example: maybridge-0522-554070631:
     maybridge is the catalogue source
     522 is the page number where this image exists
     554070631 is a random unique identifier
  
- If you use this dataset in any published work please cite our paper:

@InProceedings{SaSeSo-DRR12-ChemicalStructureRecognition-ARuleBasedApproach,
  author =       {Sadawi, Noureddin M. and Sexton, Alan P. and Sorge, Volker},
  title =        {Chemical Structure Recognition: A Rule Based Approach},
  booktitle =    {19th Document Recognition and Retrieval Conference (DRR 2012)},
  year =         2012,
  editor =       {Christian Viard-Gaudin and Richard Zanibbi},
  month =        {January},
  doi =          {10.1117/12.912185},
  publisher =    {SPIE},  
}
  
- For any feedback, suggestions or info please email us on:
  nms@cs.bham.ac.uk
