Valuable chemical information is often stored in scientific or patent literature as unstructured text and as images of structures and reactions. Imagine being able to abstract automatically all chemically relevant terms and structure images from full text articles and to be able to transform them in computer-readable structures that can be stored in databases. Imagine being able to query this data via a structure/substructure search and to access immediately the original source of the information.

Abstraction Level

Chemically relevant terms such as compound names, trivial and trade names or even chemical fragments in text documents are abstracted using the chemical annotation software developed at InfoChem ICANNOTATOR. The program also assigns anchor points and highlighting information to the chemical entities found in the source document.
In a second step these terms are transformed into computer-readable connection tables using the InfoChem Name to Structure module ICN2S.

In a parallel process images representing chemical structures or reactions are recognized and converted into computer-readable formats with an in-house developed image to structure conversion tool.

One of the biggest challenges is a reliable verification of the extracted information in terms of chemical validity. Bad image quality, ambiguous notation or incorrect names can be the source of errors. Consequently a strict chemical validation of the generated content using specific verification and checking tools is indispensable.

Management Level

The abstracted data in form of connection tables must still undergo a quality assurance process, then it can finally be loaded into a chemistry data cartridge like InfoChem's ICCARTRIDGE and can easily be queried by structure, substructure or similarity. When using a common RDBMS like Oracle®, combined structure, full text and factual data searches are also possible.
Anchor points in the original literature help linking the hit in the hit list directly to the document containing the desired information.

