Logo: InfoChem

Chemisches Zentralblatt

Click to enlarge this screenshot

InfoChem is performing automatic chemical named entity recognition of Chemisches Zentralblatt, one of the most important abstracts journal for the time period 1830-1969.
Aim of the project is building a structure searchable database, in order to offer a language independent search in such a relevant historical source.

What is Chemisches Zentralblatt?


Chemisches Zentralblatt is the first and oldest abstracts journal published in the field of chemistry. It covers the chemical literature from 1830 to 1969, describing the "birth" of chemistry as science, compared to alchemy.
In 140 years Chemisches Zentralblatt published 900,000 pages: 700,000 contain ca. 2 million abstracts, 200,000 are indexes.

Approach and applied technology


Click to enlarge this screenshot
The documents, mainly in .tiff format undergo OCR processing. The ICANNOTATOR performs named entity recognition with support of the optimized SPRESI dictionaries, then the abstracted names are converted into connection tables with the name to structure tool, also integrated in the ICANNOTATOR and supported by the dictionaries. The connection tables and the associated names are stored in a database and then combined search of text and structures can be performed on a federated search system. From the hit list it is possible to jump directly to the page containing the information thanks to a direct link to the original literature.
To proof the quality of the automatic process we have also abstracted manually structures from a sample set and performed a quantitative comparison.

Results


Click to enlarge this screenshot We have abstracted automatically 900,000 pages, obtaining ca. 1 million unique chemical names and 500,000 unique structures. The quantitative comparison with a manually abstracted sample set proves over 60% recall and nearly 90% precision for our process.
Scientists can now perform combined text and structure / substructure searches in Chemisches Zentralblatt using the federated search system ICFEDSEARCH.


Please feel free to contact us if you need further information.




Last modification: January 14, 2010.


InfoChem Gesellschaft für chemische Information mbH

Landsberger Straße 408/V
D-81241 München
Germany

Phone: +49 (0)89 58 30 02
Fax: +49 (0)89 580 38 39
Email: info@infochem.de