Imago OCR¶
Overview¶
Imago OCR is a toolkit for 2D chemical structure image recognition. It contains a GUI program and a command-line utility, as well as a documented API for developers. Imago is completely free and open-source, while also available on a commercial basis.
The core part of Imago is written from scratch in modern C++. It uses the best known algorithms for optical recognition. That guarantees Imago’s outstanding portability and performance.
Note¶
Imago OCR project is under active development. You can post us your comments and suggestions and get timely replies from the developers’ team.
Source image |
Recognized structure |
---|---|
Recognizable Molecule Features¶
Single, double, triple bonds, bridged bonds
Atom labels, subscripts, isotopes, charges
Superatoms and abbreviations expansion
Aromatic rings
Stereochemistry (up- and down-bonds)
You can find more examples on this page.
Online Demo¶
You can evaluate the Imago OCR recognition quality on the Imago Demo web page.
Comparison with other systems¶
We created a detailed report with sets of different images that compares Imago OCR with other publicly available solutions. The report is available on a separate web page. The scripts and the image sets are available in the download section.
If you can suggest other test sets or other publicly available solutions we would be happy to include them too in the report.
Resources¶
Presentation at the Symposium on 244th ACS National Meeting & Exposition:
Portability¶
Imago library is written in portable C++ and supports Linux, Windows, and Mac OS X operating systems, both 32-bit and 64-bit versions of each system.
Imago exposes the C interface to applications. Java wrapper is available for all supported platforms. A Java GUI application called Imago OCR Visual Tool is provided, and a command-line utility imago_console is provided as well.
List of Dependencies¶
The dependencies are included into the distribution packages, and so you do not need to download any of them separately to run the programs or to compile the source code.
Imago C++ dependencies:
OpenCV library
PicoPNG (optional module to load PNG images with changes for fail-safe PNG image loading)
Java-specific dependencies:
JNA (for Java wrapper)
PDFRenderer (only for Imago OCR Visual Tool)
Java Advanced Imaging (JAI) (only for Imago OCR Visual Tool, part of Java SDK)
More details on the dependencies (including their licenses) you can find on a separate page
Supported Data Formats¶
Both the Imago OCR project and the imago_console
tool are supporting
the most popular raster image formats: PNG
, JPEG
, BMP (using RGB 24bpp)
,
DIB (using RGB 24bpp)
, TIFF
, PBM
and others (depending on platform).
Imago OCR Visual Tool users can also open PDF
files, choose the
needed document page (if it is PDF
or TIFF
), and select a
fragment that should be recognized.
Developers who use the C API can pass supported format images or raw
image data to the library. Recognition result can be saved as MDL
(Symyx, Accelrys) Molfiles. Imago OCR Visual Tool also provides a
possibility to copy the recognized molecule to the system clipboard.
Download and Install¶
Look at the Downloads page for the
installation package suitable for your system. There is an installer for
Windows, and zipfiles for Linux and Mac OS X, which you can just unpack
into /usr/local/bin
or /opt
directory, or into your home
directory.
You can run Imago OCR Visual Tool even without installing any files using Java Web Start technology. Open the following JNLP-file to execute Imago OCR Visual Tool.
License¶
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Feedback¶
Do you need assistance using our tools? Do you need a feature? Do you want to send a patch to us? Did you find a bug? Please write to the following e-mail and let us know:
Commercial Availability¶
If the GPL-licensed Imago does not fit your needs, please contact us to discuss the purchase of a commercial license. You may need the commercial license if you want to:
Receive ongoing support and maintenance
Include Imago as a component in your proprietary software product
Do any other development/testing required for a proprietary software product
Visit our SolutionsHub page for more details