BaH.Tesseract - Released

BlitzMax Forums/Brucey's Modules/BaH.Tesseract - Released

Brucey(Posted 2009) [#1]
Since it is 2009 already, I thought it was about time I released a new module.

BaH.Tesseract is an OCR module. It can read an image and recognise characters, converting them to a String. There's a small usage example here.

The module can be downloaded from the maxmods downloads page.

The provided examples use BaH.FreeImage to load the example .tif images. But since the API accepts a TPixmap, you can use any image loader you want. We like to give you choice here at BaH Central :-)

Note that tesseract is only an OCR module. It doesn't do page analysis, meaning it won't understand multiple columns, or pages with "pictures". However, the API lets you specify rectangular areas within an image, so you can be quite precise.

Since the API is very simple (there are only 6 or 7 functions in total), the documentation may seem a little sparse. But it's all there :-)

:o)

Enjoy!


slenkar(Posted 2009) [#2]
its like the thingy in adobe PDF reader that lets you search through text in a PDF document right? because it recognises printed characters in an image


Brucey(Posted 2009) [#3]
Something like that.

Say you have a scanned page from somewhere - like an old book, at between 200dpi and 300dpi for best results - you simply feed the image into the module, and all going well, you should get back a String representation of the page.