Paul Ferguson wrote:
adammclean wrote:
FineReader seem to restrict the number of pages you can read in the different versions, and charge about 10 pence ($0.20) per page for the 2500 page version.
At nearly 300 pounds ($600) it is too expensive for me !
I believe the Open Source Tesseract software can also cope with Fraktur:
http://code.google.com/p/tesseract-ocr/
I have recently been using Tesseract in connection with a project and I thought I would report back here as I am very pleased with the results.
1. If you are using Windows then the best approach seems to be to download the FreeOCR program from here (using the blue button):
http://www.paperfile.net/download.html
2. Now you will need to download the Fraktur file deu-frak.traineddata.gz from here:
https://code.google.com/p/tesseract-ocr/downloads/list
3. You will need 7-Zip or a similar program to decompress the .gz file. Decompress the file to the tessdata subdirectory of FreeOCR.
4. Now rename the deu-frak.traineddata file, otherwise it won't show up in the FreeOCR language list, since deu is already used for standard German.
I renamed the file to ger.traineddata – the language will then show as ger.
Away you go...
You can process both scanned files and .pdfs which you have downloaded from the Internet, although I didn't get particularly good results with Google Books.
Not perfect but a huge timesaver!
N.B. I found it ran OK under Windows 7 but I had problems with the dreadful Windows 8.
There are also versions available for Linux, Mac etc.
Hope this is some use anyway,
PaulLast edited on Fri Jul 26th, 2013 12:21 pm by Paul Ferguson
|