Alchemy discussion forum Home
 Search       Members   Calendar   Help   Home 
Search by username
Not logged in - Login | Register 

Electronic Boehme
 Moderated by: alchemyd  
 New Topic   Reply   Print 
AuthorPost
Paul Ferguson
Member


Joined: Fri Feb 15th, 2008
Location:  
Posts: 1538
Status:  Offline
 Posted: Tue Sep 30th, 2008 11:58 am
 Quote  Reply 
Neil J Mann wrote:
The King's Garden Center is a resource that is completely new to me -- many thanks, Paul, for pointing it out, since there seem to be some interesting texts gathered here.  The Boehme texts seem to be largely the same as Pass the Word's (titles, formating etc) , but I'll do some further searching tonight.


I forgot to mention the splendid:

http://www.sacred-texts.com/

Paul Ferguson
Member


Joined: Fri Feb 15th, 2008
Location:  
Posts: 1538
Status:  Offline
 Posted: Fri Jul 26th, 2013 10:58 am
 Quote  Reply 
Paul Ferguson wrote:
adammclean wrote:
FineReader seem to restrict the number of pages you can read in the different versions, and charge about 10 pence ($0.20) per page for the 2500 page version.

At nearly 300 pounds ($600) it is too expensive for me !

 

 


I believe the Open Source Tesseract software can also cope with Fraktur:

http://code.google.com/p/tesseract-ocr/



I have recently been using Tesseract in connection with a project and I thought I would report back here as I am very pleased with the results.

1. If you are using Windows then the best approach seems to be to download the FreeOCR program from here (using the blue button):

http://www.paperfile.net/download.html

2. Now you will need to download the Fraktur file deu-frak.traineddata.gz from here:

https://code.google.com/p/tesseract-ocr/downloads/list

3. You will need 7-Zip or a similar program to decompress the .gz file. Decompress the file to the tessdata subdirectory of FreeOCR.

4. Now rename the deu-frak.traineddata file, otherwise it won't show up in the FreeOCR language list, since deu is already used for standard German.
I renamed the file to ger.traineddata – the language will then show as ger.

Away you go...

You can process both scanned files and .pdfs which you have downloaded from the Internet, although I didn't get particularly good results with Google Books.

Not perfect but a huge timesaver!

N.B. I found it ran OK under Windows 7 but I had problems with the dreadful Windows 8.
There are also versions available for Linux, Mac etc.

Hope this is some use anyway,

Paul

Last edited on Fri Jul 26th, 2013 12:21 pm by Paul Ferguson

Paul Ferguson
Member


Joined: Fri Feb 15th, 2008
Location:  
Posts: 1538
Status:  Offline
 Posted: Mon Jul 11th, 2016 10:27 am
 Quote  Reply 
The free pdf-reader Foxit Reader:

https://www.foxitsoftware.com/products/pdf-reader/

does a pretty good job of OCR'ing Fraktur and Schwabacher files.

Just save the download in Foxit as a .TXT file. This should OCR it reasonably well into Antiqua, though it doesn't seem to work with certain files. If it doesn't work with your file, search around for an alternative source file using Wikisource: https://en.wikisource.org/wiki/Main_Page

If you use the free WP software Libre Office:

https://www.libreoffice.org/

with the Alt Search extension:

http://extensions.libreoffice.org/extension-center/alternative-dialog-find-replace-for-writer

you can strip out most of the white space in your OCR'ed file by searching for /p (End of paragraph) and replacing it with nothing.

You'll still need to do a bit of work on it depending on the quality of the original scan but it is obviously much faster than transcribing manually. And all completely free!

Last edited on Mon Jul 11th, 2016 10:29 am by Paul Ferguson

Paul Ferguson
Member


Joined: Fri Feb 15th, 2008
Location:  
Posts: 1538
Status:  Offline
 Posted: Mon Jan 30th, 2017 07:07 pm
 Quote  Reply 
Fraktur file for FreeOCR now available here:

https://osdn.net/projects/sfnet_tesseract-ocr-alt/downloads/deu-frak.traineddata.gz/

Seems to work OK with Windows 10.


 Current time is 09:30 am
Page:  First Page Previous Page  1  2   




Powered by WowBB 1.7 - Copyright © 2003-2006 Aycan Gulez