View Post [edit]
Poster: | R Pal | Date: | Aug 30, 2016 4:36pm |
Forum: | opensource | Subject: | Re: I would like to remaster a book with many OCR errors |
Would you consider downloading the original PDF and re-uploading it as a new item?
If you then wait for the other file formats to be derived and re-download them, correct the errors and bad text and re-upload the corrected files to the new items directory it may work.
Does someone here have any experience with correcting previously uploaded and OCRed files?
Reply [edit]
Poster: | maven-raven | Date: | Aug 31, 2016 11:33am |
Forum: | opensource | Subject: | Re: I would like to remaster a book with many OCR errors |
It's probably better if we still keep the current version of the book around. It might confuse people if we replace the files. I will try to put together a new version of the book and then upload that independently. I will leave a comment on the original book page so that others know where to look for the remastered version.
I have already downloaded the scanned images. I will work with the txt version of the book and whenever I encounter an OCR error, I will correct it and given any the ambiguity I will consult the scans.
I still don't know which file format would be most beneficial to the internet archive. I could certainly create a PDF file but I would also not mind to use something that supports more explicit structure to the document like Docbook XML.
And I am a programmer. I know how to automate repetitive corrections.
Reply [edit]
Poster: | Jeff Kaplan | Date: | Aug 31, 2016 2:18pm |
Forum: | opensource | Subject: | Re: I would like to remaster a book with many OCR errors |
This post was modified by Jeff Kaplan on 2016-08-31 21:18:01
This post was modified by Jeff Kaplan on 2016-08-31 21:18:25
Reply [edit]
Poster: | maven-raven | Date: | Sep 1, 2016 12:03am |
Forum: | opensource | Subject: | Re: I would like to remaster a book with many OCR errors |
Is there a section on the internet archive where you have non-OCRed books? I could just post the corrected book there instead.
Reply [edit]
Poster: | Jeff Kaplan | Date: | Sep 1, 2016 8:46am |
Forum: | opensource | Subject: | Re: I would like to remaster a book with many OCR errors |
Reply [edit]
Poster: | R Pal | Date: | Aug 31, 2016 4:13pm |
Forum: | opensource | Subject: | Re: I would like to remaster a book with many OCR errors |