<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to 128: Add support for ocrmypdf</title><link>https://sourceforge.net/p/gscan2pdf/feature-requests/128/</link><description>Recent changes to 128: Add support for ocrmypdf</description><atom:link href="https://sourceforge.net/p/gscan2pdf/feature-requests/128/feed.rss" rel="self"/><language>en</language><lastBuildDate>Sun, 29 Jan 2023 20:41:58 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/gscan2pdf/feature-requests/128/feed.rss" rel="self" type="application/rss+xml"/><item><title>#128 Add support for ocrmypdf</title><link>https://sourceforge.net/p/gscan2pdf/feature-requests/128/?limit=25#9331</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I've been using it in a bash script that uses gnu find to list pdfs, then pdfinfo to ignore then unless they came from gscan2pdf and pdftotext to ignore ones that already have text.  then it just runs the bash command line to ocrmypdf.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">nwdm</dc:creator><pubDate>Sun, 29 Jan 2023 20:41:58 -0000</pubDate><guid>https://sourceforge.netbe3d94fb454c475c8fb792bc8ba111d7df8255f3</guid></item><item><title>#128 Add support for ocrmypdf</title><link>https://sourceforge.net/p/gscan2pdf/feature-requests/128/?limit=25#7041</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I am in the process of rewriting gscan2pdf in Python. When I have finished, it should be reasonably easy to hook into ocrmypdf (which is also Python) to more accurately place the text than gscan2pdf currently does.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jeffrey Ratcliffe</dc:creator><pubDate>Sun, 29 Jan 2023 15:08:47 -0000</pubDate><guid>https://sourceforge.netbbedb65f6821ed3020dedadd312342007b9bffac</guid></item><item><title>Add support for ocrmypdf</title><link>https://sourceforge.net/p/gscan2pdf/feature-requests/128/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;ocrmypdf is awesome!&lt;br/&gt;
&lt;a href="https://ocrmypdf.readthedocs.io/en/latest/introduction.html" rel="nofollow"&gt;https://ocrmypdf.readthedocs.io/en/latest/introduction.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I found that Tesseract was better than gocr in gscan2pdf.  But while investigating a way to asynchronously offload OCRing, I stumbled upon a python project called ocrmypdf.  And it is amazing!&lt;/p&gt;
&lt;p&gt;It uses Tesseract I think.  It seems to be more accurate.  It lines up better with the document image.  And it is easier to select and copy from the way it uses transparent font, so you see the original document image, but you select the OCR'd text.&lt;/p&gt;
&lt;p&gt;I'd love it if was an option in gscan2pdf next to Tesseract and gOCR&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">nwdm</dc:creator><pubDate>Sun, 29 Jan 2023 15:03:41 -0000</pubDate><guid>https://sourceforge.net0d6bd048d980a270d5a9c2d2821b57513b777381</guid></item></channel></rss>