Convert PDF to Text Using OCR
2020-05-05 ::
( 1 minutes reading )
make sure you have imagemagick and tesseract are installed
$ sudo apt install imagemagick tesseract-ocr
It’s a 2 step process:
- Convert PDF to .tiff using
convert
from imagemagick
$ convert -density 300 input.pdf -depth 8 output.tiff
- convert .tiff to text using
tesseract
generateout.txt
$ tesseract output.tiff out
comments powered by Disqus