Extract text from a binary file/image/other text formats
The docker image uses /data
folder as a volume where document will be read/written. Hence the user needs to provide the folder that would be mapped to /data
For example, Download BookReporter.pdf file to the Downloads folder of your home directory (~/Downloads)
To extract text from BookReporter.pdf
and save it to file BookReporter.txt
, run
docker run \
--rm \
-v "`pwd`:/data" \
kunalshah/textract:latest \
-o converted.txt \
file.pdf
See converted text file
cat converted.txt
Read here
Read here