Talk to a Security Expert Now: (800) 721-9177

Converting Lots of PDFs to TXTs in Ubuntu/Debian

For those of you who are struggling to find a way to convert PDF files into TXT files, here is a quick bash script. There are many alternatives out there, but none were reliable for me. You’ll need to have acroread and ghostscript installed for this to work.


#!/bin/bash
mkdir ps txt
FILES="*.pdf"
for f in $FILES
do
echo "Processing $f"
acroread -toPostScript $f ps/
g=`basename $f .pdf`
ps2txt ps/$g.ps > txt/$g.txt
done

You can also change the second to last line to read
ps2txt ps/$g.ps | grep -v "EXCLUDE" > txt/$g.txt
where EXCLUDE is a line that you want to exclude from each PDF. Please let me know if you have any problems.

enjoy,
db

Leave a Reply

Your email address will not be published. Required fields are marked *