Batch process and Viewing results

Anonymous — Tue, 07 Dec 2004 18:28:14 -0000

The Google Search Appliance uses pdftohtml version
0.33a. 0.33a is unable to read some OCR'ed files
and therefore the appliance does not index them
(since they are blank). We have approximately
10,000 files that we want to run thru the 0.33a.
Those that are blank will be re-scanned with a
different software.

Do you know of a way to use your software to
process multiple files? Additionally, how can you
tell if there are blank HTML files, other than
opening and viewing each converted file?

Thank you,
wongn@metro.net

Recent changes to 20: Batch process and Viewing results

Batch process and Viewing results