Recent changes to feature-requests

Problems with converting some kind of PDF´s

2011-10-07T00:15:23Z

Hi, I need help:
I have installed the tool on our Linux server, everthing is fine. Its works already with some koinds of PDF files - they will converted fine. Bute some kinds of PDF´s (seems new Versions) cannot be converted. There comes an error: Error (0): PDF file is damaged - attempting to reconstruct xref table... Error: Couldn't find trailer dictionary Error: Couldn't read xref table But i CAN open it so the error doesnt make sense I think the only requirements to modify are: 1. The software should be able to pass the filename to the pdf as an argument. 2. Should then be able to get 1.html document outputted. that is the same as what the pdftohtml tool is doin now but the new tool should be able to convert ALL pdf documents Has anybody the same problems? Could anybody help me and modify the tool for converting all kinds of PDF´s Of course I will PAY for the Help. with regards Markus

Join words exceeding line

2007-10-04T14:22:05Z

In http://www.staff.amu.edu.pl/~insfil/problemy-dyskusje/tom3/7.pdf
we have "konsek-" at end line, "wencje" at start next line, must be "konsekwencje"
"sprzecz-" + "ności" must be "sprzeczności"

CHM output

2006-09-27T21:25:54Z

This might be a matter of different program using
PdfToHtml, though I wish it could generate CHMs (using
LZM library from
http://www.speakeasy.org/~russotto/chm/\). CHM is a very
usable format in a lot of cases.
Or maybe PdfToHtml could generate a set of files ready
for CHM packing.

Charcter Spacing and Word Spacing Info..

2005-05-16T13:07:50Z

The text being retrieved is pretty good .. but the Width
can be acompanied with the scaling Information that
should be applied to that text so that it fits within that....

It would be a great addition!!!

Also at some places i found that the function state-
>getHorizScaling() returns 1.0000 whereas in the actual
document even by the look of the eye u can make out
that there is soe scaling definitely less than 90%... so u
could also add this info with the text tags...

Attaching the file fw4.pdf.... look at line "Add lines from
1 to G............" and notice how the scaling differs from
the lines adjacent to it..

text extraction, but still preserve pdf formatting

2005-03-18T13:40:48Z

I use pdftohtml for text processing purposes, but the
<br> in the <div>s causes me discontinuity for the
paragraphs. I could just edit the html and remove the
<br>, but then I'd lose the orginal pdf layout, which I
don't want to do.

Would it be possible set an option to use the width
property in style to set the width, rather than use <br>?

Have a look here: www.jumpdemo.com to get an idea of
what I'm trying to acheive.

cheers.

Pure text support?

2005-01-19T20:44:48Z

This program is the closest I've comed to a pure pdf to
text converter. If I could just get the program to skip
the HTML formatting it would be perfect. Maybe a -text
command line argument can be added in the future?

HTML 3.2 support

2004-01-29T18:12:13Z

Many places require HTML documents in 3.2 format, and
it's impossible to find any software out there that
will save to 3.2 these days. This would be an
excellent feature in this package.

Easier navigation of generated document

2004-01-14T11:04:08Z

Some browsers, most notably Opera, can interpret link
tags in the header of a document to provide easiy
navigation to the next and previous documents.

It would help users to navigate large documents
produced in complex mode if these tags were added.

For instance:

Opera will put a button above the window with the label
Home.

It can do the same for next and previous, etc. See: 12.
1.2 Other link relationships <http://www.w3.
org/TR/REC-html40/struct/links.html#h-12.1.2
>

for the W3C recommendations.

Anyway, apart from always wanting more, I'm really
pleased with pdftohtml. It is already saving a lot of time
and effort.

Vector graphics

2003-11-17T20:50:16Z

Hi,
is it possible to enable the vector graphics processing?

-noframes improvement

2003-10-29T17:40:10Z

-noframes works very nice in 0.36 (since frames are
evil), but it would be even nicer to have "Previous" and
"Next" links at the top and/or bottom of each page.

This would also help paging thru a doc even with icky
frames by having the links in a consistent place on
each page, instead of cascading down the left hand
frame.

Thanks!