<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en" xmlns="http://www.w3.org/2005/Atom"><title>Recent changes to 7: pdf forms to xml forms</title><link href="https://sourceforge.net/p/pdf2xml/feature-requests/7/" rel="alternate"/><link href="https://sourceforge.net/p/pdf2xml/feature-requests/7/feed.atom" rel="self"/><id>https://sourceforge.net/p/pdf2xml/feature-requests/7/</id><updated>2013-12-16T14:02:50.292000Z</updated><subtitle>Recent changes to 7: pdf forms to xml forms</subtitle><entry><title>#7 pdf forms to xml forms</title><link href="https://sourceforge.net/p/pdf2xml/feature-requests/7/?limit=25#ee6b" rel="alternate"/><published>2013-12-16T14:02:50.292000Z</published><updated>2013-12-16T14:02:50.292000Z</updated><author><name>Herve Dejean</name><uri>https://sourceforge.net/u/dejean/</uri></author><id>https://sourceforge.netfbd346fd01cc92591e66be709cd0cfa7d9224615</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;Lines roughly correspond to TEXT tags. A simple concatenation of TOKEN content creates the line. TOKEN are generated since they carry typographical information for each token.&lt;/p&gt;
&lt;p&gt;RE: forms, pdf2xml extracts information found in the PDF. Your PDF form is a set of text and graphical information. The form structure is not explicitly given. It has to be generated.&lt;/p&gt;&lt;/div&gt;</summary></entry><entry><title>pdf forms to xml forms</title><link href="https://sourceforge.net/p/pdf2xml/feature-requests/7/" rel="alternate"/><published>2013-11-24T21:17:08.060000Z</published><updated>2013-11-24T21:17:08.060000Z</updated><author><name>Anonymous</name><uri>https://sourceforge.net/u/userid-None/</uri></author><id>https://sourceforge.nete117d5da14d7977d8ad76d744da576e52593491d</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;Using:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://hivelocity.dl.sourceforge.net/project/pdf2xml/binaries/Linux%2064%20v1.2.7/pdftoxml.linux64.exe.1.2_7.gz"&gt;http://hivelocity.dl.sourceforge.net/project/pdf2xml/binaries/Linux%2064%20v1.2.7/pdftoxml.linux64.exe.1.2_7.gz&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;downloaded on:&lt;/p&gt;
&lt;p&gt;2013-01-12&lt;/p&gt;
&lt;p&gt;and applied to:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.irs.gov/pub/irs-pdf/f1040.pdf" rel="nofollow"&gt;http://www.irs.gov/pub/irs-pdf/f1040.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;which was downloaded on:&lt;/p&gt;
&lt;p&gt;2013-03-11&lt;/p&gt;
&lt;p&gt;produces what looks like a ... element for each word.&lt;br /&gt;
For example, the attachment shows a portion of the xml output after&lt;br /&gt;
running thru xmlindent.&lt;/p&gt;
&lt;p&gt;Could pdf2xml be modified so that words on same line are concatenated&lt;br /&gt;
in a single say, ... element to make the xml easier to read?&lt;br /&gt;
The code here:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.mobipocket.com/dev/pdf2xml/pdf2xml.zip" rel="nofollow"&gt;http://www.mobipocket.com/dev/pdf2xml/pdf2xml.zip&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;does that; hence, it must be possible.&lt;/p&gt;
&lt;p&gt;Also, the f1040.pdf has many pdf form fields which don't appear in the&lt;br /&gt;
resulting .xml file produced by pdf2xml.  Could pdf2xml be modified to&lt;br /&gt;
produce some type of xform fields, something like that shown here:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://xformsinstitute.com/essentials/browse/ch02s02.php" rel="nofollow"&gt;http://xformsinstitute.com/essentials/browse/ch02s02.php&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks for all the work on this.&lt;/p&gt;
&lt;p&gt;I'm a pretty good c++ programmer and I'm trying to understand pdf;&lt;br /&gt;
hence, maybe I could provide some help on these features.&lt;/p&gt;
&lt;p&gt;-regards,&lt;br /&gt;
Larry&lt;/p&gt;&lt;/div&gt;</summary></entry></feed>