<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en" xmlns="http://www.w3.org/2005/Atom"><title>Recent changes to support-requests</title><link href="https://sourceforge.net/p/pdfbox/support-requests/" rel="alternate"/><link href="https://sourceforge.net/p/pdfbox/support-requests/feed.atom" rel="self"/><id>https://sourceforge.net/p/pdfbox/support-requests/</id><updated>2022-11-29T18:40:10.258000Z</updated><subtitle>Recent changes to support-requests</subtitle><entry><title>PDFBox jar file as Reference for MS Access VBA project?</title><link href="https://sourceforge.net/p/pdfbox/support-requests/32/" rel="alternate"/><published>2022-11-29T18:40:10.258000Z</published><updated>2022-11-29T18:40:10.258000Z</updated><author><name>Tony Macelli</name><uri>https://sourceforge.net/u/tonymac/</uri></author><id>https://sourceforge.netb7f156bf6e957270d67d8c80fdf57b97b6d65743</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;Intending to read and work with my PDF files through VBA  API calls from PDFBox library,, I've downloaded a PDFBox jar file from &lt;a href="https://pdfbox.apache.org/download.cgi" rel="nofollow"&gt;https://pdfbox.apache.org/download.cgi&lt;/a&gt;, placed it in the Program Files folder in  my PC (Win 11 Home), opened a MS Access VBA database (part of MS Office 365),  and in the VBA interface used Tools | Reference ... and found the jar file.   An error resulted: "Can't add a reference to the specified file."&lt;/p&gt;
&lt;p&gt;I tried this three times, with each of the following files:  pdfbox-2.0.1.jar ,  pdfbox-2.0.16.jar , &lt;br/&gt;
 pdfbox-app-3.0.0-alpha3.jar   - but in each case the resulting error was the same.&lt;/p&gt;
&lt;p&gt;Wha tam I doing wrong and ho wcan I achieve my aim?  Thanks.&lt;/p&gt;&lt;/div&gt;</summary></entry><entry><title>can it be used with php</title><link href="https://sourceforge.net/p/pdfbox/support-requests/31/" rel="alternate"/><published>2009-10-07T08:10:57Z</published><updated>2009-10-07T08:10:57Z</updated><author><name>Anonymous</name><uri>https://sourceforge.net/u/userid-None/</uri></author><id>https://sourceforge.net54b094a33c23617630fbdbb05be183182b794da9</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;Hi,&lt;/p&gt;
&lt;p&gt;I am desperately trying to extract plain text from PDF documents. I bashed up a code that can only process 80% of my PDF file collection.&lt;br /&gt;
Cand PDFBox be pf any help to me ?&lt;/p&gt;&lt;/div&gt;</summary></entry><entry><title>can it be used with php</title><link href="https://sourceforge.net/p/pdfbox/support-requests/30/" rel="alternate"/><published>2009-10-07T07:24:35Z</published><updated>2009-10-07T07:24:35Z</updated><author><name>Anonymous</name><uri>https://sourceforge.net/u/userid-None/</uri></author><id>https://sourceforge.net913c4a4f079f8d7457b81e2f3a8e2ee27170ad63</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;Hi,&lt;/p&gt;
&lt;p&gt;I am desperately trying to extract plain text from PDF documents. I bashed up a code that can only process 80% of my PDF file collection.&lt;br /&gt;
Cand PDFBox be pf any help to me ?&lt;/p&gt;&lt;/div&gt;</summary></entry><entry><title>Extracting subscript char - Issue in some pdf</title><link href="https://sourceforge.net/p/pdfbox/support-requests/29/" rel="alternate"/><published>2009-05-05T09:47:22Z</published><updated>2009-05-05T09:47:22Z</updated><author><name>eclipse79</name><uri>https://sourceforge.net/u/eclipse79/</uri></author><id>https://sourceforge.net93c4297f82cfb170341f169bbfa8afd346953f46</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;Hello, &lt;br /&gt;
I'm trying PDFBox 0.7.3 in order to extract text from PDF files, but I have noticed a problem on subscript chars. This issue occurs in the most PDF that I have (not in all). I have very often the word "CO2", where 2 is subscript char. Some files extract the text putting a CRLF before and after the "2".&lt;br /&gt;
These are some examples:&lt;/p&gt;
&lt;p&gt;Inoltre, utilizzando unicamente combustibili fossili, il comparto non ha la possibilità di ridurre le&lt;br /&gt;
emissioni di CO&lt;br /&gt;
2&lt;br /&gt;
. &lt;/p&gt;
&lt;p&gt;- la riduzione dell?impronta CO&lt;br /&gt;
2&lt;br /&gt;
complessiva,&lt;/p&gt;
&lt;p&gt;Can anybody help me?&lt;br /&gt;
Thank you&lt;br /&gt;
Eclipse79&lt;/p&gt;&lt;/div&gt;</summary></entry><entry><title>*****PDFBox has moved to Apache*****</title><link href="https://sourceforge.net/p/pdfbox/support-requests/28/" rel="alternate"/><published>2008-07-23T00:12:33Z</published><updated>2008-07-23T00:12:33Z</updated><author><name>Ben Litchfield</name><uri>https://sourceforge.net/u/benlitchfield/</uri></author><id>https://sourceforge.net3dc4d15028c9c9d3a4d89cf6343dff1330211e64</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;All new bugs should be posted on the Apache PDFBox page.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://issues.apache.org/jira/browse/PDFBOX" rel="nofollow"&gt;https://issues.apache.org/jira/browse/PDFBOX&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;or visit the current Apache PDFBox project page&lt;/p&gt;
&lt;p&gt;&lt;a href="http://incubator.apache.org/projects/pdfbox.html" rel="nofollow"&gt;http://incubator.apache.org/projects/pdfbox.html&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</summary></entry><entry><title>ClassCastException issue when extracting graphics</title><link href="https://sourceforge.net/p/pdfbox/support-requests/27/" rel="alternate"/><published>2008-06-17T08:28:03Z</published><updated>2008-06-17T08:28:03Z</updated><author><name>Anonymous</name><uri>https://sourceforge.net/u/userid-None/</uri></author><id>https://sourceforge.net5b5d0c9e2b7fd0b6b48d24f9bbe91c89043760af</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;Hello&lt;/p&gt;
&lt;p&gt;I am evaluating PDFBox 7.0.13 to extract images out of a bunch of PDF files. These PDF files are all scanned documents. The graphics will then be passed to an OCR program to extract the text.&lt;br /&gt;
During the execution, about 15% of the documents fail with 2 types of errors:&lt;br /&gt;
-------------------------------------------------&lt;br /&gt;
java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be cast to org.pdfbox.cos.COSDictionary&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDCcitt$TiffWrapper.buildHeader(PDCcitt.java:501)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDCcitt$TiffWrapper.&amp;lt;init&amp;gt;(PDCcitt.java:363)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDCcitt$TiffWrapper.&amp;lt;init&amp;gt;(PDCcitt.java:354)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDCcitt.write2OutputStream(PDCcitt.java:128)&lt;br /&gt;
at PDFBox1.parseDocument(PDFBox1.java:237)&lt;br /&gt;
at PDFBox1.processAll(PDFBox1.java:108)&lt;br /&gt;
at PDFBox1.main(PDFBox1.java:468)&lt;br /&gt;
Failed to process - reason: Failed to parse file&lt;br /&gt;
-------------------------------------------------&lt;br /&gt;
java.lang.ArrayIndexOutOfBoundsException&lt;br /&gt;
at java.lang.System.arraycopy(Native Method)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.predictor.None.decode(None.java:71)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:154)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:166)&lt;br /&gt;
at PDFBox1.parseDocument(PDFBox1.java:237)&lt;br /&gt;
at PDFBox1.processAll(PDFBox1.java:108)&lt;br /&gt;
at PDFBox1.main(PDFBox1.java:468)&lt;br /&gt;
-------------------------------------------------&lt;br /&gt;
My problem is that these documents are classified, so I cannot submit a test case.&lt;br /&gt;
Basically, I have 2 questions:&lt;br /&gt;
1. since these problem always occur at the same address, can you identify the problem without a test case?&lt;br /&gt;
2. does the CVS version (7.0.14) contain a fix for these problems?&lt;/p&gt;
&lt;p&gt;Best regards&lt;/p&gt;
&lt;p&gt;JP&lt;br /&gt;
dev@softpark.ws&lt;/p&gt;&lt;/div&gt;</summary></entry><entry><title>embedded tif extraction from pdf</title><link href="https://sourceforge.net/p/pdfbox/support-requests/26/" rel="alternate"/><published>2008-03-11T13:15:33Z</published><updated>2008-03-11T13:15:33Z</updated><author><name>Anonymous</name><uri>https://sourceforge.net/u/userid-None/</uri></author><id>https://sourceforge.net96884e3adc38f678b7b6e76de02f66cfe63333ee</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;After extraction the tiff image from the pdf file (see attachment ae90.pdf), you get changed "Photometric Interpertation" (white pixels are black, black pixels are white) in tiff image. I used PDFBox-0.7.2.jar and PDFBox-0.7.3.jar and finally PDFBox-0.7.4-dev-20080306.jar&lt;/p&gt;
&lt;p&gt;Thank you for your help&lt;/p&gt;
&lt;p&gt;christian&lt;/p&gt;
&lt;p&gt;cnczech@web.de&lt;/p&gt;&lt;/div&gt;</summary></entry><entry><title>PDFTextStripper not handling some Japanese</title><link href="https://sourceforge.net/p/pdfbox/support-requests/25/" rel="alternate"/><published>2007-11-29T15:33:43Z</published><updated>2007-11-29T15:33:43Z</updated><author><name>sflaumen</name><uri>https://sourceforge.net/u/sflaumen/</uri></author><id>https://sourceforge.net046e2332627d80d272c579fd9cf6068a56370f53</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;Using this code sequence: &lt;/p&gt;
&lt;p&gt;PDDocument document = PDDocument.load(stream);&lt;br /&gt;
PDFTextStripper stripper = new PDFTextStripper();&lt;br /&gt;
String contents = stripper.getText(document);&lt;/p&gt;
&lt;p&gt;some Japanese documents are handled properly. This is shown by viewing the chars in the String "contents".&lt;br /&gt;
However, other Japanese documents produce garbage non-Japanese characters as viewed in the String contents. &lt;/p&gt;
&lt;p&gt;The ones that are not handled properly in PDFTextStripper display a prompt when opened in Acrobat Reader which says that a Japanese language support pack needs to be installed to view the document properly. The ones that are handled properly display Japanese characters fine when viewed through Acrobat Reader. Installing the language support pack is not a solution since it would only resolve the display in Acrobat Reader. This code needs to run on a Unix server so even if the support pack would provide help on a PC (unlikely) it would have no affect on the task when run in Unix.&lt;/p&gt;
&lt;p&gt;This appears to be an encoding issue however, unlike similar issues that have been reported, the above code completes successfully. It is just that the results are as described above.&lt;/p&gt;
&lt;p&gt;Attached is an example of a PDF file that is not handled properly by PDFTextStripper and requires a Japanese language pack to view in Acrobat Reader.&lt;/p&gt;&lt;/div&gt;</summary></entry><entry><title>File size shrinks after populating data into the fields</title><link href="https://sourceforge.net/p/pdfbox/support-requests/24/" rel="alternate"/><published>2007-09-12T14:44:58Z</published><updated>2007-09-12T14:44:58Z</updated><author><name>chapsi</name><uri>https://sourceforge.net/u/chapsi12/</uri></author><id>https://sourceforge.net54bc3bdea822ab27dd8b4fbaa297056a108e03ff</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;Hi all,&lt;/p&gt;
&lt;p&gt;I am trying to populate the fields in a PDF with data using PDFBox API. I notice that the original document is around 124 KB and the newly populated pdf is only about 82 KB. I am trying to print this modified PDF and I cant print. But I can open the modified PDF in Adobe Reader and do a File --&amp;gt; Print. &lt;/p&gt;
&lt;p&gt;Has anyone seen this kind of a problem before ? Why would the file size be smaller after it gets populated. It should only be higher. &lt;/p&gt;
&lt;p&gt;Appreciate any responses... &lt;/p&gt;
&lt;p&gt;Thanks,&lt;br /&gt;
Chapsi&lt;/p&gt;&lt;/div&gt;</summary></entry><entry><title>Extracting text by ID</title><link href="https://sourceforge.net/p/pdfbox/support-requests/23/" rel="alternate"/><published>2007-09-04T19:51:01Z</published><updated>2007-09-04T19:51:01Z</updated><author><name>Umkhulubaas</name><uri>https://sourceforge.net/u/umk/</uri></author><id>https://sourceforge.nete601e8b025bdcc154e9ab8bee75f8708dd1ed29c</id><summary type="html">&lt;div class="markdown_content"&gt;&lt;p&gt;Attachment upload for:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://sourceforge.net/forum/forum.php?thread_id=1812274&amp;amp;forum_id=267205"&gt;http://sourceforge.net/forum/forum.php?thread_id=1812274&amp;amp;forum_id=267205&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</summary></entry></feed>