<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to support-requests</title><link>https://sourceforge.net/p/pdfbox/support-requests/</link><description>Recent changes to support-requests</description><atom:link href="https://sourceforge.net/p/pdfbox/support-requests/feed.rss" rel="self"/><language>en</language><lastBuildDate>Tue, 29 Nov 2022 18:40:10 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/pdfbox/support-requests/feed.rss" rel="self" type="application/rss+xml"/><item><title>PDFBox jar file as Reference for MS Access VBA project?</title><link>https://sourceforge.net/p/pdfbox/support-requests/32/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Intending to read and work with my PDF files through VBA  API calls from PDFBox library,, I've downloaded a PDFBox jar file from &lt;a href="https://pdfbox.apache.org/download.cgi" rel="nofollow"&gt;https://pdfbox.apache.org/download.cgi&lt;/a&gt;, placed it in the Program Files folder in  my PC (Win 11 Home), opened a MS Access VBA database (part of MS Office 365),  and in the VBA interface used Tools | Reference ... and found the jar file.   An error resulted: "Can't add a reference to the specified file."&lt;/p&gt;
&lt;p&gt;I tried this three times, with each of the following files:  pdfbox-2.0.1.jar ,  pdfbox-2.0.16.jar , &lt;br/&gt;
 pdfbox-app-3.0.0-alpha3.jar   - but in each case the resulting error was the same.&lt;/p&gt;
&lt;p&gt;Wha tam I doing wrong and ho wcan I achieve my aim?  Thanks.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tony Macelli</dc:creator><pubDate>Tue, 29 Nov 2022 18:40:10 -0000</pubDate><guid>https://sourceforge.netb7f156bf6e957270d67d8c80fdf57b97b6d65743</guid></item><item><title>can it be used with php</title><link>https://sourceforge.net/p/pdfbox/support-requests/31/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hi,&lt;/p&gt;
&lt;p&gt;I am desperately trying to extract plain text from PDF documents. I bashed up a code that can only process 80% of my PDF file collection.&lt;br /&gt;
Cand PDFBox be pf any help to me ?&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Wed, 07 Oct 2009 08:10:57 -0000</pubDate><guid>https://sourceforge.net54b094a33c23617630fbdbb05be183182b794da9</guid></item><item><title>can it be used with php</title><link>https://sourceforge.net/p/pdfbox/support-requests/30/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hi,&lt;/p&gt;
&lt;p&gt;I am desperately trying to extract plain text from PDF documents. I bashed up a code that can only process 80% of my PDF file collection.&lt;br /&gt;
Cand PDFBox be pf any help to me ?&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Wed, 07 Oct 2009 07:24:35 -0000</pubDate><guid>https://sourceforge.net913c4a4f079f8d7457b81e2f3a8e2ee27170ad63</guid></item><item><title>Extracting subscript char - Issue in some pdf</title><link>https://sourceforge.net/p/pdfbox/support-requests/29/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hello, &lt;br /&gt;
I'm trying PDFBox 0.7.3 in order to extract text from PDF files, but I have noticed a problem on subscript chars. This issue occurs in the most PDF that I have (not in all). I have very often the word "CO2", where 2 is subscript char. Some files extract the text putting a CRLF before and after the "2".&lt;br /&gt;
These are some examples:&lt;/p&gt;
&lt;p&gt;Inoltre, utilizzando unicamente combustibili fossili, il comparto non ha la possibilità di ridurre le&lt;br /&gt;
emissioni di CO&lt;br /&gt;
2&lt;br /&gt;
. &lt;/p&gt;
&lt;p&gt;- la riduzione dell?impronta CO&lt;br /&gt;
2&lt;br /&gt;
complessiva,&lt;/p&gt;
&lt;p&gt;Can anybody help me?&lt;br /&gt;
Thank you&lt;br /&gt;
Eclipse79&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">eclipse79</dc:creator><pubDate>Tue, 05 May 2009 09:47:22 -0000</pubDate><guid>https://sourceforge.net93c4297f82cfb170341f169bbfa8afd346953f46</guid></item><item><title>*****PDFBox has moved to Apache*****</title><link>https://sourceforge.net/p/pdfbox/support-requests/28/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;All new bugs should be posted on the Apache PDFBox page.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://issues.apache.org/jira/browse/PDFBOX" rel="nofollow"&gt;https://issues.apache.org/jira/browse/PDFBOX&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;or visit the current Apache PDFBox project page&lt;/p&gt;
&lt;p&gt;&lt;a href="http://incubator.apache.org/projects/pdfbox.html" rel="nofollow"&gt;http://incubator.apache.org/projects/pdfbox.html&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ben Litchfield</dc:creator><pubDate>Wed, 23 Jul 2008 00:12:33 -0000</pubDate><guid>https://sourceforge.net3dc4d15028c9c9d3a4d89cf6343dff1330211e64</guid></item><item><title>ClassCastException issue when extracting graphics</title><link>https://sourceforge.net/p/pdfbox/support-requests/27/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hello&lt;/p&gt;
&lt;p&gt;I am evaluating PDFBox 7.0.13 to extract images out of a bunch of PDF files. These PDF files are all scanned documents. The graphics will then be passed to an OCR program to extract the text.&lt;br /&gt;
During the execution, about 15% of the documents fail with 2 types of errors:&lt;br /&gt;
-------------------------------------------------&lt;br /&gt;
java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be cast to org.pdfbox.cos.COSDictionary&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDCcitt$TiffWrapper.buildHeader(PDCcitt.java:501)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDCcitt$TiffWrapper.&amp;lt;init&amp;gt;(PDCcitt.java:363)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDCcitt$TiffWrapper.&amp;lt;init&amp;gt;(PDCcitt.java:354)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDCcitt.write2OutputStream(PDCcitt.java:128)&lt;br /&gt;
at PDFBox1.parseDocument(PDFBox1.java:237)&lt;br /&gt;
at PDFBox1.processAll(PDFBox1.java:108)&lt;br /&gt;
at PDFBox1.main(PDFBox1.java:468)&lt;br /&gt;
Failed to process - reason: Failed to parse file&lt;br /&gt;
-------------------------------------------------&lt;br /&gt;
java.lang.ArrayIndexOutOfBoundsException&lt;br /&gt;
at java.lang.System.arraycopy(Native Method)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.predictor.None.decode(None.java:71)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:154)&lt;br /&gt;
at org.pdfbox.pdmodel.graphics.xobject.PDPixelMap.write2OutputStream(PDPixelMap.java:166)&lt;br /&gt;
at PDFBox1.parseDocument(PDFBox1.java:237)&lt;br /&gt;
at PDFBox1.processAll(PDFBox1.java:108)&lt;br /&gt;
at PDFBox1.main(PDFBox1.java:468)&lt;br /&gt;
-------------------------------------------------&lt;br /&gt;
My problem is that these documents are classified, so I cannot submit a test case.&lt;br /&gt;
Basically, I have 2 questions:&lt;br /&gt;
1. since these problem always occur at the same address, can you identify the problem without a test case?&lt;br /&gt;
2. does the CVS version (7.0.14) contain a fix for these problems?&lt;/p&gt;
&lt;p&gt;Best regards&lt;/p&gt;
&lt;p&gt;JP&lt;br /&gt;
dev@softpark.ws&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Tue, 17 Jun 2008 08:28:03 -0000</pubDate><guid>https://sourceforge.net5b5d0c9e2b7fd0b6b48d24f9bbe91c89043760af</guid></item><item><title>embedded tif extraction from pdf</title><link>https://sourceforge.net/p/pdfbox/support-requests/26/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;After extraction the tiff image from the pdf file (see attachment ae90.pdf), you get changed "Photometric Interpertation" (white pixels are black, black pixels are white) in tiff image. I used PDFBox-0.7.2.jar and PDFBox-0.7.3.jar and finally PDFBox-0.7.4-dev-20080306.jar&lt;/p&gt;
&lt;p&gt;Thank you for your help&lt;/p&gt;
&lt;p&gt;christian&lt;/p&gt;
&lt;p&gt;cnczech@web.de&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Anonymous</dc:creator><pubDate>Tue, 11 Mar 2008 13:15:33 -0000</pubDate><guid>https://sourceforge.net96884e3adc38f678b7b6e76de02f66cfe63333ee</guid></item><item><title>PDFTextStripper not handling some Japanese</title><link>https://sourceforge.net/p/pdfbox/support-requests/25/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Using this code sequence: &lt;/p&gt;
&lt;p&gt;PDDocument document = PDDocument.load(stream);&lt;br /&gt;
PDFTextStripper stripper = new PDFTextStripper();&lt;br /&gt;
String contents = stripper.getText(document);&lt;/p&gt;
&lt;p&gt;some Japanese documents are handled properly. This is shown by viewing the chars in the String "contents".&lt;br /&gt;
However, other Japanese documents produce garbage non-Japanese characters as viewed in the String contents. &lt;/p&gt;
&lt;p&gt;The ones that are not handled properly in PDFTextStripper display a prompt when opened in Acrobat Reader which says that a Japanese language support pack needs to be installed to view the document properly. The ones that are handled properly display Japanese characters fine when viewed through Acrobat Reader. Installing the language support pack is not a solution since it would only resolve the display in Acrobat Reader. This code needs to run on a Unix server so even if the support pack would provide help on a PC (unlikely) it would have no affect on the task when run in Unix.&lt;/p&gt;
&lt;p&gt;This appears to be an encoding issue however, unlike similar issues that have been reported, the above code completes successfully. It is just that the results are as described above.&lt;/p&gt;
&lt;p&gt;Attached is an example of a PDF file that is not handled properly by PDFTextStripper and requires a Japanese language pack to view in Acrobat Reader.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">sflaumen</dc:creator><pubDate>Thu, 29 Nov 2007 15:33:43 -0000</pubDate><guid>https://sourceforge.net046e2332627d80d272c579fd9cf6068a56370f53</guid></item><item><title>File size shrinks after populating data into the fields</title><link>https://sourceforge.net/p/pdfbox/support-requests/24/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hi all,&lt;/p&gt;
&lt;p&gt;I am trying to populate the fields in a PDF with data using PDFBox API. I notice that the original document is around 124 KB and the newly populated pdf is only about 82 KB. I am trying to print this modified PDF and I cant print. But I can open the modified PDF in Adobe Reader and do a File --&amp;gt; Print. &lt;/p&gt;
&lt;p&gt;Has anyone seen this kind of a problem before ? Why would the file size be smaller after it gets populated. It should only be higher. &lt;/p&gt;
&lt;p&gt;Appreciate any responses... &lt;/p&gt;
&lt;p&gt;Thanks,&lt;br /&gt;
Chapsi&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">chapsi</dc:creator><pubDate>Wed, 12 Sep 2007 14:44:58 -0000</pubDate><guid>https://sourceforge.net54bc3bdea822ab27dd8b4fbaa297056a108e03ff</guid></item><item><title>Extracting text by ID</title><link>https://sourceforge.net/p/pdfbox/support-requests/23/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Attachment upload for:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://sourceforge.net/forum/forum.php?thread_id=1812274&amp;amp;forum_id=267205"&gt;http://sourceforge.net/forum/forum.php?thread_id=1812274&amp;amp;forum_id=267205&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Umkhulubaas</dc:creator><pubDate>Tue, 04 Sep 2007 19:51:01 -0000</pubDate><guid>https://sourceforge.nete601e8b025bdcc154e9ab8bee75f8708dd1ed29c</guid></item></channel></rss>