| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| 2025.05.22 source code.tar.gz | 2025-05-23 | 36.6 MB | |
| 2025.05.22 source code.zip | 2025-05-23 | 36.9 MB | |
| README.md | 2025-05-23 | 2.4 kB | |
| Totals: 3 Items | 73.6 MB | 0 | |
This release introduces significant enhancements to PDF processing, including image extraction and OCR integration, alongside major internal refactorings that modernize the core data flow and parser architecture. It also includes several fixes related to thread safety, library linking, and testing infrastructure, particularly improving test discovery and execution on Windows.
From PDFs, images now take flight,
Through refined chains, data flows bright.
With steadier tests and safer threads,
Docwire advances, new paths it treads.
🖼️🔗⚙️
- Features
- PDF Processing: Added extraction of images from PDF files. Extracted images can now be processed by subsequent chain elements, including content type detection and OCR.
- Writers: Updated HTML and plain text writers to support image tags, including data URLs for embedded images and text derived from OCR.
-
Testing: Implemented automatic tests for the new PDF image extraction and OCRing capabilities.
-
Improvements
- Core Architecture: Significantly refactored the data processing mechanism within chain elements. This modernizes the core data flow, enhances clarity on processing progression (continue, skip, stop), and allows more flexible tag emission, including parsers sending data back for reprocessing.
- Parser Architecture: Refactored parsers to directly implement
ChainElementand utilize enhanceddata_sourcechecks, eliminating theParserbase class. - Testing & CI: Enhanced CI by adding explicit runs of automatic test discovery to catch issues that
ctestmight silently ignore. - Testing & CI: Improved error reporting in CI by separating the execution of API automatic tests from example runs.
-
Code Organization: Moved
HTMLWriterandHTMLExporterclasses fromdocwire_coreto thedocwire_htmllibrary. -
Fixes
- Core: Ensured thread-safe initialization of parser MIME type vectors, including a specific fix for
PSTParser. - Testing: Resolved test discovery issues on Windows by implementing a custom
main()function for automatic tests instead of linkinggtest_main. - Build: Addressed linking issues with the
docwire_htmllibrary. - Build: Fixed
mailiolibrary linking to ensure compatibility with version 0.25.1 following a vcpkg upgrade.