The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
License
GNU Library or Lesser General Public License version 2.0 (LGPLv2)Follow DeDuplicator (Heritrix add-on)
Other Useful Business Software
World class QA, 100% done-for-you
MuukTest is a test automation service that combines our own proprietary, AI-powered software with expert QA services to help you achieve world class test automation at a fraction of the in-house costs.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of DeDuplicator (Heritrix add-on)!