The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.

Project Activity

See All Activity >

License

GNU Library or Lesser General Public License version 2.0 (LGPLv2)

Follow DeDuplicator (Heritrix add-on)

DeDuplicator (Heritrix add-on) Web Site

Other Useful Business Software
World class QA, 100% done-for-you Icon
World class QA, 100% done-for-you

For engineering teams in search of a solution to design, manage and maintain E2E tests for their apps

MuukTest is a test automation service that combines our own proprietary, AI-powered software with expert QA services to help you achieve world class test automation at a fraction of the in-house costs.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DeDuplicator (Heritrix add-on)!

Additional Project Details

Languages

English

Intended Audience

Advanced End Users, Developers, System Administrators

User Interface

Plugins

Programming Language

Java

Related Categories

Java Internet Software, Java Web Scrapers

Registered

2006-11-06