duplicate free download

Showing 3 open source projects for "duplicate"

View related business solutions

Business Python Clear Filters & Widen Search

Next-generation security awareness training. Built for AI email phishing, vishing, smishing, and deepfakes.
Track your GenAI risk, run multichannel deepfake simulations, and engage employees with incredible security training.

Assess how your company's digital footprint can be leveraged by cybercriminals. Identify the most at-risk individuals using thousands of public data points and take steps to proactively defend them.

Learn More
Intelligent Automation Solutions Built for Modern Finance Teams
We do CFO stuff.

Digitally transform your business with workflow automation and integrated payment solutions. Digitally store and secure your data with advanced search and accessibility features that keeps your documents at the tip of your team’s fingers.

Learn More
1

ydata-profiling

Create HTML profiling reports from pandas DataFrame objects

ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.

Downloads: 5 This Week

Last Update: 3 days ago
See Project
2

The Timeline Project

Cross-platform app for displaying and navigating events on a timeline.

The Timeline Project aims to create a free, cross-platform application for displaying and navigating events on a timeline.

46 Reviews

Downloads: 207 This Week

Last Update: 3 days ago
See Project
3

text-dedup

All-in-one text de-duplication

text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible deduplication strategies, making it ideal for cleaning web-scraped data, language model training datasets, or document archives.

Downloads: 0 This Week

Last Update: 2025-04-08
See Project