Showing 448 open source projects for "data"

View related business solutions
  • Job Evaluation and Talent Management Software Icon
    Job Evaluation and Talent Management Software

    For human resources departments in search of a tool to manage time, expenses, leave, documents, recruitment, and onboarding

    Encompassing Visions (ENCV), industry-leading job evaluation and pay equity software, is the best choice for organizations requiring transparent, comprehensive, and objective Job Evaluation software designed to help them ensure equal pay for work of equal value.
    Learn More
  • Powering the next decade of business messaging | Twilio MessagingX Icon
    Powering the next decade of business messaging | Twilio MessagingX

    For organizations interested programmable APIs built on a scalable business messaging platform

    Build unique experiences across SMS, MMS, Facebook Messenger, and WhatsApp – with our unified messaging APIs.
    Learn More
  • 1
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 91 This Week
    Last Update:
    See Project
  • 2
    Great Expectations

    Great Expectations

    Always know what to expect from your data

    Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams. Expectations are assertions for data.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    TOML

    TOML

    Tom Preston-Werner's obvious, minimal language

    ...TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages. TOML shares traits with other file formats used for application configuration and data serialization, such as YAML and JSON. TOML and JSON both are simple and use ubiquitous data types, making them easy to code for or parse with machines. TOML and YAML both emphasize human readability features, like comments that make it easier to understand the purpose of a given line. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 12 This Week
    Last Update:
    See Project
  • Assembled is the only unified platform for staffing and managing your human and AI support team. Icon
    Assembled is the only unified platform for staffing and managing your human and AI support team.

    AI for world-class support operations

    Assembled is the only platform that unifies AI agents and intelligent workforce management to power fast and flexible support operations. Built for scale, we help teams automate over 50% of customer interactions, forecast with 90%+ accuracy, and optimize staffing across in-house and BPO teams. Orchestrate every chat, email, or call, balancing workloads between human and AI agents in real time — without sacrificing quality or control. Trusted by Stripe, Canva, and Robinhood, Assembled transforms support from a cost center into a strategic advantage. Our Workforce and Vendor Management tools connect forecasting, scheduling, and performance for smarter staffing decisions. AI Agents automate conversations across channels with your workflows and brand voice. AI Copilot empowers agents with real-time guidance, suggested replies, and one-click actions for faster, higher-quality resolutions.
    Learn More
  • 5
    FreeTAKServer

    FreeTAKServer

    Situational Awareness Server compatible with TAK clients

    FTS is a Python3 implementation of a TAK Server for devices like ATAK, WinTAK, and ITAK, it is cross-platform and runs from a multi-node installation on AWS down to the Android edition. It's free and open source (released under the Eclipse Public License. FTS allows you to connect ATAK clients to share geo-information, to chat with all the connected clients, exchange files and more. It intends to support all the major use cases of the original TAK server.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    lxml

    lxml

    The lxml XML toolkit for Python

    A Python library for efficient XML and HTML processing, known for speed and compatibility. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all CPython versions from 3.6 to 3.12. See the introduction for more information about the...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 7
    Memvid

    Memvid

    Video-based AI memory library. Store millions of text chunks in MP4

    Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 8
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding. High correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik). Most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...
    Downloads: 18 This Week
    Last Update:
    See Project
  • RouteGenie NEMT software Icon
    RouteGenie NEMT software

    Modern software for non-emergency medical transportation providers, built to improve scheduling, billing, routing, and dispatching processes.

    RouteGenie NEMT software is a modern system built to automate all non-emergency medical transportation processes including routing, scheduling, dispatching, and billing. It helps manage everyday challenges like vehicle breakdowns, traffic problems, cancelations, driver call-offs, will calls, no shows, add-on trips, on-demand trips, and more.
    Learn More
  • 10
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Cortex Analyzers

    Cortex Analyzers

    Cortex Analyzers Repository

    Analyzers can be written in any programming language supported by Linux such as Python, Ruby, Perl, etc. Refer to the How to Write and Submit an Analyzer page for details on how to write and submit one.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    msgspec

    msgspec

    A fast serialization and validation library, with builtin

    msgspec is a fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 14
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    TexText

    TexText

    Re-editable LaTeX/ typst graphics for Inkscape

    Re-editable LaTeX and typst graphics for Inkscape. TexText is a Python extension for the vector graphics editor Inkscape providing the possibility to add and re-edit LaTeX and typst generated SVG elements to your drawing.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    OSCAL

    OSCAL

    Open Security Controls Assessment Language (OSCAL)

    ...Public contributions to this project are welcome. With this effort, we are stressing the agile development of a set of minimal formats that are generic enough to capture the breadth of data in scope (controls specifications), while also capable of ad-hoc tuning and extension to support peculiarities of both (industry or sector) standards and new control types. The OSCAL website provides an overview of the OSCAL project, including an XML and JSON schema reference, examples, and other resources.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Extract TOTP/HOTP secrets

    Extract TOTP/HOTP secrets

    Extract one time password (OTP) secrets from QR codes

    The Python script extract_otp_secrets.py extracts one-time password (OTP) secrets from QR codes exported by two-factor authentication (2FA) apps such as "Google Authenticator".
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Awkward Array

    Awkward Array

    Manipulate JSON-like data with NumPy-like idioms

    Awkward Array is a library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy-like idioms. Arrays are dynamically typed, but operations on them are compiled and fast. Their behavior coincides with NumPy when array dimensions are regular and generalizes when they're not.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    openvpn-monitor

    openvpn-monitor

    openvpn-monitor is a web based OpenVPN monitor

    ...It typically runs on the same host as the OpenVPN server, however, it does not necessarily need to. OpenVPN-monitor is a web-based OpenVPN monitor, that shows current connection information, such as users, location, and data transferred.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Rapid LaTeX OCR

    Rapid LaTeX OCR

    Formula recognition based on LaTeX-OCR and ONNXRuntime

    Formula recognition based on LaTeX-OCR and ONNXRuntime. rapid_latex_ocr is a tool to convert formula images to latex format. The reasoning code in the repo is modified from LaTeX-OCR, the model has all been converted to ONNX format, and the reasoning code has been simplified, Inference is faster and easier to deploy. The repo only has codes based on ONNXRuntime or OpenVINO inference in onnx format and does not contain training model codes. If you want to train your own model, please move to...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    pyserde

    pyserde

    Yet another serialization library on top of dataclasses

    Yet another serialization library on top of data classes, inspired by serde-rs. Declare a class with pyserde's @serde decorator.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Texify

    Texify

    Math OCR model that outputs LaTeX and markdown

    Texify is an OCR model that converts images or pdfs containing math into markdown and LaTeX that can be rendered by MathJax ($$ and $ are delimiters). It can run on CPU, GPU, or MPS.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    granary

    granary

    The social web translator

    The social web translator. Fetches and converts data between social networks, HTML and JSON with microformats2, ActivityStreams/ActivityPub, Atom, JSON Feed, and more. Granary is a library and REST API that fetches and converts between a wide variety of social data sources and formats. Free yourself from silo API chaff and expose the sweet social data foodstuff inside in standard formats and protocols.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB