Showing 252 open source projects for "data"

View related business solutions
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • Gearset | The complete Salesforce DevOps solution Icon
    Gearset | The complete Salesforce DevOps solution

    Salesforce DevOps done right.

    Gearset is the only platform you need for unparalleled deployment success, continuous delivery, automated testing and backups.
    Learn More
  • 1
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 2
    Anna’s Archive

    Anna’s Archive

    Comprehensive search engine for books, papers, comics, magazines

    ...It relies heavily on technologies such as Elasticsearch for search functionality and MariaDB for structured data storage, enabling fast and efficient querying across massive datasets. The system is designed with redundancy and replication in mind, allowing distributed deployments and mirrored environments to handle high traffic and large data volumes. It also includes tooling for importing datasets, managing metadata, and maintaining structured archives using custom formats.
    Downloads: 139 This Week
    Last Update:
    See Project
  • 3
    Mihomo

    Mihomo

    A simple Python Pydantic model for Honkai

    Mihomo is a Python client library leveraging Pydantic to model parsed Honkai: Star Rail user data from the Mihomo public API. It provides structured types, type hints, and convenience methods to fetch and transform player profiles, daily stats, and character details efficiently.
    Downloads: 194 This Week
    Last Update:
    See Project
  • 4
    dxy-covid-19-crawler

    dxy-covid-19-crawler

    Realtime crawler for COVID-19 outbreak statistics from DXY data

    ...DXY-COVID-19-Crawler automatically crawls data at regular intervals, typically every minute, ensuring that newly published statistics are captured as quickly as possible. Retrieved data is stored in MongoDB and archived so that the entire progression of the outbreak can be traced over time. It also provided an API that allowed developers to easily access the collected data for building dashboards, visualizations, and other analytical tools.
    Downloads: 18 This Week
    Last Update:
    See Project
  • SIEM | API Security | Log Management Software Icon
    SIEM | API Security | Log Management Software

    AI-Powered Security and IT Operations Without Compromise.

    Built on the Graylog Platform, Graylog Security is the industry’s best-of-breed threat detection, investigation, and response (TDIR) solution. It simplifies analysts’ day-to-day cybersecurity activities with an unmatched workflow and user experience while simultaneously providing short- and long-term budget flexibility in the form of low total cost of ownership (TCO) that CISOs covet. With Graylog Security, security analysts can:
    Learn More
  • 5
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    ...It can be used for data mining, monitoring and automated testing.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 6
    AIOHTTP

    AIOHTTP

    Asynchronous HTTP client/server framework for asyncio and Python

    ...Now it is possible by registering special signal handlers on every request processing stage. The main change is dropping yield from support and using async/await everywhere. Farewell, Python 3.4. You often want to send some sort of data in the URL’s query string. If you were constructing the URL by hand, this data would be given as key/value pairs in the URL after a question mark, e.g. httpbin.org/get?key=val. Requests allows you to provide these arguments as a dict, using the params keyword argument. aiohttp internally performs URL canonicalization before sending request.
    Downloads: 183 This Week
    Last Update:
    See Project
  • 7
    theHarvester

    theHarvester

    E-mails, subdomains and names

    ...Use it for open source intelligence (OSINT) gathering to help determine a company's external threat landscape on the internet. The tool gathers emails, names, subdomains, IPs and URLs using multiple public data sources.
    Downloads: 43 This Week
    Last Update:
    See Project
  • 8
    Helium Browser

    Helium Browser

    Private, fast, and honest web browser

    ...Helium blocks ads and trackers by default through an integrated, unbiased uBlock Origin extension prepackaged as a native browser component. Its UI and feature set emphasize minimalism, no “smart” recommendations, account sync, or background data collection, resulting in a distraction-free browsing experience that respects user autonomy. The browser is available across macOS, Linux, and Windows, each version built from a fully open source pipeline for reproducibility and trust. Development focuses on maintaining compatibility with modern web standards while decoupling Chromium from its Google dependencies and services.
    Downloads: 101 This Week
    Last Update:
    See Project
  • 9
    Dataproc Templates

    Dataproc Templates

    Dataproc templates and pipelines for solving simple in-cloud data task

    Dataproc templates are designed to address various in-cloud data tasks, including data import/export/backup/restore and bulk API operations. These templates leverage the power of Google Cloud's Dataproc, supporting both Dataproc Serverless and Dataproc clusters. Google provides this collection of pre-implemented Dataproc templates as a reference and for easy customization.
    Downloads: 13 This Week
    Last Update:
    See Project
  • Deliver and Track Online and Live Training Fast and Easy with Axis LMS! Icon
    Deliver and Track Online and Live Training Fast and Easy with Axis LMS!

    Axis LMS targets HR departments for employee or customer training,

    Axis LMS enables you to deliver learning and training everywhere through a flexible and easy-to-use LMS that is designed to enhance your training, automate your workflows, and engage your learners.
    Learn More
  • 10
    PostHog

    PostHog

    PostHog provides open-source web & product analytics

    PostHog is an all‑in‑one open‑source platform for product and web analytics—offering event-based analytics, session recording, feature flagging, A/B testing, cohorts, and more—that you can self‑host, with full support for data privacy and enterprise compliance. Sync data from external tools like Stripe, Hubspot, your data warehouse, and more. Query it alongside your product data. Run custom filters and transformations on your incoming data. Send it to 25+ tools or any webhook in real time or batch export large amounts to your warehouse. Capture traces, generations, latency, and cost for your LLM-powered app.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 11
    douyin

    douyin

    Open source Douyin crawler for collecting and downloading public data

    DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    ...WaterCrawl supports customizable extraction rules so users can focus only on relevant elements while ignoring unnecessary page components. WaterCrawl also offers real-time monitoring capabilities, allowing users to track crawling progress, performance metrics, and errors during large data collection jobs. Developers can integrate the tool into applications through a REST API and multiple client SDKs, enabling automated data pipelines and AI data preparation workflows.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    Weibo Crawler

    Weibo Crawler

    Python crawler for collecting and downloading Sina Weibo user data

    ...It also captures detailed data about each post, including the content, publishing time, topics, mentions, likes, reposts, and comments. In addition to textual data, the project can download original media from posts, such as images, videos, and Live Photo content. Collected data can be exported to structured formats such as CSV or JSON or stored in databases for further analysis and research.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    SearXNG

    SearXNG

    Free internet metasearch engine which aggregates

    ...Instead of maintaining its own index, it queries numerous external search providers and merges the results into a single interface, increasing coverage and diversity of information. One of its core principles is privacy, as it does not track users, store personal data, or create search profiles, making it a strong alternative to traditional search engines. . The system can be self-hosted, allowing individuals or organizations to run their own private search instance with full control over configuration and data handling. It supports extensive customization, including selecting which engines to query, filtering content, and adjusting ranking or display behavior. ...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 15
    openvpn-monitor

    openvpn-monitor

    openvpn-monitor is a web based OpenVPN monitor

    ...It typically runs on the same host as the OpenVPN server, however, it does not necessarily need to. OpenVPN-monitor is a web-based OpenVPN monitor, that shows current connection information, such as users, location, and data transferred.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    Linkedin Scraper

    Linkedin Scraper

    A library that scrapes Linkedin for user data

    Linkedin Scraper is a library that scrapes Linkedin for user data. Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper. The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    OpenWPM

    OpenWPM

    A web privacy measurement framework

    OpenWPM is a web privacy measurement framework that makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. OpenWPM is tested on Ubuntu 18.04 via TravisCI and is commonly used via the docker container that this repo builds, which is also based on Ubuntu. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 18
    hosts

    hosts

    Consolidate and extend hosts files from several well-curated sources

    ...Currently, we offer the following categories: fakenews, social, gambling, and porn. Extensions are optional, and can be combined in various ways with the base hosts file. The combined products are stored in the alternates folder. Data for extensions are stored in the extensions folder. You manage extensions by curating this folder tree, where you will find the data for fakenews, social, gambling, and porn extension data that we maintain and provide for you. Create an optional blacklist file. The contents of this file (containing a listing of additional domains in hosts file format) are appended to the unified hosts file during the update process. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 19
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    ...In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Algo VPN

    Algo VPN

    Set of Ansible scripts that simplifies the setup of a personal VPN

    ...For anyone who is privacy conscious, travels for work frequently, or can’t afford a dedicated IT department, this one’s for you. Really, the paid-for services are just commercial honeypots. If an attacker can compromise a VPN provider, they can monitor a whole lot of sensitive data. Paid-for VPNs tend to be insecure: they share keys, their weak cryptography gives a false sense of security, and they require you to trust their operators. Even if you’re not doing anything wrong, you could be sharing the same endpoint with someone who is. In that case, your network traffic will be analyzed when law enforcement makes that seizure.
    Downloads: 60 This Week
    Last Update:
    See Project
  • 21
    FinalRecon

    FinalRecon

    All-in-one Python web reconnaissance tool for fast target analysis

    FinalRecon is an all-in-one web reconnaissance tool written in Python that helps security professionals gather information about a target website quickly and efficiently. It combines multiple reconnaissance techniques into a single command-line utility so users do not need to run several separate tools to collect similar data. FinalRecon focuses on providing a fast overview of a web target while maintaining accuracy in the collected results. It includes modules for gathering server information, analyzing SSL certificates, performing WHOIS lookups, and crawling website resources. FinalRecon can also enumerate DNS records, discover subdomains, search for directories and files, and scan common network ports. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 23
    finvizfinance

    finvizfinance

    Finviz analysis python library

    ...Stock charts, fundamental & technical information, insider information and stock news. Forex charts and performance. Crypto charts and performance. Screener and Group provide data frames for comparing stocks according to different filters and trading signals. Getting information (fundament, description, outer rating, stock news, inside trader) of an individual stock.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 24
    SEO Machine

    SEO Machine

    A specialized Claude Code workspace for creating long-form

    ...The system uses specialized commands and agents to perform tasks such as keyword research, competitor analysis, content drafting, and optimization. It incorporates real data sources like Google Analytics and Search Console to guide decision-making and improve content effectiveness. The architecture emphasizes context-awareness, using brand voice, style guides, and keyword strategies to maintain consistency across outputs. It also includes performance evaluation tools that score content and suggest improvements before publishing.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    Snoop Project

    Snoop Project

    This is the most powerful software taking into account CIS location

    ...Snoop is a research work (own database / closed bugbounty) in the field of searching and processing public data on the Internet. In terms of specialized search, Snoop is able to compete with traditional search engines.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB