Showing 49 open source projects for "processing"

View related business solutions
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • The sales CRM that makes your life easy, so all you have to do is sell. Icon
    The sales CRM that makes your life easy, so all you have to do is sell.

    The simpler way to sell

    Welcome to the simpler way to sell. Pipedrive is CRM software that makes your life easy, for less legwork and more sales. Let us track your sales conversations, eliminate admin tasks, get you more leads and uncover how you win, because your day belongs to you. Join more than 100,000 sales teams around the world that use the CRM rated #1 by SoftwareReviews in 2019. Start your free 14-day trial and get full access – no credit card needed.
    Try it free
  • 1
    Python API for JMComic

    Python API for JMComic

    Python crawler and API for downloading JMComic albums and images

    ...It includes command-line functionality and configuration files so users can customize download behavior, directory structures, and performance settings without modifying code. It also supports plugin-based extensions that allow additional processing.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 2
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    ...MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports image processing tasks such as downloading and cropping artwork used by media centers. It includes several interfaces, allowing users to operate it through a graphical desktop application, a browser-based web interface, or command-line utilities depending on their workflow. Its architecture separates core scraping logic from the user interfaces, allowing the same metadata processing system to be reused across different modes.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    AIOHTTP

    AIOHTTP

    Asynchronous HTTP client/server framework for asyncio and Python

    ...A long awaited new feature is tracing client request life cycle to figure out when and why client request spends a time waiting for connection establishment, getting server response headers etc. Now it is possible by registering special signal handlers on every request processing stage. The main change is dropping yield from support and using async/await everywhere. Farewell, Python 3.4. You often want to send some sort of data in the URL’s query string. If you were constructing the URL by hand, this data would be given as key/value pairs in the URL after a question mark, e.g. httpbin.org/get?key=val. Requests allows you to provide these arguments as a dict, using the params keyword argument. aiohttp internally performs URL canonicalization before sending request.
    Downloads: 183 This Week
    Last Update:
    See Project
  • 4
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version 0.3 changed the requests serialization from marshal to cPickle, therefore persisted requests using version 0.2 will not able to work on 0.3. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Electronic Lab Notebook (ELN) Software Icon
    Electronic Lab Notebook (ELN) Software

    Ideal for any lab. Whether you’re just starting up, a small or large academic institution, or a globally operating company.

    eLabJournal is an all-in-one Electronic Lab Notebook (ELN) software that includes sample tracking and protocol management modules.
    Learn More
  • 5
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first by avoiding the noise caused by recurring elements (headers, footers, links/blogroll etc.) and second by including information such as author and date in order to make sense of the data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    newspaper4k

    newspaper4k

    Python library for scraping and analyzing online news articles easily

    ...Newspaper4k also includes natural language processing capabilities that can generate summaries and identify keywords from extracted article text. Newspaper4k supports both single-article extraction and full news site processing, allowing users to build sources representing entire publications and iterate through their articles. It maintains compatibility with the original project so that existing code written for newspaper3k can continue working with minimal changes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    ...Several scripts also incorporate multi-threading and proxy usage to improve scraping efficiency and help avoid common anti-scraping limitations. In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Snoop Project

    Snoop Project

    This is the most powerful software taking into account CIS location

    ...Snoop project is developed without taking into account the opinions of the NSA and their friends, that is, it is available to the average user. Snoop is a research work (own database / closed bugbounty) in the field of searching and processing public data on the Internet. In terms of specialized search, Snoop is able to compete with traditional search engines.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    douyin

    douyin

    Open source Douyin crawler for collecting and downloading public data

    DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Award Winning Time and Labor Software Icon
    Award Winning Time and Labor Software

    Synerion offers time and labor, advanced scheduling, absence management, labor allocation, timesheets, coreHR and more.

    Stop wasting time and resources on manual and error-prone paper-based workforce management with Synerion. Synerion offers a comprehensive range of workforce management solutions that goes beyond time and tracking. The platform also offers enhanced scheduling features, labor costing, absence management, and payroll integration.
    Learn More
  • 10
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website....
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    Grab is a python framework for building web scrapers. With Grab you can build web scrapers of various complexity, from simple 5-line scripts to complex asynchronous website crawlers processing millions of web pages. Grab provides an API for performing network requests and for handling the received content e.g. interacting with DOM tree of the HTML document. The single request/response API that allows you to build network request, perform it and work with the received content. The API is built on top of urllib3 and lxml libraries. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Tweepy

    Tweepy

    Twitter for Python

    An easy-to-use Python library for accessing the Twitter API. You can also use Git to clone the repository from GitHub to install the latest development version. The easiest way to install the latest version from PyPI is by using pip. Twitter requires all requests to use OAuth for authentication. The API class provides access to the entire twitter RESTful API methods. Each method can accept various parameters and return responses. When we invoke an API method most of the time returned back to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    news-please

    news-please

    Python tool for crawling and extracting structured data from news site

    ...It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a site. It combines several established technologies and libraries to perform web crawling and content extraction, enabling reliable processing across a wide range of news sources. Developers can use the software either as a standalone command line application or integrate it into their own Python applications through its library interface. Extracted article data can be stored in different formats and systems, including JSON files or database-backed storage solutions.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    diskover-community

    diskover-community

    Open source file indexing & storage analytics powered by Elasticsearch

    Diskover Community Edition is an open source file system indexing and storage analytics platform designed to help organizations understand and manage large volumes of file data. It crawls file systems and indexes metadata using Elasticsearch, enabling fast search, analysis, and organization of files stored across different storage systems. It allows administrators and users to explore file structures, monitor storage usage, and gain insights into how data is distributed across...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DNS Crawler

    DNS Crawler

    A Bulk Domain Assessment Tool

    DNS Crawler is a lightweight, Python-based utility designed for efficient batch processing and assessment of internet domain names. It reads from a list of domains formatted as: domain_name <tab> or ; optional_comment and generates a detailed, Excel-compatible CSV report with columns including: DOMAIN: Domain name REG: Registrar SOA, NS, MX, TXT, SPF, DMARC, MS, A, PTR: Common DNS records for comprehensive domain analysis NOTE: Optional comments from the original input file Whether you're managing multiple domains or auditing for cross-platform DNS consistency, DNS Crawler simplifies the process, offering a clear, structured output for easy review and reporting.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    apache-logs-to-mysql

    Apache Log Parser and Data Normalization Application

    Apache Log Parser and Data Normalization Application Python handles File Processing & MySQL handles Data Processing ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema to automate importing Access & Error files and normalizing data into database designed for reports & data analysis. Runs on Windows, Linux and MacOS & tested with MySQL versions 8.0.39, 8.4.3, 9.0.0 & 9.1.0. 4 LogFormats & 2 ErrorLogFormats can be loaded and 5 MySQL Stored Procedures can be processed in a single Python `ProcessLogs function` execution. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages. Once trained, the generated scraper...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    pspider

    pspider

    Simple Python framework for building multithreaded web crawlers

    PSpider is a lightweight web crawling framework written in Python designed to simplify the development of custom web spiders. It focuses on providing an easy-to-understand architecture while still supporting concurrent crawling for improved performance. It uses a multithreaded model that separates the crawling workflow into several components responsible for fetching, parsing, and saving data. Tasks are managed through queues, allowing different parts of the crawler to process work...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    ruia

    ruia

    Async Python framework for fast and flexible web scraping spiders

    ...Developers can define structured fields to extract information from HTML content and process responses asynchronously to improve crawling performance. It also supports middleware and plugin systems that allow customization of request handling, response processing, and additional functionality.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 20
    TRACARDI - Customer Data Platform

    TRACARDI - Customer Data Platform

    TRACARDI free open-source customer data platform

    ...TRACARDI with is API first approach enables you to collect data from multiple channels. Regardless if it is web site, mobile app or CRM system open Api let you send data for further processing. Integrate data into one consistent user profile. TRACARDI is free open-source platform which you can extend the way you want it. TRACARDI is a low-code framework you can integrate with other parts of your ecosystem. Integrate it with other open source platforms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    ...This makes the repository both a scraping example and a small data analysis experiment built around the collected content. Overall, mzitu serves as a learning-oriented implementation of Python web scraping, data processing, and visualization techniques.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Twitter Intelligence

    Twitter Intelligence

    Twitter Intelligence OSINT project performs tracking and analysis

    ...SQLite is used as the database. Tweet data is stored on the Tweet, User, Location, Hashtag, HashtagTweet tables. The database is created automatically. analysis.py performs analysis processing. User, hashtag, and location analyzes are performed. You must write Google Map API Key in setting.py to display Google Maps.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24

    survol

    RDF-based framework monitoring business systems activity

    ...all communicating with each other, manipulating your data, and whose software architecture has become, with time, complicated, difficult to understand, and undocumented. Data are aggregated with an RDF inference engine, creating a global vision of the business information processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    uncaptcha

    uncaptcha

    Defeating Google's audio reCaptcha with 85% accuracy

    ...By combining outputs from several transcription engines, the system increases the likelihood of correctly identifying the spoken digits or phrases required to solve the challenge. It employs signal processing techniques such as segmenting audio clips into individual components before transcription, which improves accuracy in noisy or complex audio conditions. The project was developed as part of academic research to highlight potential weaknesses in CAPTCHA systems and includes disclaimers emphasizing responsible use. While it achieved high success rates at the time of publication, later updates to reCAPTCHA have reduced its effectiveness.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB