Showing 70 open source projects for "python data analysis"

View related business solutions
  • Ango Hub | All-in-one data labeling platform Icon
    Ango Hub | All-in-one data labeling platform

    For AI teams and Computer Vision team in organizations of all size

    AI-Assisted features of the Ango Hub will automate your AI data workflows to improve data labeling efficiency and model RLHF, all while allowing domain experts to focus on providing high-quality data.
    Learn More
  • Component Content Management System for Software Documentation Icon
    Component Content Management System for Software Documentation

    Great tool for serious technical writers

    Paligo is an end-to-end Component Content Management System (CCMS) solution for technical documentation, policies and procedures, knowledge management, and more.
    Learn More
  • 1
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    ...Using text segmentation and frequency analysis, the project can create a word cloud representing common keywords found in the dataset. This makes the repository both a scraping example and a small data analysis experiment built around the collected content. Overall, mzitu serves as a learning-oriented implementation of Python web scraping, data processing, and visualization techniques.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Twitter Intelligence

    Twitter Intelligence

    Twitter Intelligence OSINT project performs tracking and analysis

    A project written in Python for Twitter tracking and analysis without using Twitter API. This project is a Python 3.x application. The package dependencies are in the file requirements.txt. Run that command to install the dependencies. SQLite is used as the database. Tweet data is stored on the Tweet, User, Location, Hashtag, HashtagTweet tables. The database is created automatically. analysis.py performs analysis processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    WeChatSogou

    WeChatSogou

    Python library to crawl and retrieve data from WeChat accounts

    WechatSogou is an open source Python library designed to retrieve data from WeChat official accounts by using the Sogou WeChat search service as its data source. It provides developers with a programmatic way to search for public accounts and collect article information without manually browsing the search interface. It functions as a crawler interface that sends requests to the search engine, retrieves results, and converts the returned pages into structured data that can be used in applications or analysis pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    pyspider

    pyspider

    A powerful Spider(Web Crawler) system in Python

    pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Intelligent testing agents | Checksum.ai Icon
    Intelligent testing agents | Checksum.ai

    Checksum generates, runs, and maintains end-to-end tests automatically so your team ships with confidence as code output grows.

    Coding agents write the code. Checksum runs it—continuously testing against real APIs, real data, real edge cases—before it ever reaches production.
    Learn More
  • 5
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Toapi

    Toapi

    Convert websites into structured APIs automatically with Python tool

    Toapi is a Python library designed to transform ordinary websites into usable API services. Instead of building a traditional web crawler that collects and stores data before exposing it through an API, Toapi simplifies the process by allowing developers to define data structures that automatically generate an API layer from existing web pages. It works by parsing HTML content from a source site and mapping selected elements into structured data that can be returned as JSON through API endpoints. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    haipproxy

    haipproxy

    Distributed proxy IP pool for web crawlers using Scrapy and Redis

    ...It automatically crawls proxy resources from the internet and aggregates them into a centralized pool that can be accessed by distributed spiders and scraping systems. It is built using Python and relies on Scrapy for high-performance crawling while Redis is used for data storage, communication, and task coordination between components. It includes crawlers that discover proxy servers, validators that test proxy availability and performance, and schedulers that manage crawling and validation tasks. HAipproxy aims to maintain a high availability proxy pool with low latency so that scraping frameworks can rotate proxies efficiently and avoid blocking during large-scale data collection. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Perl Web Scraping Project

    Perl Web Scraping Project

    Perl Web Scraping Project

    ...It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis. Web scraping a web page involves fetching it and extracting from it.[1][2] Fetching is the downloading of a page (which a browser does when you view the page). Therefore, web crawling is a main component of web scraping, to fetch pages for later processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. Newer version and smaller file size can be found at: https://sourceforge.net/projects/dstk3/ It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • All-in-One Mental Health EHR Icon
    All-in-One Mental Health EHR

    Simplify your systems. Strengthen your cash flow. Start fresh with Ensora Health

    Ensora Health’s Mental Health EHR is designed for mental health professionals, therapists, and practice managers looking for a secure, user-friendly solution to streamline administrative tasks and improve efficiency in their practice management
    Learn More
  • 10
    JAWS - Just Another Web Scraper

    JAWS - Just Another Web Scraper

    A simple Web Scraper using Regular Expression or Html Agility

    JAWS or Just Another Web Scraper, is part of the Data Scraping Softwares developed by SVbook, alongside JATI (Image to Text) and JAVT (Video to Text). JAWS offer easy interface to scrape data from the website using regular expression, text preprocessing, or HTML Agility Pack.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    sqliv

    sqliv

    Massive SQL injection vulnerability scanner for automated web testing

    SQLiv is a command-line security tool designed to identify SQL injection vulnerabilities in web applications through automated scanning techniques. Written primarily in Python, the project focuses on discovering potentially vulnerable web pages by analyzing URLs that contain database query parameters. It can perform large-scale scanning by using search engine queries known as SQL injection dorks to collect candidate websites and then test them for vulnerabilities. In addition to bulk scanning, SQLiv supports targeted analysis of specific domains or individual URLs, allowing security researchers to focus on particular web applications. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    Simple-Scrape is a simple web-scraping library that allows for programmatic access to HTML code. No further techniques are needed and the library is very compact and thus easy to use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    newsscrape

    news headline collecting for analysis in determining the category

    newsscrape is web scraping for news headline to analyse on how it relates to a news category. - It extracts RSS feed from Google News. - Each news headline is matched against Google News category like Entertainment, Sports, etc. - Called from scheduler to collect this data at 5 minutes interval and be accumulated in a database. - It contains R statistical computing scripts to learn the pattern on words in the headline resulting a particular category. - To test its accuracy in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    IAD dispatch web scraper

    A very simple web scraper for taxi dispatch data.

    Introduction: The Dulles International Airport (IAD) near Washington, D.C. has a taxi service provided by the Washington Flyer. Taxi cabs are leased by drivers and rides are regulated using a queue system. Drivers enter a corral near the Arrival gate and wait for dispatchers to announce passengers. There is a website that displays useful information about the queue. The number of taxis waiting in queue, the wait time of the last vehicle out, and the number of taxis to exit the corral in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    python-web_excavator

    Genral Data Mining API: Only write html parsing code.

    A general web scraper that uses the requests library to communicate with the website. Scraper() contains a parser object, which you can add parsing handles to. ParseHandle() is the code mining for you data from an html source. Repo: https://github.com/crispycret/web_excavator
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    Domain Analyzer Security Tool

    Finds all the security information for a given domain name

    Domain analyzer is a security analysis tool which automatically discovers and reports information about the given domain. Its main purpose is to analyze domains in an unattended way.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17

    Job Crawler

    Job Data Collection - Web Crawler

    ...Moreover, program is going to reply on these figures, and performs a detailed analysis for the employment situation of the states of the USA. What is the hot job in your state? This report is going to explain how to design and implement solution for Job data collection system. It also includes some links for source code, class diagram, algorithm
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    Web Crawler Security Tool

    A web crawler oriented to information security.

    Last update on tue mar 26 16:25 UTC 2012 The Web Crawler Security is a python based tool to automatically crawl a web site. It is a web crawler oriented to help in penetration testing tasks. The main task of this tool is to search and list all the links (pages and files) in a web site. The crawler has been completely rewritten in v1.0 bringing a lot of improvements: improved the data visualization, interactive option to download files, increased speed in crawling, exports list of found files into a separated file (useful to crawl a site once, then download files and analyse them with FOCA), generate an output log in Common Log Format (CLF), manage basic authentication and more! ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    A toolkit for crawling information from web pages by combining different kinds of "actions". Actions are simple operations such as navigation to a specified url or extraction of text from the html. Also available is a graphic user interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    studiMaps is a web based application for visualization and analysis of social networks. It consists of two software components: a web-crawler for getting data and the web based application for visualization.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB