Search Results for "data collection algorithm"

Showing 315 open source projects for "data collection algorithm"

View related business solutions
  • Self-hosted n8n: No-code AI workflows Icon
    Self-hosted n8n: No-code AI workflows

    Connect workflows. Integrate data

    A free-to-use workflow automation tool, n8n lets you connect all your apps and data in one customizable, no-code platform. Design workflows and process data from a simple, unified dashboard.
    Learn More
  • Outplacement, Executive Coaching and Career Development | Careerminds Icon
    Outplacement, Executive Coaching and Career Development | Careerminds

    Careerminds outplacement includes personalized coaching and a high-tech approach to help transition employees back to work faster.

    By helping to avoid the potential risks of RIFs or layoffs through our global outplacement services, companies can move forward with their goals while preserving their internal culture, employer brand, and bottom lines.
    Learn More
  • 1
    Pythonic Data Structures and Algorithms

    Pythonic Data Structures and Algorithms

    Minimal examples of data structures and algorithms in Python

    The Pythonic Data Structures and Algorithms repository by keon is a hands-on collection of implementations of classical data structures and algorithms written in Python. It offers working, often well-commented code for many standard algorithmic problems — from sorting/searching to graph algorithms, dynamic programming, data structures, and more — making it a valuable resource for learning and reference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    X's Recommendation Algorithm

    X's Recommendation Algorithm

    Source code for the X Recommendation Algorithm

    The Algorithm is Twitter’s open source release of the core ranking system that powers the platform’s home timeline. It provides transparency into how tweets are selected, prioritized, and surfaced to users, reflecting Twitter’s move toward openness in recommendation algorithms. The repository contains the recommendation pipeline, which incorporates signals such as engagement, relevance, and content features, and demonstrates how they combine to form ranked outputs.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Polymarket Data

    Polymarket Data

    Polymarket Data Retriever that fetches, processes, and structures data

    ...It begins by fetching market metadata such as questions, outcomes, and trading volumes, then proceeds to scrape order-filled events from a GraphQL-based subgraph, and finally transforms these raw events into structured trade-level records with calculated prices and directions. One of its key strengths is its ability to run incrementally and resume operations automatically, making it suitable for long-running data collection without duplication or data loss.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    Data Science Articles from CodeCut

    Data Science Articles from CodeCut

    Collection of useful data science topics along with articles

    The Data-science repository from CodeCutTech is a curated collection of educational content focused on practical tools and workflows used in modern data science projects. Instead of providing a single software package, the repository aggregates articles, tutorials, and examples covering many topics within the data science ecosystem. The materials address areas such as MLOps, data management, project organization, testing practices, visualization techniques, and productivity tools used by data scientists. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure Online Fax and Business Text Messaging Service Icon
    Secure Online Fax and Business Text Messaging Service

    Elevate your business communications with secure SMS and fax solutions.

    Send and receive SMS and fax online, from email, app or with our developer friendly SMS & fax API. HIPAA compliant & ISO 27001 certified. Outstanding value and 5-star service.
    Learn More
  • 5
    how-to-optim-algorithm-in-cuda

    how-to-optim-algorithm-in-cuda

    How to optimize some algorithm in cuda

    how-to-optim-algorithm-in-cuda is an open educational repository focused on teaching developers how to optimize algorithms for high-performance execution on GPUs using CUDA. The project combines technical notes, code examples, and practical experiments that demonstrate how common computational kernels can be optimized to improve speed and memory efficiency. Instead of presenting only theoretical explanations, the repository includes hand-written CUDA implementations of fundamental operations...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Data Science Interviews

    Data Science Interviews

    Data science interview questions and answers

    Data Science Interviews is an open-source repository that collects common data science interview questions along with community-provided answers and explanations. The project serves as a preparation resource for students, job seekers, and professionals who want to review the technical knowledge required for data science roles. The repository organizes questions into different categories including theoretical machine learning concepts, technical programming questions, and probability or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    cracking-the-data-science-interview

    cracking-the-data-science-interview

    A Collection of Cheatsheets, Books, Questions, and Portfolio

    Cracking the Data Science Interview is an open educational repository that collects study materials, resources, and reference links for preparing for data science interviews. The project organizes content across many fundamental areas of data science, including statistics, probability, SQL, machine learning, and deep learning. It includes cheat sheets that summarize important technical concepts commonly discussed during technical interviews. The repository also provides links to recommended...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 9
    D4RL

    D4RL

    Collection of reference environments, offline reinforcement learning

    ...Researchers can load a dataset for a given task (e.g., maze navigation, manipulation) and apply their algorithm without the need to collect fresh transitions, which accelerates experimentation and comparison. The API is based on Gymnasium (via gym.make) and each environment also exposes a method get_dataset() that returns the offline data to learn from. The repository emphasizes open science, reproducibility, and benchmarking at scale, making it easier to compare algorithms on equal footing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Wiz: #1 Cloud Security Software for Modern Cloud Protection Icon
    Wiz: #1 Cloud Security Software for Modern Cloud Protection

    Protect Everything You Build and Run in the Cloud

    Use the Wiz Cloud Security Platform to build faster in the cloud, enabling security, dev and devops to work together in a self-service model built for the scale and speed of your cloud development.
    Learn More
  • 10
    harmonypy

    harmonypy

    Integrate multiple high-dimensional datasets with fuzzy k-means

    Harmony is an algorithm for integrating multiple high-dimensional datasets. harmonypy is a port of the harmony R package by Ilya Korsunsky. Harmony is a general-purpose R package with an efficient algorithm for integrating multiple data sets. It is especially useful for large single-cell datasets such as single-cell RNA-seq.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    The Grand Complete Data Science Guide

    The Grand Complete Data Science Guide

    Data Science Guide With Videos And Materials

    The Grand Complete Data Science Materials is a repository curated by a data-science educator that aggregates a wide range of learning resources — from basic programming and math foundation to advanced topics in machine learning, deep learning, natural language processing, computer vision, and deployment practices — into a structured, centralized collection aimed at learners seeking a comprehensive path to data science mastery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    ...Several scripts also incorporate multi-threading and proxy usage to improve scraping efficiency and help avoid common anti-scraping limitations. In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    ML-NLP

    ML-NLP

    This project is a common knowledge point and code implementation

    ML-NLP is a large open-source repository that collects theoretical knowledge, practical explanations, and code examples related to machine learning, deep learning, and natural language processing. The project is designed primarily as a learning resource for algorithm engineers and students preparing for technical interviews in machine learning or NLP roles. It compiles important concepts that frequently appear in machine learning discussions, including neural network architectures, training...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    AI Hedge Fund

    AI Hedge Fund

    An AI Hedge Fund Team

    This repository demonstrates how to build a simplified, automated hedge fund strategy powered by AI/ML. It integrates financial data collection, preprocessing, feature engineering, and predictive modeling to simulate decision-making in trading. The code shows workflows for pulling stock or market data, applying machine learning algorithms to forecast trends, and generating buy/sell/hold signals based on the predictions. Its structure is educational: intended more as a proof-of-concept than a ready-to-use financial product, giving learners insight into the mechanics of quantitative finance automation. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    openTSNE

    openTSNE

    Extensible, parallel implementations of t-SNE

    openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction algorithm for visualizing high-dimensional data sets. openTSNE incorporates the latest improvements to the t-SNE algorithm, including the ability to add new data points to existing embeddings [2], massive speed improvements [3] [4] [5], enabling t-SNE to scale to millions of data points, and various tricks to improve the global alignment of the resulting visualizations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Deepchecks

    Deepchecks

    Test Suites for validating ML models & data

    Deepchecks is the leading tool for testing and for validating your machine learning models and data, and it enables doing so with minimal effort. Deepchecks accompany you through various validation and testing needs such as verifying your data’s integrity, inspecting its distributions, validating data splits, evaluating your model and comparing between different models. While you’re in the research phase, and want to validate your data, find potential methodological problems, and/or validate your model and evaluate it. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Anna’s Archive

    Anna’s Archive

    Comprehensive search engine for books, papers, comics, magazines

    Anna’s Archive is a large-scale open-source search engine and data aggregation platform designed to index and provide access to a vast collection of books, academic papers, comics, magazines, and other digital texts through a unified interface. The project includes all the infrastructure required to run a full instance locally or in production, combining web servers, databases, and search indexing systems into a scalable architecture.
    Downloads: 67 This Week
    Last Update:
    See Project
  • 18
    Professional Services

    Professional Services

    Common solutions and tools developed by Google Cloud

    Professional Services repository is a collection of real-world solutions, tools, and reference implementations developed by Google Cloud’s Professional Services team to address common enterprise challenges. Unlike simple sample repositories, it focuses on production-oriented use cases such as data pipelines, machine learning workflows, infrastructure automation, and security management.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    douyin

    douyin

    Open source Douyin crawler for collecting and downloading public data

    DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    DATAGEN

    DATAGEN

    AI-driven multi-agent research assistant automating hypothesis

    DATAGEN is an AI-driven multi-agent research and data analysis platform designed to automate complex analytical workflows. The system coordinates multiple specialized AI agents that collaborate to perform tasks such as hypothesis generation, data collection, analysis, visualization, and report creation. Instead of requiring users to manually orchestrate each stage of a research process, the platform allows these agents to coordinate automatically and handle the workflow end-to-end. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Amazing-Python-Scripts

    Amazing-Python-Scripts

    Curated collection of Amazing Python scripts

    ...Examples include scripts for sentiment analysis, data scraping, web automation, log analysis, and interactive applications such as games or voice-controlled tools. The project also provides contribution guidelines and documentation so that developers can easily collaborate and expand the collection of scripts.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    latexcv

    latexcv

    A collection of cv and resume templates written in LaTeX

    A collection of user-friendly LaTeX CV and résumé templates (packaged within the R Markdown vitae ecosystem), offering simple themes and templates for creating professional CVs without heavy TeX coding. Supports multiple display themes such as classic, modern, sidebar layouts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Argilla

    Argilla

    The open-source data curation platform for LLMs

    ...Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can use and combine your preferred libraries without implementing any specific interface. Most annotation tools treat data collection as a one-off activity at the beginning of each project. In real-world projects, data collection is a key activity of the iterative process of ML model development. Once a model goes into production, you want to monitor and analyze its predictions, and collect more data to improve your model over time. Argilla is designed to close this gap, enabling you to iterate as much as you need.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    DreamerV3

    DreamerV3

    Mastering Diverse Domains through World Models

    ...This approach enables the algorithm to efficiently learn policies for decision-making tasks that would otherwise require enormous amounts of data or computational resources. DreamerV3 was designed as a general reinforcement learning framework that can solve diverse tasks using the same configuration of hyperparameters across many environments. In research demonstrations, the algorithm has been shown to perform strongly across more than one hundred control tasks and complex simulated environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB