processing free download

Data-Juicer

Data processing for and with foundation models

Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.

Downloads: 2 This Week

Last Update: 2026-03-17

See Project

spaCy

Industrial-strength Natural Language Processing (NLP)

spaCy is a library built on the very latest research for advanced Natural Language Processing (NLP) in Python and Cython. Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with an accuracy within 1% of the best available. ...

Downloads: 92 This Week

Last Update: 2026-03-29

See Project

HanLP

Han Language Processing

HanLP is a multilingual Natural Language Processing (NLP) library composed of a series of models and algorithms. Built on TensorFlow 2.0, it was designed to advance state-of-the-art deep learning techniques and popularize the application of natural language processing in both academia and industry. HanLP is capable of lexical analysis (Chinese word segmentation, part-of-speech tagging, named entity recognition), syntax analysis, text classification, and sentiment analysis. ...

Downloads: 12 This Week

Last Update: 2025-03-07

See Project

SciSpaCy

A full spaCy pipeline and models for scientific/biomedical documents

ScispaCy is a spaCy extension optimized for processing biomedical and scientific text, providing domain-specific NLP models for tasks like named entity recognition (NER) and dependency parsing.

Downloads: 2 This Week

Last Update: 2025-10-01

See Project

DOLMA

Data and tools for generating and inspecting OLMo pre-training data

DOLMA (Data Optimization and Learning for Model Alignment) is a framework designed to manage large-scale datasets for training and fine-tuning language models efficiently.

Downloads: 10 This Week

Last Update: 2025-06-25

See Project

ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs

ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.

Downloads: 7 This Week

Last Update: 2025-06-09

See Project

deepdoctection

A Repo For Document AI

...It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for fine-tuning, evaluating and running models. For more specific text processing tasks use one of the many other great NLP libraries.

Downloads: 3 This Week

Last Update: 2026-04-09

See Project

Classical Language Toolkit (CLTK)

The Classical Language Toolkit

The Classical Language Toolkit (CLTK) is a Python library offering natural language processing support for classical languages, including Latin, Greek, and others.

Downloads: 6 This Week

Last Update: 2025-05-04

See Project

STORM

An LLM-powered knowledge curation system that researches topics

STORM is an open-source virtual assistant framework developed by Stanford's OVAL lab. It is designed for creating natural language interfaces and assistants that can interact with APIs, databases, and services in a modular way.

Downloads: 5 This Week

Last Update: 2025-01-23

See Project

Datasets

Hub of ready-to-use datasets for ML models

Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency.

Downloads: 6 This Week

Last Update: 2026-03-23

See Project

MindNLP

Easy-to-use and high-performance NLP and LLM framework

MindNLP is a natural language processing library built on the MindSpore framework, providing tools and models for various NLP tasks.

Downloads: 0 This Week

Last Update: 2025-11-05

See Project

Underthesea

Underthesea - Vietnamese NLP Toolkit

Underthesea is a Vietnamese NLP toolkit providing various text processing capabilities, including word segmentation, part-of-speech tagging, and named entity recognition.

Downloads: 0 This Week

Last Update: 6 days ago

See Project

Chonkie

The no-nonsense RAG chunking library

Chonkie is an AI-powered framework designed for building conversational agents and chatbots with natural language understanding and multi-turn conversation support.

Downloads: 8 This Week

Last Update: 2025-03-01

See Project

PaperAI

Semantic search and workflows for medical/scientific papers

PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.

Downloads: 9 This Week

Last Update: 2025-07-01

See Project

Pyreft

ReFT: Representation Finetuning for Language Models

PyreFT is a tool by Stanford NLP for fine-tuning transformer models with an emphasis on efficient, resource-conserving training and customizability for NLP tasks.

Downloads: 5 This Week

Last Update: 2025-02-04

See Project

BEIR

A Heterogeneous Benchmark for Information Retrieval

BEIR is a benchmark framework for evaluating information retrieval models across various datasets and tasks, including document ranking and question answering.

Downloads: 6 This Week

Last Update: 2025-06-04

See Project

Text Generation Inference

Large Language Model Text Generation Inference

Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.

Downloads: 9 This Week

Last Update: 2025-12-18

See Project

FastRAG

Efficient Retrieval Augmentation and Generation Framework

fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. fastRAG is designed to empower researchers and developers with a comprehensive tool set for advancing retrieval augmented generation.

Downloads: 7 This Week

Last Update: 2025-01-24

See Project

DeepSparse

Sparsity-aware deep learning inference runtime for CPUs

A sparsity-aware enterprise inferencing system for AI models on CPUs. Maximize your CPU infrastructure with DeepSparse to run performant computer vision (CV), natural language processing (NLP), and large language models (LLMs).

Downloads: 1 This Week

Last Update: 2025-06-02

See Project

Hazm

Persian NLP Toolkit

Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.

Downloads: 0 This Week

Last Update: 2026-04-01

See Project

Diffgram

Training data (data labeling, annotation, workflow) for all data types

...Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.

Downloads: 8 This Week

Last Update: 2024-10-14

See Project

NVIDIA NeMo

Toolkit for conversational AI

...Supported models: Jasper, QuartzNet, CitriNet, Conformer-CTC, Conformer-Transducer, Squeezeformer-CTC, Squeezeformer-Transducer, ContextNet, LSTM-Transducer (RNNT), LSTM-CTC. NGC collection of pre-trained speech processing models.

Downloads: 3 This Week

Last Update: 2026-03-23

See Project

AdalFlow

The library to build & auto-optimize LLM applications

AdalFlow is a framework for building AI-powered automation workflows, enabling users to design and execute intelligent automation pipelines with minimal coding.

Downloads: 3 This Week

Last Update: 2025-09-25

See Project

NNCF

Neural Network Compression Framework for enhanced OpenVINO

NNCF (Neural Network Compression Framework) is an optimization toolkit for deep learning models, designed to apply quantization, pruning, and other techniques to improve inference efficiency.

Downloads: 5 This Week

Last Update: 2026-04-08

See Project

Detoxify

Trained models & code to predict toxic comments

Detoxify is a deep learning-based tool for detecting and filtering toxic language in online conversations, leveraging Transformer models for high accuracy.

Downloads: 2 This Week

Last Update: 2026-03-26

See Project

Search Results for "processing"

Showing 125 open source projects for "processing"

Data-Juicer

spaCy

HanLP

SciSpaCy

DOLMA

ExtractThinker

deepdoctection

Classical Language Toolkit (CLTK)

STORM

Datasets

MindNLP

Underthesea

Chonkie

PaperAI

Pyreft

BEIR

Text Generation Inference

FastRAG

DeepSparse

Hazm

Diffgram

NVIDIA NeMo

AdalFlow

NNCF

Detoxify

Search Results for "processing"

Showing 125 open source projects for "processing"

Data-Juicer

spaCy

HanLP

SciSpaCy

DOLMA

ExtractThinker

deepdoctection

Classical Language Toolkit (CLTK)

STORM

Datasets

MindNLP

Underthesea

Chonkie

PaperAI

Pyreft

BEIR

Text Generation Inference

FastRAG

DeepSparse

Hazm

Diffgram

NVIDIA NeMo

AdalFlow

NNCF

Detoxify

Related Searches

Related Categories