Data processing for and with foundation models
Training data (data labeling, annotation, workflow) for all data types
ExtractThinker is a Document Intelligence library for LLMs
Extract schema, statistics and entities from datasets
Data and tools for generating and inspecting OLMo pre-training data
A curated list of data mining papers about fraud detection
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
Industrial-strength Natural Language Processing (NLP)
Build AI-powered semantic search applications
Hub of ready-to-use datasets for ML models
Efficient few-shot learning with Sentence Transformers
A natural language interface for computers
Easy-to-use and powerful NLP library with Awesome model zoo
Superlinked is a Python framework for AI Engineers
Fast and customizable framework for automatic ML model creation
Haystack is an open source NLP framework to interact with your data
The Classical Language Toolkit
Toolkit for conversational AI
A Repo For Document AI
Public opinion analysis system
Easy-to-use and high-performance NLP and LLM framework
Stanford NLP Python library for many human languages
Dealing with all unstructured data, such as reverse image search
The most accurate natural language detection library for Python
Data loaders and abstractions for text and NLP