Data processing for and with foundation models
ExtractThinker is a Document Intelligence library for LLMs
A curated list of data mining papers about fraud detection
Training data (data labeling, annotation, workflow) for all data types
Data and tools for generating and inspecting OLMo pre-training data
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
Extract schema, statistics and entities from datasets
Industrial-strength Natural Language Processing (NLP)
Text mining using tidy tools
Hub of ready-to-use datasets for ML models
The Classical Language Toolkit
A Repo For Document AI
Superlinked is a Python framework for AI Engineers
Fast and customizable framework for automatic ML model creation
Easy-to-use and high-performance NLP and LLM framework
A persistent, network resilient, full text search library
Easy-to-use and powerful NLP library with Awesome model zoo
Toolkit for conversational AI
Stanford CoreNLP, a Java suite of core NLP tools
Efficient few-shot learning with Sentence Transformers
Modest natural-language processing
State of the Art Natural Language Processing
Haystack is an open source NLP framework to interact with your data
Public opinion analysis system
The most accurate natural language detection library for Python