Data and tools for generating and inspecting OLMo pre-training data
A curated list of data mining papers about fraud detection
A Heterogeneous Benchmark for Information Retrieval
A Repo For Document AI
Extract schema, statistics and entities from datasets
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models
Fast and customizable framework for automatic ML model creation
The library to build & auto-optimize LLM applications
Pretrained model hub for Keras 3
Semantic search and workflows for medical/scientific papers
Efficient Retrieval Augmentation and Generation Framework
A full spaCy pipeline and models for scientific/biomedical documents
Libraries for applying sparsification recipes to neural networks
Data processing for and with foundation models
Neural Network Compression Framework for enhanced OpenVINO
Efficient few-shot learning with Sentence Transformers
A coding-free framework built on PyTorch
Easy-to-use and powerful NLP library with Awesome model zoo
Training data (data labeling, annotation, workflow) for all data types
Making large AI models cheaper, faster and more accessible
A Unified Library for Parameter-Efficient Learning
Data loaders and abstractions for text and NLP
Recognition and resolution of numbers, units, date/time, etc.
Hub of ready-to-use datasets for ML models
Build AI-powered semantic search applications