Showing 106 open source projects for "transformers"

View related business solutions
  • Professional Streaming and Video Hosting - GDPR Compliant - 3Q Icon
    Professional Streaming and Video Hosting - GDPR Compliant - 3Q

    Secure hosting, scalable streaming, and easy integration for internal and external communications

    3Q offers a multifunctional video platform for hosting, managing and distributing video and audio content on all channels. Live and on-demand.
    Learn More
  • E-commerce Fulfillment For Scaling Brands Icon
    E-commerce Fulfillment For Scaling Brands

    Ecommerce and omnichannel brands seeking scalable fulfillment solutions that integrate with popular sales channels

    Flowspace delivers fulfillment excellence by pairing powerful software and on-the-ground logistics know-how. Our platform provides automation, real-time control, and reliability beyond traditional 3PL capabilities—so you can scale smarter, faster, and easier.
    Learn More
  • 1
    SHAP

    SHAP

    A game theoretic approach to explain the output of ml models

    SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. While SHAP can explain the output of any machine learning model, we have developed a high-speed exact algorithm for tree ensemble methods. Fast C++ implementations are supported for XGBoost, LightGBM, CatBoost, scikit-learn and pyspark...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    BitNet

    BitNet

    BitNet: Scaling 1-bit Transformers for Large Language Models

    BitNet is a machine learning research implementation that explores extremely low-precision neural network architectures designed to dramatically reduce the computational cost of large language models. The project implements the BitNet architecture described in research on scaling transformer models using extremely low-bit quantization techniques. In this approach, neural network weights are quantized to approximately one bit per parameter, allowing models to operate with far lower memory...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    xFormers

    xFormers

    Hackable and optimized Transformers building blocks

    xformers is a modular, performance-oriented library of transformer building blocks, designed to allow researchers and engineers to compose, experiment, and optimize transformer architectures more flexibly than monolithic frameworks. It abstracts components like attention layers, feedforward modules, normalization, and positional encoding, so you can mix and match or swap optimized kernels easily. One of its key goals is efficient attention: it supports dense, sparse, low-rank, and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    HY-MT

    HY-MT

    Hunyuan Translation Model Version 1.5

    HY-MT (Hunyuan Translation) is a high-quality multilingual machine translation model suite developed to support mutual translation across dozens of languages with strong performance even at smaller model scales. It ships with both an 1.8 B parameter model and a larger 7 B model, the latter optimized not only for direct translation but also for formatted and contextualized output, allowing better handling of terminology and mixed-language content. The project emphasizes both speed and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Accounting practice management software Icon
    Accounting practice management software

    Accountants, accounting firms, tax attorneys, tax professionals

    Canopy is a cloud-based practice management software for accounting and tax firms, offering tools for client engagement, document management, workflow automation, and time & billing. Its Client Engagement platform centralizes interactions with a secure portal, customizable branding, and email integration, while the Document Management system enables organized, paperless file storage. The Workflow module enhances visibility into tasks and projects through templates, task assignments, and automation, reducing human error. Additionally, the Time & Billing feature tracks billable hours, generates invoices, and processes payments, ensuring accurate financial management. With its comprehensive features, Canopy streamlines operations, reduces stress, and enhances client experiences.
    Learn More
  • 5
    Coconut

    Coconut

    Training Large Language Model to Reason in a Continuous Latent Space

    ...It supports training across multiple reasoning paradigms—including standard Chain-of-Thought (CoT), no-thought, and hybrid configurations—using configurable training stages and latent representations. The repository is built with Hugging Face Transformers, PyTorch Distributed, and Weights & Biases (wandb) for logging, supporting large-scale experiments on mathematical and logical reasoning datasets such as GSM8K, ProntoQA, and ProsQA.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Giskard

    Giskard

    Collaborative & Open-Source Quality Assurance for all AI models

    ...Giskard automatically generates relevant tests based on the vulnerabilities detected by the scan. You can easily customize the tests depending on your use case by defining domain-specific data slicers and transformers as fixtures of your test suites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Qwen2.5-Math

    Qwen2.5-Math

    A series of math-specific large language models of our Qwen2 series

    Qwen2.5-Math is a series of mathematics-specialized large language models in the Qwen2 family, released by Alibaba’s QwenLM. It includes base models (1.5B / 7B / 72B parameters), instruction-tuned versions, and a reward model (RM) to improve alignment. Unlike its predecessor Qwen2-Math, Qwen2.5-Math supports both Chain-of-Thought (CoT) reasoning and Tool-Integrated Reasoning (TIR) for solving math problems, and works in both Chinese and English. It is optimized for solving mathematical...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    MatMul-Free LM

    MatMul-Free LM

    Implementation for MatMul-free LM

    MatMul-Free LM is an experimental implementation of a large language model architecture designed to eliminate traditional matrix multiplication operations used in transformer networks. Since matrix multiplication is one of the most computationally expensive components of modern language models, the project explores alternative computational strategies that reduce hardware requirements while maintaining comparable performance. The architecture relies on quantization-aware training and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Streamline Hiring with Skill Assessments Icon
    Streamline Hiring with Skill Assessments

    Say goodbye to hiring guesswork. Use Canditech’s job simulation tests to assess real-world skills and make data-driven decisions.

    Canditech offers innovative, cheat-proof skill assessments and job simulations to transform your hiring process. From technical skills to soft skills, we help you assess candidates on actual job performance. With over 500 customizable tests and powerful video interview features, you can evaluate real-world capabilities, streamline your hiring, and reduce biases. Whether you’re hiring for remote roles, mass hiring, or looking to expand your diversity pool, Canditech’s data-driven platform ensures the right candidates are chosen for the job every time.
    Get a Free Demo
  • 10
    Intel LLM Library for PyTorch

    Intel LLM Library for PyTorch

    Accelerate local LLM inference and finetuning

    ...IPEX-LLM supports a wide range of popular models, including architectures such as LLaMA, Mistral, Qwen, and other transformer-based systems. The library can integrate with common AI frameworks and serving tools such as Hugging Face Transformers, LangChain, and vLLM, allowing developers to incorporate optimized inference into existing pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    GLM-4

    GLM-4

    GLM-4 series: Open Multilingual Multimodal Chat LMs

    GLM-4 is a family of open models from ZhipuAI that spans base, chat, and reasoning variants at both 32B and 9B scales, with long-context support and practical local-deployment options. The GLM-4-32B-0414 models are trained on ~15T high-quality data (including substantial synthetic reasoning data), then post-trained with preference alignment, rejection sampling, and reinforcement learning to improve instruction following, coding, function calling, and agent-style behaviors. The...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    Qwen3-Omni

    Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM

    Qwen3-Omni is a natively end-to-end multilingual omni-modal foundation model that processes text, images, audio, and video and delivers real-time streaming responses in text and natural speech. It uses a Thinker-Talker architecture with a Mixture-of-Experts (MoE) design, early text-first pretraining, and mixed multimodal training to support strong performance across all modalities without sacrificing text or image quality. The model supports 119 text languages, 19 speech input languages, and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    LLaMA-Mesh

    LLaMA-Mesh

    Unifying 3D Mesh Generation with Language Models

    LLaMA-Mesh is a research framework that extends large language models so they can understand and generate 3D mesh data alongside text. The system introduces a method for representing 3D meshes in a textual format by encoding vertex coordinates and face definitions as sequences that can be processed by a language model. By serializing 3D geometry into text tokens, the approach allows existing transformer architectures to generate and interpret 3D models without requiring specialized visual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    IQuest-Coder-V1 Model Family

    IQuest-Coder-V1 Model Family

    New family of code large language models (LLMs)

    IQuest-Coder-V1 is a cutting-edge family of open-source large language models specifically engineered for code generation, deep code understanding, and autonomous software engineering tasks. These models range from tens of billions to smaller footprints and are trained on a novel code-flow multi-stage paradigm that captures how real software evolves over time — not just static code snapshots — giving them a deeper semantic understanding of programming logic. They support native long contexts...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Perception Models

    Perception Models

    State-of-the-art Image & Video CLIP, Multimodal Large Language Models

    Perception Models is a state-of-the-art framework developed by Facebook Research for advanced image and video perception tasks. It introduces two primary components: the Perception Encoder (PE) for visual feature extraction and the Perception Language Model (PLM) for multimodal decoding and reasoning. The PE module is a family of vision encoders designed to excel in image and video understanding, surpassing models like SigLIP2, InternVideo2, and DINOv2 across multiple benchmarks. Meanwhile,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Seldon Core

    Seldon Core

    An MLOps framework to package, deploy, monitor and manage models

    ...Built on Kubernetes, runs on any cloud and on-premises. Framework agnostic, supports top ML libraries, toolkits and languages. Advanced deployments with experiments, ensembles and transformers. Our open-source framework makes it easier and faster to deploy your machine learning models and experiments at scale on Kubernetes. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    InfiniteYou

    InfiniteYou

    Flexible Photo Recrafting While Preserving Your Identity

    InfiniteYou is an open-source image-generation and “identity-preserving image editing / generation” framework from ByteDance, designed to generate high-fidelity images that preserve a subject’s identity while allowing flexible editing or re-creation according to textual prompts. Using an architecture built around diffusion transformers (DiTs), InfiniteYou introduces a component called InfuseNet that injects identity features derived from reference images into the generation process — via residual connections — so that the output matches the person’s identity closely, without sacrificing visual quality or text-image alignment. The team uses a multi-stage training strategy with synthetic multi-sample data per identity to fine-tune for both identity consistency and aesthetic quality. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    AutoGPTQ

    AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis

    AutoGPTQ is an implementation of GPTQ (Quantized GPT) that optimizes large language models (LLMs) for faster inference by reducing their computational footprint while maintaining accuracy.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    Recurrent Interface Network (RIN)

    Recurrent Interface Network (RIN)

    Implementation of Recurrent Interface Network (RIN)

    Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in Pytorch. The author unawaredly reinvented the induced set-attention block from the set transformers paper. They also combine this with the self-conditioning technique from the Bit Diffusion paper, specifically for the latents. The last ingredient seems to be a new noise function based around the sigmoid, which the author claims is better than cosine scheduler for larger images. The big surprise is that the generations can reach this level of fidelity. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Transformers4Rec

    Transformers4Rec

    Transformers4Rec is a flexible and efficient library

    ...The library works as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). Transformers4Rec makes state-of-the-art transformer architectures available for RecSys researchers and industry practitioners. Traditional recommendation algorithms usually ignore the temporal dynamics and the sequence of interactions when trying to model user behavior. Generally, the next user interaction is related to the sequence of the user's previous choices. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    DeepSeek MoE

    DeepSeek MoE

    Towards Ultimate Expert Specialization in Mixture-of-Experts Language

    ...The repo publishes both Base and Chat variants of the 16B MoE model (deepseek-moe-16b) and provides evaluation results across benchmarks. It also includes a quick start with inference instructions (using Hugging Face Transformers) and guidance on fine-tuning (DeepSpeed, hyperparameters, quantization). The licensing is MIT for code, with a “Model License” applied to the models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    CogView

    CogView

    Text-to-Image generation. The repo for NeurIPS 2021 paper

    CogView is a large-scale pretrained text-to-image transformer model, introduced in the NeurIPS 2021 paper CogView: Mastering Text-to-Image Generation via Transformers. With 4 billion parameters, it was one of the earliest transformer-based models to successfully generate high-quality images from natural language descriptions in Chinese, with partial support for English via translation. The model incorporates innovations such as PB-relax and Sandwich-LN to enable stable training of very deep transformers without NaN loss issues. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ReplitLM

    ReplitLM

    Inference code and configs for the ReplitLM model family

    ReplitLM is a family of open-source language models developed by Replit for assisting with programming tasks such as code generation and completion. The project includes model checkpoints, configuration files, and example code that enable developers to run and experiment with the models locally or within machine learning frameworks. These models are designed specifically for coding workflows and are trained on large datasets of source code covering many programming languages and development...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    DiT (Diffusion Transformers)

    DiT (Diffusion Transformers)

    Official PyTorch Implementation of "Scalable Diffusion Models"

    DiT (Diffusion Transformer) is a powerful architecture that applies transformer-based modeling directly to diffusion generative processes for high-quality image synthesis. Unlike CNN-based diffusion models, DiT represents the diffusion process in the latent space and processes image tokens through transformer blocks with learned positional encodings, offering scalability and superior sample quality. The model architecture parallels large language models but for image tokens—each block...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB