mixture free download

Showing 148 open source projects for "mixture"

View related business solutions

Linux Clear Filters & Widen Search

All-in-One Inspection Software
flowdit is a connected worker platform tailored for industry needs in commissioning, quality, maintenance, and EHS management.

Optimize Frontline Operations: Elevate Equipment Uptime, Operational Excellence, and Safety with Connected Teams and Data, Including Issue Capture and Corrective Action.

Learn More
PageDNA: Web-to-Print eCommerce Software
eCommerce for Print, Signs and Fulfillment Trusted by In‑Plants and Commercial Print Leaders

PageDNA enables successful eCommerce strategies for commercial print sales organizations, internal print shops, and brand owners. PageDNA’s online ordering platform increases print volume while decreasing touch costs for all stakeholders: clientele, print operations, and the organizations they support.

Learn More
1

pomegranate

Fast, flexible and easy to use probabilistic modelling in Python

pomegranate is a library for probabilistic modeling defined by its modular implementation and treatment of all models as the probability distributions they are. The modular implementation allows one to easily drop normal distributions into a mixture model to create a Gaussian mixture model just as easily as dropping a gamma and a Poisson distribution into a mixture model to create a heterogeneous mixture. But that's not all! Because each model is treated as a probability distribution, Bayesian networks can be dropped into a mixture just as easily as a normal distribution, and hidden Markov models can be dropped into Bayes classifiers to make a classifier over sequences. ...

Downloads: 1 This Week

Last Update: 2024-08-06
See Project
2

MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

MoBA, short for Mixture of Block Attention, is an open-source research implementation of a novel attention mechanism designed to improve the efficiency of large language models processing extremely long contexts. The architecture adapts ideas from Mixture-of-Experts networks and applies them directly to the attention mechanism of transformer models. Instead of forcing each token to attend to every other token in the sequence, MoBA divides the context into blocks and dynamically routes queries to only the most relevant segments of information. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
3

Qwen3.5

Qwen3.5 is the large language model series developed by Qwen team

...Qwen3.5 builds on earlier Qwen generations by improving multilingual understanding, reasoning ability, and efficiency, while also introducing native multimodal capabilities that allow the model to work with both language and visual inputs. Architecturally, the system leverages modern large-scale training techniques and mixture-of-experts style efficiency so that very large parameter counts can be used while keeping inference practical.

Downloads: 16 This Week

Last Update: 5 days ago
See Project
4

Wan2.2

Wan2.2: Open and Advanced Large-Scale Video Generative Model

Wan2.2 is a major upgrade to the Wan series of open and advanced large-scale video generative models, incorporating cutting-edge innovations to boost video generation quality and efficiency. It introduces a Mixture-of-Experts (MoE) architecture that splits the denoising process across specialized expert models, increasing total model capacity without raising computational costs. Wan2.2 integrates meticulously curated cinematic aesthetic data, enabling precise control over lighting, composition, color tone, and more, for high-quality, customizable video styles. ...

1 Review

Downloads: 131 This Week

Last Update: 2026-03-17
See Project
Windocks - Docker Oracle and SQL Server Containers
Deliver faster. Provision data for AI/ML. Enhance data privacy. Improve quality.

Windocks is a leader in cloud native database DevOps, recognized by Gartner as a Cool Vendor, and as an innovator by Bloor research in Test Data Management. Novartis, DriveTime, American Family Insurance, and other enterprises rely on Windocks for on-demand database environments for development, testing, and DevOps. Windocks software is easily downloaded for evaluation on standard Linux and Windows servers, for use on-premises or cloud, and for data delivery of SQL Server, Oracle, PostgreSQL, and MySQL to Docker containers or conventional database instances.

Learn More
5

HunyuanImage-3.0

A Powerful Native Multimodal Model for Image Generation

...It unifies multimodal understanding and generation in a single autoregressive framework, combining text and image modalities seamlessly rather than relying on separate image-only diffusion components. It uses a Mixture-of-Experts (MoE) architecture with many expert subnetworks to scale efficiently, deploying only a subset of experts per token, which allows large parameter counts without linear inference cost explosion. The model is intended to be competitive with closed-source image generation systems, aiming for high fidelity, prompt adherence, fine detail, and even “world knowledge” reasoning (i.e. leveraging context, semantics, or common sense in generation). ...

1 Review

Downloads: 8 This Week

Last Update: 2026-02-03
See Project
6

MiMo-V2-Flash

MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation

MiMo-V2-Flash is a large Mixture-of-Experts language model designed to deliver strong reasoning, coding, and agentic-task performance while keeping inference fast and cost-efficient. It uses an MoE setup where a very large total parameter count is available, but only a smaller subset is activated per token, which helps balance capability with runtime efficiency.

Downloads: 4 This Week

Last Update: 2026-01-08
See Project
7

DeepSeek R1

Open-source, high-performance AI model with advanced reasoning

DeepSeek-R1 is an open-source large language model developed by DeepSeek, designed to excel in complex reasoning tasks across domains such as mathematics, coding, and language. DeepSeek R1 offers unrestricted access for both commercial and academic use. The model employs a Mixture of Experts (MoE) architecture, comprising 671 billion total parameters with 37 billion active parameters per token, and supports a context length of up to 128,000 tokens. DeepSeek-R1's training regimen uniquely integrates large-scale reinforcement learning (RL) without relying on supervised fine-tuning, enabling the model to develop advanced reasoning capabilities. ...

1 Review

Downloads: 58 This Week

Last Update: 2025-07-09
See Project
8

Flash-MoE

Running a big model on a small laptop

Flash-MoE is a high-performance implementation of mixture-of-experts (MoE) architectures designed to optimize the efficiency and scalability of large AI models. It focuses on accelerating routing and computation by leveraging optimized kernels and memory management techniques, allowing models to dynamically select specialized sub-networks during inference. The project aims to reduce the computational cost typically associated with MoE systems while maintaining or improving performance. ...

Downloads: 0 This Week

Last Update: 2026-04-02
See Project
9

DeepSeekMath-V2

Towards self-verifiable mathematical reasoning

...Unlike general-purpose LLMs that might generate plausible-looking math but sometimes hallucinate or mishandle rigorous logic, Math-V2 is engineered to not only generate solutions but also self-verify them, meaning it examines the derivations, checks logical consistency, and flags or corrects mistakes, producing proofs + verification rather than just a final answer. Under the hood, Math-V2 uses a massive Mixture-of-Experts (MoE) architecture (activated parameter count reportedly in the hundreds of billions) derived from DeepSeek’s experimental base architecture. For math problems, it employs a generator-verifier loop: it first generates a candidate proof (or solution path), then runs a verifier that assesses correctness and completeness.

Downloads: 9 This Week

Last Update: 2025-12-01
See Project
Attack Surface Management | Criminal IP ASM
For security operations, threat-intelligence and risk teams wanting a tool to get access to auto-monitored assets exposed to attack surfaces

Criminal IP’s Attack Surface Management (ASM) is a threat-intelligence–driven platform that continuously discovers, inventories, and monitors every internet-connected asset associated with an organization, including shadow and forgotten resources, so teams see their true external footprint from an attacker’s perspective. The solution combines automated asset discovery with OSINT techniques, AI enrichment and advanced threat intelligence to surface exposed hosts, domains, cloud services, IoT endpoints and other Internet-facing vectors, capture evidence (screenshots and metadata), and correlate findings to known exploitability and attacker tradecraft. ASM prioritizes exposures by business context and risk, highlights vulnerable components and misconfigurations, and provides real-time alerts and dashboards to speed investigation and remediation.

Learn More
10

Qwen3-Coder

Qwen3-Coder is the code version of Qwen3

Qwen3-Coder is the latest and most powerful agentic code model developed by the Qwen team at Alibaba Cloud. Its flagship version, Qwen3-Coder-480B-A35B-Instruct, features a massive 480 billion-parameter Mixture-of-Experts architecture with 35 billion active parameters, delivering top-tier performance on coding and agentic tasks. This model sets new state-of-the-art benchmarks among open models for agentic coding, browser-use, and tool-use, matching performance comparable to leading models like Claude Sonnet. Qwen3-Coder supports an exceptionally long context window of 256,000 tokens, extendable to 1 million tokens using Yarn, enabling repository-scale code understanding and generation. ...

1 Review

Downloads: 18 This Week

Last Update: 2026-03-24
See Project
11

LLMs-Zero-to-Hero

From nobody to big model (LLM) hero

...Rather than relying entirely on existing frameworks, the project encourages readers to implement important components themselves in order to gain a deeper understanding of how modern language models work internally. It includes explanations of dense transformer architectures, mixture-of-experts models, training pipelines, and techniques used in contemporary LLM development.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
12

Awesome Stock Resources

Collection of links for free stock photography, video and Illustration

...A collection of links to public domain photography resources. The photographs on some resources require Attribution unless otherwise stated on the website itself. These use a mixture of license, all of which have been linked to next to them. Some resources haven't specified any formal terms of use or licenses. A collection of illustration resources which contain a mixture of historical archive, contemporary and public domain assets. A collection of resources which contain stock graphical elements which don't fit in the other sections. ...

Downloads: 0 This Week

Last Update: 2025-11-11
See Project
13

Qwen3.6

Qwen3.6 is the large language model series developed by Qwen team

...One of its defining goals is to enhance “agentic coding,” enabling the model to reason across entire codebases, handle multi-step development tasks, and assist with complex software engineering workflows. The architecture incorporates modern techniques such as mixture-of-experts and hybrid attention mechanisms, allowing it to scale efficiently while maintaining strong performance.

Downloads: 0 This Week

Last Update: 1 day ago
See Project
14

vLLM Semantic Router

System Level Intelligent Router for Mixture-of-Models at Cloud

Semantic Router is an open-source system designed to intelligently route requests across multiple large language models based on the semantic meaning and complexity of user queries. Instead of sending every prompt to the same model, the system analyzes the intent and reasoning requirements of the request and dynamically selects the most appropriate model to process it. This approach allows developers to combine multiple models with different strengths, such as lightweight models for simple...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
15

Xtuner

A Next-Generation Training Engine Built for Ultra-Large MoE Models

Xtuner is a large-scale training engine designed for efficient training and fine-tuning of modern large language models, particularly mixture-of-experts architectures. The framework focuses on enabling scalable training for extremely large models while maintaining efficiency across distributed computing environments. Unlike traditional 3D parallel training strategies, XTuner introduces optimized parallelism techniques that simplify scaling and reduce system complexity when training massive models. ...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
16

Ling-V2

Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI

Ling-V2 is an open-source family of Mixture-of-Experts (MoE) large language models developed by the InclusionAI research organization with the goal of combining state-of-the-art performance, efficiency, and openness for next-generation AI applications. It introduces highly sparse architectures where only a fraction of the model’s parameters are activated per input token, enabling models like Ling-mini-2.0 to achieve reasoning and instruction-following capabilities on par with much larger dense models while remaining significantly more computationally efficient. ...

Downloads: 0 This Week

Last Update: 2026-02-12
See Project
17

Ring

Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI

Ring is a reasoning Mixture-of-Experts (MoE) large language model (LLM) developed by inclusionAI. It is built from or derived from Ling. Its design emphasizes reasoning, efficiency, and modular expert activation. In its “flash” variant (Ring-flash-2.0), it optimizes inference by activating only a subset of experts. It applies reinforcement learning/reasoning optimization techniques.

Downloads: 0 This Week

Last Update: 2025-09-30
See Project
18

Kimi K2

Kimi K2 is the large language model series developed by Moonshot AI

Kimi K2 is Moonshot AI’s advanced open-source large language model built on a scalable Mixture-of-Experts (MoE) architecture that combines a trillion total parameters with a subset of ~32 billion active parameters to deliver powerful and efficient performance on diverse tasks. It was trained on an enormous corpus of over 15.5 trillion tokens to push frontier capabilities in coding, reasoning, and general agentic tasks while addressing training stability through novel optimizer and architecture design strategies. ...

Downloads: 49 This Week

Last Update: 2026-01-27
See Project
19

llmfit

157 models, 30 providers, one command to find what runs on hardware

...It provides both an interactive terminal user interface and a traditional CLI mode, enabling flexible workflows for different user preferences. llmfit also supports advanced configurations including multi-GPU setups, mixture-of-experts architectures, and dynamic quantization recommendations. By presenting clear performance estimates and compatibility guidance, the project reduces the trial-and-error typically involved in local LLM experimentation. Overall, llmfit serves as a practical decision assistant for developers who want to run language models efficiently on their own machines.

Downloads: 42 This Week

Last Update: 2 days ago
See Project
20

Wan2.1

Wan2.1: Open and Advanced Large-Scale Video Generative Model

...The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. Wan2.1’s architecture balances generation quality and inference cost, paving the way for later improvements seen in Wan2.2 such as Mixture-of-Experts and enhanced aesthetics. It was trained on large-scale video and image datasets, providing generalization across diverse scenes and motion patterns.

1 Review

Downloads: 82 This Week

Last Update: 2026-03-05
See Project
21

DeepSeek-V3

Powerful AI language model (MoE) optimized for efficiency/performance

DeepSeek-V3 is a robust Mixture-of-Experts (MoE) language model developed by DeepSeek, featuring a total of 671 billion parameters, with 37 billion activated per token. It employs Multi-head Latent Attention (MLA) and the DeepSeekMoE architecture to enhance computational efficiency. The model introduces an auxiliary-loss-free load balancing strategy and a multi-token prediction training objective to boost performance.

1 Review

Downloads: 120 This Week

Last Update: 2025-07-09
See Project
22

Kimi K2.5

Moonshot's most powerful AI model

Kimi K2.5 is Moonshot AI’s open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed vision and text tokens. Based on a 1T-parameter Mixture-of-Experts (MoE) architecture with 32B activated parameters, it integrates advanced language reasoning with strong visual understanding. K2.5 supports both “Thinking” and “Instant” modes, enabling either deep step-by-step reasoning or low-latency responses depending on the task. Designed for agentic workflows, it features an Agent Swarm mechanism that decomposes complex problems into coordinated sub-agents executing in parallel. ...

Downloads: 52 This Week

Last Update: 10 hours ago
See Project
23

MiniMax-M1

Open-weight, large-scale hybrid-attention reasoning model

...It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling. Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to support a native context length of 1 million tokens while using far fewer FLOPs than comparable reasoning models for very long generations. The team emphasizes efficient scaling of test-time compute: at 100K-token generation lengths, M1 reportedly uses only about 25 percent of the FLOPs of some competing models, making extended “think step” traces more feasible. ...

Downloads: 0 This Week

Last Update: 2025-12-01
See Project
24

DeepSeek V2

Strong, Economical, and Efficient Mixture-of-Experts Language Model

DeepSeek-V2 is the second major iteration of DeepSeek’s foundation language model (LLM) series. This version likely includes architectural improvements, training enhancements, and expanded dataset coverage compared to V1. The repository includes model weight artifacts, evaluation benchmarks across a broad suite (e.g. reasoning, math, multilingual), configuration files, and possibly tokenization / inference scripts. The V2 model is expected to support more advanced features like better...

Downloads: 6 This Week

Last Update: 2025-10-03
See Project
25

DeepGEMM

Clean and efficient FP8 GEMM kernels with fine-grained scaling

...The library is designed to work cleanly and simply, avoiding overly templated or heavily abstracted code, while still delivering performance that rivals expert-tuned libraries. It supports both standard and “grouped” GEMMs, which is useful for architectures like Mixture of Experts (MoE) that require segmented matrix multiplications. One distinguishing aspect is that DeepGEMM compiles its kernels at runtime (via a lightweight Just-In-Time (JIT) module), so users don’t need to precompile CUDA kernels before installation. Despite its lean design, it includes scaling strategies (fine-grained scaling) and optimizations inspired by cutting edge systems (drawing from ideas in CUTLASS, CuTe) but in a more streamlined form.

Downloads: 9 This Week

Last Update: 2025-12-23
See Project