Browse free open source AI Models and projects for Windows below. Use the toggles on the left to filter open source AI Models by OS, license, language, programming language, and project status.

  • Share your screen instantly while on a phone call with CrankWheel for an engaging presentation. Icon
    Share your screen instantly while on a phone call with CrankWheel for an engaging presentation.

    For salespeople and customer service agents who want to compliment their phone calls with visual elements.

    Our 10x simpler screen sharing tool is designed for you if you spend your days on the phone with clients, and need to add a visual presentation to close sales. No more scheduling a follow-up meeting, or teaching them to use a complex tool. Send them a text message or email, and they see your screen in seconds.
    Learn More
  • ACI Learning: Internal Audit, Cybersecurity, and IT Training Icon
    ACI Learning: Internal Audit, Cybersecurity, and IT Training

    Proven skill building for every aspect of your support or IT team.

    Traditional training doesn't equip employees with the practical skills they need to drive business success. ACI Learning provides hands-on IT and cybersecurity training designed to build real-world, on-the-job skills. Our outcome-based programs empower employees with certification prep, industry-recognized credentials, and flexible learning options. With expert-led video training, labs, and scalable solutions, we help businesses, individuals, governments, and academic institutions develop a skilled workforce, align with business goals, and stay ahead in a rapidly evolving digital world.
    Learn More
  • 1
    GLM-5

    GLM-5

    From Vibe Coding to Agentic Engineering

    GLM-5 is a next-generation open-source large language model (LLM) developed by the Z .ai team under the zai-org organization that pushes the boundaries of reasoning, coding, and long-horizon agentic intelligence. Building on earlier GLM series models, GLM-5 dramatically scales the parameter count (to roughly 744 billion) and expands pre-training data to significantly improve performance on complex tasks such as multi-step reasoning, software engineering workflows, and agent orchestration compared to its predecessors like GLM-4.5. It incorporates innovations like DeepSeek Sparse Attention (DSA) to preserve massive context windows while reducing deployment costs and supporting long context processing, which is crucial for detailed plans and agent tasks.
    Downloads: 225 This Week
    Last Update:
    See Project
  • 2
    LTX-2.3

    LTX-2.3

    Official Python inference and LoRA trainer package

    LTX-2.3 is an open-source multimodal artificial intelligence foundation model developed by Lightricks for generating synchronized video and audio from prompts or other inputs. Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while simultaneously producing corresponding audio elements such as speech, music, ambient sound, or effects. This unified approach allows creators to generate complete multimedia sequences where motion, timing, and sound are aligned automatically. LTX-2 is designed for both research and production workflows and can generate high-resolution video clips with precise control over structure, motion, and camera behavior.
    Downloads: 189 This Week
    Last Update:
    See Project
  • 3
    llama.cpp

    llama.cpp

    Port of Facebook's LLaMA model in C/C++

    The llama.cpp project enables the inference of Meta's LLaMA model (and other models) in pure C/C++ without requiring a Python runtime. It is designed for efficient and fast model execution, offering easy integration for applications needing LLM-based capabilities. The repository focuses on providing a highly optimized and portable implementation for running large language models directly within C/C++ environments.
    Downloads: 152 This Week
    Last Update:
    See Project
  • 4
    Wan2.2

    Wan2.2

    Wan2.2: Open and Advanced Large-Scale Video Generative Model

    Wan2.2 is a major upgrade to the Wan series of open and advanced large-scale video generative models, incorporating cutting-edge innovations to boost video generation quality and efficiency. It introduces a Mixture-of-Experts (MoE) architecture that splits the denoising process across specialized expert models, increasing total model capacity without raising computational costs. Wan2.2 integrates meticulously curated cinematic aesthetic data, enabling precise control over lighting, composition, color tone, and more, for high-quality, customizable video styles. The model is trained on significantly larger datasets than its predecessor, greatly enhancing motion complexity, semantic understanding, and aesthetic diversity. Wan2.2 also open-sources a 5-billion parameter high-compression VAE-based hybrid text-image-to-video (TI2V) model that supports 720P video generation at 24fps on consumer-grade GPUs like the RTX 4090. It supports multiple video generation tasks including text-to-video.
    Downloads: 131 This Week
    Last Update:
    See Project
  • The #1 AI-Powered eLearning Platform Icon
    The #1 AI-Powered eLearning Platform

    For users seeking a platform to generate online courses using AI

    Transform your content into engaging eLearning experiences with Coursebox, the #1 AI-powered eLearning authoring tool. Our platform automates the course creation process, allowing you to design a structured course in seconds. Simply make edits, add any missing elements, and your course is ready to go. Whether you want to publish privately, share publicly, sell your course, or export it to your LMS, Coursebox has you covered.
    Learn More
  • 5
    DeepSeek-V3

    DeepSeek-V3

    Powerful AI language model (MoE) optimized for efficiency/performance

    DeepSeek-V3 is a robust Mixture-of-Experts (MoE) language model developed by DeepSeek, featuring a total of 671 billion parameters, with 37 billion activated per token. It employs Multi-head Latent Attention (MLA) and the DeepSeekMoE architecture to enhance computational efficiency. The model introduces an auxiliary-loss-free load balancing strategy and a multi-token prediction training objective to boost performance. Trained on 14.8 trillion diverse, high-quality tokens, DeepSeek-V3 underwent supervised fine-tuning and reinforcement learning to fully realize its capabilities. Evaluations indicate that it outperforms other open-source models and rivals leading closed-source models, achieving this with a training duration of 55 days on 2,048 Nvidia H800 GPUs, costing approximately $5.58 million.
    Downloads: 120 This Week
    Last Update:
    See Project
  • 6
    Demucs

    Demucs

    Code for the paper Hybrid Spectrogram and Waveform Source Separation

    Demucs (Deep Extractor for Music Sources) is a deep-learning framework for music source separation—extracting individual instrument or vocal tracks from a mixed audio file. The system is based on a U-Net-like convolutional architecture combined with recurrent and transformer elements to capture both short-term and long-term temporal structure. It processes raw waveforms directly rather than spectrograms, allowing for higher-quality reconstruction and fewer artifacts in separated tracks. The repository includes pretrained models for common tasks such as isolating vocals, drums, bass, and accompaniment from stereo music, achieving state-of-the-art results in benchmarks like MUSDB18. Demucs supports GPU-accelerated inference and can process multi-channel audio with chunked streaming for real-time or batch operation. It also provides training scripts and utilities to fine-tune on custom datasets, along with remixing and enhancement tools.
    Downloads: 102 This Week
    Last Update:
    See Project
  • 7
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. The framework targets both interactive graphical applications and media-rich experiences, making it a solid foundation for games, creative tools, or visualization systems that demand both performance and flexibility. While being low-level, it also provides sensible defaults and helper abstractions that reduce boilerplate and help teams maintain clear, maintainable code.
    Downloads: 83 This Week
    Last Update:
    See Project
  • 8
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. Wan2.1’s architecture balances generation quality and inference cost, paving the way for later improvements seen in Wan2.2 such as Mixture-of-Experts and enhanced aesthetics. It was trained on large-scale video and image datasets, providing generalization across diverse scenes and motion patterns.
    Downloads: 82 This Week
    Last Update:
    See Project
  • 9
    PaddleOCR

    PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle

    PaddleOCR offers exceptional, multilingual, and practical Optical Character Recognition (OCR) tools that can help users train better models and apply them into practice. Inspired by PaddlePaddle, PaddleOCR is an ultra lightweight OCR system, with multilingual recognition, digit recognition, vertical text recognition, as well as long text recognition. It features a PPOCR series of high-quality pre-trained models, which includes: ultra lightweight ppocr_mobile series models, general ppocr_server series models, and ultra lightweight compression ppocr_mobile_slim series models. PaddleOCR is easy to install and easy to use on Windows, Linux, MacOS and other systems.
    Downloads: 81 This Week
    Last Update:
    See Project
  • All-in-One Inspection Software Icon
    All-in-One Inspection Software

    flowdit is a connected worker platform tailored for industry needs in commissioning, quality, maintenance, and EHS management.

    Optimize Frontline Operations: Elevate Equipment Uptime, Operational Excellence, and Safety with Connected Teams and Data, Including Issue Capture and Corrective Action.
    Learn More
  • 10
    ACE-Step 1.5

    ACE-Step 1.5

    The most powerful local music generation model

    ACE-Step 1.5 is an advanced open-source foundation model for AI-driven music generation that pushes beyond traditional limitations in speed, musical coherence, and controllability by innovating in architecture and training design. It integrates cutting-edge generative techniques—such as diffusion-based synthesis combined with compressed autoencoders and lightweight transformer elements—to produce high-quality full-length music tracks with rapid inference times, capable of generating a complete song in seconds on modern GPUs while remaining efficient enough to run on consumer-grade hardware with minimal memory requirements. Beyond straightforward text-to-music synthesis, ACE-Step 1.5 enables flexible creative workflows, including tasks like cover generation, editing existing tracks, transforming vocals to background accompaniment, and stylistic personalization using low-rank adaptation from just a few example songs.
    Downloads: 80 This Week
    Last Update:
    See Project
  • 11
    GLM-4.7

    GLM-4.7

    Advanced language and coding AI model

    GLM-4.7 is an advanced agent-oriented large language model designed as a high-performance coding and reasoning partner. It delivers significant gains over GLM-4.6 in multilingual agentic coding, terminal-based workflows, and real-world developer benchmarks such as SWE-bench and Terminal Bench 2.0. The model introduces stronger “thinking before acting” behavior, improving stability and accuracy in complex agent frameworks like Claude Code, Cline, and Roo Code. GLM-4.7 also advances “vibe coding,” producing cleaner, more modern UIs, better-structured webpages, and visually improved slide layouts. Its tool-use capabilities are substantially enhanced, with notable improvements in browsing, search, and tool-integrated reasoning tasks. Overall, GLM-4.7 shows broad performance upgrades across coding, reasoning, chat, creative writing, and role-play scenarios.
    Downloads: 78 This Week
    Last Update:
    See Project
  • 12
    GLM-4.5

    GLM-4.5

    GLM-4.5: Open-source LLM for intelligent agents by Z.ai

    GLM-4.5 is a cutting-edge open-source large language model designed by Z.ai for intelligent agent applications. The flagship GLM-4.5 model has 355 billion total parameters with 32 billion active parameters, while the compact GLM-4.5-Air version offers 106 billion total parameters and 12 billion active parameters. Both models unify reasoning, coding, and intelligent agent capabilities, providing two modes: a thinking mode for complex reasoning and tool usage, and a non-thinking mode for immediate responses. They are released under the MIT license, allowing commercial use and secondary development. GLM-4.5 achieves strong performance on 12 industry-standard benchmarks, ranking 3rd overall, while GLM-4.5-Air balances competitive results with greater efficiency. The models support FP8 and BF16 precision, and can handle very large context windows of up to 128K tokens. Flexible inference is supported through frameworks like vLLM and SGLang with tool-call and reasoning parsers included.
    Downloads: 76 This Week
    Last Update:
    See Project
  • 13
    GLM-4.6

    GLM-4.6

    Agentic, Reasoning, and Coding (ARC) foundation models

    GLM-4.6 is the latest iteration of Zhipu AI’s foundation model, delivering significant advancements over GLM-4.5. It introduces an extended 200K token context window, enabling more sophisticated long-context reasoning and agentic workflows. The model achieves superior coding performance, excelling in benchmarks and practical coding assistants such as Claude Code, Cline, Roo Code, and Kilo Code. Its reasoning capabilities have been strengthened, including improved tool usage during inference and more effective integration within agent frameworks. GLM-4.6 also enhances writing quality, producing outputs that better align with human preferences and role-playing scenarios. Benchmark evaluations demonstrate that it not only outperforms GLM-4.5 but also rivals leading global models such as DeepSeek-V3.1-Terminus and Claude Sonnet 4.
    Downloads: 61 This Week
    Last Update:
    See Project
  • 14
    DeepSeek R1

    DeepSeek R1

    Open-source, high-performance AI model with advanced reasoning

    DeepSeek-R1 is an open-source large language model developed by DeepSeek, designed to excel in complex reasoning tasks across domains such as mathematics, coding, and language. DeepSeek R1 offers unrestricted access for both commercial and academic use. The model employs a Mixture of Experts (MoE) architecture, comprising 671 billion total parameters with 37 billion active parameters per token, and supports a context length of up to 128,000 tokens. DeepSeek-R1's training regimen uniquely integrates large-scale reinforcement learning (RL) without relying on supervised fine-tuning, enabling the model to develop advanced reasoning capabilities. This approach has resulted in performance comparable to leading models like OpenAI's o1, while maintaining cost-efficiency. To further support the research community, DeepSeek has released distilled versions of the model based on architectures such as LLaMA and Qwen.
    Downloads: 58 This Week
    Last Update:
    See Project
  • 15
    Kimi K2.5

    Kimi K2.5

    Moonshot's most powerful AI model

    Kimi K2.5 is Moonshot AI’s open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed vision and text tokens. Based on a 1T-parameter Mixture-of-Experts (MoE) architecture with 32B activated parameters, it integrates advanced language reasoning with strong visual understanding. K2.5 supports both “Thinking” and “Instant” modes, enabling either deep step-by-step reasoning or low-latency responses depending on the task. Designed for agentic workflows, it features an Agent Swarm mechanism that decomposes complex problems into coordinated sub-agents executing in parallel. With a 256K context length and MoonViT vision encoder, the model excels across reasoning, coding, long-context comprehension, image, and video benchmarks. Kimi K2.5 is available via Moonshot’s API (OpenAI/Anthropic-compatible) and supports deployment through vLLM, SGLang, and KTransformers.
    Downloads: 52 This Week
    Last Update:
    See Project
  • 16
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short phrase or exemplars, scaling to a vastly larger set of categories than traditional closed-set models. This capability is grounded in a new data engine that automatically annotated over four million unique concepts, producing a massive open-vocabulary segmentation dataset and enabling the model to achieve 75–80% of human performance on the SA-CO benchmark, which itself spans 270K unique concepts.
    Downloads: 51 This Week
    Last Update:
    See Project
  • 17
    Kimi K2

    Kimi K2

    Kimi K2 is the large language model series developed by Moonshot AI

    Kimi K2 is Moonshot AI’s advanced open-source large language model built on a scalable Mixture-of-Experts (MoE) architecture that combines a trillion total parameters with a subset of ~32 billion active parameters to deliver powerful and efficient performance on diverse tasks. It was trained on an enormous corpus of over 15.5 trillion tokens to push frontier capabilities in coding, reasoning, and general agentic tasks while addressing training stability through novel optimizer and architecture design strategies. The model family includes variants like a foundational base model that researchers can fine-tune for specific use cases and an instruct-optimized variant primed for general-purpose chat and agent-style interactions, offering flexibility for both experimentation and deployment. With its high-dimensional attention mechanisms and expert routing, Kimi-K2 excels across benchmarks in live coding, math reasoning, and problem solving.
    Downloads: 49 This Week
    Last Update:
    See Project
  • 18
    FLUX.2

    FLUX.2

    Official inference repo for FLUX.2 models

    FLUX.2 is a state-of-the-art open-weight image generation and editing model released by Black Forest Labs aimed at bridging the gap between research-grade capabilities and production-ready workflows. The model offers both text-to-image generation and powerful image editing, including editing of multiple reference images, with fidelity, consistency, and realism that push the limits of what open-source generative models have achieved. It supports high-resolution output (up to ~4 megapixels), which allows for photography-quality images, detailed product shots, infographics or UI mockups rather than just low-resolution drafts. FLUX.2 is built with a modern architecture (a flow-matching transformer + a revamped VAE + a strong vision-language encoder), enabling strong prompt adherence, correct rendering of text/typography in images, reliable lighting, layout, and physical realism, and consistent style/character/product identity across multiple generations or edits.
    Downloads: 45 This Week
    Last Update:
    See Project
  • 19
    FastSD CPU

    FastSD CPU

    Fast stable diffusion on CPU and AI PC

    FastSD CPU is an optimized fork of Stable Diffusion designed to run efficiently on CPUs and devices without dedicated GPUs by leveraging Latent Consistency Models and Adversarial Diffusion Distillation techniques that accelerate inference. It focuses on bringing fast text-to-image generation to mainstream hardware like desktop CPUs, lower-end laptops, or edge devices without requiring high-end graphics processors. The repository contains multiple interfaces including a desktop GUI for simple generation, an advanced web-based UI with support for extensions like LoRA and ControlNet, and a command-line interface for scripted usage or server deployments. With support for performance-oriented libraries such as OpenVINO and hardware acceleration on platforms like Intel AI PCs, FastSD CPU aims to shrink generation times dramatically compared with naive CPU implementations.
    Downloads: 43 This Week
    Last Update:
    See Project
  • 20
    Z-Image

    Z-Image

    Image generation model with single-stream diffusion transformer

    Z-Image is an efficient, open-source image generation foundation model built to make high-quality image synthesis more accessible. With just 6 billion parameters — far fewer than many large-scale models — it uses a novel “single-stream diffusion Transformer” architecture to deliver photorealistic image generation, demonstrating that excellence does not always require extremely large model sizes. The project includes several variants: Z-Image-Turbo, a distilled version optimized for speed and low resource consumption; Z-Image-Base, the full-capacity foundation model; and Z-Image-Edit, fine-tuned for image editing tasks. Despite its compact size, Z-Image produces outputs that closely rival those from much larger models — including strong rendering of bilingual (English and Chinese) text inside images, accurate prompt adherence, and good layout and composition.
    Downloads: 40 This Week
    Last Update:
    See Project
  • 21
    Hunyuan3D 2.0

    Hunyuan3D 2.0

    High-Resolution 3D Assets Generation with Large Scale Diffusion Models

    The Hunyuan3D-2 model, developed by Tencent, is designed for generating high-resolution 3D assets using large-scale diffusion models. This model offers advanced capabilities for creating detailed 3D models, including texture enhancements, multi-view shape generation, and rapid inference for real-time applications. It is particularly useful for industries requiring high-quality 3D content, such as gaming, film, and virtual reality. Hunyuan3D-2 supports various enhancements and is available for deployment through tools like Blender and Hugging Face. Includes a user-friendly production/studio tool (Hunyuan3D-Studio) to manipulate/animate meshes. Condition-aligned shape generation via the DiT model, so generated mesh is influenced by input images or prompts.
    Downloads: 35 This Week
    Last Update:
    See Project
  • 22
    DeepSeek Coder V2

    DeepSeek Coder V2

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models

    DeepSeek-Coder-V2 is the version-2 iteration of DeepSeek’s code generation models, refining the original DeepSeek-Coder line with improved architecture, training strategies, and benchmark performance. While the V1 models already targeted strong code understanding and generation, V2 appears to push further in both multilingual support and reasoning in code, likely via architectural enhancements or additional training objectives. The repository provides updated model weights, evaluation results on benchmarks (e.g. HumanEval, MultiPL-E, APPS), and new inference/serving scripts. Compared to the original, DeepSeek-Coder-V2 likely incorporates improved context management, caching strategies, or enhanced infilling capabilities. The project aims to provide a more performant and reliable open-source alternative to closed-source code models, optimized for practical usage in code completion, infilling, and code understanding across English and Chinese codebases.
    Downloads: 34 This Week
    Last Update:
    See Project
  • 23
    FLUX.1

    FLUX.1

    Official inference repo for FLUX.1 models

    FLUX.1 repository contains inference code and tooling for the FLUX.1 text-to-image diffusion models, enabling developers and researchers to generate and edit images from natural-language prompts using open-weight versions of the model on their own hardware or within custom applications. The project is part of a larger family of FLUX models developed by Black Forest Labs, designed to produce high-quality, detailed visuals from text descriptions with competitive prompt adherence and artistic fidelity. This repo focuses on running the open-source model variants efficiently, providing scripts, model loading logic, and examples for local installations, and supports integration with Python toolchains like PyTorch and popular generative pipelines. Users can launch CLI tools to generate images, experiment with different FLUX variants, and extend the base code for research-oriented applications.
    Downloads: 33 This Week
    Last Update:
    See Project
  • 24
    Stable Diffusion

    Stable Diffusion

    High-Resolution Image Synthesis with Latent Diffusion Models

    Stable Diffusion Version 2. The Stable Diffusion project, developed by Stability AI, is a cutting-edge image synthesis model that utilizes latent diffusion techniques for high-resolution image generation. It offers an advanced method of generating images based on text input, making it highly flexible for various creative applications. The repository contains pretrained models, various checkpoints, and tools to facilitate image generation tasks, such as fine-tuning and modifying the models. Stability AI's approach to image synthesis has contributed to creating detailed, scalable images while maintaining efficiency.
    Downloads: 285 This Week
    Last Update:
    See Project
  • 25
    Easy Diffusion

    Easy Diffusion

    An easy 1-click way to create beautiful artwork on your PC using AI

    Easy Diffusion is a widely used community-driven repository offering a simple, one-click way to install and use Stable Diffusion-based generative AI on a personal computer without advanced technical skills or prior setup. It provides a browser-based user interface that runs locally, allowing users to type text prompts and immediately generate images directly within their web browser, democratizing access to powerful text-to-image models for artists and hobbyists alike. The project abstracts away environment setup, dependencies, and model installation — tasks that can be daunting to beginners — and instead lets users focus on creative experimentation with prompt phrasing, model parameters, and image output settings. Because it’s designed to be easy to install and use, EasyDiffusion’s interface includes options for queuing multiple jobs, applying modifiers like upscaling or face correction, and adjusting generation parameters like guidance scale and resolution.
    Downloads: 30 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB