Compare the Top AI Models for Mac as of April 2026 - Page 2

  • 1
    Ministral 8B

    Ministral 8B

    Mistral AI

    Mistral AI has introduced two advanced models for on-device computing and edge applications, named "les Ministraux": Ministral 3B and Ministral 8B. These models excel in knowledge, commonsense reasoning, function-calling, and efficiency within the sub-10B parameter range. They support up to 128k context length and are designed for various applications, including on-device translation, offline smart assistants, local analytics, and autonomous robotics. Ministral 8B features an interleaved sliding-window attention pattern for faster and more memory-efficient inference. Both models can function as intermediaries in multi-step agentic workflows, handling tasks like input parsing, task routing, and API calls based on user intent with low latency and cost. Benchmark evaluations indicate that les Ministraux consistently outperforms comparable models across multiple tasks. As of October 16, 2024, both models are available, with Ministral 8B priced at $0.1 per million tokens.
    Starting Price: Free
  • 2
    Mistral Small

    Mistral Small

    Mistral AI

    On September 17, 2024, Mistral AI announced several key updates to enhance the accessibility and performance of their AI offerings. They introduced a free tier on "La Plateforme," their serverless platform for tuning and deploying Mistral models as API endpoints, enabling developers to experiment and prototype at no cost. Additionally, Mistral AI reduced prices across their entire model lineup, with significant cuts such as a 50% reduction for Mistral Nemo and an 80% decrease for Mistral Small and Codestral, making advanced AI more cost-effective for users. The company also unveiled Mistral Small v24.09, a 22-billion-parameter model offering a balance between performance and efficiency, suitable for tasks like translation, summarization, and sentiment analysis. Furthermore, they made Pixtral 12B, a vision-capable model with image understanding capabilities, freely available on "Le Chat," allowing users to analyze and caption images without compromising text-based performance.
    Starting Price: Free
  • 3
    LTXV

    LTXV

    Lightricks

    LTXV offers a suite of AI-powered creative tools designed to empower content creators across various platforms. LTX provides AI-driven video generation capabilities, allowing users to craft detailed video sequences with full control over every stage of production. It leverages Lightricks' proprietary AI models to deliver high-quality, efficient, and user-friendly editing experiences. LTX Video uses a breakthrough called multiscale rendering, starting with fast, low-res passes to capture motion and lighting, then refining with high-res detail. Unlike traditional upscalers, LTXV-13B analyzes motion over time, front-loading the heavy computation to deliver up to 30× faster, high-quality renders.
    Starting Price: Free
  • 4
    Kimi K2

    Kimi K2

    Moonshot AI

    Kimi K2 is a state-of-the-art open source large language model series built on a mixture-of-experts (MoE) architecture, featuring 1 trillion total parameters and 32 billion activated parameters for task-specific efficiency. Trained with the Muon optimizer on over 15.5 trillion tokens and stabilized by MuonClip’s attention-logit clamping, it delivers exceptional performance in frontier knowledge, reasoning, mathematics, coding, and general agentic workflows. Moonshot AI provides two variants, Kimi-K2-Base for research-level fine-tuning and Kimi-K2-Instruct pre-trained for immediate chat and tool-driven interactions, enabling both custom development and drop-in agentic capabilities. Benchmarks show it outperforms leading open source peers and rivals top proprietary models in coding tasks and complex task breakdowns, while its 128 K-token context length, tool-calling API compatibility, and support for industry-standard inference engines.
    Starting Price: Free
  • 5
    Grok Code Fast 1
    Grok Code Fast 1 is a high-speed, economical reasoning model designed specifically for agentic coding workflows. Unlike traditional models that can feel slow in tool-based loops, it delivers near-instant responses, excelling in everyday software development tasks. Built from scratch with a programming-rich corpus and refined on real-world pull requests, it supports languages like TypeScript, Python, Java, Rust, C++, and Go. Developers can use it for everything from zero-to-one project building to precise bug fixes and codebase Q&A. With optimized inference and caching techniques, it achieves impressive responsiveness and a 90%+ cache hit rate when integrated with partners like GitHub Copilot, Cursor, and Cline. Offered at just $0.20 per million input tokens and $1.50 per million output tokens, Grok Code Fast 1 strikes a strong balance between speed, performance, and affordability.
    Starting Price: $0.20 per million input tokens
  • 6
    Kimi K2 Thinking

    Kimi K2 Thinking

    Moonshot AI

    Kimi K2 Thinking is an advanced open source reasoning model developed by Moonshot AI, designed specifically for long-horizon, multi-step workflows where the system interleaves chain-of-thought processes with tool invocation across hundreds of sequential tasks. The model uses a mixture-of-experts architecture with a total of 1 trillion parameters, yet only about 32 billion parameters are activated per inference pass, optimizing efficiency while maintaining vast capacity. It supports a context window of up to 256,000 tokens, enabling the handling of extremely long inputs and reasoning chains without losing coherence. Native INT4 quantization is built in, which reduces inference latency and memory usage without performance degradation. Kimi K2 Thinking is explicitly built for agentic workflows; it can autonomously call external tools, manage sequential logic steps (up to and typically between 200-300 tool calls in a single chain), and maintain consistent reasoning.
    Starting Price: Free
  • 7
    Mistral Large 3
    Mistral Large 3 is a next-generation, open multimodal AI model built with a powerful sparse Mixture-of-Experts architecture featuring 41B active parameters out of 675B total. Designed from scratch on NVIDIA H200 GPUs, it delivers frontier-level reasoning, multilingual performance, and advanced image understanding while remaining fully open-weight under the Apache 2.0 license. The model achieves top-tier results on modern instruction benchmarks, positioning it among the strongest permissively licensed foundation models available today. With native support across vLLM, TensorRT-LLM, and major cloud providers, Mistral Large 3 offers exceptional accessibility and performance efficiency. Its design enables enterprise-grade customization, letting teams fine-tune or adapt the model for domain-specific workflows and proprietary applications. Mistral Large 3 represents a major advancement in open AI, offering frontier intelligence without sacrificing transparency or control.
    Starting Price: Free
  • 8
    Kimi K2.5

    Kimi K2.5

    Moonshot AI

    Kimi K2.5 is a next-generation multimodal AI model designed for advanced reasoning, coding, and visual understanding tasks. It features a native multimodal architecture that supports both text and visual inputs, enabling image and video comprehension alongside natural language processing. Kimi K2.5 delivers open-source state-of-the-art performance in agent workflows, software development, and general intelligence tasks. The model offers ultra-long context support with a 256K token window, making it suitable for large documents and complex conversations. It includes long-thinking capabilities that allow multi-step reasoning and tool invocation for solving challenging problems. Kimi K2.5 is fully compatible with the OpenAI API format, allowing developers to switch seamlessly with minimal changes. With strong performance, flexibility, and developer-focused tooling, Kimi K2.5 is built for production-grade AI applications.
    Starting Price: Free
  • 9
    GLM-5

    GLM-5

    Zhipu AI

    GLM-5 is Z.ai’s latest large language model built for complex systems engineering and long-horizon agentic tasks. It scales significantly beyond GLM-4.5, increasing total parameters and training data while integrating DeepSeek Sparse Attention to reduce deployment costs without sacrificing long-context capacity. The model combines enhanced pre-training with a new asynchronous reinforcement learning infrastructure called slime, improving training efficiency and post-training refinement. GLM-5 achieves best-in-class performance among open-source models across reasoning, coding, and agent benchmarks, narrowing the gap with leading frontier models. It ranks highly on evaluations such as Vending Bench 2, demonstrating strong long-term planning and operational capabilities. The model is open-sourced under the MIT License.
    Starting Price: Free
  • 10
    Composer 2
    Composer 2 is an advanced AI coding model integrated into Cursor, designed to deliver high-level programming performance at a cost-efficient price. It is trained on long-horizon coding tasks, enabling it to solve complex problems that require multiple steps and actions. The model demonstrates strong improvements across key benchmarks, including Terminal-Bench and SWE-bench Multilingual. With enhanced intelligence and efficiency, it provides faster and more accurate code generation. Composer 2 combines strong performance with affordable pricing, making it accessible for developers and teams.
    Starting Price: $0.50/M input
  • 11
    GLM-5.1

    GLM-5.1

    Zhipu AI

    GLM-5.1 is the latest iteration of Z.ai’s GLM series, designed as a frontier-level, agent-oriented AI model optimized for coding, reasoning, and long-horizon workflows. It builds on the GLM-5 architecture, which uses a Mixture-of-Experts (MoE) design to deliver high performance while keeping inference costs efficient, and is part of a broader push toward open-weight, developer-accessible models. A core focus of GLM-5.1 is enabling agentic behavior, meaning it can plan, execute, and iterate across multi-step tasks rather than simply responding to single prompts. It is specifically designed to handle complex workflows such as debugging code, navigating repositories, and executing chained operations with sustained context. Compared to earlier models, GLM-5.1 improves reliability in long interactions, maintaining coherence across extended sessions and reducing breakdowns in multi-step reasoning.
    Starting Price: Free
  • 12
    Qwen3.6-Max-Preview
    Qwen3.6-Max-Preview is a next-generation frontier language model designed to push the limits of intelligence, instruction following, and real-world agent capabilities within the Qwen ecosystem. Building on the Qwen3 series, this preview release introduces stronger world knowledge, sharper instruction alignment, and significant improvements in agentic coding performance, enabling the model to better handle complex, multi-step tasks and software engineering workflows. It is engineered for advanced reasoning and execution scenarios, where the model not only generates responses but also interacts with tools, processes long contexts, and supports structured problem-solving across domains such as coding, research, and enterprise workflows. The architecture continues the Qwen focus on large-scale, high-efficiency models capable of handling extensive context windows and delivering consistent performance across multilingual and knowledge-intensive tasks.
    Starting Price: Free
  • 13
    Kimi K2.6

    Kimi K2.6

    Moonshot AI

    Kimi K2.6 is a next-generation agentic AI model developed by Moonshot AI, designed to push forward real-world execution, coding, and multi-step reasoning beyond earlier K2 and K2.5 versions. It builds on a Mixture-of-Experts architecture and the multimodal, agent-first foundation of the Kimi series, combining language understanding, coding, and tool use into a single system capable of planning and executing complex workflows. It introduces deeper reasoning capabilities and significantly improved agent planning, allowing it to break down tasks, coordinate tools, and handle multi-file or multi-step problems with greater accuracy and efficiency. It supports advanced tool calling with high reliability, enabling integration with external systems such as web search or APIs, and includes built-in validation mechanisms to ensure correct execution formats.
    Starting Price: Free
  • 14
    Jurassic-2
    Announcing the launch of Jurassic-2, the latest generation of AI21 Studio’s foundation models, a game-changer in the field of AI, with top-tier quality and new capabilities. And that's not all, we're also releasing our task-specific APIs, with plug-and-play reading and writing capabilities that outperform competitors. Our focus at AI21 Studio is to help developers and businesses leverage reading and writing AI to build real-world products with tangible value. Today marks two important milestones with the release of Jurassic-2 and Task-Specific APIs, empowering you to bring generative AI to production. Jurassic-2 (or J2, as we like to call it) is the next generation of our foundation models with significant improvements in quality and new capabilities including zero-shot instruction-following, reduced latency, and multi-language support. Task-specific APIs provide developers with industry-leading APIs that perform specialized reading and writing tasks out-of-the-box.
    Starting Price: $29 per month
  • 15
    Stable LM

    Stable LM

    Stability AI

    Stable LM: Stability AI Language Models. The release of Stable LM builds on our experience in open-sourcing earlier language models with EleutherAI, a nonprofit research hub. These language models include GPT-J, GPT-NeoX, and the Pythia suite, which were trained on The Pile open-source dataset. Many recent open-source language models continue to build on these efforts, including Cerebras-GPT and Dolly-2. Stable LM is trained on a new experimental dataset built on The Pile, but three times larger with 1.5 trillion tokens of content. We will release details on the dataset in due course. The richness of this dataset gives Stable LM surprisingly high performance in conversational and coding tasks, despite its small size of 3 to 7 billion parameters (by comparison, GPT-3 has 175 billion parameters). Stable LM 3B is a compact language model designed to operate on portable digital devices like handhelds and laptops, and we’re excited about its capabilities and portability.
    Starting Price: Free
  • 16
    Dolly

    Dolly

    Databricks

    Dolly is a cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT. Whereas the work from the Alpaca team showed that state-of-the-art models could be coaxed into high quality instruction-following behavior, we find that even years-old open source models with much earlier architectures exhibit striking behaviors when fine tuned on a small corpus of instruction training data. Dolly works by taking an existing open source 6 billion parameter model from EleutherAI and modifying it ever so slightly to elicit instruction following capabilities such as brainstorming and text generation not present in the original model, using data from Alpaca.
    Starting Price: Free
  • 17
    mT5

    mT5

    Google

    Multilingual T5 (mT5) is a massively multilingual pretrained text-to-text transformer model, trained following a similar recipe as T5. This repo can be used to reproduce the experiments in the mT5 paper. mT5 is pretrained on the mC4 corpus, covering 101 languages: Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, and more.
    Starting Price: Free
  • 18
    Cerebras-GPT
    State-of-the-art language models are extremely challenging to train; they require huge compute budgets, complex distributed compute techniques and deep ML expertise. As a result, few organizations train large language models (LLMs) from scratch. And increasingly those that have the resources and expertise are not open sourcing the results, marking a significant change from even a few months back. At Cerebras, we believe in fostering open access to the most advanced models. With this in mind, we are proud to announce the release to the open source community of Cerebras-GPT, a family of seven GPT models ranging from 111 million to 13 billion parameters. Trained using the Chinchilla formula, these models provide the highest accuracy for a given compute budget. Cerebras-GPT has faster training times, lower training costs, and consumes less energy than any publicly available model to date.
    Starting Price: Free
  • 19
    Falcon-40B

    Falcon-40B

    Technology Innovation Institute (TII)

    Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license. Why use Falcon-40B? It is the best open-source model currently available. Falcon-40B outperforms LLaMA, StableLM, RedPajama, MPT, etc. See the OpenLLM Leaderboard. It features an architecture optimized for inference, with FlashAttention and multiquery. It is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions. ⚠️ This is a raw, pretrained model, which should be further finetuned for most usecases. If you are looking for a version better suited to taking generic instructions in a chat format, we recommend taking a look at Falcon-40B-Instruct.
    Starting Price: Free
  • 20
    Falcon-7B

    Falcon-7B

    Technology Innovation Institute (TII)

    Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license. Why use Falcon-7B? It outperforms comparable open-source models (e.g., MPT-7B, StableLM, RedPajama etc.), thanks to being trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. See the OpenLLM Leaderboard. It features an architecture optimized for inference, with FlashAttention and multiquery. It is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions.
    Starting Price: Free
  • 21
    RedPajama

    RedPajama

    RedPajama

    Foundation models such as GPT-4 have driven rapid improvement in AI. However, the most powerful models are closed commercial models or only partially open. RedPajama is a project to create a set of leading, fully open-source models. Today, we are excited to announce the completion of the first step of this project: the reproduction of the LLaMA training dataset of over 1.2 trillion tokens. The most capable foundation models today are closed behind commercial APIs, which limits research, customization, and their use with sensitive data. Fully open-source models hold the promise of removing these limitations, if the open community can close the quality gap between open and closed models. Recently, there has been much progress along this front. In many ways, AI is having its Linux moment. Stable Diffusion showed that open-source can not only rival the quality of commercial offerings like DALL-E but can also lead to incredible creativity from broad participation by communities.
    Starting Price: Free
  • 22
    Vicuna

    Vicuna

    lmsys.org

    Vicuna-13B is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90%* quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90%* of cases. The cost of training Vicuna-13B is around $300. The code and weights, along with an online demo, are publicly available for non-commercial use.
    Starting Price: Free
  • 23
    MPT-7B

    MPT-7B

    MosaicML

    Introducing MPT-7B, the latest entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k. Now you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch. For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+, the last of which uses a context length of 65k tokens!
    Starting Price: Free
  • 24
    OpenLLaMA

    OpenLLaMA

    OpenLLaMA

    OpenLLaMA is a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset. Our model weights can serve as the drop in replacement of LLaMA 7B in existing implementations. We also provide a smaller 3B variant of LLaMA model.
    Starting Price: Free
  • 25
    GPT4All

    GPT4All

    Nomic AI

    GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade CPUs. The goal is simple - be the best instruction-tuned assistant-style language model that any person or enterprise can freely use, distribute and build on. A GPT4All model is a 3GB - 8GB file that you can download and plug into the GPT4All open-source ecosystem software. Nomic AI supports and maintains this software ecosystem to enforce quality and security alongside spearheading the effort to allow any person or enterprise to easily train and deploy their own on-edge large language models. Data is one the most important ingredients to successfully building a powerful, general-purpose large language model. The GPT4All community has built the GPT4All open source data lake as a staging ground for contributing instruction and assistant tuning data for future GPT4All model trains.
    Starting Price: Free
  • 26
    ChatGLM

    ChatGLM

    Zhipu AI

    ChatGLM-6B is an open-source, Chinese-English bilingual dialogue language model based on the General Language Model (GLM) architecture with 6.2 billion parameters. Combined with model quantization technology, users can deploy locally on consumer-grade graphics cards (only 6GB of video memory is required at the INT4 quantization level). ChatGLM-6B uses technology similar to ChatGPT, optimized for Chinese Q&A and dialogue. After about 1T identifiers of Chinese and English bilingual training, supplemented by supervision and fine-tuning, feedback self-help, human feedback reinforcement learning and other technologies, ChatGLM-6B with 6.2 billion parameters has been able to generate answers that are quite in line with human preferences.
    Starting Price: Free
  • 27
    Jan

    Jan

    Jan

    10x productivity with customizable AI assistants, global hotkeys, and in-line AI. Seamless integration into your mobile workflows with elegant features. Conversations, preferences, and model usage stay on your computer—secure, exportable, and can be deleted at any time.
    Starting Price: Free
  • 28
    Mixtral 8x7B

    Mixtral 8x7B

    Mistral AI

    Mixtral 8x7B is a high-quality sparse mixture of experts model (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT-3.5 on most standard benchmarks.
    Starting Price: Free
  • 29
    Llama 3
    We’ve integrated Llama 3 into Meta AI, our intelligent assistant, that expands the ways people can get things done, create and connect with Meta AI. You can see first-hand the performance of Llama 3 by using Meta AI for coding tasks and problem solving. Whether you're developing agents, or other AI-powered applications, Llama 3 in both 8B and 70B will offer the capabilities and flexibility you need to develop your ideas. With the release of Llama 3, we’ve updated the Responsible Use Guide (RUG) to provide the most comprehensive information on responsible development with LLMs. Our system-centric approach includes updates to our trust and safety tools with Llama Guard 2, optimized to support the newly announced taxonomy published by MLCommons expanding its coverage to a more comprehensive set of safety categories, code shield, and Cybersec Eval 2.
    Starting Price: Free
  • 30
    Codestral

    Codestral

    Mistral AI

    We introduce Codestral, our first-ever code model. Codestral is an open-weight generative AI model explicitly designed for code generation tasks. It helps developers write and interact with code through a shared instruction and completion API endpoint. As it masters code and English, it can be used to design advanced AI applications for software developers. Codestral is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash. It also performs well on more specific ones like Swift and Fortran. This broad language base ensures Codestral can assist developers in various coding environments and projects.
    Starting Price: Free
MongoDB Logo MongoDB