Download Latest Version koboldcpp-1.111.2 source code.zip (64.1 MB)
Email in envelope

Get an email when there's a new version of KoboldCpp

Home / v1.80.3
Name Modified Size InfoDownloads / Week
Parent folder
koboldcpp_cu12.exe 2024-12-23 596.2 MB
koboldcpp.exe 2024-12-23 477.9 MB
koboldcpp-mac-arm64 2024-12-23 27.6 MB
koboldcpp-linux-x64-nocuda 2024-12-23 64.6 MB
koboldcpp-linux-x64-cuda1210 2024-12-23 671.2 MB
koboldcpp-linux-x64-cuda1150 2024-12-23 586.0 MB
koboldcpp_oldcpu.exe 2024-12-23 478.1 MB
koboldcpp_nocuda.exe 2024-12-23 64.1 MB
koboldcpp-1.80.3 source code.tar.gz 2024-12-23 29.1 MB
koboldcpp-1.80.3 source code.zip 2024-12-23 29.5 MB
README.md 2024-12-23 4.9 kB
koboldcpp_tools_19dec.zip 2024-12-19 14.9 MB
Totals: 12 Items   3.0 GB 0

koboldcpp-1.80.3

End of the year edition

image

  • NEW: Added support for image Multimodal with Qwen2-VL! You can grab the quantized mmproj here for the 2B and 7B models, and then grab the 2B or 7B Instruct models from Bartowski.
  • Note: Qwen2-VL will use CPU for CLIP when on Metal and Vulkan. Works fine on CUDA and CPU. Follow https://github.com/ggerganov/llama.cpp/issues/10843
  • For a quick start, here's a working template you can use
  • NEW: Vulkan now has coopmat1 support, making it significantly faster on modern Nvidia cards (credits @0cc4m)
  • Added a few new QoL flags:
  • --moeexperts - Overwrite the number of experts to use in MoE models
  • --failsafe - A proper way to set failsafe mode, which disables all CPU intrinsics and GPU usage.
  • --draftgpulayers - Set number of layers to offload for speculative decoding draft model
  • --draftgpusplit - GPU layer distribution ratio for draft model (default=same as main). Only works if using multi-GPUs.
  • Fixes for buggy tkinter GUI launcher window in Linux (thanks @henk717)
  • Restored support for ARM quants in Kobold (e.g. Q4_0_4_4), but you should consider switching to q4_0 eventually.
  • Fixed a bug that caused context corruption when aborting a generation while halfway processing a prompt
  • Added new field suppress_non_speech to Whisper allowing banning "noise annotation" logits (e.g. Barking, Doorbell, Chime, Muzak)
  • Improved compile flags on ARM, self-compiled builds now use correct native flags and should be significantly faster (tested on Pi and Termux). Simply run make for native ARM builds, or make LLAMA_PORTABLE=1 for a slower portable build.
  • trim_stop now defaults to true (output will no longer contain stop sequence by default)
  • Debugmode shows drafted tokens and allow incompatibles vocab for speculative decoding when enabled (not recommended)
  • Handle more generation parameters in ollama API emulation
  • Handle pyinstaller temp paths for chat adapters when saving a kcpps config file
  • Default image gen sampler set to Euler
  • MMQ is now the default for CLI as well. Use nommq flag to disable (e.g. --usecublas all nommq). Old flags still work.
  • Upgrade build to use C++17
  • Always use PCI Bus ID order for CUDA GPU listing consistency (match nvidia-smi)
  • Updated Kobold Lite, multiple fixes and improvements
  • NEW: Added LaTeX rendering together with markdown. Uses standard \[...\] \(...\) and $$...$$ syntax.
  • You can now manually upload an audio file to transcribe in settings.
  • Better regex to trigger image generation
  • Aesthetic UI fixes
  • Added q as an alias to query for direct URL querying (e.g. http://localhost:5001?q=what+is+love)
  • Added support for AllTalk v2 API. AllTalk v1 is still supported automatically (credits @erew123)
  • Added support for Mantella XTTS (XTTS fork)
  • Toggle to disable "non-speech" whisper output (see above)
  • Consolidated Instruct templates (Mistral V3 merged to V7)
  • Merged fixes and improvements from upstream

Hotfix 1.80.1 - Fixed macOS and vulkan clip for qwen2-vl Hotfix 1.80.2 - Fixed drafting EOS issue Hotfix 1.80.3 - Fixed clblast oldcpu not getting set correctly

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller. If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster). If you're using Linux, select the appropriate Linux binary file instead (not exe). If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary. If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Source: README.md, updated 2024-12-23