Download Latest Version koboldcpp-1.111.2 source code.zip (64.1 MB)
Email in envelope

Get an email when there's a new version of KoboldCpp

Home / v1.39.1
Name Modified Size InfoDownloads / Week
Parent folder
koboldcpp_nocuda.exe 2023-08-07 22.2 MB
koboldcpp.exe 2023-08-07 284.4 MB
koboldcpp-1.39.1 source code.tar.gz 2023-08-07 10.2 MB
koboldcpp-1.39.1 source code.zip 2023-08-07 10.3 MB
README.md 2023-08-07 1.3 kB
Totals: 5 Items   327.1 MB 0

koboldcpp-1.39.1

  • Fix SSE streaming to handle headers correctly during abort (Credits: @duncannah)
  • Bugfix for --blasbatchsize -1 and 1024 (fix alloc blocks error)
  • Added experimental support for --blasbatchsize 2048 (note, buffers are doubled if that is selected, using much more memory)
  • Added support for 12k and 16k --contextsize options. Please let me know if you encounter issues.
  • Pulled upstream improvements, further CUDA speedups for MMQ mode for all quant types.
  • Fix for some LLAMA 65B models being detected as LLAMA2 70B models.
  • Revert to upstream approach for CUDA pool malloc (1.39.1 - done only for MMQ).
  • Updated Lite, includes adding support for importing Tavern V2 card formats, with world info (character book) and clearer settings edit boxes.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Source: README.md, updated 2023-08-07