Download Latest Version koboldcpp-1.111.2 source code.zip (64.1 MB)
Email in envelope

Get an email when there's a new version of KoboldCpp

Home / v1.43
Name Modified Size InfoDownloads / Week
Parent folder
koboldcpp.exe 2023-09-07 285.5 MB
koboldcpp_nocuda.exe 2023-09-07 22.9 MB
koboldcpp-1.43 source code.tar.gz 2023-09-07 10.6 MB
koboldcpp-1.43 source code.zip 2023-09-07 10.8 MB
README.md 2023-09-07 2.3 kB
Totals: 5 Items   329.8 MB 0

koboldcpp-1.43

  • Re-added support for automatic rope scale calculations based on a model's training context (n_ctx_train), this triggers if you do not explicitly specify a --ropeconfig. For example, this means llama2 models will (by default) use a smaller rope scale compared to llama1 models, for the same specified --contextsize. Setting --ropeconfig will override this. This was bugged and removed in the previous release, but it should be working fine now.
  • HIP and CUDA visible devices set to that GPU only, if GPU number is provided and tensor split is not specified.
  • Fixed RWKV models being broken after recent upgrades.
  • Tweaked --unbantokens to decrease the banned token logit values further, as very rarely they could still appear. Still not using -inf as that causes issues with typical sampling.
  • Integrate SSE streaming improvements from @kalomaze
  • Added mutex for thread-safe polled-streaming from @Elbios
  • Added support for older GGML (ggjt_v3) for 34B llama2 models by @vxiiduu, note that this may still have issues if n_gqa is not 1, in which case using GGUF would be better.
  • Fixed support for Windows 7, which should work in noavx2 and failsafe modes again. Also, SSE3 flags are now enabled for failsafe mode.
  • Updated Kobold Lite, now uses placeholders for instruct tags that get swapped during generation.
  • Tab navigation order improved in GUI launcher, though some elements like checkboxes still require mouse to toggle.
  • Pulled other fixes and improvements from upstream.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI. and then once loaded, you can connect like this (or use the full koboldai client): http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Of Note:

  • Reminder that HIPBLAS requires self compilation, and is not included by default in the prebuilt executables.
  • Remember that token unbans can now be set via API (and Lite) in addition to the command line.
Source: README.md, updated 2023-09-07