| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| koboldcpp_ggml_tools_21nov.zip | 2023-11-21 | 3.4 MB | |
| koboldcpp_nocuda.exe | 2023-11-20 | 26.1 MB | |
| koboldcpp.exe | 2023-11-20 | 289.0 MB | |
| koboldcpp-1.50.1 source code.tar.gz | 2023-11-20 | 11.9 MB | |
| koboldcpp-1.50.1 source code.zip | 2023-11-20 | 12.0 MB | |
| README.md | 2023-11-20 | 2.2 kB | |
| Totals: 6 Items | 342.5 MB | 0 | |
koboldcpp-1.50.1
- Improved automatic GPU layer selection: In the GUI launcher with CuBLAS, it will now automatically select all layers to do a full GPU offload if it thinks you have enough VRAM to support it.
- Added a short delay to the Abort function in Lite, hopefully fixes the glitches with retry and abort.
- Fixed automatic RoPE values for Yi and Deepseek. If no
--ropeconfigis set, the preconfigured rope values in the model now take priority over the automatic context rope scale. - The above fix should also allow YaRN RoPE scaled models to work correctly by default, assuming the model has been correctly converted. Note: Customized YaRN configurations flags are not yet available.
- The OpenAI compatible
/v1/completionshas been enhanced, adding extra unofficial parameters that Aphrodite uses, such as Min-P, Top-A and Mirostat. However, OpenAI does not support separatememoryfields or sampler order, so the Kobold API will still give better results there. - SSE streaming support has been added for OpenAI
/v1/completionsendpoint (tested working in SillyTavern) - Custom DALL-E endpoints are now supported, for use with OAI proxies.
- Pulled fixed and improvements from upstream, updated Kobold Lite
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller. If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Hotfix 1.50.1:
- Fixed a regression with older RWKV/GPT-2/GPT-J/GPT-NeoX models that caused a segfault.
- If ropeconfig is not set, apply auto linear rope scaling multiplier for rope-tuned models such as Yi when used outside their original context limit.
- Fixed another bug in Lite with the retry/abort button.
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag.