| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| koboldcpp_nocuda.exe | 2023-08-07 | 22.2 MB | |
| koboldcpp.exe | 2023-08-07 | 284.4 MB | |
| koboldcpp-1.39.1 source code.tar.gz | 2023-08-07 | 10.2 MB | |
| koboldcpp-1.39.1 source code.zip | 2023-08-07 | 10.3 MB | |
| README.md | 2023-08-07 | 1.3 kB | |
| Totals: 5 Items | 327.1 MB | 0 | |
koboldcpp-1.39.1
- Fix SSE streaming to handle headers correctly during abort (Credits: @duncannah)
- Bugfix for
--blasbatchsize -1and1024(fix alloc blocks error) - Added experimental support for
--blasbatchsize 2048(note, buffers are doubled if that is selected, using much more memory) - Added support for 12k and 16k
--contextsizeoptions. Please let me know if you encounter issues. - Pulled upstream improvements, further CUDA speedups for MMQ mode for all quant types.
- Fix for some LLAMA 65B models being detected as LLAMA2 70B models.
- Revert to upstream approach for CUDA pool malloc (1.39.1 - done only for MMQ).
- Updated Lite, includes adding support for importing Tavern V2 card formats, with world info (character book) and clearer settings edit boxes.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller. If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag.