• 1
    gpu_poor

    gpu_poor

    Calculate token/s & GPU memory requirement for any LLM

    ...By analyzing factors such as model size, context length, batch size, and GPU specifications, the system estimates how much VRAM will be required and how fast tokens can be generated during inference. The tool also provides a detailed breakdown of where GPU memory is allocated, including model weights, KV cache, activations, and other runtime overhead. This information allows developers to evaluate trade-offs between different quantization methods such as GGML, bitsandbytes, and QLoRA before attempting to deploy a model. gpu_poor is particularly useful for researchers and hobbyists.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB