Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "gpu as computation"

x

Sort By:

Relevance

OS

Linux 50
Windows 50
Mac 47
More...
BSD 21
ChromeOS 21
Mobile Operating Systems 1

Category

Artificial Intelligence 23
Software Development 19
Business 5
Multimedia 3
Scientific/Engineering 3
System 2
Blockchain 1
Security 1

License

OSI-Approved Open Source 46
Other License 2

Translations

English 2

Programming Language

Python 25
C++ 12
C 4
Rust 3
More...
JavaScript 2
Julia 2
TypeScript 2
BASIC 1
GLSL (OpenGL Shading Language) 1
Haskell 1
MATLAB 1
Objective C 1
Unix Shell 1

Status

Alpha 2
Planning 1
Pre-Alpha 1
Beta 1
More...
Production/Stable 1

Showing 59 open source projects for "gpu as computation"

View related business solutions

Failed Payment Recovery for Subscription Businesses
For subscription companies searching for a failed payment recovery solution to grow revenue, and retain customers.

FlexPay’s innovative platform uses multiple technologies to achieve the highest number of retained customers, resulting in reduced involuntary churn, longer life span after recovery, and higher revenue. Leading brands like LegalZoom, Hooked on Phonics, and ClinicSense trust FlexPay to recover failed payments, reduce churn, and increase customer lifetime value.

Learn More
Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight
Lock Down Any Resource, Anywhere, Anytime

CLEAR by Quantum Knight is a FIPS-140-3 validated encryption SDK engineered for enterprises requiring top-tier security. Offering robust post-quantum cryptography, CLEAR secures files, streaming media, databases, and networks with ease across over 30 modern platforms. Its compact design, smaller than a single smartphone image, ensures maximum efficiency and low energy consumption.

Learn More
1

GPU Puzzles

Solve puzzles. Learn CUDA

GPU Puzzles is an educational project designed to teach GPU programming concepts through interactive coding exercises and puzzles. Instead of presenting traditional lecture-style explanations, the project immerses learners directly in hands-on programming tasks that demonstrate how GPU computation works. The exercises are implemented using Python with the Numba CUDA interface, which allows Python code to compile into GPU kernels that run on CUDA-enabled hardware. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
2

Shumai

Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

...It can automatically leverage GPU acceleration on Linux (via CUDA) and CPU computation on macOS.

Downloads: 0 This Week

Last Update: 4 days ago
See Project
3

DualPipe

A bidirectional pipeline parallelism algorithm

DualPipe is a bidirectional pipeline parallelism algorithm open-sourced by DeepSeek, introduced in their DeepSeek-V3 technical framework. The main goal of DualPipe is to maximize overlap between computation and communication phases during distributed training, thus reducing idle GPU time (i.e. “pipeline bubbles”) and improving cluster efficiency. Traditional pipeline parallelism methods (e.g. 1F1B or staggered pipelining) leave gaps because forward and backward phases can’t fully overlap with communication. DualPipe addresses that by scheduling micro-batches from both ends of the pipeline in a bidirectional fashion—i.e. some micro-batches flow forward while others flow backward—so that computation on one partition can coincide with communication for another.

Downloads: 0 This Week

Last Update: 2025-12-25
See Project
4

Qulacs

Variational Quantum Circuit Simulator for Quantum Computation Research

Variational Quantum Circuit Simulator for Quantum Computation Research. Qulacs is a Python/C++ library for fast simulation of large, noisy, or parametric quantum circuits. Qulacs is developed at QunaSys, Osaka University, NTT, and Fujitsu.

Downloads: 0 This Week

Last Update: 2026-02-03
See Project
Iris Powered By Generali - Iris puts your customer in control of their identity.
Increase customer and employee retention by offering Onwatch identity protection today.

Iris Identity Protection API sends identity monitoring and alerts data into your existing digital environment – an ideal solution for businesses that are looking to offer their customers identity protection services without having to build a new product or app from scratch.

Learn More
5

Flash-MoE

Running a big model on a small laptop

Flash-MoE is a high-performance implementation of mixture-of-experts (MoE) architectures designed to optimize the efficiency and scalability of large AI models. It focuses on accelerating routing and computation by leveraging optimized kernels and memory management techniques, allowing models to dynamically select specialized sub-networks during inference. The project aims to reduce the computational cost typically associated with MoE systems while maintaining or improving performance. It likely includes support for GPU acceleration and parallel processing, enabling it to handle large-scale workloads effectively. ...

Downloads: 0 This Week

Last Update: 2026-04-02
See Project
6

FlexLLMGen

Running large language models on a single GPU

...The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. This design allows organizations to deploy powerful language models for high-volume tasks without the infrastructure costs typically associated with large-scale AI systems. The project is particularly useful for workloads that prioritize throughput over latency, including benchmarking experiments and large corpus analysis.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
7

ImplicitGlobalGrid.jl

Distributed parallelization of stencil-based GPU and CPU applications

...Samuel Omlin) with Stanford University (Dr. Ludovic Räss) and the Swiss Geocomputing Centre (Prof. Yuri Podladchikov). It renders the distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid almost trivial and enables close to ideal weak scaling of real-world applications on thousands of GPUs [1, 2, 3]. ImplicitGlobalGrid relies on the Julia MPI wrapper (MPI.jl) to perform halo updates close to hardware limit and leverages CUDA-aware or ROCm-aware MPI for GPU-applications. ...

Downloads: 0 This Week

Last Update: 2026-01-08
See Project
8

CubeCL

Multi-platform high-performance compute language extension for Rust

CubeCL is a low-level compute language and compiler framework designed to simplify and optimize GPU programming for high-performance workloads, particularly in machine learning and numerical computing. It provides an abstraction layer that allows developers to write portable, hardware-efficient compute kernels without directly dealing with complex GPU APIs such as CUDA or OpenCL. CubeCL focuses on delivering predictable performance and composability by exposing explicit control over memory...

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
9

AirLLM

AirLLM 70B inference with single 4GB GPU

AirLLM is an open source Python library that enables extremely large language models to run on consumer hardware with very limited GPU memory. The project addresses one of the main barriers to local LLM experimentation by introducing a memory-efficient inference technique that loads model layers sequentially rather than storing the entire model in GPU memory. This layer-wise inference approach allows models with tens of billions of parameters to run on devices with only a few gigabytes of VRAM. ...

Downloads: 1 This Week

Last Update: 2026-03-10
See Project
Simplify Purchasing For Your Business
Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.

Learn More
10

SwissGL

SwissGL is a minimalistic wrapper on top of WebGL2 JS API

SwissGL is a compact JavaScript library that provides a streamlined abstraction layer over the WebGL2 API, designed to minimize boilerplate when building GPU-accelerated graphics, simulations, and procedural visualizations. Acting as a "Swiss Army knife" for WebGL2, it simplifies shader, texture, and framebuffer management into a single, expressive interface that enables developers to write complex GPU workflows in just a few lines of code. The library centers around one main function that...

Downloads: 3 This Week

Last Update: 3 days ago
See Project
11

PyTorch

Open source machine learning framework

PyTorch is a Python package that offers Tensor computation (like NumPy) with strong GPU acceleration and deep neural networks built on tape-based autograd system. This project allows for fast, flexible experimentation and efficient production. PyTorch consists of torch (Tensor library), torch.autograd (tape-based automatic differentiation library), torch.jit (a compilation stack [TorchScript]), torch.nn (neural networks library), torch.multiprocessing (Python multiprocessing), and torch.utils (DataLoader and other utility functions). ...

Downloads: 106 This Week

Last Update: 2026-03-24
See Project
12

Halide

A language for fast, portable data-parallel computation

Halide is a programming language for fast, portable data-parallel computation. It was designed to make writing high-performance image and array processing code much easier on modern machines. It works on all major operating systems and with several CPU architectures (X86, ARM, MIPS, Hexagon, PowerPC) and GPU Compute APIs (CUDA, OpenCL, OpenGL, among others). It isn't a standalone programming language however; rather it is embedded in C++ which means that you write C++ code, building an in-memory representation of a Halide pipeline using Halide's C++ API. ...

Downloads: 0 This Week

Last Update: 2025-09-17
See Project
13

Zoo Design Studio

The Zoo Design Studio app

...Users can interact with the system through a familiar point-and-click interface, but every action is translated into code in the underlying modeling language, ensuring consistency between visual and programmatic representations. The application is powered by a GPU-first geometry engine that streams rendered output as video frames, enabling high-performance modeling even when heavy computation is offloaded to remote infrastructure. It uses WebSockets for real-time communication between the client and the modeling engine, allowing immediate feedback and interactive design updates.

Downloads: 9 This Week

Last Update: 4 days ago
See Project
14

LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

LLaMA-Factory is a fine-tuning and training framework for Meta's LLaMA language models. It enables researchers and developers to train and customize LLaMA models efficiently using advanced optimization techniques.

Downloads: 6 This Week

Last Update: 2025-12-31
See Project
15

CTranslate2

Fast inference engine for Transformer models

...The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU. The execution is significantly faster and requires less resources than general-purpose deep learning frameworks on supported models and tasks thanks to many advanced optimizations: layer fusion, padding removal, batch reordering, in-place operations, caching mechanism, etc. The model serialization and computation support weights with reduced precision: 16-bit floating points (FP16), 16-bit integers (INT16), and 8-bit integers (INT8). ...

Downloads: 13 This Week

Last Update: 2026-02-04
See Project
16

Hasktorch

Tensors and neural networks in Haskell

Hasktorch is a powerful Haskell library for tensor computation and neural network modeling, built on top of libtorch (the backend of PyTorch). It brings differentiable programming, automatic differentiation, and efficient tensor operations into Haskell’s strongly typed functional paradigm. This project is in active development, so expect changes to the library API as it evolves. We would like to invite new users to join our Hasktorch discord space for questions and discussions....

Downloads: 0 This Week

Last Update: 2025-09-04
See Project
17

TensorLy

Tensor Learning in Python

TensorLy is a Python library that aims at making tensor learning simple and accessible. It allows to easily perform tensor decomposition, tensor learning and tensor algebra. Its backend system allows to seamlessly perform computation with NumPy, PyTorch, JAX, TensorFlow, CuPy or Paddle, and run methods at scale on CPU or GPU.

Downloads: 0 This Week

Last Update: 2024-11-11
See Project
18

Profile Data

Analyze computation-communication overlap in V3/R1

profile-data is a repository that publishes profiling traces and metrics from DeepSeek’s training and inference infrastructure (especially during DeepSeek-V3 / R1 experiments). The profiling data targets insights into computation-communication overlap, pipeline scheduling (e.g. DualPipe), and how MoE / EP / parallelism strategies interact in real systems. The repository contains JSON trace files like train.json, prefill.json, decode.json, and associated assets. Users can load them into tools like Chrome tracing to inspect GPU idle times, overlapping operations, and scheduling alignment. ...

Downloads: 2 This Week

Last Update: 2025-10-03
See Project
19

Faster Whisper

Faster Whisper transcription with CTranslate2

Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...

Downloads: 14 This Week

Last Update: 2026-04-06
See Project
20

clip-retrieval

Easily compute clip embeddings and build a clip retrieval system

clip-retrieval is an open-source toolkit designed to build large-scale semantic search systems for images and text by leveraging CLIP embeddings to enable multimodal retrieval. It allows developers to compute embeddings for both images and text efficiently and then index them for fast similarity search across massive datasets. The system is optimized for performance and scalability, capable of processing tens or even hundreds of millions of embeddings using GPU acceleration. It includes...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
21

Meridian

Meridian is an MMM framework

...The framework provides a robust foundation for constructing in-house MMM pipelines capable of handling both national and geo-level data, with built-in support for calibration using experimental data or prior knowledge. Meridian uses the No-U-Turn Sampler (NUTS) for Markov Chain Monte Carlo (MCMC) sampling to produce statistically rigorous results, and it includes GPU acceleration to significantly reduce computation time.

Downloads: 0 This Week

Last Update: 5 hours ago
See Project
22

uzu

A high-performance inference engine for AI models

uzu is a high-performance inference engine designed to run artificial intelligence models efficiently on Apple Silicon hardware. Written primarily in Rust and leveraging Apple’s Metal framework, the project focuses on maximizing performance when executing large language models and other AI workloads on devices such as Mac computers with M-series chips. The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph...

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
23

Koila

Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code

Koila is a lightweight Python library designed to help developers avoid memory errors when training deep learning models with PyTorch. The library introduces a lazy evaluation mechanism that delays computation until it is actually required, allowing the framework to better estimate the memory requirements of a model before execution. By building a computational graph first and executing operations only when necessary, koila reduces the risk of running out of GPU memory during the forward pass of neural network training. This approach enables developers to experiment with larger batch sizes and more complex architectures while maintaining stable training behavior. ...

Downloads: 1 This Week

Last Update: 5 days ago
See Project
24

EvoTorch

Advanced evolutionary computation library built on top of PyTorch

EvoTorch is an evolutionary optimization framework built on top of PyTorch, developed by NNAISENSE. It is designed for large-scale optimization problems, particularly those that require evolutionary algorithms rather than gradient-based methods.

Downloads: 0 This Week

Last Update: 2025-05-14
See Project
25

Bend

A massively parallel, high-level programming language

Bend is an interactive programming environment (REPL) built on top of the Kotlin language, designed to allow users to explore, experiment, and learn Kotlin in a live, feedback-driven manner. The tool lets you define variables, functions, or values at the prompt and iteratively refine them—immediately seeing output and types—while preserving state across commands. It emphasizes discoverability and experimentation: users can inspect functions, call them on sample inputs, and evolve logic...

Downloads: 0 This Week

Last Update: 2025-09-21
See Project

Previous
You're on page 1
2
3
Next

Related Searches

logic gate simulator

pytorch

cuda

quantum circuit

circuit simulator

third party 2506

machine learning

artificial intelligence

anaconda

aimbot free fire

Related Categories

Artificial Intelligence

Software Development

Business

Multimedia

Scientific/Engineering

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise