Showing 106 open source projects for "git:/git.code.sf.net/p/docfetcher/code"

View related business solutions
  • Award-winning proxy networks, AI-powered web scrapers, and business-ready datasets for download.
 Icon
    Award-winning proxy networks, AI-powered web scrapers, and business-ready datasets for download.


    How the world collects public web data

    Bright Data is a leading data collection platform, enabling businesses to collect crucial structured and unstructured data from millions of websites through our proprietary technology. Our proxy networks give you access to sophisticated target sites using precise geo-targeting. You can also use our tools to unblock tough target sites, accomplish SERP-specific data collection tasks, manage and optimize your proxy performance as well as automating all of your data collection needs.
    Learn More
  • Powerful Website Security | Continuous Web Threat Platform Icon
    Powerful Website Security | Continuous Web Threat Platform

    Continuously detect, prioritize, and validate web threats to quickly mitigate security, privacy, and compliance risks.

    Reflectiz is a comprehensive web exposure management platform that helps organizations proactively identify, monitor, and mitigate security, privacy, and compliance risks across their online environments. Designed to address the growing complexity of modern websites, Reflectiz provides full visibility and control over first, third, and even fourth-party components, such as scripts, trackers, and open-source libraries that often evade traditional security tools.
    Learn More
  • 1
    LlamaDeploy

    LlamaDeploy

    Deploy your agentic worfklows to production

    ...The project provides an asynchronous architecture that allows developers to deploy complex multi-agent workflows as scalable microservices. It enables teams to move from experimental prototypes to production systems with minimal changes to existing LlamaIndex code, making it easier to operationalize AI agents. The system supports orchestrating multiple services, handling communication between agents, and managing workflow execution in distributed environments. Developers can define workflows that involve multiple steps such as data retrieval, reasoning, tool invocation, and response generation, then deploy them using the framework’s infrastructure tools. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    self-llm

    self-llm

    Tutorial tailored for Chinese babies on rapid fine-tuning

    self-llm is an open source educational project created by the Datawhale community that serves as a practical guide for deploying, fine-tuning, and using open-source large language models on Linux systems. The repository focuses on helping beginners and developers understand how to run and customize modern LLMs locally rather than relying solely on hosted APIs. It provides step-by-step tutorials covering environment setup, model deployment, inference workflows, and efficient fine-tuning...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    Agentless

    Agentless

    An agentless approach to automatically solve software development

    Agentless is an open-source framework that applies large language models to automatically resolve software development issues without relying on complex autonomous agent systems. The project proposes an alternative approach to AI-driven code repair that avoids the overhead of multi-agent orchestration by using a structured pipeline for identifying and fixing bugs. When solving a problem, the system first performs localization to determine which files, functions, or code segments are most likely responsible for the issue. It then generates multiple candidate patches for the identified locations using language model reasoning and diff-style edits. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    II Agent

    II Agent

    A new open-source framework to build and deploy intelligent agents

    ...The platform allows users to interact with multiple AI models within a single environment while connecting those models to external services and knowledge sources. Through a unified interface, users can switch between models, access specialized tools, and execute tasks that require information retrieval, code execution, or file analysis. The architecture focuses on transforming traditional software tools into autonomous assistants capable of completing tasks independently based on user instructions. II-Agent supports integration with modern AI services and can coordinate interactions between different models and capabilities within the same workflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MicroStation by Bentley Systems is the trusted computer-aided design (CAD) software built specifically for infrastructure design. Icon
    MicroStation by Bentley Systems is the trusted computer-aided design (CAD) software built specifically for infrastructure design.

    Microstation enables architects, engineers, and designers to create precise 2D and 3D drawings that bring complex projects to life.

    MicroStation is the only computer-aided design software for infrastructure design, helping architects and engineers like you bring their vision to life, present their designs to their clients, and deliver their projects to the community.
    Learn More
  • 5
    llms-from-scratch-cn

    llms-from-scratch-cn

    Build a large language model from 0 only with Python foundation

    ...Through a collection of notebooks, code examples, and translated learning materials, users can explore how to implement components such as multi-head attention, data loaders, and training pipelines using Python and PyTorch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Agentic RAG for Dummies

    Agentic RAG for Dummies

    A modular Agentic RAG built with LangGraph

    Agentic RAG for Dummies is an educational repository that demonstrates how to build retrieval-augmented generation systems combined with autonomous AI agents. The project explains the principles behind agentic retrieval pipelines where language models can dynamically decide when to retrieve information, analyze results, and plan further actions. Instead of relying on static retrieval pipelines, the system shows how agents can orchestrate retrieval, reasoning, and tool usage in a more...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Qwen3 Embedding

    Qwen3 Embedding

    Designed for text embedding and ranking tasks

    ...It achieves state-of-the-art performance on benchmarks like MTEB (Multilingual Text Embedding Benchmark) and supports instruction-aware embedding (i.e. embedding task instructions along with queries) and flexible embedding/vector dimension definitions. It is meant for tasks such as text retrieval, classification, clustering, bitext mining, and code retrieval.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    OmniBox

    OmniBox

    Collect, organize, use, and share, all in OmniBox

    ...Inspired by the omnibox concept used in modern browsers, the system combines search functionality with command execution so that users can access information and perform tasks without navigating complex menus. The mirrored distribution on SourceForge exists to provide an additional download source and preserve access to the software’s source code independent of its original repository. Tools like Omnibox typically emphasize extensibility, allowing developers to add plugins or integrations that connect the interface to other systems such as APIs, search engines, or automation tools.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Awesome LLM Apps

    Awesome LLM Apps

    Collection of awesome LLM apps with AI Agents and RAG using OpenAI

    ...The list spans a wide range of categories including productivity tools, creative assistants, utilities, education platforms, research frameworks, and niche vertical apps, showcasing how generative models are being used across domains. Each entry includes a brief description, language model dependencies, technology stack notes, and sometimes links to demos or source code, making it easy to explore ideas and reuse concepts for your own projects. Because the landscape of LLM-powered applications changes quickly, the repository is designed to be updated regularly through community contributions, ensuring it stays current with new tools and releases.
    Downloads: 3 This Week
    Last Update:
    See Project
  • PeerGFS PEER Software - File Sharing and Collaboration Icon
    PeerGFS PEER Software - File Sharing and Collaboration

    One Solution to Simplify File Management and Orchestration Across Edge, Data Center, and Cloud Storage

    PeerGFS is a software-only solution developed to solve file management/file replication challenges in multi-site, multi-platform, and hybrid multi-cloud environments.
    Learn More
  • 10
    LLMs-from-scratch

    LLMs-from-scratch

    Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

    LLMs-from-scratch is an educational codebase that walks through implementing modern large-language-model components step by step. It emphasizes building blocks—tokenization, embeddings, attention, feed-forward layers, normalization, and training loops—so learners understand not just how to use a model but how it works internally. The repository favors clear Python and NumPy or PyTorch implementations that can be run and modified without heavyweight frameworks obscuring the logic. Chapters...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    OSS-Fuzz Gen

    OSS-Fuzz Gen

    LLM powered fuzzing via OSS-Fuzz

    OSS-Fuzz-Gen is a companion project that helps automatically create or improve fuzz targets for open-source codebases, aiming to increase coverage in OSS-Fuzz with minimal maintainer effort. It analyses a library’s APIs, examples, and tests to propose harnesses that exercise parsers, decoders, or protocol handlers—precisely the code where fuzzing pays off. The system integrates with modern LLM-assisted workflows to draft harness code and then iterates based on build errors or low coverage signals. Importantly, it aligns with OSS-Fuzz conventions, generating corpus seeds, build rules, and sanitizer settings so projects can plug in quickly. Reports highlight what functions were targeted, how coverage evolved, and where manual hints could unlock more paths. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    ...It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. Code & examples provided with Hugging Face transformers, and usage via AutoProcessor, model classes etc. High performance on many standard benchmarks: ASR, speech-emotion recognition, vocal sound classification, speech translation etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    langrocks

    langrocks

    Tools like web browser, computer access and code runner for LLMs

    Langrocks is a programming language experimentation toolkit that enables developers to create, test, and optimize custom programming languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Curated Transformers

    Curated Transformers

    PyTorch library of curated Transformer models and their components

    State-of-the-art transformers, brick by brick. Curated Transformers is a transformer library for PyTorch. It provides state-of-the-art models that are composed of a set of reusable components. Supports state-of-the-art transformer models, including LLMs such as Falcon, Llama, and Dolly v2. Implementing a feature or bugfix benefits all models. For example, all models support 4/8-bit inference through the bitsandbytes library and each model can use the PyTorch meta device to avoid unnecessary...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    files-to-prompt

    files-to-prompt

    Concatenate a directory full of files into a single prompt

    ...It includes rich filtering controls, letting you limit by extension, include or skip hidden files, and ignore paths that match glob patterns or .gitignore rules. The output format is flexible: you can emit plain text, Markdown with fenced code blocks, or a Claude-XML style format designed for structured multi-file prompts. It can read file paths from stdin (including NUL-separated paths), which makes it easy to combine with find, rg, or other shell tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    LLaMA Models

    LLaMA Models

    Utilities intended for use with Llama models

    ...The project’s issues and releases reflect an actively used coordination point for the ecosystem, where guidance, utilities, and compatibility notes are published. It complements separate repos that carry code and demos (for example inference kernels or cookbook content) by keeping authoritative metadata and specs here. Model lineages and size variants are documented externally (e.g., Llama 3.x and beyond), with this repo providing the “single source of truth” links and utilities. In practice, teams use llama-models as a reference when selecting variants, aligning licenses, and wiring in helper scripts for deployment.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Guardrails

    Guardrails

    Adding guardrails to large language models

    Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs). At the heart of Guardrails is the rail spec. rail is intended to be a language-agnostic, human-readable format for specifying structure and type information, validators and corrective actions over LLM outputs. We create a RAIL spec to describe the expected structure and types of the LLM output, the quality criteria for the output to be considered valid,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    how-to-optim-algorithm-in-cuda

    how-to-optim-algorithm-in-cuda

    How to optimize some algorithm in cuda

    how-to-optim-algorithm-in-cuda is an open educational repository focused on teaching developers how to optimize algorithms for high-performance execution on GPUs using CUDA. The project combines technical notes, code examples, and practical experiments that demonstrate how common computational kernels can be optimized to improve speed and memory efficiency. Instead of presenting only theoretical explanations, the repository includes hand-written CUDA implementations of fundamental operations such as reductions, element-wise computations, softmax, and attention mechanisms. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    llm.c

    llm.c

    LLM training in simple, raw C/CUDA

    llm.c is a minimalist, systems-level implementation of a small transformer-based language model in C that prioritizes clarity and educational value. By stripping away heavy frameworks, it exposes the core math and memory flows of embeddings, attention, and feed-forward layers. The code illustrates how to wire forward passes, losses, and simple training or inference loops with direct control over arrays and buffers. Its compact design makes it easy to trace execution, profile hotspots, and understand the cost of each operation. Portability is a goal: it aims to compile with common toolchains and run on modest hardware for small experiments. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Mosec

    Mosec

    A high-performance ML model serving framework, offers dynamic batching

    Mosec is a high-performance and flexible model-serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    embedchain

    embedchain

    Framework to easily create LLM powered bots over any dataset

    Embedchain is a framework to easily create LLM-powered bots over any dataset. If you want a javascript version, check out embedchain-js. Embedchain empowers you to create chatbot models similar to ChatGPT, using your own evolving dataset. Start building LLM powered bots under 30 seconds.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ModernBERT

    ModernBERT

    Bringing BERT into modernity via both architecture changes and scaling

    ModernBERT is an open-source research project that modernizes the classic BERT encoder architecture by incorporating recent advances in transformer design, training techniques, and efficiency improvements. The goal of the project is to bring BERT-style models up to date with the capabilities of modern large language models while preserving the strengths of bidirectional encoder architectures used for tasks such as classification, retrieval, and semantic search. ModernBERT introduces...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Purple Llama

    Purple Llama

    Set of tools to assess and improve LLM security

    ...Its scope spans input and output safeguards, cybersecurity-focused evaluations, and reference shields that can be inserted at inference time. The project evolves as a hub for safety research artifacts like Llama Guard and Code Shield, along with dataset specs and how-to guides for integrating checks into applications. CyberSecEval, one of its flagship components, provides repeatable evaluations for security risk, including agent-oriented tasks such as automated patching benchmarks. The aim is to make safety practical: ship testable baselines, publish metrics, and provide drop-in implementations that reduce friction for teams adopting Llama. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    LLMs-Zero-to-Hero

    LLMs-Zero-to-Hero

    From nobody to big model (LLM) hero

    LLMs-Zero-to-Hero is an open-source educational project designed to guide learners through the complete process of understanding and building large language models from the ground up. The repository presents a structured learning pathway that begins with fundamental concepts in machine learning and progresses toward advanced topics such as model pre-training, fine-tuning, and deployment. Rather than relying entirely on existing frameworks, the project encourages readers to implement...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Happy-LLM

    Happy-LLM

    Large Language Model Principles and Practice Tutorial from Scratch

    Happy-LLM is an open-source educational project created by the Datawhale AI community that provides a structured and comprehensive tutorial for understanding and building large language models from scratch. The project guides learners through the entire conceptual and practical pipeline of modern LLM development, starting with foundational natural language processing concepts and gradually progressing to advanced architectures and training techniques. It explains the Transformer...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB