LLMOps

LLM Observability & Tracing

  • Agenta-AI/agenta - Agenta is an open-source LLMOps platform designed to accelerate the development of reliable LLM applications, offering integrated prompt management, evaluation, and observability features.
  • Arize-ai/openinference - OpenInference provides conventions and instrumentation for OpenTelemetry to enable detailed tracing and observability of AI applications, especially those built with LLMs and agents.
  • Arize-ai/phoenix - Phoenix is an open-source AI observability platform for LLM application experimentation, evaluation, and troubleshooting, providing tracing, evaluation, dataset management, prompt management, and a...
  • chirpz-ai/pandaprobe - PandaProbe is an open-source agent engineering platform for collaboratively tracing, evaluating, monitoring, and debugging AI agents, with integrations for LangGraph, CrewAI, and other agent SDKs.
  • cloudshipai/station - Station is an open-source, Git-backed runtime for deploying and orchestrating intelligent multi-agent AI systems on self-hosted infrastructure with built-in evaluation and observability.
  • comet-ml/opik - Opik is an open-source platform for comprehensive observability, evaluation, and optimization of LLM applications, RAG systems, and agentic workflows.
  • cyberark/agentwatch - Agentwatch is a platform-agnostic observability framework for monitoring AI agent interactions, LLM calls, and tool usage across various AI development frameworks, providing real-time insights and ...
  • deepsense-ai/ragbits - Ragbits is a toolkit for rapid development and operation of GenAI applications, providing building blocks for LLM integration, RAG processing, multi-agent workflows, observability, and testing.
  • evilmartians/agent-prism - AgentPrism is an open-source library of React components for visualizing traces from AI agents, turning complex OpenTelemetry and Langfuse data into clear, debuggable diagrams.
  • Helicone/helicone - Helicone is an open-source LLM observability platform and AI gateway that provides monitoring, evaluation, prompt management, and intelligent routing for large language models.
  • however-yir/knowledgeops-agent - KnowledgeOps Agent is an enterprise-grade Spring AI platform designed for multi-tenant RAG, tool calling, and agent workflow orchestration, featuring robust security, observability, and evaluation ...
  • Javis603/token-monitor - A real-time desktop widget to monitor token usage, AI limits, and costs across various AI coding tools, featuring multi-device synchronization and historical usage trends.
  • langfuse/langfuse - Langfuse is an open-source LLM engineering platform for developing, monitoring, evaluating, and debugging AI applications, offering observability, prompt management, and evaluation capabilities.
  • langfuse/oss-llmops-stack - An open-source, modular LLMOps stack combining LiteLLM for LLM API unification, routing, and cost control, with Langfuse for detailed observability, prompt versioning, and performance evaluation in...
  • latitude-dev/latitude-llm - Latitude is an open-source AI monitoring platform that provides issue detection, human-aligned evaluations, and agent-native tracing for LLM applications and AI agents.
  • liaohch3/claude-tap - A local proxy and trace viewer for AI coding agents, capturing and inspecting API traffic to debug agent behavior and analyze prompts, messages, and tool definitions.
  • lmnr-ai/lmnr - Laminar is an open-source observability platform purpose-built for AI agents, offering tracing, evaluation, AI monitoring, SQL access, dashboards, and data annotation for LLM-based applications.
  • msfirebird/claw-lens - An open-source, local-first observability dashboard for OpenClaw AI agents, providing cost analytics, live monitoring, deep session inspection, and security auditing.
  • openlit/openlit - OpenLIT is an open-source platform offering OpenTelemetry-native observability for LLMs, including GPU monitoring, guardrails, evaluations, prompt management, and API key vault, to streamline AI de...
  • palico-ai/palico-ai - Palico AI is an integrated framework for iterative development, evaluation, and production of LLM applications, offering tools for building, improving performance, and debugging.
  • pydantic/logfire - Pydantic Logfire is an observability platform for Python applications, providing detailed insights into production systems, including those leveraging LLMs and FastAPI, built on OpenTelemetry.
  • raga-ai-hub/RagaAI-Catalyst - RagaAI Catalyst is a Python SDK for comprehensive observability, monitoring, and evaluation of AI agents and LLM applications, offering tracing, debugging, and advanced analytics.
  • Scale3-Labs/langtrace - Langtrace is an open-source, OpenTelemetry-based observability tool providing real-time tracing, evaluations, and metrics for LLM applications, including LLMs, LLM frameworks, and vector databases.
  • stainlu/hermes-labyrinth - Hermes Labyrinth is a read-only observability plugin for Hermes Agent, visualizing autonomous agent journeys and interactions with prompts, tools, and memory into a navigable map.
  • traceloop/openllmetry - OpenLLMetry provides open-source observability for LLM applications by extending OpenTelemetry to capture traces and metrics from LLM providers, vector databases, and AI frameworks.
  • traceloop/openllmetry-js - OpenLLMetry-JS provides open-source observability for LLM applications in JavaScript/TypeScript, built on OpenTelemetry to trace interactions with LLM providers and vector databases.
  • traceroot-ai/traceroot - TraceRoot is an open-source observability and self-healing platform for AI agents, providing tracing, AI-powered debugging, and detectors for production issues like hallucinations and tool failures.
  • VasiHemanth/tokentelemetry - TokenTelemetry is a 100% local, open-source observability dashboard for AI coding and autonomous agents, tracking token usage, costs, tool calls, and session traces.
  • vllora/vllora - vLLora is a lightweight, real-time debugging and observability tool for AI agents, providing tracing and analysis of LLM interactions via an OpenAI-compatible API.
  • VoltAgent/voltagent - VoltAgent is an end-to-end AI Agent Engineering Platform offering an open-source TypeScript framework for building intelligent agents and a VoltOps Console for observability, automation, deployment...

↑ Back to TOC

LLM Evaluation & Testing

  • alphadl/AdaRubrics - AdaRubric offers task-adaptive rubrics and dense reward signals for evaluating LLM agent trajectories, enhancing evaluation reliability and reward learning.
  • athina-ai/athina-evals - A Python SDK offering 50+ preset and custom evaluations for LLM-generated responses, integrating with the Athina IDE for experimentation and dataset comparison.
  • confident-ai/deepeval - DeepEval is an open-source LLM evaluation framework, offering a variety of metrics and tools for assessing the performance of AI agents, RAG pipelines, and chatbots through unit testing.
  • coze-dev/coze-loop - Cozeloop is an open-source platform offering full-lifecycle management for AI agents, encompassing development, debugging, evaluation, and monitoring.
  • cvs-health/langfair - LangFair is a Python library for conducting use-case level bias and fairness assessments of large language models (LLMs) by allowing users to bring their own prompts for evaluation.
  • cvs-health/uqlm - UQLM is a Python library for detecting and mitigating hallucination in Large Language Model (LLM) outputs using uncertainty quantification techniques.
  • cyberark/FuzzyAI - FuzzyAI is an automated LLM fuzzing tool designed to identify and mitigate potential jailbreaks and security vulnerabilities in LLM APIs.
  • darkrishabh/agent-skills-eval - A test runner for Agent Skills that evaluates the effectiveness of AI agent skills by comparing model performance with and without a skill, using a judge model for grading.
  • evidentlyai/evidently - Evidently is an open-source Python framework for evaluating, testing, and monitoring ML and LLM systems, providing comprehensive data and model quality checks from experiments to production.
  • EvolvingLMMs-Lab/lmms-eval - LMMs-Eval is a unified, reproducible, and efficient evaluation toolkit for multimodal large language models (LMMs) across diverse tasks like text, image, video, and audio.
  • future-agi/future-agi - Future AGI is an open-source, end-to-end platform for evaluating, observing, simulating, and protecting LLM and AI agent applications, offering tracing, evals, guardrails, and a performant gateway.
  • GiovanniPasq/chunky - Chunky is an open-source toolkit for preparing documents for Retrieval Augmented Generation (RAG) pipelines, offering PDF-to-Markdown conversion, cleaning, chunk inspection, and chunking strategy c...
  • Giskard-AI/giskard-oss - Giskard is an open-source Python library for testing and evaluating agentic systems and LLM applications, offering tools for scenario-based testing, red teaming, and vulnerability scanning.
  • hegelai/prompttools - PromptTools provides open-source utilities for experimenting with, testing, and evaluating prompts, LLMs, and vector databases through code, notebooks, and a local playground.
  • ianarawjo/ChainForge - ChainForge is an open-source visual programming environment designed for battle-testing, comparing, and evaluating prompts and LLM responses across different models and settings.
  • ifixai-ai/iFixAi - iFixAi is a diagnostic tool that evaluates AI models and agents for operational misalignment, including fabrication, manipulation, deception, unpredictability, and opacity, by running up to 45 insp...
  • iMeanAI/WebCanvas - WebCanvas is an open-source framework for building, training, and evaluating LLM-based web agents in dynamic, real-time online environments.
  • JinjieNi/MixEval - MixEval is a dynamic, ground-truth-based benchmark and evaluation suite for large language and multimodal models.
  • juanjuandog/FinSight-AI - FinSight AI is an open-source AI equity research agent that develops evidence-grounded reports with resilient workflow orchestration, RAG evaluation, and comprehensive backend infrastructure.
  • JudgmentLabs/judgeval - Judgeval is an open-source Python SDK enabling continuous improvement for AI agents through OpenTelemetry-based tracing, agent-judge evaluations, and online monitoring of LLM-powered applications.
  • langwatch/better-agents - Better Agents is a CLI tool and set of standards for building, testing, and collaborating on AI agents, integrating with various frameworks and coding assistants for production readiness.
  • langwatch/langwatch - LangWatch is a platform for end-to-end LLM evaluations, AI agent testing, and observability, offering tools for simulations, performance monitoring, prompt optimization, and an AI gateway for gover...
  • LeoYeAI/myclaw-bench - MyClaw Bench provides a comprehensive benchmark for evaluating AI agents on OpenClaw, featuring 45 tasks across four difficulty tiers with a focus on real-world outcomes and complex reasoning.
  • LLAMATOR-Core/llamator - LLAMATOR is a Python framework for red teaming and security testing of chatbots, Generative AI systems, LLMs, RAGs, Agents, and Vision Language Models (VLMs) against various attacks and vulnerabili...
  • Marker-Inc-Korea/AutoRAG - AutoRAG is an open-source framework designed to automate the evaluation and optimization of Retrieval-Augmented Generation (RAG) pipelines using an AutoML-style approach for specific datasets.
  • msoedov/agentic_security - Agentic Security is an open-source vulnerability scanner and AI red teaming kit designed to test Large Language Models (LLMs) and agent workflows against jailbreaks, fuzzing, and multimodal attacks.
  • NVIDIA/garak - Garak is an open-source LLM vulnerability scanner designed to red-team and assess generative AI models for weaknesses like hallucination, data leakage, prompt injection, and toxicity.
  • onyx-dot-app/EnterpriseRAG-Bench - EnterpriseRAG-Bench offers a benchmark dataset and evaluation framework for RAG systems using realistic company internal documents and a diverse set of questions.
  • PacificAI/langtest - LangTest is an open-source library for testing and evaluating Large Language Models and NLP models for various quality aspects like robustness, bias, fairness, and accuracy.
  • plurai-ai/intellagent - IntellAgent evaluates and optimizes conversational AI agents through simulated, realistic synthetic interactions to uncover failure points and improve performance.
  • prometheus-eval/prometheus-eval - Prometheus-Eval is a framework and a collection of open-source LLM judges designed for evaluating the quality of LLM responses in generation tasks, supporting both absolute grading and pairwise ran...
  • promptfoo/promptfoo - Promptfoo is a CLI and library for evaluating LLM applications, offering automated testing, red teaming, and vulnerability scanning for prompts, models, agents, and RAGs.
  • Raudaschl/rag-fusion - RaG-Fusion enhances RAG via multi-query generation and Reciprocal Rank Fusion to improve retrieval, especially for term mismatches, including an evaluation harness with NFCorpus/BEIR.
  • relari-ai/continuous-eval - continuous-eval is an open-source framework for data-driven, modular evaluation of LLM-powered applications, offering a comprehensive metric library and probabilistic evaluation capabilities.
  • rhesis-ai/rhesis - Rhesis is an open-source collaborative testing platform for LLM and agentic applications, providing AI-powered test generation, conversation simulation, adversarial testing, and comprehensive evalu...
  • superlinear-ai/raglite - RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) that provides configurable components for LLMs, vector databases, and rerankers, with optimized strategies for chunking, retriev...
  • TIGER-AI-Lab/ClawBench - ClawBench is an open-source benchmark for evaluating AI browser agents on a diverse set of everyday online tasks across live websites, measuring end-to-end task success.
  • truera/trulens - TruLens is an open-source framework for systematically evaluating and tracking LLM applications and AI agents, providing fine-grained instrumentation and comprehensive feedback functions.
  • uptrain-ai/uptrain - UpTrain is an open-source platform providing evaluation and monitoring for Generative AI applications, offering preconfigured checks, root cause analysis, and production monitoring for LLMs.
  • vectara/open-rag-eval - An open-source Python toolkit for evaluating Retrieval-Augmented Generation (RAG) pipelines, offering flexible metrics and connectors without requiring golden answers.
  • vibrantlabsai/ragas - Ragas is an evaluation framework for LLM applications that provides objective metrics, test data generation, and feedback loops for continuous improvement.
  • ZhangJinHaHaHa/AgentLens - AgentLens is a decentralized marketplace and infrastructure for AI Agents, providing verifiable proof of capabilities, security, and track record using on-chain auditing, TEE attestation, and ZK pr...

↑ Back to TOC

Prompt Management

  • agentmark-ai/agentmark - AgentMark is an open-source platform for defining, managing, and evaluating AI agent prompts, datasets, and traces directly within a Git repository using OpenTelemetry for observability.
  • austin-starks/Promptimizer - An automated, AI-powered framework that uses genetic algorithms and machine learning to optimize LLM prompts, illustrated with an AI-driven stock screening example.
  • BoundaryML/baml - BAML (Basically a Made-up Language) is an AI framework that facilitates reliable AI workflows and agents by transforming prompt engineering into schema engineering for structured output generation.
  • bujue3709/GPT-Conversation-Toolkit - A browser extension for ChatGPT web platform, offering conversation management, export, search, prompt management, and timeline navigation features to enhance user experience.
  • dot-agent/nextpy - Nextpy is a framework for building self-modifying software, focusing on guardrails, structured outputs, a powerful prompt engine for pre-compiling and session state, and optimized code generation f...
  • genkit-ai/genkit - Genkit is an open-source framework by Google for building and operating AI-powered applications across multiple languages, featuring unified APIs for various models, structured outputs, multi-modal...
  • langfuse/mcp-server-langfuse - A Model Context Protocol (MCP) server that integrates with Langfuse to provide prompt discovery, retrieval, and management capabilities.
  • lastmile-ai/aiconfig - AIConfig is an open-source framework for building and managing generative AI applications by version-controlling prompts, models, and parameters as JSON-serializable configurations.
  • microsoft/aici - AICI (Artificial Intelligence Controller Interface) allows building Wasm-based controllers to constrain and direct LLM output in real time, enabling advanced generation strategies.
  • microsoft/prompty - Prompty is a markdown-based file format (.prompty) and runtime for creating, managing, and executing LLM prompts, providing tools for development, previewing, and tracing.
  • minipuft/claude-prompts - A prompt template server for Claude, enabling hot-reload, thinking frameworks, and quality gates for crafting reusable prompts and orchestrating agentic workflows.
  • neuron-core/neuron-ai - Neuron is a PHP framework for building and orchestrating AI agents, supporting LLM integration, prompt management, RAG, multi-agent workflows, and observability.
  • Open-Source-Legal/OpenContracts - OpenContracts is an open-source document intelligence platform that processes unstructured documents into a programmable citation graph using AI agents, structured extraction, and a Model Context P...
  • patterns-ai-core/langchainrb - Langchain.rb provides a Ruby interface for building LLM-powered applications, offering a unified API for various LLM providers, prompt management, and RAG capabilities.
  • pezzolabs/pezzo - Pezzo is an open-source, cloud-native LLMOps platform for streamlined prompt design, version management, instant delivery, collaboration, troubleshooting, and observability of AI operations.
  • SynaLinks/synalinks - SynaLinks is an open-source neuro-symbolic framework for creating, training, evaluating, and deploying advanced LLM-based applications like RAGs, autonomous agents, and self-evolving reasoning syst...

↑ Back to TOC

LLM Gateways & Proxies

  • adaline/gateway - Adaline Gateway is a fully local, production-grade SDK providing a unified interface for calling over 300+ LLMs with built-in features like batching, retries, caching, callbacks, and OpenTelemetry ...
  • agentgateway/agentgateway - Agentgateway is an open-source proxy offering unified connectivity and governance for AI agents and LLM providers, encompassing security, observability, and advanced traffic management features.
  • ai-forever/gpt2giga - A FastAPI proxy that translates OpenAI- and Anthropic-compatible API requests to the GigaChat API, enabling seamless integration of GigaChat with existing LLM applications.
  • apache/apisix - Apache APISIX is a dynamic, real-time, high-performance API Gateway that can also function as an AI Gateway, providing AI proxying, load balancing for LLMs, and robust security for AI agents.
  • atopos31/llmio - LLMIO is a Go-based LLM load-balancing gateway providing a unified API, weighted scheduling, observability, and an admin UI for managing various LLM providers.
  • BerriAI/litellm - LiteLLM is an open-source AI Gateway and Python SDK providing a unified interface to over 100 LLM providers, with features like cost tracking, guardrails, load balancing, and observability.
  • bestruirui/octopus - Octopus is a self-hosted LLM API aggregation and load balancing service that provides a unified gateway for multiple LLM providers, intelligent routing, and analytics for cost and usage tracking.
  • bionic-gpt/bionic-gpt - Bionic is an on-premise, secure, and scalable LLM gateway and RAG platform designed to replace ChatGPT while maintaining data confidentiality and offering advanced features like AI assistants, toke...
  • bitrouter/bitrouter - BitRouter is an open-source, local-first LLM router built in Rust that optimizes AI agent performance and cost by dynamically routing requests to the most appropriate LLM, supporting multiple provi...
  • caidaoli/ccLoad - ccLoad is an AI API gateway that provides smart routing, automatic failover, exponential cooldown, multi-URL scheduling, real-time monitoring, and cost control for various LLM APIs.
  • casdoor/casdoor - Casdoor is an open-source, "AI-first" Identity and Access Management (IAM) and Model Context Protocol (MCP) gateway, providing authentication and authorization for AI applications and agents.
  • Chleba/ollamaMQ - ollamaMQ is a high-performance, asynchronous proxy and load balancer for Ollama and LM Studio APIs, providing multi-backend load balancing, fair-share queuing, model-aware routing, and a real-time ...
  • coaidev/coai - CoAI.Dev is a next-generation, multi-tenant LLM gateway and AIGC solution offering unified API access, load balancing, cost management, and various AI application features for over 200 models from ...
  • dataiku/kiji-proxy - Kiji Privacy Proxy is an intelligent privacy layer for AI APIs that automatically detects and masks personally identifiable information (PII) in requests to AI services.
  • decolua/9router - 9Router is an AI router and token saver that connects various AI coding tools to over 40 AI providers, optimizing usage with auto-fallback, quota tracking, and token compression.
  • diegosouzapw/OmniRoute - OmniRoute is a free AI gateway that unifies access to over 170 AI providers, offering token compression, auto-fallback, and aggregating free tiers to provide billions of free tokens monthly.
  • dwgx/WindsurfAPI - WindsurfAPI is an OpenAI and Anthropic compatible API proxy that translates requests to Windsurf's internal gRPC protocol, providing access to over 100 LLM models with account pooling, rate limitin...
  • ENTERPILOT/GoModel - GoModel is a fast, lightweight AI gateway written in Go, providing a unified OpenAI-compatible API for various LLM providers with observability, guardrails, streaming, and cost tracking.
  • envoyproxy/ai-gateway - Envoy AI Gateway leverages Envoy Gateway to manage and optimize request traffic flows to various Generative AI services and self-hosted models, offering a tiered gateway approach for robust AI infr...
  • Fast-Editor/Lynkr - Lynkr is an HTTP proxy CLI tool designed to optimize interactions with LLMs, particularly for AI coding assistants, by compressing tokens, managing caching, and routing requests for efficiency and ...
  • ferro-labs/ai-gateway - Ferro Labs AI Gateway is a high-performance Go-native LLM gateway for routing requests across 30+ providers with features like caching, guardrails, A/B testing, and cost controls.
  • guanxiaol/WindsurfPoolAPI - WindsurfPoolAPI is an enterprise-grade multi-account pool proxy for the Windsurf AI platform, supporting over 113 models via OpenAI and Anthropic APIs with features like load balancing and token an...
  • higress-group/higress - Higress is a cloud-native AI gateway based on Istio and Envoy, providing unified management, observability, and traffic control for LLM APIs and Model Context Protocol (MCP) servers.
  • intentee/paddler - Paddler is an open-source LLM/VLM load balancer and serving platform for self-hosting and scaling models, built around llama.cpp for efficient inference with dynamic model swapping and observability.
  • kaitranntt/ccs - CCS is a multi-provider profile and runtime manager for various AI models and APIs, enabling seamless switching between Claude, Gemini, Copilot, OpenRouter, and local models without configuration o...
  • katanemo/plano - Plano is an AI-native proxy and data plane for agentic applications, providing built-in orchestration, safety, observability, and intelligent LLM routing to simplify the production deployment of AI...
  • Kenza-AI/sagify - Sagify simplifies LLM and ML model deployment, management, and inference on AWS SageMaker, featuring an LLM Gateway for unified access to various large language models.
  • Kong/kong - Kong Gateway is a cloud-native API and AI gateway offering high performance, extensibility via plugins, and advanced AI traffic capabilities including multi-LLM support, semantic security, and cach...
  • LeenHawk/gproxy - GPROXY is a Rust-based, high-performance, multi-provider LLM proxy server that unifies OpenAI, Claude, and Gemini-style APIs, offering multi-tenant authorization, rate limiting, quota management, a...
  • maximhq/bifrost - Bifrost is a high-performance AI gateway that unifies access to over 23 providers through a single OpenAI-compatible API, offering features like automatic failover, load balancing, semantic caching...
  • Mintplex-Labs/anything-llm - AnythingLLM is an all-in-one local-first AI application for chatting with documents, managing AI agents, and integrating with various LLMs and vector databases, offering dynamic model routing and m...
  • mnfst/manifest - Manifest is a sophisticated LLM gateway and router designed to optimize AI application costs and performance by intelligently routing queries to the most suitable LLM provider and model.
  • NadirRouter/NadirClaw - NadirClaw is an open-source LLM router and AI cost optimizer that intelligently routes prompts to different language models based on complexity, reducing API costs by 40-70% through an OpenAI-compa...
  • Nayjest/lm-proxy - LM-Proxy is a lightweight, OpenAI-compatible HTTP LLM proxy/gateway for multi-provider inference (Google, Anthropic, OpenAI, PyTorch), supporting real-time streaming, API key management, and dynami...
  • nextlevelbuilder/goclaw - GoClaw is a multi-tenant AI agent platform and gateway built in Go, enabling the deployment and orchestration of AI agent teams with extensive LLM provider support, sophisticated memory management,...
  • Nexus-Router/nexus - Nexus is an AI gateway that unifies access to multiple LLM providers and Model Context Protocol (MCP) servers, offering robust routing, security, and governance for AI stacks.
  • nyroway/nyro - Nyro is a self-hosted AI gateway that translates protocols between different AI tools and model providers, enabling interoperability and flexible model routing without code changes.
  • octelium/octelium - Octelium is a self-hosted zero-trust secure access platform operating as a ZTNA, VPN, API/AI/MCP gateway, and PaaS, with specific features for AI/LLM gateway functionality.
  • open-bias/open-bias - Open Bias is an open-source reliability harness that acts as a proxy between applications and LLM providers to enforce runtime rules and policies, preventing off-policy behavior.
  • packyme/privacy-filter - Privacy Filter is a Go-based LLM privacy gateway that redacts PII and secrets from text with millisecond latency, specifically designed to ensure data privacy before reaching large language models.
  • PAIArtCom/Clipal - Clipal is a local LLM API gateway and reverse proxy designed for developer productivity, offering unified access, failover, and key management for AI coding assistants like Claude Code, Codex CLI, ...
  • peva3/SmarterRouter - SmarterRouter is an intelligent LLM gateway and VRAM-aware router that profiles models, aggregates benchmarks, and automatically routes queries to the best available LLM, supporting local and exter...
  • Portkey-AI/gateway - Portkey AI Gateway is a fast, open-source AI gateway designed for routing requests to over 1,600 LLMs, featuring integrated guardrails, automatic retries, and load balancing for reliable and secure...
  • QuantumNous/new-api - new-api is a unified LLM gateway and AI asset management system that enables aggregation, distribution, and cross-conversion of various LLMs into OpenAI, Claude, or Gemini compatible formats, offer...
  • reshaprio/reshapr - reShapr is an open-source, no-code MCP Server that transforms traditional REST, GraphQL, and gRPC APIs into LLM-friendly tools, optimizing context windows and enabling AI-native API access.
  • romgX/openrelay - OpenRelay is an AI model router and proxy that unifies numerous free and paid AI model quotas into a single local endpoint, enabling their use across various AI tools and IDEs.
  • schmitech/orbit - ORBIT is a self-hosted AI gateway and retrieval-adapter layer designed for private, multi-model RAG applications, offering secure inference, data retrieval, and agentic tool-calling capabilities.
  • starbaser/ccproxy - ccproxy is a CLI-based transparent network interceptor and proxy for LLM clients, enabling cross-provider routing, request/response transformation, and custom hooks for various large language models.
  • taichuy/1flowbase - 1flowbase is an open-source virtual model gateway that allows users to build multi-model workflows, publish them as OpenAI/Claude-compatible endpoints, and gain visibility into trace, token, latenc...
  • theopenco/llmgateway - LLM Gateway is an open-source API gateway for Large Language Models, providing unified access, API key management, usage analytics, multi-provider routing, and performance monitoring for various LL...
  • ThinkWatchProject/ThinkWatch - ThinkWatch is an enterprise-grade AI gateway for secure, audited, and governed access to AI APIs and Multi-Cloud Provider (MCP) tools, providing unified proxying, RBAC, rate limiting, and cost trac...
  • thushan/olla - Olla is a high-performance, lightweight proxy and load balancer for LLM infrastructure, providing intelligent routing, automatic failover, and unified model discovery across diverse inference backe...
  • TPIsoftwareOSPO/digiRunner-Open-Source - digiRunner is an enterprise-grade API Gateway that acts as a unified control plane for both microservices and AI services, providing governance, cost control, and prompt management for LLMs.
  • traceloop/hub - Traceloop Hub is a high-performance, OpenTelemetry-based LLM gateway written in Rust, centralizing control and tracing of LLM calls across multiple providers with built-in observability.
  • vllm-project/semantic-router - vLLM Semantic Router is an intelligent routing system designed for managing and orchestrating diverse AI/ML models (mixture-of-models) across various environments, focusing on efficiency, safety, a...
  • voidmind-io/voidllm - VoidLLM is a privacy-first, self-hosted LLM proxy and AI gateway designed for teams, offering features like load balancing, multi-provider routing, API key management, usage tracking, and rate limi...
  • w8123/EnterpriseAgentFramework - EnterpriseAgentFramework is a Java/Spring Boot platform for registering, governing, orchestrating, and exposing enterprise APIs as AI capabilities for agents, focusing on production-grade AI operat...
  • Writesonic/GPTRouter - GPTRouter is an AI model gateway for managing multiple LLMs and image models, providing universal API access, smart fallbacks, automatic retries, and reduced latency for reliable AI application per...

↑ Back to TOC

AI Safety & Guardrails

  • agentcontrol/agent-control - Agent Control provides a centralized control plane for enforcing runtime guardrails and safety policies for AI agents, blocking prompt injections, PII leakage, and other risks.
  • cuga-project/cuga-agent - CUGA is an open-source generalist agent harness for enterprises, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aw...
  • deadbits/vigil-llm - Vigil is a security scanner for LLM prompts and responses, designed to detect prompt injections, jailbreaks, and other adversarial attacks using various scanning methods like vector databases, YARA...
  • Justin0504/Aegis - Aegis is a pre-execution firewall for AI agents, providing runtime policy enforcement, cryptographic audit trails, human-in-the-loop approvals, and a kill switch without code changes.
  • microsoft/agent-governance-toolkit - AI Agent Governance Toolkit (AGT) provides policy enforcement, identity management, execution sandboxing, and reliability engineering to secure autonomous AI agents in production.
  • microsoft/presidio - Presidio is an open-source framework by Microsoft for detecting, redacting, masking, and anonymizing sensitive data (PII/PHI) across text, images, and structured data, suitable as an AI safety guar...
  • NVIDIA-NeMo/Guardrails - NVIDIA NeMo Guardrails is an open-source toolkit for adding programmable guardrails to LLM-based conversational applications, focusing on safety, security, and controlled dialog.
  • pegasi-ai/reins - Reins provides security controls for AI agents by enforcing deterministic policies, scanning for vulnerabilities, tracking drift with an immutable audit trail, and intervening on risky actions.
  • privacera/paig - PAIG (Privacera AI Guardrails) is an open-source framework designed to protect Generative AI applications by ensuring security, safety, and observability for responsible AI deployment.
  • protectai/llm-guard - LLM Guard is a comprehensive open-source security toolkit designed to fortify Large Language Model (LLM) interactions by providing robust sanitization, malicious content detection, data leakage pre...
  • SponsioLabs/Sponsio - Sponsio provides deterministic runtime safety solutions for AI agents, enforcing contracts on agent procedures in milliseconds with zero LLM cost.
  • superagent-ai/superagent - Superagent is an open-source SDK providing safety features for AI applications, including prompt injection detection, PII redaction, repository scanning for threats, and red teaming capabilities fo...
  • Tencent/AI-Infra-Guard - AI-Infra-Guard is a full-stack AI red teaming platform providing comprehensive security analysis, vulnerability scanning, and jailbreak evaluation for AI ecosystems and LLMs.
  • toby-bridges/api-relay-audit - A local security audit tool for AI API relays and LLM proxies, designed to detect prompt injection, model substitution, tool-call rewriting, and other tampering.
  • ucsandman/DashClaw - DashClaw is an AI agent governance runtime that intercepts actions, enforces guard policies, manages approvals, and produces audit-ready decision trails for AI agents interacting with real systems.
  • wuyoscar/Internal-Safety-Collapse - ISC-Bench is a research project and benchmark for evaluating the "Internal Safety Collapse" vulnerability in LLMs, where models bypass safety controls when completing complex tasks.

↑ Back to TOC

Model Serving & Inference

Model Serving Frameworks

  • ajndkr/lanarky - Lanarky is a Python web framework built on FastAPI, specifically designed for creating LLM-powered microservices with native streaming support for HTTP and WebSockets.
  • basetenlabs/truss - Truss is a CLI tool and framework for packaging, deploying, and serving AI/ML models in production, handling containerization, dependency management, and GPU configuration.
  • bentoml/BentoDiffusion - BentoDiffusion offers example projects for self-hosting and deploying various diffusion models using BentoML for image and video generation through text prompts.
  • bentoml/BentoML - BentoML is an open-source framework for building, shipping, and scaling AI applications, providing tools to serve AI/ML models as production-ready API endpoints.
  • bentoml/OpenLLM - OpenLLM allows developers to self-host and run any open-source or custom LLMs as OpenAI-compatible API endpoints in the cloud, streamlining deployment and serving.
  • Bessouat40/RAGLight - RAGLight is a modular Python framework designed for Retrieval-Augmented Generation (RAG), offering flexible integration with various LLMs, embeddings, and vector stores, and supporting agentic RAG ...
  • BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU - A CPU-based inference API for YOLOv3 and YOLOv4 object detection models, designed for easy deployment via Docker and Docker Swarm with RESTful endpoints.
  • BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU - A GPU-accelerated REST API for real-time object detection inference using YOLOv3 and YOLOv4 Darknet models, deployable via Docker or Docker Swarm.
  • containers/podman-desktop-extension-ai-lab - A Podman Desktop extension for local LLM experimentation and application development using containerized inference servers and a recipe catalog.
  • containers/ramalama - RamaLama is an open-source tool that simplifies local AI model serving for inference using OCI containers, abstracting hardware complexities and enabling container-centric development for AI.
  • dphnAI/aphrodite-engine - Aphrodite Engine is a large-scale LLM inference engine built on vLLM's Paged Attention technology, optimizing the serving of HuggingFace-compatible models with high performance and support for vari...
  • efeslab/Nanoflow - Nanoflow is a high-performance, throughput-oriented serving framework for Large Language Models (LLMs) that utilizes intra-device parallelism and asynchronous CPU scheduling.
  • eightBEC/fastapi-ml-skeleton - A FastAPI skeleton application designed to speed up the deployment and serving of machine learning models in production.
  • GetSoloTech/solo-cli - Fast CLI for deploying and serving AI models, especially for physical AI and robotics, optimized for edge and on-device operations.
  • golf-mcp/golf - Golf is a Python framework for building and deploying AI agent servers, providing infrastructure for authentication, observability, and managing tools, prompts, and resources.
  • jina-ai/serve - A framework for building and deploying cloud-native AI services, with native support for ML frameworks, high-performance serving, LLM streaming, and Kubernetes/Docker Compose deployment.
  • jundot/omlx - oMLX is an LLM inference server optimized for Apple Silicon, offering continuous batching, tiered KV caching, and a macOS menu bar interface for managing models locally.
  • kserve/kserve - KServe is a standardized, distributed platform for serving generative and predictive AI models on Kubernetes, providing scalable deployment and management.
  • Michael-A-Kuykendall/shimmy - Shimmy is a pure-Rust WebGPU inference engine providing OpenAI-compatible endpoints for local GGUF models, featuring Airframe engine and TurboShimmy INT4 KV cache compression for efficient GPU util...
  • ModelTC/LightLLM - LightLLM is a lightweight, scalable, and high-performance Python-based framework specifically designed for Large Language Model (LLM) inference and serving.
  • mosecorg/mosec - Mosec is a high-performance, Rust and Python-based ML model serving framework that provides dynamic batching, pipelined stages, and CPU/GPU support for efficient online inference.
  • msoedov/langcorn - LangCorn is an API server that facilitates serving LangChain LLM applications and agents using FastAPI, simplifying deployment and providing a robust inference solution.
  • openinfer-project/openinfer - openinfer is a pure Rust and CUDA LLM inference engine designed for serving frontier-scale models without Python framework runtimes, offering high performance and low resource footprint.
  • openvinotoolkit/model_server - OpenVINO Model Server is a high-performance serving system for AI/ML models, optimized for OpenVINO and Intel architectures, offering efficient model inference via gRPC or REST APIs including OpenA...
  • pipeless-ai/pipeless - Pipeless is an open-source framework for building and deploying real-time computer vision applications, managing multimedia pipelines, model inference, and multi-stream processing.
  • predibase/lorax - LoRAX is an inference server designed to efficiently serve thousands of fine-tuned LLMs using LoRA adapters on a single GPU, optimizing cost, throughput, and latency.
  • roboflow/inference - Roboflow Inference is a self-hostable server for deploying and managing computer vision models and AI workflows on edge devices or other infrastructure, supporting various models and vision tasks.
  • ServerlessLLM/ServerlessLLM - ServerlessLLM is a system for efficiently serving, multiplexing, and fine-tuning large language models on shared GPUs with ultra-fast model loading and an OpenAI-compatible API.
  • superduper-io/superduper - SuperDuperDB is an end-to-end framework for building AI applications and agents by integrating AI models directly into databases, facilitating inference, RAG, and AI agent orchestration.
  • superlinked/sie - Superlinked Inference Engine (SIE) is an open-source inference server that unifies the serving of over 85 pre-configured models for embeddings, reranking, and extraction, supporting seamless deploy...
  • Tejas-TA/predikit - Predikit bridges traditional ML models (scikit-learn, XGBoost) with AI agents by automatically generating LLM-callable tools with typed I/O and OpenAI function schemas, simplifying model integration.
  • thu-pacman/chitu - Chitu is a high-performance inference framework for large language models, designed for efficiency, flexibility, and availability across various hardware platforms.
  • underneathall/pinferencia - Pinferencia is a lightweight Python library for deploying machine learning models as inference servers with auto-generated GUI and REST APIs, supporting various ML frameworks and Kserve API.
  • vllm-project/vllm - vLLM is a high-throughput and memory-efficient serving and inference engine for large language models, featuring PagedAttention, continuous batching, and extensive hardware and model support.
  • vllm-project/vllm-omni - vLLM-Omni is an extension of vLLM designed for efficient serving and inference of omni-modality AI models, encompassing text, image, video, and audio data processing.

↑ Back to TOC

Inference Optimization

  • Adlik/Adlik - Adlik is an end-to-end framework for optimizing and accelerating deep learning inference across cloud, edge, and device environments.
  • AI-Hypercomputer/JetStream - JetStream is an optimized engine for large language model (LLM) inference on XLA devices, primarily TPUs, focusing on throughput and memory efficiency.
  • aiptimizer/TurboOCR - TurboOCR is a high-performance, GPU-accelerated OCR server designed for fast and accurate text extraction from images and PDFs, leveraging TensorRT and PP-OCRv5.
  • alibaba/rtp-llm - RTP-LLM is Alibaba's high-performance inference engine for Large Language Models, designed to accelerate the serving of diverse LLM applications in production environments.
  • brontoguana/krasis - Krasis is a hybrid LLM runtime focused on efficiently running large Mixture-of-Experts models on consumer-grade NVIDIA GPUs with limited VRAM.
  • chengzeyi/ParaAttention - ParaAttention accelerates Diffusion Transformer (DiT) model inference through context parallel attention and dynamic caching, supporting Ulysses and Ring-style parallelism.
  • EfficientMoE/MoE-Infinity - MoE-Infinity is a PyTorch library for cost-effective, fast, and easy serving of Mixture-of-Experts (MoE) Large Language Models, optimizing inference on memory-constrained GPUs.
  • insight-platform/Savant - Savant is an open-source framework for building high-performance, real-time multimedia AI applications, specifically computer vision and video analytics pipelines, on Nvidia hardware for both edge ...
  • intel/xFasterTransformer - xFasterTransformer is an optimized inference solution for Large Language Models on Intel Xeon platforms, leveraging hardware capabilities for high performance and scalability.
  • interestingLSY/swiftLLM - SwiftLLM is a compact, high-performance LLM inference system designed for research, offering vLLM-equivalent performance with a significantly smaller codebase for easy understanding and modification.
  • jd-opensource/xllm - xLLM is an efficient inference engine optimized for Chinese AI accelerators, providing high-performance, low-latency deployment for large language models within enterprise settings.
  • kossisoroyce/timber - Timber is an AOT compiler that transforms classical ML models (XGBoost, LightGBM, scikit-learn, CatBoost, ONNX) into native C99 inference code for extremely fast, portable, and low-overhead model s...
  • msnh2012/Msnhnet - Msnhnet is a lightweight, C++ inference framework for deploying PyTorch models, supporting various architectures like YOLO and ResNet on CPU and GPU with optimizations for embedded devices.
  • nobodywho-ooo/nobodywho - NobodyWho is an efficient, on-device inference engine that enables local execution of LLMs and SLMs across various platforms without requiring API keys.
  • ome-projects/ome - OME (Open Model Engine) is a Kubernetes operator designed for enterprise-grade management, deployment, and serving of Large Language Models (LLMs), optimizing resource utilization and supporting va...
  • open-compress/claw-compactor - Claw Compactor is an LLM token compression engine that uses a 14-stage Fusion Pipeline for content-aware, reversible compression to reduce LLM inference costs and optimize context windows.
  • ovg-project/kvcached - kvcached is a KV cache library that introduces virtual memory abstraction for LLM serving on shared GPUs, enabling elastic and demand-driven KV cache allocation for improved GPU utilization under d...
  • PaddlePaddle/Paddle.js - Paddle.js is a browser-based deep learning inference engine for Baidu PaddlePaddle, enabling model loading and execution directly in web environments with WebGL, WebGPU, and WebAssembly support.
  • PrithivirajDamodaran/FlashRank - FlashRank is an ultra-lite and super-fast Python library designed to re-rank search results in RAG pipelines using state-of-the-art LLMs and cross-encoders without requiring PyTorch or Hugging Face...
  • psmarter/mini-infer - mini-infer is a large language model (LLM) inference engine built from scratch, featuring optimized techniques like paged KV cache, continuous batching, chunked prefill, and speculative decoding fo...
  • qualcomm/ai-hub-models - Qualcomm AI Hub Models provides pre-optimized machine learning models for efficient deployment and inference on Qualcomm hardware, offering tools for compilation, quantization, profiling, and runni...
  • raketenkater/llm-server - An intelligent launcher and server for GGUF LLMs, automating multi-GPU tensor-splitting, MoE expert placement, hardware-matched downloads, and performance tuning for optimal inference.
  • SearchSavior/OpenArc - OpenArc is an inference engine for Intel devices, enabling the serving of various AI models like LLMs, VLMs, Whisper, and embedding models via OpenAI-compatible endpoints with OpenVINO acceleration.
  • ShannonAI/service-streamer - Service Streamer is middleware that optimizes deep learning model inference by batching discrete web requests into mini-batches, significantly boosting GPU utilization and overall system performanc...
  • siliconflow/onediff - OneDiff is an acceleration library for diffusion models, providing out-of-the-box performance optimizations for popular UIs and libraries like Hugging Face Diffusers and ComfyUI.
  • StarlightSearch/EmbedAnything - EmbedAnything is a highly performant, modular, and memory-safe Rust-based pipeline for generating multimodal embeddings and streaming them to vector databases, supporting various sources and infere...
  • Tencent/FeatherCNN - FeatherCNN is a high-performance, lightweight inference engine for convolutional neural networks, specifically optimized for ARM CPUs on mobile and embedded devices.
  • Tencent/Forward - Forward is a high-performance deep learning inference acceleration framework developed by Tencent, leveraging TensorRT for optimized deployment of models on NVIDIA GPUs with support for major frame...
  • vllm-project/vllm-ascend - vLLM Ascend is a community-maintained hardware plugin for seamlessly running vLLM and various large language models on Ascend NPUs, optimizing inference performance.
  • youssofal/MTPLX - MTPLX is a macOS-native inference engine for Apple Silicon that leverages multi-token prediction (MTP) for accelerated local LLM serving, offering significant speed improvements.
  • zhihu/ZhiLight - ZhiLight is a highly optimized LLM inference acceleration engine for Llama and its variants, developed by Zhihu and ModelBest Inc., designed for efficient deployment on various NVIDIA GPUs.
  • zilliztech/GPTCache - GPTCache is a library that creates a semantic cache for LLM queries, significantly reducing API costs and improving response times by storing and reusing previous LLM responses.

↑ Back to TOC

Vector Databases & Retrieval Infrastructure

  • activeloopai/deeplake - Deep Lake is an AI Data Runtime that provides a serverless multimodal datalake with integrated vector search, optimizing data storage and streaming for AI/ML applications and agentic RAG.
  • airweave-ai/airweave - Airweave is an open-source context retrieval layer for AI agents and RAG systems, unifying data from various sources into an LLM-friendly search interface.
  • alibaba/zvec - Zvec is an open-source, lightweight, and lightning-fast in-process vector database designed for embedding directly into applications, providing low-latency and scalable similarity search.
  • Anush008/fastembed-rs - FastEmbed-rs is a Rust library providing efficient local generation of text and image vector embeddings and document reranking for AI applications and RAG systems.
  • astronomer/ask-astro - Ask Astro is an open-source reference implementation of an LLM application architecture, providing a Q&A interface for Airflow and Astronomer, utilizing RAG, prompt orchestration, and feedback loops.
  • caura-ai/caura-memclaw - MemClaw is an open-source, governed, shared memory system for multi-tenant, multi-agent AI fleets, designed for scalable and efficient knowledge transfer and retrieval.
  • CaviraOSS/OpenMemory - A self-hosted, local-first cognitive memory engine for LLMs and AI agents, offering multi-sector memory, temporal reasoning, and explainable recall instead of just vector retrieval.
  • christopherkarani/Wax - Wax is a Swift-native memory engine for AI agents, offering on-device, single-file storage for documents, embeddings, and structured knowledge with blazing-fast RAG on Apple Silicon.
  • ClaudioDrews/memory-os - A 7-layer memory operating system for LLM agents like Hermes, providing persistent, context-aware memory using Qdrant for vector storage and various other mechanisms for structured facts, session r...
  • Corpus-OS/corpusos - Corpus OS provides a wire-first, vendor-neutral protocol suite and SDK for standardizing LLM, Embedding, Vector, and Graph infrastructure for AI frameworks.
  • danny-avila/rag_api - ID-based RAG FastAPI: An asynchronous and scalable FastAPI service for document indexing and retrieval using Langchain and PostgreSQL/pgvector, designed for targeted queries on a file level.
  • datastax/jvector - JVector is an advanced, embedded, graph-based approximate nearest neighbor (ANN) vector search engine for Java, optimized for large-scale, high-dimensional data retrieval.
  • devflowinc/trieve - Trieve is an all-in-one platform for semantic search, recommendations, and Retrieval-Augmented Generation (RAG) offered via API, featuring self-hosting, hybrid search, and bring-your-own-model capa...
  • different-ai/embedbase - Embedbase is an AI backend-as-a-service that provides a dead-simple API for LLM interaction and semantic search through hosted embeddings, supporting various LLM providers.
  • dingodb/dingo - DingoDB is a distributed multi-modal vector database offering unified SQL (MySQL-compatible) for structured and unstructured data, ensuring high concurrency and ultra-low latency.
  • divagr18/memlayer - Memlayer provides a plug-and-play, persistent memory layer for LLMs and AI agents, enabling intelligent context recall and knowledge extraction through hybrid vector and graph storage.
  • EmbeddedLLM/JamAIBase - JamAI Base is an open-source RAG backend platform with an intuitive spreadsheet-like UI, offering built-in LLM, vector embeddings, and reranker orchestration for AI application development.
  • endee-io/endee - Endee is a high-performance open-source vector database designed for AI search and retrieval workloads, supporting RAG, semantic search, and hybrid retrieval with optimized indexing and execution.
  • epsilla-cloud/vectordb - Epsilla is a high-performance, open-source vector database management system focused on scalable and cost-effective similarity search for embedding vectors.
  • getmetal/motorhead - Motorhead is an LLM memory and information retrieval server that provides API endpoints for managing conversational memory, summarization, and retrieval-augmented generation (RAG) through vector si...
  • giancarloerra/SocratiCode - SocratiCode is an open-source codebase context engine that provides deep semantic understanding of entire codebases for AI assistants, enabling hybrid search, dependency graphs, and impact analysis.
  • HelixDB/helix-db - HelixDB is a graph-vector database built in Rust, designed to unify data types like graph, vector, KV, document, and relational data for AI applications and knowledge graphs.
  • hora-search/hora - Hora is an efficient, Rust-based library offering a collection of approximate nearest neighbor search algorithms for high-performance similarity search.
  • infiniflow/infinity - Infinity is an AI-native database designed for LLM applications, offering incredibly fast hybrid search across dense vectors, sparse vectors, tensors, and full-text.
  • jina-ai/vectordb - A Pythonic vector database for efficient storage and retrieval of embeddings, leveraging DocArray and Jina for scalable solutions locally or in the cloud.
  • kantord/SeaGOAT - SeaGOAT is a local-first semantic code search engine that uses vector embeddings to enable natural language queries and regular expressions across your codebase without external API calls.
  • kelindar/search - A Go library for embedded vector search and semantic embeddings, using llama.cpp and GGUF BERT models, suitable for small to medium-scale applications with GPU acceleration.
  • lancedb/lancedb - LanceDB is an open-source, embedded, and cloud-native vector database designed for fast, scalable, and production-ready multimodal vector search, built on the Lance columnar format.
  • llm-tools/embedJs - EmbedJs is a Node.js RAG framework for building personalized LLM applications by segmenting data, generating embeddings, and integrating with vector databases for optimized retrieval.
  • mage0535/hermes-memory-installer - An agent-agnostic memory sidecar that provides persistent memory, layered recall, and knowledge graphing for AI coding agents, integrating with existing systems without modifying agent internals.
  • memvid/memvid - Memvid is a portable, single-file memory layer for AI agents, offering instant retrieval and long-term memory without needing complex RAG pipelines or server-based vector databases.
  • microsoft/SPTAG - SPTAG is a distributed approximate nearest neighbor (ANN) search library by Microsoft for large-scale vector search scenarios, offering high-quality vector index building, searching, and distribute...
  • milvus-io/milvus - Milvus is a high-performance, cloud-native vector database designed for scalable vector Approximate Nearest Neighbor (ANN) search, efficiently organizing and searching vast amounts of unstructured ...
  • milvus-io/pymilvus - Python SDK for Milvus, an open-source vector database designed for AI applications, enabling seamless interaction for vector storage and similarity search.
  • myscale/MyScaleDB - MyScaleDB is a SQL vector database built on ClickHouse, designed for high-performance vector search, filtered search, and full-text search in scalable AI applications.
  • neuml/txtai - txtai is an all-in-one AI framework providing an embeddings database for semantic search and LLM orchestration capabilities like RAG and agentic workflows.
  • NeumTry/NeumAI - Neum AI is a data platform for managing large-scale vector embedding creation and synchronization to provide context for LLMs through Retrieval Augmented Generation (RAG).
  • nuclia/nucliadb - NucliaDB is an AI search database for unstructured data, built for Retrieval Augmented Generation (RAG), offering hybrid search with vector, full-text, and graph indexes.
  • oramasearch/orama - Orama is a JavaScript search engine providing full-text, vector, and hybrid search capabilities, designed for use in browsers, servers, or edge networks, and supporting RAG pipelines.
  • orneryd/NornicDB - NornicDB is a distributed graph and vector database with temporal MVCC, offering Neo4j Bolt/Cypher and Qdrant gRPC compatibility, designed for AI-native workloads like Graph-RAG and agent memory.
  • pathwaycom/llm-app - Ready-to-run cloud templates for building real-time RAG, AI pipelines, and enterprise search applications that synchronize with various live data sources.
  • paulpierre/markdown-crawler - A multithreaded web crawler that converts web pages into markdown files, specifically designed to preprocess data for LLM RAG applications.
  • philippgille/chromem-go - Chromem-go is an embeddable vector database for Go, offering a Chroma-like interface with zero third-party dependencies, designed for in-memory operation with optional persistence.
  • pixeltable/pixeltable - Pixeltable is a unified multimodal backend that integrates data storage, model execution, embedding indexing, and serving for AI data applications.
  • postgresml/postgresml - PostgresML is a PostgreSQL extension that integrates machine learning and AI capabilities directly into the database, enabling in-database inference, RAG pipelines, and vector search with GPU accel...
  • qdrant/qdrant - Qdrant is an open-source, high-performance vector similarity search engine and vector database designed specifically for AI applications, enabling fast storage, search, and management of vectors wi...
  • qdrant/qdrant-client - Python client library for the Qdrant vector search engine, facilitating interaction with Qdrant instances for vector storage, search, and remote inference capabilities.
  • rapidsai/cuvs - cuVS is a GPU-accelerated library providing state-of-the-art algorithms for vector similarity search and clustering, designed to simplify GPU usage in AI and data mining applications.
  • Restream/reindexer - Reindexer is an embeddable, in-memory, document-oriented database offering high-performance full-text search, k-nearest neighbors (KNN) search, and hybrid search capabilities.
  • run-llama/llama_index - LlamaIndex is an open-source data framework for building LLM applications by connecting custom data sources to large language models, focusing on data ingestion, indexing, and retrieval augmented g...
  • RyanCodrai/turbovec - TurboVec is a Rust-based approximate nearest neighbor (ANN) vector index with Python bindings, built on Google Research's TurboQuant algorithm for efficient, memory-optimized vector similarity search.
  • SeekStorm/SeekStorm - SeekStorm is a high-performance, Rust-native search engine offering sub-millisecond vector and lexical search capabilities as an in-process library and multi-tenancy server.
  • StarTrail-org/LEANN - LEANN is an innovative, lightweight, and private vector database designed for personal AI, enabling RAG applications with significantly reduced storage requirements by recomputing embeddings on-dem...
  • supervc-stack/VectorChord - VectorChord is a PostgreSQL extension designed for scalable, high-performance, and cost-effective vector search, enabling efficient storage and retrieval of billions of vectors.
  • tantaraio/voy - Voy is a WASM-based vector similarity search engine implemented in Rust, optimized for fast, tiny, and tree-shakable nearest neighbor search on edge servers and in web applications.
  • Tencent/WeKnora - WeKnora is an open-source, LLM-powered knowledge framework for enterprise document understanding, semantic retrieval, and autonomous reasoning, featuring RAG, ReAct agents, and an auto-maintaining ...
  • tensorchord/pgvecto.rs - pgvecto.rs is a PostgreSQL extension written in Rust, purpose-built for scalable, low-latency, and hybrid-enabled vector similarity search directly within Postgres.
  • topoteretes/cognee - Cognee is an open-source AI memory platform that provides AI agents with persistent long-term memory through a self-hosted knowledge graph, combining vector embeddings and graph reasoning.
  • unum-cloud/USearch - USearch is a high-performance, compact, and broadly compatible single-file similarity search and clustering engine for vectors and texts, primarily focused on user-defined metrics with minimal depe...
  • unum-cloud/UStore - UStore is a multi-modal transactional database designed for AI and semantic search, featuring vector-search integration and APIs for various data types.
  • vearch/vearch - Vearch is a cloud-native distributed vector database designed for efficient similarity search of embedding vectors in AI applications, offering hybrid search, performance, scalability, and reliabil...
  • VectifyAI/PageIndex - PageIndex is a vectorless, reasoning-based RAG system that builds hierarchical tree indexes from documents and uses LLMs to reason over them for context-aware retrieval.
  • verygoodplugins/automem - AutoMem is a graph-vector memory service providing durable, relational, and context-aware long-term memory for AI assistants using a dual-storage layer of FalkorDB and Qdrant.
  • vespa-engine/vespa - Vespa is an AI search platform for serving and organizing vectors, tensors, text, and structured data, enabling real-time inference and retrieval at any scale.
  • weaviate/weaviate - Weaviate is an open-source, cloud-native vector database for semantic search, combining vector similarity search with keyword filtering, RAG, and reranking capabilities.
  • yoanbernabeu/grepai - grepai is a privacy-first CLI for semantic code search, enabling AI agents and developers to find relevant code by intent using vector embeddings, drastically reducing token usage.
  • zilliztech/attu - Attu is an AI-native GUI for managing Milvus vector databases, offering multi-cluster management, data exploration, vector search, an AI assistant, and monitoring tools.
  • zilliztech/claude-context - Claude Context provides semantic code search for AI coding agents, allowing them to access the entire codebase as context in a cost-effective manner using a vector database.
  • zilliztech/deep-searcher - DeepSearcher is an open-source tool that combines LLMs and vector databases to perform deep research, evaluation, and reasoning on private data, generating accurate answers and comprehensive reports.
  • zilliztech/VectorDBBench - VectorDBBench is a comprehensive benchmark tool for evaluating the performance and cost-effectiveness of various vector databases and cloud services across diverse scenarios.
  • Zleap-AI/SAG - SAG is an out-of-the-box document retrieval workbench based on the SAG RAG technique, offering a conversational interface, knowledge graph visualization, and advanced retrieval functionalities for ...

↑ Back to TOC

AI Orchestration & Deployment

Workflow Orchestration for AI

  • 0xPlaygrounds/rig - Rig is a Rust library designed for building modular and scalable LLM-powered applications, offering a unified interface for multiple model providers and vector stores.
  • AgentEra/Agently - Agently is an AI application runtime framework for building reliable GenAI services, focusing on stable contracts, observable execution, and restart-safe workflow boundaries for LLM applications.
  • apache/burr - Apache Burr is a Python framework for building and operating stateful AI applications like chatbots and agents, providing state management, tracing, and persistence.
  • bosun-ai/swiftide - Swiftide is an opinionated Rust framework for building LLM applications, offering an agent harness, typed task graphs for orchestration, and streaming RAG pipelines for indexing and querying.
  • chatchat-space/LangGraph-Chatchat - LangGraph-Chatchat is an open-source, offline-deployable RAG and Agent application built with LangGraph, supporting open-source LLMs and vector databases for knowledge-based Q&A.
  • cheshire-cat-ai/core - Cheshire Cat is an open-source framework for building and operating custom AI agents and conversational layers, featuring an API-first design, RAG integration, plugin extensibility, and support for...
  • ComposioHQ/composio - Composio provides SDKs for building AI agents capable of interacting with over 1000 tools, offering capabilities like tool search, context management, and authentication across various AI frameworks.
  • covibes/zeroshot - Zeroshot is an open-source CLI that orchestrates multi-agent AI coding workflows to autonomously implement, review, test, and verify code changes for software engineering tasks.
  • dataelement/bisheng - BISHENG is an open LLM application DevOps platform focusing on enterprise scenarios, providing comprehensive features for GenAI workflows, RAG, agents, model management, evaluation, and enterprise-...
  • dynamiq-ai/dynamiq - Dynamiq is an orchestration framework for developing and streamlining agentic AI and LLM applications, specializing in RAG and multi-agent workflows.
  • framerslab/agentos - AgentOS is a TypeScript framework for building AI agents with cognitive memory, runtime tool forging, multi-agent orchestration, and support for 11 LLM providers.
  • GoogleCloudPlatform/agent-starter-pack - A Python package providing production-ready templates for GenAI agents on Google Cloud, focusing on infrastructure, CI/CD, observability, and security.
  • Haohao-end/openagent - Full-stack platform that empowers teams to build, orchestrate, publish, and operate AI applications with visual workflows, dataset management, and multi-model support.
  • holon-run/holon - Holon is a local workbench that provides a continuous, event-driven execution environment for AI agents, allowing them to perform tasks that span multiple sessions and human interactions.
  • iflytek/astron-agent - Astron Agent is an enterprise-grade platform for building, orchestrating, and deploying AI agent applications with integrated RPA, model management, and high-availability features.
  • jaylfc/taOS - taOS is a self-hosted, distributed AI agent operating system designed for consumer hardware, providing a web desktop, app store, agent deployment, and a framework-agnostic AI memory system for orch...
  • Josh-XT/AGiXT - AGiXT is a comprehensive AI automation platform that orchestrates instruction management and complex task execution across diverse AI providers, featuring adaptive memory and a versatile plugin sys...
  • julep-ai/julep - Julep is an open-source platform for building, orchestrating, and self-hosting agent-based AI workflows with persistent memory and tool integration.
  • LiteLLM-Labs/litellm-agent-control-plane - LiteLLM Agent Control Plane provides a unified interface and management for deploying, running, and orchestrating various AI agents across different runtimes.
  • memodb-io/Acontext - Acontext enhances AI agent learning by automatically capturing successful agent interactions as "skill memory" in human-readable Markdown files, facilitating reuse and introspection.
  • neurocult/agency - Agency is a Go library designed for building autonomous AI agents and generative AI applications with a clean, idiomatic Go approach, simplifying interactions with LLMs and other generative AI models.
  • operand/agency - Agency is a minimal Python framework that provides an Actor model for building and operating agent-integrated systems, enabling flexible and scalable communication between AI agents and traditional...
  • plastic-labs/honcho - Honcho is a memory infrastructure for building stateful AI agents, enabling them to understand and retain information about people, agents, groups, projects, and ideas over time.
  • SmythOS/sre - SmythOS is an open-source runtime environment (SRE) and SDK for building, running, and managing production-ready AI agents, providing OS-level abstractions for AI resources.
  • stoyan-stoyanov/llmflows - LLMFlows is a Python framework for building, operating, and debugging transparent LLM applications with explicit control over prompts, LLM calls, and dependencies.
  • TencentCloud/TencentDB-Agent-Memory - TencentDB Agent Memory is a memory management system for AI agents, featuring a 4-tier progressive pipeline to improve efficiency and reasoning by categorizing and condensing agent memories without...
  • ThousandBirdsInc/chidori - Chidori is a reactive runtime and agent framework for building durable, replayable, and resumable AI agents using plain async TypeScript, featuring automatic checkpointing and replay with zero LLM ...
  • TransformerOptimus/SuperAGI - SuperAGI is an open-source framework designed for building, managing, and running autonomous AI agents, offering tools for deployment, capability extension, and performance monitoring.
  • trypromptly/LLMStack - LLMStack is a no-code platform for building and deploying generative AI agents and applications by chaining multiple LLMs, integrating custom data, and connecting to business processes, offering bo...
  • ultracontext/ultracontext - UltraContext provides open-source context infrastructure for AI agents, enabling real-time capture, sharing, and versioning of conversational context across different agents and LLM frameworks.
  • unohee/OpenSwarm - Autonomous AI agent orchestrator that uses multiple LLMs (Claude, GPT, local models) to process Linear issues, perform coding tasks, and maintain long-term memory for repository learning.

↑ Back to TOC

Model Monitoring & Governance

Model & Data Drift Monitoring

  • deepchecks/deepchecks - Deepchecks is an open-source platform providing continuous validation for AI and ML models and data from research to production, focusing on testing, CI, and monitoring.
  • MAIF/eurybia - Eurybia is a Python library for detecting data and model drift, validating data, and generating comprehensive HTML reports for AI governance and model monitoring.
  • NannyML/nannyml - NannyML is an open-source Python library for post-deployment ML model monitoring, offering performance estimation, data drift detection, and intelligent linking of drift alerts to performance changes.
  • squaredev-io/whitebox - Whitebox is an open-source, end-to-end ML monitoring platform with edge capabilities that integrates with Kubernetes, focusing on classification and regression model metrics, data/model drift, and ...

↑ Back to TOC

AI Governance & Compliance

  • kitops-ml/kitops - KitOps is a CNCF open-source tool for packaging, versioning, and securely sharing AI/ML models, datasets, code, and configuration into an OCI Artifact for simplified deployment and governance.
  • semantica-agi/semantica - Semantica is an AI-native knowledge graph intelligence framework that provides an auditable, governed, and explainable context and accountability layer for AI agents and LLM systems.
  • verifywise-ai/verifywise - VerifyWise is an AI governance platform providing tools for LLM evaluation, risk management, and compliance with AI regulations including the EU AI Act, ISO 42001, and NIST AI RMF.

↑ Back to TOC