Awesome Infra for AI

LLMOps

LLM Observability & Tracing

Agenta-AI/agenta - Agenta is an open-source LLMOps platform designed to accelerate the development of reliable LLM applications, offering integrated prompt management, evaluation, and observability features.
Arize-ai/openinference - OpenInference provides conventions and instrumentation for OpenTelemetry to enable detailed tracing and observability of AI applications, especially those built with LLMs and agents.
Arize-ai/phoenix - Phoenix is an open-source AI observability platform for LLM application experimentation, evaluation, and troubleshooting, providing tracing, evaluation, dataset management, prompt management, and a...
chirpz-ai/pandaprobe - PandaProbe is an open-source agent engineering platform for collaboratively tracing, evaluating, monitoring, and debugging AI agents, with integrations for LangGraph, CrewAI, and other agent SDKs.
cloudshipai/station - Station is an open-source, Git-backed runtime for deploying and orchestrating intelligent multi-agent AI systems on self-hosted infrastructure with built-in evaluation and observability.
comet-ml/opik - Opik is an open-source platform for comprehensive observability, evaluation, and optimization of LLM applications, RAG systems, and agentic workflows.
cyberark/agentwatch - Agentwatch is a platform-agnostic observability framework for monitoring AI agent interactions, LLM calls, and tool usage across various AI development frameworks, providing real-time insights and ...
deepsense-ai/ragbits - Ragbits is a toolkit for rapid development and operation of GenAI applications, providing building blocks for LLM integration, RAG processing, multi-agent workflows, observability, and testing.
evilmartians/agent-prism - AgentPrism is an open-source library of React components for visualizing traces from AI agents, turning complex OpenTelemetry and Langfuse data into clear, debuggable diagrams.
Helicone/helicone - Helicone is an open-source LLM observability platform and AI gateway that provides monitoring, evaluation, prompt management, and intelligent routing for large language models.
however-yir/knowledgeops-agent - KnowledgeOps Agent is an enterprise-grade Spring AI platform designed for multi-tenant RAG, tool calling, and agent workflow orchestration, featuring robust security, observability, and evaluation ...
Javis603/token-monitor - A real-time desktop widget to monitor token usage, AI limits, and costs across various AI coding tools, featuring multi-device synchronization and historical usage trends.
langfuse/langfuse - Langfuse is an open-source LLM engineering platform for developing, monitoring, evaluating, and debugging AI applications, offering observability, prompt management, and evaluation capabilities.
langfuse/oss-llmops-stack - An open-source, modular LLMOps stack combining LiteLLM for LLM API unification, routing, and cost control, with Langfuse for detailed observability, prompt versioning, and performance evaluation in...
latitude-dev/latitude-llm - Latitude is an open-source AI monitoring platform that provides issue detection, human-aligned evaluations, and agent-native tracing for LLM applications and AI agents.
liaohch3/claude-tap - A local proxy and trace viewer for AI coding agents, capturing and inspecting API traffic to debug agent behavior and analyze prompts, messages, and tool definitions.
lmnr-ai/lmnr - Laminar is an open-source observability platform purpose-built for AI agents, offering tracing, evaluation, AI monitoring, SQL access, dashboards, and data annotation for LLM-based applications.
msfirebird/claw-lens - An open-source, local-first observability dashboard for OpenClaw AI agents, providing cost analytics, live monitoring, deep session inspection, and security auditing.
openlit/openlit - OpenLIT is an open-source platform offering OpenTelemetry-native observability for LLMs, including GPU monitoring, guardrails, evaluations, prompt management, and API key vault, to streamline AI de...
palico-ai/palico-ai - Palico AI is an integrated framework for iterative development, evaluation, and production of LLM applications, offering tools for building, improving performance, and debugging.
pydantic/logfire - Pydantic Logfire is an observability platform for Python applications, providing detailed insights into production systems, including those leveraging LLMs and FastAPI, built on OpenTelemetry.
raga-ai-hub/RagaAI-Catalyst - RagaAI Catalyst is a Python SDK for comprehensive observability, monitoring, and evaluation of AI agents and LLM applications, offering tracing, debugging, and advanced analytics.
Scale3-Labs/langtrace - Langtrace is an open-source, OpenTelemetry-based observability tool providing real-time tracing, evaluations, and metrics for LLM applications, including LLMs, LLM frameworks, and vector databases.
stainlu/hermes-labyrinth - Hermes Labyrinth is a read-only observability plugin for Hermes Agent, visualizing autonomous agent journeys and interactions with prompts, tools, and memory into a navigable map.
traceloop/openllmetry - OpenLLMetry provides open-source observability for LLM applications by extending OpenTelemetry to capture traces and metrics from LLM providers, vector databases, and AI frameworks.
traceloop/openllmetry-js - OpenLLMetry-JS provides open-source observability for LLM applications in JavaScript/TypeScript, built on OpenTelemetry to trace interactions with LLM providers and vector databases.
traceroot-ai/traceroot - TraceRoot is an open-source observability and self-healing platform for AI agents, providing tracing, AI-powered debugging, and detectors for production issues like hallucinations and tool failures.
VasiHemanth/tokentelemetry - TokenTelemetry is a 100% local, open-source observability dashboard for AI coding and autonomous agents, tracking token usage, costs, tool calls, and session traces.
vllora/vllora - vLLora is a lightweight, real-time debugging and observability tool for AI agents, providing tracing and analysis of LLM interactions via an OpenAI-compatible API.
VoltAgent/voltagent - VoltAgent is an end-to-end AI Agent Engineering Platform offering an open-source TypeScript framework for building intelligent agents and a VoltOps Console for observability, automation, deployment...

↑ Back to TOC

LLM Evaluation & Testing

alphadl/AdaRubrics - AdaRubric offers task-adaptive rubrics and dense reward signals for evaluating LLM agent trajectories, enhancing evaluation reliability and reward learning.
athina-ai/athina-evals - A Python SDK offering 50+ preset and custom evaluations for LLM-generated responses, integrating with the Athina IDE for experimentation and dataset comparison.
confident-ai/deepeval - DeepEval is an open-source LLM evaluation framework, offering a variety of metrics and tools for assessing the performance of AI agents, RAG pipelines, and chatbots through unit testing.
coze-dev/coze-loop - Cozeloop is an open-source platform offering full-lifecycle management for AI agents, encompassing development, debugging, evaluation, and monitoring.
cvs-health/langfair - LangFair is a Python library for conducting use-case level bias and fairness assessments of large language models (LLMs) by allowing users to bring their own prompts for evaluation.
cvs-health/uqlm - UQLM is a Python library for detecting and mitigating hallucination in Large Language Model (LLM) outputs using uncertainty quantification techniques.
cyberark/FuzzyAI - FuzzyAI is an automated LLM fuzzing tool designed to identify and mitigate potential jailbreaks and security vulnerabilities in LLM APIs.
darkrishabh/agent-skills-eval - A test runner for Agent Skills that evaluates the effectiveness of AI agent skills by comparing model performance with and without a skill, using a judge model for grading.
evidentlyai/evidently - Evidently is an open-source Python framework for evaluating, testing, and monitoring ML and LLM systems, providing comprehensive data and model quality checks from experiments to production.
EvolvingLMMs-Lab/lmms-eval - LMMs-Eval is a unified, reproducible, and efficient evaluation toolkit for multimodal large language models (LMMs) across diverse tasks like text, image, video, and audio.
future-agi/future-agi - Future AGI is an open-source, end-to-end platform for evaluating, observing, simulating, and protecting LLM and AI agent applications, offering tracing, evals, guardrails, and a performant gateway.
GiovanniPasq/chunky - Chunky is an open-source toolkit for preparing documents for Retrieval Augmented Generation (RAG) pipelines, offering PDF-to-Markdown conversion, cleaning, chunk inspection, and chunking strategy c...
Giskard-AI/giskard-oss - Giskard is an open-source Python library for testing and evaluating agentic systems and LLM applications, offering tools for scenario-based testing, red teaming, and vulnerability scanning.
hegelai/prompttools - PromptTools provides open-source utilities for experimenting with, testing, and evaluating prompts, LLMs, and vector databases through code, notebooks, and a local playground.
ianarawjo/ChainForge - ChainForge is an open-source visual programming environment designed for battle-testing, comparing, and evaluating prompts and LLM responses across different models and settings.
ifixai-ai/iFixAi - iFixAi is a diagnostic tool that evaluates AI models and agents for operational misalignment, including fabrication, manipulation, deception, unpredictability, and opacity, by running up to 45 insp...
iMeanAI/WebCanvas - WebCanvas is an open-source framework for building, training, and evaluating LLM-based web agents in dynamic, real-time online environments.
JinjieNi/MixEval - MixEval is a dynamic, ground-truth-based benchmark and evaluation suite for large language and multimodal models.
juanjuandog/FinSight-AI - FinSight AI is an open-source AI equity research agent that develops evidence-grounded reports with resilient workflow orchestration, RAG evaluation, and comprehensive backend infrastructure.
JudgmentLabs/judgeval - Judgeval is an open-source Python SDK enabling continuous improvement for AI agents through OpenTelemetry-based tracing, agent-judge evaluations, and online monitoring of LLM-powered applications.
langwatch/better-agents - Better Agents is a CLI tool and set of standards for building, testing, and collaborating on AI agents, integrating with various frameworks and coding assistants for production readiness.
langwatch/langwatch - LangWatch is a platform for end-to-end LLM evaluations, AI agent testing, and observability, offering tools for simulations, performance monitoring, prompt optimization, and an AI gateway for gover...
LeoYeAI/myclaw-bench - MyClaw Bench provides a comprehensive benchmark for evaluating AI agents on OpenClaw, featuring 45 tasks across four difficulty tiers with a focus on real-world outcomes and complex reasoning.
LLAMATOR-Core/llamator - LLAMATOR is a Python framework for red teaming and security testing of chatbots, Generative AI systems, LLMs, RAGs, Agents, and Vision Language Models (VLMs) against various attacks and vulnerabili...
Marker-Inc-Korea/AutoRAG - AutoRAG is an open-source framework designed to automate the evaluation and optimization of Retrieval-Augmented Generation (RAG) pipelines using an AutoML-style approach for specific datasets.
msoedov/agentic_security - Agentic Security is an open-source vulnerability scanner and AI red teaming kit designed to test Large Language Models (LLMs) and agent workflows against jailbreaks, fuzzing, and multimodal attacks.
NVIDIA/garak - Garak is an open-source LLM vulnerability scanner designed to red-team and assess generative AI models for weaknesses like hallucination, data leakage, prompt injection, and toxicity.
onyx-dot-app/EnterpriseRAG-Bench - EnterpriseRAG-Bench offers a benchmark dataset and evaluation framework for RAG systems using realistic company internal documents and a diverse set of questions.
PacificAI/langtest - LangTest is an open-source library for testing and evaluating Large Language Models and NLP models for various quality aspects like robustness, bias, fairness, and accuracy.
plurai-ai/intellagent - IntellAgent evaluates and optimizes conversational AI agents through simulated, realistic synthetic interactions to uncover failure points and improve performance.
prometheus-eval/prometheus-eval - Prometheus-Eval is a framework and a collection of open-source LLM judges designed for evaluating the quality of LLM responses in generation tasks, supporting both absolute grading and pairwise ran...
promptfoo/promptfoo - Promptfoo is a CLI and library for evaluating LLM applications, offering automated testing, red teaming, and vulnerability scanning for prompts, models, agents, and RAGs.
Raudaschl/rag-fusion - RaG-Fusion enhances RAG via multi-query generation and Reciprocal Rank Fusion to improve retrieval, especially for term mismatches, including an evaluation harness with NFCorpus/BEIR.
relari-ai/continuous-eval - continuous-eval is an open-source framework for data-driven, modular evaluation of LLM-powered applications, offering a comprehensive metric library and probabilistic evaluation capabilities.
rhesis-ai/rhesis - Rhesis is an open-source collaborative testing platform for LLM and agentic applications, providing AI-powered test generation, conversation simulation, adversarial testing, and comprehensive evalu...
superlinear-ai/raglite - RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) that provides configurable components for LLMs, vector databases, and rerankers, with optimized strategies for chunking, retriev...
TIGER-AI-Lab/ClawBench - ClawBench is an open-source benchmark for evaluating AI browser agents on a diverse set of everyday online tasks across live websites, measuring end-to-end task success.
truera/trulens - TruLens is an open-source framework for systematically evaluating and tracking LLM applications and AI agents, providing fine-grained instrumentation and comprehensive feedback functions.
uptrain-ai/uptrain - UpTrain is an open-source platform providing evaluation and monitoring for Generative AI applications, offering preconfigured checks, root cause analysis, and production monitoring for LLMs.
vectara/open-rag-eval - An open-source Python toolkit for evaluating Retrieval-Augmented Generation (RAG) pipelines, offering flexible metrics and connectors without requiring golden answers.
vibrantlabsai/ragas - Ragas is an evaluation framework for LLM applications that provides objective metrics, test data generation, and feedback loops for continuous improvement.
ZhangJinHaHaHa/AgentLens - AgentLens is a decentralized marketplace and infrastructure for AI Agents, providing verifiable proof of capabilities, security, and track record using on-chain auditing, TEE attestation, and ZK pr...

↑ Back to TOC

Prompt Management

agentmark-ai/agentmark - AgentMark is an open-source platform for defining, managing, and evaluating AI agent prompts, datasets, and traces directly within a Git repository using OpenTelemetry for observability.
austin-starks/Promptimizer - An automated, AI-powered framework that uses genetic algorithms and machine learning to optimize LLM prompts, illustrated with an AI-driven stock screening example.
BoundaryML/baml - BAML (Basically a Made-up Language) is an AI framework that facilitates reliable AI workflows and agents by transforming prompt engineering into schema engineering for structured output generation.
bujue3709/GPT-Conversation-Toolkit - A browser extension for ChatGPT web platform, offering conversation management, export, search, prompt management, and timeline navigation features to enhance user experience.
dot-agent/nextpy - Nextpy is a framework for building self-modifying software, focusing on guardrails, structured outputs, a powerful prompt engine for pre-compiling and session state, and optimized code generation f...
genkit-ai/genkit - Genkit is an open-source framework by Google for building and operating AI-powered applications across multiple languages, featuring unified APIs for various models, structured outputs, multi-modal...
langfuse/mcp-server-langfuse - A Model Context Protocol (MCP) server that integrates with Langfuse to provide prompt discovery, retrieval, and management capabilities.
lastmile-ai/aiconfig - AIConfig is an open-source framework for building and managing generative AI applications by version-controlling prompts, models, and parameters as JSON-serializable configurations.
microsoft/aici - AICI (Artificial Intelligence Controller Interface) allows building Wasm-based controllers to constrain and direct LLM output in real time, enabling advanced generation strategies.
microsoft/prompty - Prompty is a markdown-based file format (.prompty) and runtime for creating, managing, and executing LLM prompts, providing tools for development, previewing, and tracing.
minipuft/claude-prompts - A prompt template server for Claude, enabling hot-reload, thinking frameworks, and quality gates for crafting reusable prompts and orchestrating agentic workflows.
neuron-core/neuron-ai - Neuron is a PHP framework for building and orchestrating AI agents, supporting LLM integration, prompt management, RAG, multi-agent workflows, and observability.
Open-Source-Legal/OpenContracts - OpenContracts is an open-source document intelligence platform that processes unstructured documents into a programmable citation graph using AI agents, structured extraction, and a Model Context P...
patterns-ai-core/langchainrb - Langchain.rb provides a Ruby interface for building LLM-powered applications, offering a unified API for various LLM providers, prompt management, and RAG capabilities.
pezzolabs/pezzo - Pezzo is an open-source, cloud-native LLMOps platform for streamlined prompt design, version management, instant delivery, collaboration, troubleshooting, and observability of AI operations.
SynaLinks/synalinks - SynaLinks is an open-source neuro-symbolic framework for creating, training, evaluating, and deploying advanced LLM-based applications like RAGs, autonomous agents, and self-evolving reasoning syst...

↑ Back to TOC

LLM Gateways & Proxies

adaline/gateway - Adaline Gateway is a fully local, production-grade SDK providing a unified interface for calling over 300+ LLMs with built-in features like batching, retries, caching, callbacks, and OpenTelemetry ...
agentgateway/agentgateway - Agentgateway is an open-source proxy offering unified connectivity and governance for AI agents and LLM providers, encompassing security, observability, and advanced traffic management features.
ai-forever/gpt2giga - A FastAPI proxy that translates OpenAI- and Anthropic-compatible API requests to the GigaChat API, enabling seamless integration of GigaChat with existing LLM applications.
apache/apisix - Apache APISIX is a dynamic, real-time, high-performance API Gateway that can also function as an AI Gateway, providing AI proxying, load balancing for LLMs, and robust security for AI agents.
atopos31/llmio - LLMIO is a Go-based LLM load-balancing gateway providing a unified API, weighted scheduling, observability, and an admin UI for managing various LLM providers.
BerriAI/litellm - LiteLLM is an open-source AI Gateway and Python SDK providing a unified interface to over 100 LLM providers, with features like cost tracking, guardrails, load balancing, and observability.
bestruirui/octopus - Octopus is a self-hosted LLM API aggregation and load balancing service that provides a unified gateway for multiple LLM providers, intelligent routing, and analytics for cost and usage tracking.
bionic-gpt/bionic-gpt - Bionic is an on-premise, secure, and scalable LLM gateway and RAG platform designed to replace ChatGPT while maintaining data confidentiality and offering advanced features like AI assistants, toke...
bitrouter/bitrouter - BitRouter is an open-source, local-first LLM router built in Rust that optimizes AI agent performance and cost by dynamically routing requests to the most appropriate LLM, supporting multiple provi...
caidaoli/ccLoad - ccLoad is an AI API gateway that provides smart routing, automatic failover, exponential cooldown, multi-URL scheduling, real-time monitoring, and cost control for various LLM APIs.
casdoor/casdoor - Casdoor is an open-source, "AI-first" Identity and Access Management (IAM) and Model Context Protocol (MCP) gateway, providing authentication and authorization for AI applications and agents.
Chleba/ollamaMQ - ollamaMQ is a high-performance, asynchronous proxy and load balancer for Ollama and LM Studio APIs, providing multi-backend load balancing, fair-share queuing, model-aware routing, and a real-time ...
coaidev/coai - CoAI.Dev is a next-generation, multi-tenant LLM gateway and AIGC solution offering unified API access, load balancing, cost management, and various AI application features for over 200 models from ...
dataiku/kiji-proxy - Kiji Privacy Proxy is an intelligent privacy layer for AI APIs that automatically detects and masks personally identifiable information (PII) in requests to AI services.
decolua/9router - 9Router is an AI router and token saver that connects various AI coding tools to over 40 AI providers, optimizing usage with auto-fallback, quota tracking, and token compression.
diegosouzapw/OmniRoute - OmniRoute is a free AI gateway that unifies access to over 170 AI providers, offering token compression, auto-fallback, and aggregating free tiers to provide billions of free tokens monthly.
dwgx/WindsurfAPI - WindsurfAPI is an OpenAI and Anthropic compatible API proxy that translates requests to Windsurf's internal gRPC protocol, providing access to over 100 LLM models with account pooling, rate limitin...
ENTERPILOT/GoModel - GoModel is a fast, lightweight AI gateway written in Go, providing a unified OpenAI-compatible API for various LLM providers with observability, guardrails, streaming, and cost tracking.
envoyproxy/ai-gateway - Envoy AI Gateway leverages Envoy Gateway to manage and optimize request traffic flows to various Generative AI services and self-hosted models, offering a tiered gateway approach for robust AI infr...
Fast-Editor/Lynkr - Lynkr is an HTTP proxy CLI tool designed to optimize interactions with LLMs, particularly for AI coding assistants, by compressing tokens, managing caching, and routing requests for efficiency and ...
ferro-labs/ai-gateway - Ferro Labs AI Gateway is a high-performance Go-native LLM gateway for routing requests across 30+ providers with features like caching, guardrails, A/B testing, and cost controls.
guanxiaol/WindsurfPoolAPI - WindsurfPoolAPI is an enterprise-grade multi-account pool proxy for the Windsurf AI platform, supporting over 113 models via OpenAI and Anthropic APIs with features like load balancing and token an...
higress-group/higress - Higress is a cloud-native AI gateway based on Istio and Envoy, providing unified management, observability, and traffic control for LLM APIs and Model Context Protocol (MCP) servers.
intentee/paddler - Paddler is an open-source LLM/VLM load balancer and serving platform for self-hosting and scaling models, built around llama.cpp for efficient inference with dynamic model swapping and observability.
kaitranntt/ccs - CCS is a multi-provider profile and runtime manager for various AI models and APIs, enabling seamless switching between Claude, Gemini, Copilot, OpenRouter, and local models without configuration o...
katanemo/plano - Plano is an AI-native proxy and data plane for agentic applications, providing built-in orchestration, safety, observability, and intelligent LLM routing to simplify the production deployment of AI...
Kenza-AI/sagify - Sagify simplifies LLM and ML model deployment, management, and inference on AWS SageMaker, featuring an LLM Gateway for unified access to various large language models.
Kong/kong - Kong Gateway is a cloud-native API and AI gateway offering high performance, extensibility via plugins, and advanced AI traffic capabilities including multi-LLM support, semantic security, and cach...
LeenHawk/gproxy - GPROXY is a Rust-based, high-performance, multi-provider LLM proxy server that unifies OpenAI, Claude, and Gemini-style APIs, offering multi-tenant authorization, rate limiting, quota management, a...
maximhq/bifrost - Bifrost is a high-performance AI gateway that unifies access to over 23 providers through a single OpenAI-compatible API, offering features like automatic failover, load balancing, semantic caching...
Mintplex-Labs/anything-llm - AnythingLLM is an all-in-one local-first AI application for chatting with documents, managing AI agents, and integrating with various LLMs and vector databases, offering dynamic model routing and m...
mnfst/manifest - Manifest is a sophisticated LLM gateway and router designed to optimize AI application costs and performance by intelligently routing queries to the most suitable LLM provider and model.
NadirRouter/NadirClaw - NadirClaw is an open-source LLM router and AI cost optimizer that intelligently routes prompts to different language models based on complexity, reducing API costs by 40-70% through an OpenAI-compa...
Nayjest/lm-proxy - LM-Proxy is a lightweight, OpenAI-compatible HTTP LLM proxy/gateway for multi-provider inference (Google, Anthropic, OpenAI, PyTorch), supporting real-time streaming, API key management, and dynami...
nextlevelbuilder/goclaw - GoClaw is a multi-tenant AI agent platform and gateway built in Go, enabling the deployment and orchestration of AI agent teams with extensive LLM provider support, sophisticated memory management,...
Nexus-Router/nexus - Nexus is an AI gateway that unifies access to multiple LLM providers and Model Context Protocol (MCP) servers, offering robust routing, security, and governance for AI stacks.
nyroway/nyro - Nyro is a self-hosted AI gateway that translates protocols between different AI tools and model providers, enabling interoperability and flexible model routing without code changes.
octelium/octelium - Octelium is a self-hosted zero-trust secure access platform operating as a ZTNA, VPN, API/AI/MCP gateway, and PaaS, with specific features for AI/LLM gateway functionality.
open-bias/open-bias - Open Bias is an open-source reliability harness that acts as a proxy between applications and LLM providers to enforce runtime rules and policies, preventing off-policy behavior.
packyme/privacy-filter - Privacy Filter is a Go-based LLM privacy gateway that redacts PII and secrets from text with millisecond latency, specifically designed to ensure data privacy before reaching large language models.
PAIArtCom/Clipal - Clipal is a local LLM API gateway and reverse proxy designed for developer productivity, offering unified access, failover, and key management for AI coding assistants like Claude Code, Codex CLI, ...
peva3/SmarterRouter - SmarterRouter is an intelligent LLM gateway and VRAM-aware router that profiles models, aggregates benchmarks, and automatically routes queries to the best available LLM, supporting local and exter...
Portkey-AI/gateway - Portkey AI Gateway is a fast, open-source AI gateway designed for routing requests to over 1,600 LLMs, featuring integrated guardrails, automatic retries, and load balancing for reliable and secure...
QuantumNous/new-api - new-api is a unified LLM gateway and AI asset management system that enables aggregation, distribution, and cross-conversion of various LLMs into OpenAI, Claude, or Gemini compatible formats, offer...
reshaprio/reshapr - reShapr is an open-source, no-code MCP Server that transforms traditional REST, GraphQL, and gRPC APIs into LLM-friendly tools, optimizing context windows and enabling AI-native API access.
romgX/openrelay - OpenRelay is an AI model router and proxy that unifies numerous free and paid AI model quotas into a single local endpoint, enabling their use across various AI tools and IDEs.
schmitech/orbit - ORBIT is a self-hosted AI gateway and retrieval-adapter layer designed for private, multi-model RAG applications, offering secure inference, data retrieval, and agentic tool-calling capabilities.
starbaser/ccproxy - ccproxy is a CLI-based transparent network interceptor and proxy for LLM clients, enabling cross-provider routing, request/response transformation, and custom hooks for various large language models.
taichuy/1flowbase - 1flowbase is an open-source virtual model gateway that allows users to build multi-model workflows, publish them as OpenAI/Claude-compatible endpoints, and gain visibility into trace, token, latenc...
theopenco/llmgateway - LLM Gateway is an open-source API gateway for Large Language Models, providing unified access, API key management, usage analytics, multi-provider routing, and performance monitoring for various LL...
ThinkWatchProject/ThinkWatch - ThinkWatch is an enterprise-grade AI gateway for secure, audited, and governed access to AI APIs and Multi-Cloud Provider (MCP) tools, providing unified proxying, RBAC, rate limiting, and cost trac...
thushan/olla - Olla is a high-performance, lightweight proxy and load balancer for LLM infrastructure, providing intelligent routing, automatic failover, and unified model discovery across diverse inference backe...
TPIsoftwareOSPO/digiRunner-Open-Source - digiRunner is an enterprise-grade API Gateway that acts as a unified control plane for both microservices and AI services, providing governance, cost control, and prompt management for LLMs.
traceloop/hub - Traceloop Hub is a high-performance, OpenTelemetry-based LLM gateway written in Rust, centralizing control and tracing of LLM calls across multiple providers with built-in observability.
vllm-project/semantic-router - vLLM Semantic Router is an intelligent routing system designed for managing and orchestrating diverse AI/ML models (mixture-of-models) across various environments, focusing on efficiency, safety, a...
voidmind-io/voidllm - VoidLLM is a privacy-first, self-hosted LLM proxy and AI gateway designed for teams, offering features like load balancing, multi-provider routing, API key management, usage tracking, and rate limi...
w8123/EnterpriseAgentFramework - EnterpriseAgentFramework is a Java/Spring Boot platform for registering, governing, orchestrating, and exposing enterprise APIs as AI capabilities for agents, focusing on production-grade AI operat...
Writesonic/GPTRouter - GPTRouter is an AI model gateway for managing multiple LLMs and image models, providing universal API access, smart fallbacks, automatic retries, and reduced latency for reliable AI application per...

↑ Back to TOC

AI Safety & Guardrails

agentcontrol/agent-control - Agent Control provides a centralized control plane for enforcing runtime guardrails and safety policies for AI agents, blocking prompt injections, PII leakage, and other risks.
cuga-project/cuga-agent - CUGA is an open-source generalist agent harness for enterprises, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aw...
deadbits/vigil-llm - Vigil is a security scanner for LLM prompts and responses, designed to detect prompt injections, jailbreaks, and other adversarial attacks using various scanning methods like vector databases, YARA...
Justin0504/Aegis - Aegis is a pre-execution firewall for AI agents, providing runtime policy enforcement, cryptographic audit trails, human-in-the-loop approvals, and a kill switch without code changes.
microsoft/agent-governance-toolkit - AI Agent Governance Toolkit (AGT) provides policy enforcement, identity management, execution sandboxing, and reliability engineering to secure autonomous AI agents in production.
microsoft/presidio - Presidio is an open-source framework by Microsoft for detecting, redacting, masking, and anonymizing sensitive data (PII/PHI) across text, images, and structured data, suitable as an AI safety guar...
NVIDIA-NeMo/Guardrails - NVIDIA NeMo Guardrails is an open-source toolkit for adding programmable guardrails to LLM-based conversational applications, focusing on safety, security, and controlled dialog.
pegasi-ai/reins - Reins provides security controls for AI agents by enforcing deterministic policies, scanning for vulnerabilities, tracking drift with an immutable audit trail, and intervening on risky actions.
privacera/paig - PAIG (Privacera AI Guardrails) is an open-source framework designed to protect Generative AI applications by ensuring security, safety, and observability for responsible AI deployment.
protectai/llm-guard - LLM Guard is a comprehensive open-source security toolkit designed to fortify Large Language Model (LLM) interactions by providing robust sanitization, malicious content detection, data leakage pre...
SponsioLabs/Sponsio - Sponsio provides deterministic runtime safety solutions for AI agents, enforcing contracts on agent procedures in milliseconds with zero LLM cost.
superagent-ai/superagent - Superagent is an open-source SDK providing safety features for AI applications, including prompt injection detection, PII redaction, repository scanning for threats, and red teaming capabilities fo...
Tencent/AI-Infra-Guard - AI-Infra-Guard is a full-stack AI red teaming platform providing comprehensive security analysis, vulnerability scanning, and jailbreak evaluation for AI ecosystems and LLMs.
toby-bridges/api-relay-audit - A local security audit tool for AI API relays and LLM proxies, designed to detect prompt injection, model substitution, tool-call rewriting, and other tampering.
ucsandman/DashClaw - DashClaw is an AI agent governance runtime that intercepts actions, enforces guard policies, manages approvals, and produces audit-ready decision trails for AI agents interacting with real systems.
wuyoscar/Internal-Safety-Collapse - ISC-Bench is a research project and benchmark for evaluating the "Internal Safety Collapse" vulnerability in LLMs, where models bypass safety controls when completing complex tasks.

↑ Back to TOC

Model Serving & Inference

Model Serving Frameworks

ajndkr/lanarky - Lanarky is a Python web framework built on FastAPI, specifically designed for creating LLM-powered microservices with native streaming support for HTTP and WebSockets.
basetenlabs/truss - Truss is a CLI tool and framework for packaging, deploying, and serving AI/ML models in production, handling containerization, dependency management, and GPU configuration.
bentoml/BentoDiffusion - BentoDiffusion offers example projects for self-hosting and deploying various diffusion models using BentoML for image and video generation through text prompts.
bentoml/BentoML - BentoML is an open-source framework for building, shipping, and scaling AI applications, providing tools to serve AI/ML models as production-ready API endpoints.
bentoml/OpenLLM - OpenLLM allows developers to self-host and run any open-source or custom LLMs as OpenAI-compatible API endpoints in the cloud, streamlining deployment and serving.
Bessouat40/RAGLight - RAGLight is a modular Python framework designed for Retrieval-Augmented Generation (RAG), offering flexible integration with various LLMs, embeddings, and vector stores, and supporting agentic RAG ...
BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU - A CPU-based inference API for YOLOv3 and YOLOv4 object detection models, designed for easy deployment via Docker and Docker Swarm with RESTful endpoints.
BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU - A GPU-accelerated REST API for real-time object detection inference using YOLOv3 and YOLOv4 Darknet models, deployable via Docker or Docker Swarm.
containers/podman-desktop-extension-ai-lab - A Podman Desktop extension for local LLM experimentation and application development using containerized inference servers and a recipe catalog.
containers/ramalama - RamaLama is an open-source tool that simplifies local AI model serving for inference using OCI containers, abstracting hardware complexities and enabling container-centric development for AI.
dphnAI/aphrodite-engine - Aphrodite Engine is a large-scale LLM inference engine built on vLLM's Paged Attention technology, optimizing the serving of HuggingFace-compatible models with high performance and support for vari...
efeslab/Nanoflow - Nanoflow is a high-performance, throughput-oriented serving framework for Large Language Models (LLMs) that utilizes intra-device parallelism and asynchronous CPU scheduling.
eightBEC/fastapi-ml-skeleton - A FastAPI skeleton application designed to speed up the deployment and serving of machine learning models in production.
GetSoloTech/solo-cli - Fast CLI for deploying and serving AI models, especially for physical AI and robotics, optimized for edge and on-device operations.
golf-mcp/golf - Golf is a Python framework for building and deploying AI agent servers, providing infrastructure for authentication, observability, and managing tools, prompts, and resources.
jina-ai/serve - A framework for building and deploying cloud-native AI services, with native support for ML frameworks, high-performance serving, LLM streaming, and Kubernetes/Docker Compose deployment.
jundot/omlx - oMLX is an LLM inference server optimized for Apple Silicon, offering continuous batching, tiered KV caching, and a macOS menu bar interface for managing models locally.
kserve/kserve - KServe is a standardized, distributed platform for serving generative and predictive AI models on Kubernetes, providing scalable deployment and management.
Michael-A-Kuykendall/shimmy - Shimmy is a pure-Rust WebGPU inference engine providing OpenAI-compatible endpoints for local GGUF models, featuring Airframe engine and TurboShimmy INT4 KV cache compression for efficient GPU util...
ModelTC/LightLLM - LightLLM is a lightweight, scalable, and high-performance Python-based framework specifically designed for Large Language Model (LLM) inference and serving.
mosecorg/mosec - Mosec is a high-performance, Rust and Python-based ML model serving framework that provides dynamic batching, pipelined stages, and CPU/GPU support for efficient online inference.
msoedov/langcorn - LangCorn is an API server that facilitates serving LangChain LLM applications and agents using FastAPI, simplifying deployment and providing a robust inference solution.
openinfer-project/openinfer - openinfer is a pure Rust and CUDA LLM inference engine designed for serving frontier-scale models without Python framework runtimes, offering high performance and low resource footprint.
openvinotoolkit/model_server - OpenVINO Model Server is a high-performance serving system for AI/ML models, optimized for OpenVINO and Intel architectures, offering efficient model inference via gRPC or REST APIs including OpenA...
pipeless-ai/pipeless - Pipeless is an open-source framework for building and deploying real-time computer vision applications, managing multimedia pipelines, model inference, and multi-stream processing.
predibase/lorax - LoRAX is an inference server designed to efficiently serve thousands of fine-tuned LLMs using LoRA adapters on a single GPU, optimizing cost, throughput, and latency.
roboflow/inference - Roboflow Inference is a self-hostable server for deploying and managing computer vision models and AI workflows on edge devices or other infrastructure, supporting various models and vision tasks.
ServerlessLLM/ServerlessLLM - ServerlessLLM is a system for efficiently serving, multiplexing, and fine-tuning large language models on shared GPUs with ultra-fast model loading and an OpenAI-compatible API.
superduper-io/superduper - SuperDuperDB is an end-to-end framework for building AI applications and agents by integrating AI models directly into databases, facilitating inference, RAG, and AI agent orchestration.
superlinked/sie - Superlinked Inference Engine (SIE) is an open-source inference server that unifies the serving of over 85 pre-configured models for embeddings, reranking, and extraction, supporting seamless deploy...
Tejas-TA/predikit - Predikit bridges traditional ML models (scikit-learn, XGBoost) with AI agents by automatically generating LLM-callable tools with typed I/O and OpenAI function schemas, simplifying model integration.
thu-pacman/chitu - Chitu is a high-performance inference framework for large language models, designed for efficiency, flexibility, and availability across various hardware platforms.
underneathall/pinferencia - Pinferencia is a lightweight Python library for deploying machine learning models as inference servers with auto-generated GUI and REST APIs, supporting various ML frameworks and Kserve API.
vllm-project/vllm - vLLM is a high-throughput and memory-efficient serving and inference engine for large language models, featuring PagedAttention, continuous batching, and extensive hardware and model support.
vllm-project/vllm-omni - vLLM-Omni is an extension of vLLM designed for efficient serving and inference of omni-modality AI models, encompassing text, image, video, and audio data processing.

↑ Back to TOC

Inference Optimization

Adlik/Adlik - Adlik is an end-to-end framework for optimizing and accelerating deep learning inference across cloud, edge, and device environments.
AI-Hypercomputer/JetStream - JetStream is an optimized engine for large language model (LLM) inference on XLA devices, primarily TPUs, focusing on throughput and memory efficiency.
aiptimizer/TurboOCR - TurboOCR is a high-performance, GPU-accelerated OCR server designed for fast and accurate text extraction from images and PDFs, leveraging TensorRT and PP-OCRv5.
alibaba/rtp-llm - RTP-LLM is Alibaba's high-performance inference engine for Large Language Models, designed to accelerate the serving of diverse LLM applications in production environments.
brontoguana/krasis - Krasis is a hybrid LLM runtime focused on efficiently running large Mixture-of-Experts models on consumer-grade NVIDIA GPUs with limited VRAM.
chengzeyi/ParaAttention - ParaAttention accelerates Diffusion Transformer (DiT) model inference through context parallel attention and dynamic caching, supporting Ulysses and Ring-style parallelism.
EfficientMoE/MoE-Infinity - MoE-Infinity is a PyTorch library for cost-effective, fast, and easy serving of Mixture-of-Experts (MoE) Large Language Models, optimizing inference on memory-constrained GPUs.
insight-platform/Savant - Savant is an open-source framework for building high-performance, real-time multimedia AI applications, specifically computer vision and video analytics pipelines, on Nvidia hardware for both edge ...
intel/xFasterTransformer - xFasterTransformer is an optimized inference solution for Large Language Models on Intel Xeon platforms, leveraging hardware capabilities for high performance and scalability.
interestingLSY/swiftLLM - SwiftLLM is a compact, high-performance LLM inference system designed for research, offering vLLM-equivalent performance with a significantly smaller codebase for easy understanding and modification.
jd-opensource/xllm - xLLM is an efficient inference engine optimized for Chinese AI accelerators, providing high-performance, low-latency deployment for large language models within enterprise settings.
kossisoroyce/timber - Timber is an AOT compiler that transforms classical ML models (XGBoost, LightGBM, scikit-learn, CatBoost, ONNX) into native C99 inference code for extremely fast, portable, and low-overhead model s...
msnh2012/Msnhnet - Msnhnet is a lightweight, C++ inference framework for deploying PyTorch models, supporting various architectures like YOLO and ResNet on CPU and GPU with optimizations for embedded devices.
nobodywho-ooo/nobodywho - NobodyWho is an efficient, on-device inference engine that enables local execution of LLMs and SLMs across various platforms without requiring API keys.
ome-projects/ome - OME (Open Model Engine) is a Kubernetes operator designed for enterprise-grade management, deployment, and serving of Large Language Models (LLMs), optimizing resource utilization and supporting va...
open-compress/claw-compactor - Claw Compactor is an LLM token compression engine that uses a 14-stage Fusion Pipeline for content-aware, reversible compression to reduce LLM inference costs and optimize context windows.
ovg-project/kvcached - kvcached is a KV cache library that introduces virtual memory abstraction for LLM serving on shared GPUs, enabling elastic and demand-driven KV cache allocation for improved GPU utilization under d...
PaddlePaddle/Paddle.js - Paddle.js is a browser-based deep learning inference engine for Baidu PaddlePaddle, enabling model loading and execution directly in web environments with WebGL, WebGPU, and WebAssembly support.
PrithivirajDamodaran/FlashRank - FlashRank is an ultra-lite and super-fast Python library designed to re-rank search results in RAG pipelines using state-of-the-art LLMs and cross-encoders without requiring PyTorch or Hugging Face...
psmarter/mini-infer - mini-infer is a large language model (LLM) inference engine built from scratch, featuring optimized techniques like paged KV cache, continuous batching, chunked prefill, and speculative decoding fo...
qualcomm/ai-hub-models - Qualcomm AI Hub Models provides pre-optimized machine learning models for efficient deployment and inference on Qualcomm hardware, offering tools for compilation, quantization, profiling, and runni...
raketenkater/llm-server - An intelligent launcher and server for GGUF LLMs, automating multi-GPU tensor-splitting, MoE expert placement, hardware-matched downloads, and performance tuning for optimal inference.
SearchSavior/OpenArc - OpenArc is an inference engine for Intel devices, enabling the serving of various AI models like LLMs, VLMs, Whisper, and embedding models via OpenAI-compatible endpoints with OpenVINO acceleration.
ShannonAI/service-streamer - Service Streamer is middleware that optimizes deep learning model inference by batching discrete web requests into mini-batches, significantly boosting GPU utilization and overall system performanc...
siliconflow/onediff - OneDiff is an acceleration library for diffusion models, providing out-of-the-box performance optimizations for popular UIs and libraries like Hugging Face Diffusers and ComfyUI.
StarlightSearch/EmbedAnything - EmbedAnything is a highly performant, modular, and memory-safe Rust-based pipeline for generating multimodal embeddings and streaming them to vector databases, supporting various sources and infere...
Tencent/FeatherCNN - FeatherCNN is a high-performance, lightweight inference engine for convolutional neural networks, specifically optimized for ARM CPUs on mobile and embedded devices.
Tencent/Forward - Forward is a high-performance deep learning inference acceleration framework developed by Tencent, leveraging TensorRT for optimized deployment of models on NVIDIA GPUs with support for major frame...
vllm-project/vllm-ascend - vLLM Ascend is a community-maintained hardware plugin for seamlessly running vLLM and various large language models on Ascend NPUs, optimizing inference performance.
youssofal/MTPLX - MTPLX is a macOS-native inference engine for Apple Silicon that leverages multi-token prediction (MTP) for accelerated local LLM serving, offering significant speed improvements.
zhihu/ZhiLight - ZhiLight is a highly optimized LLM inference acceleration engine for Llama and its variants, developed by Zhihu and ModelBest Inc., designed for efficient deployment on various NVIDIA GPUs.
zilliztech/GPTCache - GPTCache is a library that creates a semantic cache for LLM queries, significantly reducing API costs and improving response times by storing and reusing previous LLM responses.

↑ Back to TOC

Vector Databases & Retrieval Infrastructure

activeloopai/deeplake - Deep Lake is an AI Data Runtime that provides a serverless multimodal datalake with integrated vector search, optimizing data storage and streaming for AI/ML applications and agentic RAG.
airweave-ai/airweave - Airweave is an open-source context retrieval layer for AI agents and RAG systems, unifying data from various sources into an LLM-friendly search interface.
alibaba/zvec - Zvec is an open-source, lightweight, and lightning-fast in-process vector database designed for embedding directly into applications, providing low-latency and scalable similarity search.
Anush008/fastembed-rs - FastEmbed-rs is a Rust library providing efficient local generation of text and image vector embeddings and document reranking for AI applications and RAG systems.
astronomer/ask-astro - Ask Astro is an open-source reference implementation of an LLM application architecture, providing a Q&A interface for Airflow and Astronomer, utilizing RAG, prompt orchestration, and feedback loops.
caura-ai/caura-memclaw - MemClaw is an open-source, governed, shared memory system for multi-tenant, multi-agent AI fleets, designed for scalable and efficient knowledge transfer and retrieval.
CaviraOSS/OpenMemory - A self-hosted, local-first cognitive memory engine for LLMs and AI agents, offering multi-sector memory, temporal reasoning, and explainable recall instead of just vector retrieval.
christopherkarani/Wax - Wax is a Swift-native memory engine for AI agents, offering on-device, single-file storage for documents, embeddings, and structured knowledge with blazing-fast RAG on Apple Silicon.
ClaudioDrews/memory-os - A 7-layer memory operating system for LLM agents like Hermes, providing persistent, context-aware memory using Qdrant for vector storage and various other mechanisms for structured facts, session r...
Corpus-OS/corpusos - Corpus OS provides a wire-first, vendor-neutral protocol suite and SDK for standardizing LLM, Embedding, Vector, and Graph infrastructure for AI frameworks.
danny-avila/rag_api - ID-based RAG FastAPI: An asynchronous and scalable FastAPI service for document indexing and retrieval using Langchain and PostgreSQL/pgvector, designed for targeted queries on a file level.
datastax/jvector - JVector is an advanced, embedded, graph-based approximate nearest neighbor (ANN) vector search engine for Java, optimized for large-scale, high-dimensional data retrieval.
devflowinc/trieve - Trieve is an all-in-one platform for semantic search, recommendations, and Retrieval-Augmented Generation (RAG) offered via API, featuring self-hosting, hybrid search, and bring-your-own-model capa...
different-ai/embedbase - Embedbase is an AI backend-as-a-service that provides a dead-simple API for LLM interaction and semantic search through hosted embeddings, supporting various LLM providers.
dingodb/dingo - DingoDB is a distributed multi-modal vector database offering unified SQL (MySQL-compatible) for structured and unstructured data, ensuring high concurrency and ultra-low latency.
divagr18/memlayer - Memlayer provides a plug-and-play, persistent memory layer for LLMs and AI agents, enabling intelligent context recall and knowledge extraction through hybrid vector and graph storage.
EmbeddedLLM/JamAIBase - JamAI Base is an open-source RAG backend platform with an intuitive spreadsheet-like UI, offering built-in LLM, vector embeddings, and reranker orchestration for AI application development.
endee-io/endee - Endee is a high-performance open-source vector database designed for AI search and retrieval workloads, supporting RAG, semantic search, and hybrid retrieval with optimized indexing and execution.
epsilla-cloud/vectordb - Epsilla is a high-performance, open-source vector database management system focused on scalable and cost-effective similarity search for embedding vectors.
getmetal/motorhead - Motorhead is an LLM memory and information retrieval server that provides API endpoints for managing conversational memory, summarization, and retrieval-augmented generation (RAG) through vector si...
giancarloerra/SocratiCode - SocratiCode is an open-source codebase context engine that provides deep semantic understanding of entire codebases for AI assistants, enabling hybrid search, dependency graphs, and impact analysis.
HelixDB/helix-db - HelixDB is a graph-vector database built in Rust, designed to unify data types like graph, vector, KV, document, and relational data for AI applications and knowledge graphs.
hora-search/hora - Hora is an efficient, Rust-based library offering a collection of approximate nearest neighbor search algorithms for high-performance similarity search.
infiniflow/infinity - Infinity is an AI-native database designed for LLM applications, offering incredibly fast hybrid search across dense vectors, sparse vectors, tensors, and full-text.
jina-ai/vectordb - A Pythonic vector database for efficient storage and retrieval of embeddings, leveraging DocArray and Jina for scalable solutions locally or in the cloud.
kantord/SeaGOAT - SeaGOAT is a local-first semantic code search engine that uses vector embeddings to enable natural language queries and regular expressions across your codebase without external API calls.
kelindar/search - A Go library for embedded vector search and semantic embeddings, using llama.cpp and GGUF BERT models, suitable for small to medium-scale applications with GPU acceleration.
lancedb/lancedb - LanceDB is an open-source, embedded, and cloud-native vector database designed for fast, scalable, and production-ready multimodal vector search, built on the Lance columnar format.
llm-tools/embedJs - EmbedJs is a Node.js RAG framework for building personalized LLM applications by segmenting data, generating embeddings, and integrating with vector databases for optimized retrieval.
mage0535/hermes-memory-installer - An agent-agnostic memory sidecar that provides persistent memory, layered recall, and knowledge graphing for AI coding agents, integrating with existing systems without modifying agent internals.
memvid/memvid - Memvid is a portable, single-file memory layer for AI agents, offering instant retrieval and long-term memory without needing complex RAG pipelines or server-based vector databases.
microsoft/SPTAG - SPTAG is a distributed approximate nearest neighbor (ANN) search library by Microsoft for large-scale vector search scenarios, offering high-quality vector index building, searching, and distribute...
milvus-io/milvus - Milvus is a high-performance, cloud-native vector database designed for scalable vector Approximate Nearest Neighbor (ANN) search, efficiently organizing and searching vast amounts of unstructured ...
milvus-io/pymilvus - Python SDK for Milvus, an open-source vector database designed for AI applications, enabling seamless interaction for vector storage and similarity search.
myscale/MyScaleDB - MyScaleDB is a SQL vector database built on ClickHouse, designed for high-performance vector search, filtered search, and full-text search in scalable AI applications.
neuml/txtai - txtai is an all-in-one AI framework providing an embeddings database for semantic search and LLM orchestration capabilities like RAG and agentic workflows.
NeumTry/NeumAI - Neum AI is a data platform for managing large-scale vector embedding creation and synchronization to provide context for LLMs through Retrieval Augmented Generation (RAG).
nuclia/nucliadb - NucliaDB is an AI search database for unstructured data, built for Retrieval Augmented Generation (RAG), offering hybrid search with vector, full-text, and graph indexes.
oramasearch/orama - Orama is a JavaScript search engine providing full-text, vector, and hybrid search capabilities, designed for use in browsers, servers, or edge networks, and supporting RAG pipelines.
orneryd/NornicDB - NornicDB is a distributed graph and vector database with temporal MVCC, offering Neo4j Bolt/Cypher and Qdrant gRPC compatibility, designed for AI-native workloads like Graph-RAG and agent memory.
pathwaycom/llm-app - Ready-to-run cloud templates for building real-time RAG, AI pipelines, and enterprise search applications that synchronize with various live data sources.
paulpierre/markdown-crawler - A multithreaded web crawler that converts web pages into markdown files, specifically designed to preprocess data for LLM RAG applications.
philippgille/chromem-go - Chromem-go is an embeddable vector database for Go, offering a Chroma-like interface with zero third-party dependencies, designed for in-memory operation with optional persistence.
pixeltable/pixeltable - Pixeltable is a unified multimodal backend that integrates data storage, model execution, embedding indexing, and serving for AI data applications.
postgresml/postgresml - PostgresML is a PostgreSQL extension that integrates machine learning and AI capabilities directly into the database, enabling in-database inference, RAG pipelines, and vector search with GPU accel...
qdrant/qdrant - Qdrant is an open-source, high-performance vector similarity search engine and vector database designed specifically for AI applications, enabling fast storage, search, and management of vectors wi...
qdrant/qdrant-client - Python client library for the Qdrant vector search engine, facilitating interaction with Qdrant instances for vector storage, search, and remote inference capabilities.
rapidsai/cuvs - cuVS is a GPU-accelerated library providing state-of-the-art algorithms for vector similarity search and clustering, designed to simplify GPU usage in AI and data mining applications.
Restream/reindexer - Reindexer is an embeddable, in-memory, document-oriented database offering high-performance full-text search, k-nearest neighbors (KNN) search, and hybrid search capabilities.
run-llama/llama_index - LlamaIndex is an open-source data framework for building LLM applications by connecting custom data sources to large language models, focusing on data ingestion, indexing, and retrieval augmented g...
RyanCodrai/turbovec - TurboVec is a Rust-based approximate nearest neighbor (ANN) vector index with Python bindings, built on Google Research's TurboQuant algorithm for efficient, memory-optimized vector similarity search.
SeekStorm/SeekStorm - SeekStorm is a high-performance, Rust-native search engine offering sub-millisecond vector and lexical search capabilities as an in-process library and multi-tenancy server.
StarTrail-org/LEANN - LEANN is an innovative, lightweight, and private vector database designed for personal AI, enabling RAG applications with significantly reduced storage requirements by recomputing embeddings on-dem...
supervc-stack/VectorChord - VectorChord is a PostgreSQL extension designed for scalable, high-performance, and cost-effective vector search, enabling efficient storage and retrieval of billions of vectors.
tantaraio/voy - Voy is a WASM-based vector similarity search engine implemented in Rust, optimized for fast, tiny, and tree-shakable nearest neighbor search on edge servers and in web applications.
Tencent/WeKnora - WeKnora is an open-source, LLM-powered knowledge framework for enterprise document understanding, semantic retrieval, and autonomous reasoning, featuring RAG, ReAct agents, and an auto-maintaining ...
tensorchord/pgvecto.rs - pgvecto.rs is a PostgreSQL extension written in Rust, purpose-built for scalable, low-latency, and hybrid-enabled vector similarity search directly within Postgres.
topoteretes/cognee - Cognee is an open-source AI memory platform that provides AI agents with persistent long-term memory through a self-hosted knowledge graph, combining vector embeddings and graph reasoning.
unum-cloud/USearch - USearch is a high-performance, compact, and broadly compatible single-file similarity search and clustering engine for vectors and texts, primarily focused on user-defined metrics with minimal depe...
unum-cloud/UStore - UStore is a multi-modal transactional database designed for AI and semantic search, featuring vector-search integration and APIs for various data types.
vearch/vearch - Vearch is a cloud-native distributed vector database designed for efficient similarity search of embedding vectors in AI applications, offering hybrid search, performance, scalability, and reliabil...
VectifyAI/PageIndex - PageIndex is a vectorless, reasoning-based RAG system that builds hierarchical tree indexes from documents and uses LLMs to reason over them for context-aware retrieval.
verygoodplugins/automem - AutoMem is a graph-vector memory service providing durable, relational, and context-aware long-term memory for AI assistants using a dual-storage layer of FalkorDB and Qdrant.
vespa-engine/vespa - Vespa is an AI search platform for serving and organizing vectors, tensors, text, and structured data, enabling real-time inference and retrieval at any scale.
weaviate/weaviate - Weaviate is an open-source, cloud-native vector database for semantic search, combining vector similarity search with keyword filtering, RAG, and reranking capabilities.
yoanbernabeu/grepai - grepai is a privacy-first CLI for semantic code search, enabling AI agents and developers to find relevant code by intent using vector embeddings, drastically reducing token usage.
zilliztech/attu - Attu is an AI-native GUI for managing Milvus vector databases, offering multi-cluster management, data exploration, vector search, an AI assistant, and monitoring tools.
zilliztech/claude-context - Claude Context provides semantic code search for AI coding agents, allowing them to access the entire codebase as context in a cost-effective manner using a vector database.
zilliztech/deep-searcher - DeepSearcher is an open-source tool that combines LLMs and vector databases to perform deep research, evaluation, and reasoning on private data, generating accurate answers and comprehensive reports.
zilliztech/VectorDBBench - VectorDBBench is a comprehensive benchmark tool for evaluating the performance and cost-effectiveness of various vector databases and cloud services across diverse scenarios.
Zleap-AI/SAG - SAG is an out-of-the-box document retrieval workbench based on the SAG RAG technique, offering a conversational interface, knowledge graph visualization, and advanced retrieval functionalities for ...

↑ Back to TOC

AI Orchestration & Deployment

Workflow Orchestration for AI

0xPlaygrounds/rig - Rig is a Rust library designed for building modular and scalable LLM-powered applications, offering a unified interface for multiple model providers and vector stores.
AgentEra/Agently - Agently is an AI application runtime framework for building reliable GenAI services, focusing on stable contracts, observable execution, and restart-safe workflow boundaries for LLM applications.
apache/burr - Apache Burr is a Python framework for building and operating stateful AI applications like chatbots and agents, providing state management, tracing, and persistence.
bosun-ai/swiftide - Swiftide is an opinionated Rust framework for building LLM applications, offering an agent harness, typed task graphs for orchestration, and streaming RAG pipelines for indexing and querying.
chatchat-space/LangGraph-Chatchat - LangGraph-Chatchat is an open-source, offline-deployable RAG and Agent application built with LangGraph, supporting open-source LLMs and vector databases for knowledge-based Q&A.
cheshire-cat-ai/core - Cheshire Cat is an open-source framework for building and operating custom AI agents and conversational layers, featuring an API-first design, RAG integration, plugin extensibility, and support for...
ComposioHQ/composio - Composio provides SDKs for building AI agents capable of interacting with over 1000 tools, offering capabilities like tool search, context management, and authentication across various AI frameworks.
covibes/zeroshot - Zeroshot is an open-source CLI that orchestrates multi-agent AI coding workflows to autonomously implement, review, test, and verify code changes for software engineering tasks.
dataelement/bisheng - BISHENG is an open LLM application DevOps platform focusing on enterprise scenarios, providing comprehensive features for GenAI workflows, RAG, agents, model management, evaluation, and enterprise-...
dynamiq-ai/dynamiq - Dynamiq is an orchestration framework for developing and streamlining agentic AI and LLM applications, specializing in RAG and multi-agent workflows.
framerslab/agentos - AgentOS is a TypeScript framework for building AI agents with cognitive memory, runtime tool forging, multi-agent orchestration, and support for 11 LLM providers.
GoogleCloudPlatform/agent-starter-pack - A Python package providing production-ready templates for GenAI agents on Google Cloud, focusing on infrastructure, CI/CD, observability, and security.
Haohao-end/openagent - Full-stack platform that empowers teams to build, orchestrate, publish, and operate AI applications with visual workflows, dataset management, and multi-model support.
holon-run/holon - Holon is a local workbench that provides a continuous, event-driven execution environment for AI agents, allowing them to perform tasks that span multiple sessions and human interactions.
iflytek/astron-agent - Astron Agent is an enterprise-grade platform for building, orchestrating, and deploying AI agent applications with integrated RPA, model management, and high-availability features.
jaylfc/taOS - taOS is a self-hosted, distributed AI agent operating system designed for consumer hardware, providing a web desktop, app store, agent deployment, and a framework-agnostic AI memory system for orch...
Josh-XT/AGiXT - AGiXT is a comprehensive AI automation platform that orchestrates instruction management and complex task execution across diverse AI providers, featuring adaptive memory and a versatile plugin sys...
julep-ai/julep - Julep is an open-source platform for building, orchestrating, and self-hosting agent-based AI workflows with persistent memory and tool integration.
LiteLLM-Labs/litellm-agent-control-plane - LiteLLM Agent Control Plane provides a unified interface and management for deploying, running, and orchestrating various AI agents across different runtimes.
memodb-io/Acontext - Acontext enhances AI agent learning by automatically capturing successful agent interactions as "skill memory" in human-readable Markdown files, facilitating reuse and introspection.
neurocult/agency - Agency is a Go library designed for building autonomous AI agents and generative AI applications with a clean, idiomatic Go approach, simplifying interactions with LLMs and other generative AI models.
operand/agency - Agency is a minimal Python framework that provides an Actor model for building and operating agent-integrated systems, enabling flexible and scalable communication between AI agents and traditional...
plastic-labs/honcho - Honcho is a memory infrastructure for building stateful AI agents, enabling them to understand and retain information about people, agents, groups, projects, and ideas over time.
SmythOS/sre - SmythOS is an open-source runtime environment (SRE) and SDK for building, running, and managing production-ready AI agents, providing OS-level abstractions for AI resources.
stoyan-stoyanov/llmflows - LLMFlows is a Python framework for building, operating, and debugging transparent LLM applications with explicit control over prompts, LLM calls, and dependencies.
TencentCloud/TencentDB-Agent-Memory - TencentDB Agent Memory is a memory management system for AI agents, featuring a 4-tier progressive pipeline to improve efficiency and reasoning by categorizing and condensing agent memories without...
ThousandBirdsInc/chidori - Chidori is a reactive runtime and agent framework for building durable, replayable, and resumable AI agents using plain async TypeScript, featuring automatic checkpointing and replay with zero LLM ...
TransformerOptimus/SuperAGI - SuperAGI is an open-source framework designed for building, managing, and running autonomous AI agents, offering tools for deployment, capability extension, and performance monitoring.
trypromptly/LLMStack - LLMStack is a no-code platform for building and deploying generative AI agents and applications by chaining multiple LLMs, integrating custom data, and connecting to business processes, offering bo...
ultracontext/ultracontext - UltraContext provides open-source context infrastructure for AI agents, enabling real-time capture, sharing, and versioning of conversational context across different agents and LLM frameworks.
unohee/OpenSwarm - Autonomous AI agent orchestrator that uses multiple LLMs (Claude, GPT, local models) to process Linear issues, perform coding tasks, and maintain long-term memory for repository learning.

↑ Back to TOC

Model Monitoring & Governance

Model & Data Drift Monitoring

deepchecks/deepchecks - Deepchecks is an open-source platform providing continuous validation for AI and ML models and data from research to production, focusing on testing, CI, and monitoring.
MAIF/eurybia - Eurybia is a Python library for detecting data and model drift, validating data, and generating comprehensive HTML reports for AI governance and model monitoring.
NannyML/nannyml - NannyML is an open-source Python library for post-deployment ML model monitoring, offering performance estimation, data drift detection, and intelligent linking of drift alerts to performance changes.
squaredev-io/whitebox - Whitebox is an open-source, end-to-end ML monitoring platform with edge capabilities that integrates with Kubernetes, focusing on classification and regression model metrics, data/model drift, and ...

↑ Back to TOC

AI Governance & Compliance

kitops-ml/kitops - KitOps is a CNCF open-source tool for packaging, versioning, and securely sharing AI/ML models, datasets, code, and configuration into an OCI Artifact for simplified deployment and governance.
semantica-agi/semantica - Semantica is an AI-native knowledge graph intelligence framework that provides an auditable, governed, and explainable context and accountability layer for AI agents and LLM systems.
verifywise-ai/verifywise - VerifyWise is an AI governance platform providing tools for LLM evaluation, risk management, and compliance with AI regulations including the EU AI Act, ISO 42001, and NIST AI RMF.

↑ Back to TOC