llama-cpp-agent

Build agents with llama.cpp, vllm and TGI

About

The llama-cpp-agent framework is a tool designed to simplify interactions with Large Language Models (LLMs). It provides an interface for chatting with LLMs, executing function calls, generating structured output, performing retrieval augmented generation, and processing text using agentic chains with tools. The framework integrates seamlessly with the llama.cpp server, llama-cpp-python and OpenAI endpoints that support grammar, offering flexibility and extensibility. Key Features - Simple Chat Interface: Engage in seamless conversations with LLMs. - Structured Output: Generate structured output (objects) from LLMs. - Single and Parallel Function Calling: Execute functions using LLMs. - RAG - Retrieval Augmented Generation: Perform retrieval augmented generation with colbert reranking. - Agent Chains: Process text using agent chains with tools, supporting Conversational, Sequential, and Mapping Chains. - Compatibility: Works with llama-index tools and OpenAI tool schemas. - Flexibility: Suitable for various applications, from casual chatting to specific function executions.