Orchestral AI: A Lightweight Framework for Provider-Agnostic LLM Agents

TL;DR

The Problem. LLM agent frameworks force you to choose: vendor lock-in (Claude SDK, OpenAI Assistants) or sprawling complexity (LangChain's multi-package ecosystem). Both make debugging painful and deployment complicated.
The Solution. Orchestral provides a single Python package with a unified interface across all major providers. Synchronous execution means deterministic behavior and full stack traces. Type hints auto-generate tool schemas.
The Result. One codebase works with OpenAI, Anthropic, Google, Mistral, Groq, AWS Bedrock, and local Ollama models. Already powering research at particle physics labs and exoplanet research teams.
The Safety. Built-in hooks prevent dangerous commands (like rm -rf) and enforce "read-before-edit" policies—critical for autonomous coding agents operating on real codebases.

Research Overview

Every LLM agent framework makes the same promise: simplify building AI agents. But developers face a persistent dilemma.

Option A: Provider-specific SDKs. Anthropic's Claude SDK, OpenAI's Assistants API. Deep integration with one provider, but your code is locked in. Want to compare Claude vs GPT-4 on your task? Rewrite everything.

Option B: General-purpose frameworks. LangChain, CrewAI, AutoGPT. Multi-provider support, but at a cost. LangChain alone splits across langchain-core, langchain-community, and dozens of integration packages. Control flow hides behind abstractions. Debugging means navigating layers of callbacks and state machines.

What is tool calling?

Tool calling (or function calling) lets an LLM invoke external functions during a conversation. The model decides when to call a tool, generates the arguments, and receives the result. This enables agents to search the web, run code, query databases, or interact with any API.

Orchestral takes a different approach: one lightweight Python package that works with every major provider while keeping execution synchronous and debuggable. The Agent object is the only orchestrator. Control flow is explicit. A single pip install gets you started.

The framework powers real research. HEPTAPOD uses it for particle physics simulations at Fermilab. ASTER uses it for exoplanet atmosphere analysis. Both required the combination that Orchestral provides: provider flexibility for cost optimization, reproducible workflows for publication, and a deployment model that embeds into existing research codebases.

The Framework Problem

The LLM agent landscape has fragmented into competing approaches, each with significant tradeoffs.

General-purpose orchestration

LangChain dominates this category with hundreds of integrations across multiple packages. The architecture uses chains for sequential processing and LangGraph for complex workflows. This modularity enables flexibility but introduces complexity. Developers must navigate abstraction layers, understand state machine semantics, and manage dependencies across packages.

Debugging a LangChain agent often means tracing through callbacks, inspecting intermediate states, and understanding how data flows between chain components. For researchers who need to know exactly what code executes and when, this opacity is problematic.

Multi-agent systems

CrewAI focuses on orchestrating role-playing agents that collaborate on tasks. The role-based architecture simplifies multi-agent coordination but reveals limitations in practice: poor logging infrastructure makes debugging difficult, customization options are constrained, and transitioning to production requires implementing monitoring independently.

Provider-specific SDKs

Anthropic's Claude SDK provides deep integration with Claude models, including automatic context compaction and MCP extensibility. The SDK powers Claude Code for IDE integration. But this tight coupling creates constraints: agents become tied to development environments, deployment requires IDE infrastructure, and embedding agents in research projects means working around the IDE-centric architecture.

The commitment to a single provider eliminates the ability to compare models or optimize costs across vendors.

The cost of complexity

AutoGPT pioneered fully autonomous agents with impressive capabilities but a heavy footprint. The split between FastAPI backend, PostgreSQL database, and Next.js frontend creates deployment complexity unsuitable for simple use cases or embedding in research projects.

For scientific computing, these frameworks present particular challenges. Reproducibility demands understanding exactly what code executes. Long-running computations require robust error handling. Publication workflows need documentation and export capabilities. Most frameworks prioritize enterprise deployment over research iteration.

Architecture

Orchestral centers everything on the Agent object. The Agent contains three core components: LLM, Tools, and Context. The framework's modular structure separates provider integration, tool execution, conversation management, and user interfaces.

Orchestral Agent Architecture

Lightweight agent with provider-agnostic LLM, type-safe tools, and hook system

The Agent as orchestrator

The Agent manages the conversation loop:

Receive user message
Send to LLM with available tools
If LLM returns tool calls, execute them through the hook system
Send tool results back to LLM
Repeat until LLM returns a final response

This loop is explicit. No hidden state machines, no callback chains. A debugger can step through every line.

from orchestral import Agent, Claude
 
agent = Agent(
    llm=Claude(model='claude-sonnet-4-0'),
    tools=[analyze_data, search_web],
    system_prompt="You are a research assistant"
)
 
# Simple text interaction
response = agent.send_text_message("Hello")
 
# Full tool-calling loop
response = agent.run(
    "Analyze the dataset and summarize findings",
    max_iterations=8
)

Context management

The Context object manages conversation history and validates message sequences. LLM APIs are strict about tool call and result matching. Missing a tool result or providing results without calls causes cryptic errors. Context automatically detects and cleans orphaned results before API calls.

Conversations serialize to a provider-agnostic format. A conversation saved with Claude can be loaded and continued with GPT-4:

# Save conversation
agent.context.save_json("conversation.json")
 
# Load and continue with different provider
context = Context.load_json("conversation.json")
agent = Agent(
    llm=GPT(model='gpt-4'),
    tools=tools,
    context=context
)

Provider Agnosticism

Orchestral supports all major LLM providers through a unified interface. The LLM object provides an abstract base class that each provider implements.

# Switch providers by changing one line
llm = Claude(model='claude-sonnet-4-0')
# llm = GPT(model='gpt-4')
# llm = Gemini(model='gemini-2.0-flash')
# llm = Ollama(model='llama3.1:70b')
# llm = Groq(model='llama-3.1-70b-versatile')
# llm = Mistral(model='mistral-large')
 
agent = Agent(llm=llm, tools=tools)
# Identical API regardless of provider

Each provider implements the same core methods:

Input Processing — Convert unified Context to provider format
API Calls — Synchronous and streaming invocation
Output Processing — Parse responses to unified Response objects
Tool Schema Conversion — Translate tool definitions to provider format

Provider-specific complexity stays encapsulated. Application code never imports provider-specific classes directly.

The business value is concrete: prototype with free local models (Llama 3 via Ollama), then deploy with GPT-4 or Claude for production—without rewriting a single line of code. No vendor lock-in means you can always switch to whoever offers the best price-performance ratio.

Cost-aware workflows

Different models have different costs. Orchestral's provider abstraction enables cost-conscious research workflows:

# Explore with local model (zero cost)
explorer = Agent(
    llm=Ollama(model='llama3.1:70b'),
    tools=tools
)
explorer.run("Analyze dataset, plan approach")
 
# Switch to powerful model, reusing context
context = explorer.context
reasoner = Agent(
    llm=Claude(model='claude-opus-4-0'),
    tools=tools,
    context=context
)
reasoner.run("Execute the planned analysis")

Use local models for drafts and refinement. Use expensive models for final results. The context transfers seamlessly.

Privacy-first deployments: Using Ollama enables fully offline, air-gapped agents. For sensitive data—medical records, proprietary code, financial documents—no data ever leaves your infrastructure. The same agent code runs on a laptop with no internet connection, critical for regulated industries like healthcare and finance.

Synthetic LLMs

The CheapLLM router automatically selects the cheapest available provider based on configured API keys. Applications deployed to users with varying configurations get model selection without hardcoding providers.

Type-Safe Tools

Tools enable LLMs to invoke external functions. Orchestral provides two approaches for tool definition.

Decorator-based tools

The @define_tool() decorator turns any Python function into a tool:

from orchestral import define_tool
 
@define_tool()
def calculate_energy(mass: float, c: float = 299792458.0):
    """Calculate relativistic energy E=mc²
 
    Args:
        mass: Mass in kilograms
        c: Speed of light in m/s
    """
    return mass * c ** 2

The framework automatically:

Extracts parameter types from Python type hints
Generates JSON schemas compatible with each provider
Validates inputs via Pydantic
Converts between provider-specific formats
Handles execution and error formatting

No handwritten JSON schemas. The docstring becomes the tool description. Type hints become the parameter schema.

Type hints → JSON schema (automatic):

# Your Python function                    # Generated JSON Schema
def calculate_energy(                     # {
    mass: float,                          #   "properties": {
    c: float = 299792458.0                #     "mass": {"type": "number"},
):                                        #     "c": {"type": "number", "default": 299792458.0}
    """Calculate E=mc²"""                 #   },
    return mass * c ** 2                  #   "required": ["mass"]
                                          # }

The translation is automatic: float → "number", str → "string", list[int] → {"type": "array", "items": {"type": "integer"}}. No manual schema maintenance.

Type-Safe Tool Generation

Python type hints automatically become JSON schemas

Class-based tools

For stateful tools, extend BaseTool:

from orchestral.tools.base import BaseTool, RuntimeField
 
class DataAnalysisTool(BaseTool):
    """Analyze numerical dataset"""
    data_path: str | None = RuntimeField(
        description="Path to CSV data file"
    )
    method: str = RuntimeField(
        default="mean",
        description="Analysis method"
    )
 
    def _run(self) -> str:
        import pandas as pd
        df = pd.read_csv(self.data_path)
        result = getattr(df, self.method)()
        return f"Analysis result: {result}"

RuntimeField defines parameters the LLM provides (appearing in tool schemas). StateField maintains internal state invisible to the LLM. This separation enables tools to maintain context across calls without exposing implementation details.

Built-in tools

Orchestral includes production-ready tools covering common workflows:

Category	Tools
Filesystem	ReadFile, WriteFile, EditFile, ListDirectory, FileSearch
Execution	RunCommand (persistent shell), RunPython
Web	WebSearch, ArxivTool
Utilities	TodoRead/TodoWrite, DisplayImage

Each tool includes comprehensive error handling, input validation, and safety mechanisms.

The Hook System

Hooks intercept tool execution at two points: before execution (pre-hooks) and after (post-hooks). Pre-hooks can approve, reject, or modify tool calls. Post-hooks transform outputs.

Security hooks

from orchestral.hooks import (
    SafeguardHook,      # LLM-based safety analysis
    UserApprovalHook,   # Three-tier classification
    DangerousCommandHook # Pattern-based blocking
)
 
agent = Agent(
    llm=llm,
    tools=tools,
    tool_hooks=[
        DangerousCommandHook(),  # Block rm -rf, eval()
        UserApprovalHook(),       # Ask for approval
        SafeguardHook()           # LLM judge
    ]
)

Hooks chain in sequence. The first rejection short-circuits execution.

Hook Security Pipeline

Multi-layered safety gates before tool execution

Output hooks

from orchestral.hooks import TruncateOutputHook, SummarizeOutputHook
 
agent = Agent(
    llm=llm,
    tools=[ReadFileTool()],
    tool_hooks=[
        TruncateOutputHook(max_chars=5000),
        SummarizeOutputHook(llm=cheap_llm)
    ]
)

Tool outputs can be enormous. Post-hooks truncate or summarize to preserve context window for meaningful information.

Custom hooks

Implement domain-specific workflows:

class BudgetControlHook(ToolHook):
    """Reject operations exceeding cost budget"""
    def __init__(self, max_cost: float):
        self.max_cost = max_cost
 
    def before_call(self, tool, context):
        if context.total_cost > self.max_cost:
            return ToolHookResult(
                approved=False,
                message=f"Budget exceeded: ${context.total_cost:.2f}",
                should_interrupt=True
            )
        return ToolHookResult(approved=True)

Read-before-edit safety

The EditFileTool tracks which files have been read in the current conversation. Attempting to edit a file that hasn't been read triggers rejection:

EditFile: example.txt
Error: File Not Read
Reason: You must read the file before editing it
Use read_file to see the current content first

This prevents blind overwrites without special prompting. The safety mechanism operates through metadata in the Context layer.

Research Features

Orchestral includes features specifically designed for scientific computing workflows.

Publication-ready output

The web UI includes copy buttons that export conversation snippets as LaTeX code—saving researchers hours of formatting time for papers and supplementary materials. Add \include{orchestral.tex} to your paper's preamble, then paste formatted conversations directly:

User: I just arrived in Paris, do I need my coat?

Agent: GetWeather(location="Paris, France")

Temperature: 26°C, Clear Skies

You probably won't need your coat. It should be pleasant!

The module provides colored boxes matching the UI appearance, enabling reproducible records of agent-assisted analysis integrated directly into papers.

Persistent terminal sessions

The RunCommandTool preserves working directory and environment variables across calls:

agent.run("cd /data && ls")
agent.run("pwd")  # Returns: /data (cd persisted)
agent.run("export DATA_PATH=/data/exp1")
agent.run("echo $DATA_PATH")  # Returns: /data/exp1

This matches how humans naturally use command-line interfaces. Complex workflows requiring directory navigation, environment setup, and stateful operations work as expected.

Subagents

Orchestral supports agentic tools: tools that contain their own Agent instance. The BaseSubagent class enables hierarchical task decomposition.

Hierarchical Delegation vs. Multi-Agent Chat

Orchestral uses hierarchical delegation: a manager agent calls a subordinate agent as a tool, gets a result, and continues. This differs from "collaborative chat" frameworks (like CrewAI) where multiple agents discuss in a shared conversation. Hierarchical delegation is simpler to debug—you see exactly which agent did what—and preserves context better since results are summarized before returning.

class DocumentAnalyzer(BaseSubagent):
    """Extract structured information from documents"""
 
    def _run(self, query: str) -> str:
        # Subagent has its own LLM and tools
        result = self.agent.run(
            f"Analyze documents for: {query}"
        )
        return result.text

Subagents explore and filter content, returning only essential information to the parent agent's context. This improves information density and preserves the parent's context window for high-level reasoning.

MCP integration

The Model Context Protocol (MCP) standardizes tool sharing across AI applications. Orchestral's MCP support enables:

Use MCP servers: Connect to databases, APIs, and services
Share tools: Expose Orchestral tools as MCP servers
Ecosystem access: Leverage the growing MCP ecosystem

MCP tools are adapted to the BaseTool interface, ensuring they work identically across all providers.

Real-World Deployments

Orchestral powers research at multiple institutions.

HEPTAPOD: Particle physics

The HEP Toolkit for Agentic Planning, Orchestration, and Deployment applies Orchestral to Beyond the Standard Model physics at Fermilab. Agents manage Monte Carlo simulations—the computationally intensive statistical method that generates millions of simulated particle collision events. An agent might invoke MadGraph to generate event files, Pythia8 for particle showering, and ROOT for statistical analysis, chaining these heavyweight tools in a reproducible workflow. The framework provides a structured, auditable layer between researchers, LLMs, and computational infrastructure.

ASTER: Exoplanet research

The Agentic Science Toolkit for Exoplanet Research combines tools for downloading planetary parameters from the NASA Exoplanet Archive, generating TauREx forward models, and performing Bayesian retrieval analysis. A case study of WASP-39b demonstrated ASTER's ability to rapidly execute retrieval analysis, recovering atmospheric parameters reported in the literature.

Both projects required:

Provider flexibility for cost optimization
Reproducible workflows for publication
Lightweight deployment that embeds in existing codebases
Cost tracking for computational budgets

Framework Comparison

Orchestral vs popular agent frameworks on key dimensions

Implementation Guide

Installation

pip install orchestral-ai

Requires Python 3.10+ (for modern type hint syntax like str | None and list[int]).

Quick start

from orchestral import Agent, Claude, define_tool
 
@define_tool()
def search_papers(query: str, limit: int = 5):
    """Search academic papers on arXiv"""
    # Implementation here
    pass
 
agent = Agent(
    llm=Claude(model='claude-sonnet-4-0'),
    tools=[search_papers],
    system_prompt="You are a research assistant"
)
 
response = agent.run("Find recent papers on LLM agents")
print(response.text)
print(f"Cost: ${agent.context.total_cost:.4f}")

Provider configuration

Set API keys as environment variables:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...
export GROQ_API_KEY=...

Or configure programmatically:

llm = Claude(
    model='claude-sonnet-4-0',
    api_key='sk-ant-...'
)

Streaming

Streaming uses synchronous generators:

for chunk in agent.stream_text_message(prompt):
    print(chunk, end='', flush=True)

The framework handles chunk collection, tool call aggregation, and building the final Response object. No async/await complexity.

Why Synchronous Matters

Python's async/await creates "event loop nightmares" in Jupyter notebooks—the primary environment for data scientists. Nested event loops fail silently, stack traces become unreadable, and debugging requires understanding asyncio internals. Orchestral's synchronous design gives you normal Python stack traces: when something breaks, you see exactly which line caused it.

Adding custom tools

Wrap existing functions with the decorator:

from orchestral import define_tool
from my_database import query_database
 
@define_tool()
def search_database(query: str, limit: int = 10):
    """Search the internal database
 
    Args:
        query: SQL-like query string
        limit: Maximum results to return
    """
    results = query_database(query, limit=limit)
    return format_results(results)

Deployment options

Orchestral deploys anywhere Python runs.

Deploys Anywhere Python Runs

No databases, no message queues—just pip install

The deployment footprint is a Python interpreter and pip. No database servers, no message queues, no persistent infrastructure required.

Limitations

The paper acknowledges several constraints.

No automatic context compaction

Users must manually manage context when approaching model limits. LLM-based summarization of conversation history is planned but not yet implemented.

Sequential tool execution

Tools execute sequentially within each turn. Parallel tool execution could improve performance for independent operations but adds complexity that conflicts with the framework's synchronous design philosophy.

Single-agent focus

Orchestral focuses on single-agent workflows. Multi-agent collaboration patterns like CrewAI's role-based teams require manual orchestration. The architecture supports this through subagents, but no built-in multi-agent primitives exist.

Multimodal limitations

Image understanding is supported (models can analyze images), but image generation and multimodal tool outputs are not yet integrated.

Paper: arXiv:2601.02577 Authors: Alexander Roman, Jacob Roman (Orchestral AI) Code: github.com/orchestral-ai/orchestral Documentation: orchestral-ai.com

Authors

Alexander RomanOrchestral AI,Jacob RomanOrchestral AI

Code & Data

Cite this paper

Alexander Roman, Jacob Roman (2026). Orchestral AI: A Lightweight Framework for Provider-Agnostic LLM Agents. arXiv 2026.

Key Findings