Operations & Observability
Operations & Observability
Section titled “Operations & Observability”Clasper Ops provides comprehensive operational tooling for running AI agents in production. This document covers tracing, replay, evaluation framework, workspace versioning, and the skill registry.
Tracing
Section titled “Tracing”Every agent execution produces a detailed trace for debugging and analysis.
Trace Structure
Section titled “Trace Structure”interface AgentTrace { id: string; // UUID v7 (time-ordered) tenantId: string; workspaceId: string; agentRole?: string; startedAt: string; completedAt?: string; durationMs?: number; workspaceHash?: string; // SHA256 of workspace at execution time skillVersions: Record<string, string>; model: string; provider: string; input: { message: string; messageHistory: number; }; steps: TraceStep[]; output?: { message: string; toolCalls: ToolCallTrace[]; }; usage: { inputTokens: number; outputTokens: number; totalCost: number; }; redactedPrompt?: string; error?: string; labels?: Record<string, string>; taskId?: string; documentId?: string; messageId?: string;}Trace Steps
Section titled “Trace Steps”Traces record every significant step:
type TraceStep = LLMCallStep | ToolCallStep | ToolResultStep | ErrorStep;
interface LLMCallStep { type: 'llm_call'; timestamp: string; durationMs: number; data: { model: string; provider: string; inputTokens: number; outputTokens: number; cost: number; hasToolCalls: boolean; finishReason?: string; };}
interface ToolCallStep { type: 'tool_call'; timestamp: string; durationMs: number; data: { toolCallId: string; toolName: string; arguments: unknown; // Redacted permitted: boolean; permissionReason?: string; };}
interface ToolResultStep { type: 'tool_result'; timestamp: string; durationMs: number; data: { toolCallId: string; toolName: string; success: boolean; result?: unknown; // Redacted error?: string; };}
interface ErrorStep { type: 'error'; timestamp: string; durationMs: number; data: { code: string; message: string; recoverable: boolean; };}Correlation IDs
Section titled “Correlation IDs”Every request receives a trace ID that:
- Is returned in the response (
trace_idfield) - Is included in the
X-Trace-Idresponse header - Can be passed in via
X-Trace-Idrequest header for correlation - Links all audit log entries for the request
Building Traces
Section titled “Building Traces”Use the TraceBuilder for constructing traces:
import { TraceBuilder } from './lib/tracing/trace.js';
const trace = new TraceBuilder({ tenantId: "tenant-123", workspaceId: "workspace-123", model: "gpt-4o-mini", provider: "openai", agentRole: "jarvis", inputMessage: "Find the fastest route"});
trace.addLLMCall({ model: "gpt-4o-mini", provider: "openai", inputTokens: 1200, outputTokens: 300, cost: 0.00025, hasToolCalls: true}, 2000);
trace.addToolCall({ toolCallId: "call-1", toolName: "search", arguments: { query: "test" }, permitted: true}, 50);
trace.addToolResult({ toolCallId: "call-1", toolName: "search", success: true, result: { hits: 3 }}, 450);
trace.setOutput("Here is the result", []);const finalTrace = trace.complete();Querying Traces
Section titled “Querying Traces”# List recent tracesGET /traces?limit=50
# Filter by tenantGET /traces?tenant_id=tenant-123
# Filter by statusGET /traces?status=error
# Get full traceGET /traces/0194c8f0-7e1a-7000-8000-000000000001Operations Console (v1.2)
Section titled “Operations Console (v1.2)”The human-facing Operations Console is served at /ops and provides trace explorer, diff, promotion, rollback, skill ops, and cost/risk dashboards.
Ops API endpoints are under /ops/api/* and require Authorization: Bearer <OIDC JWT>.
Replay & Diff
Section titled “Replay & Diff”Traces can be replayed for debugging and comparison.
Replay Context
Section titled “Replay Context”Get everything needed to replay a trace:
GET /traces/:id/replayReturns:
{ "trace_id": "...", "original_request": { "message": "...", "messages": [...], "metadata": {...} }, "workspace_snapshot": { "hash": "abc123...", "files": { "AGENTS.md": "...", "SOUL.md": "..." } }, "skill_versions": { "summarize": "1.2.0" }}Diff Scenarios
Section titled “Diff Scenarios”Use replay for:
- Model comparison - Run same input with different model
- Skill regression - Test new skill version against baseline
- Workspace changes - Compare behavior after prompt updates
- Debugging - Reproduce issues with exact context
Programmatic Replay
Section titled “Programmatic Replay”const traceStore = getTraceStore();
// Get replay contextconst context = traceStore.getReplayContext(traceId);
// Modify and re-runconst result = await agent.run({ ...context.original_request, model: 'gpt-4o', // Try different model});
// Compare outputsconst diff = compareOutputs(context.original_response, result.response);Evaluation Framework
Section titled “Evaluation Framework”Run evaluations to detect regressions and measure agent performance.
Evaluation Dataset
Section titled “Evaluation Dataset”interface EvalDataset { name: string; description?: string; cases: EvalCase[];}
interface EvalCase { id: string; name?: string; input: Record<string, unknown>; expectedOutput?: Record<string, unknown>; expectedBehavior?: string; // For subjective evals tags?: string[];}Running Evaluations
Section titled “Running Evaluations”POST /evals/run{ "name": "ticket-summarizer-v1.2", "cases": [ { "id": "case-1", "name": "Happy path", "input": { "ticket_id": "T-123" }, "expected_output": { "sentiment": "positive" } }, { "id": "case-2", "name": "Error handling", "input": { "ticket_id": "invalid" }, "expected_output": { "error": true } } ], "options": { "skill": "ticket_summarizer", "skill_version": "1.2.0", "model": "gpt-4o-mini", "parallel": 3 }}Evaluation Results
Section titled “Evaluation Results”interface EvalResult { id: string; datasetName: string; startedAt: Date; completedAt: Date; results: CaseResult[]; summary: EvalSummary; config: EvalOptions;}
interface CaseResult { caseId: string; status: 'passed' | 'failed' | 'error'; actualOutput: unknown; expectedOutput: unknown; score: number; // 0-1 durationMs: number; traceId: string; // Link to full trace}
interface EvalSummary { total: number; passed: number; failed: number; errors: number; avgScore: number; totalDurationMs: number; totalCost: number;}Drift Detection
Section titled “Drift Detection”Compare evaluations over time:
const evalRunner = getEvalRunner();
// Run current evaluationconst current = await evalRunner.run(dataset, options);
// Get baseline (previous run)const baseline = evalRunner.getResult(baselineId);
// Compareconst drift = evalRunner.compareToBaseline(current, baseline);
console.log(drift);// {// overallDrift: 0.05, // 5% regression// improvedCases: ['case-3'],// regressedCases: ['case-1', 'case-2'],// newFailures: ['case-2'],// recommendations: ['Review case-2 for regression']// }Golden Datasets
Section titled “Golden Datasets”Maintain golden datasets for critical paths:
// Store golden datasetconst dataset: EvalDataset = { name: 'ticket-summarizer-golden', cases: loadGoldenCases('./evals/ticket-summarizer.json')};
// Run and compare to baselineconst result = await evalRunner.run(dataset, { skill: 'ticket_summarizer' });const drift = evalRunner.compareToBaseline(result, lastGoldenRun);
if (drift.overallDrift > 0.1) { throw new Error('Regression detected: 10%+ drift from baseline');}Workspace Versioning
Section titled “Workspace Versioning”Track workspace changes with content-addressable storage.
Snapshots
Section titled “Snapshots”Create a snapshot of the current workspace:
# Programmaticconst versioning = getWorkspaceVersioning();const version = versioning.snapshot(workspaceId, 'Updated AGENTS.md');Returns:
interface WorkspaceVersion { hash: string; // SHA256 content hash workspaceId: string; createdAt: Date; message?: string; files: Record<string, FileSnapshot>;}
interface FileSnapshot { path: string; hash: string; size: number; content: string;}Version History
Section titled “Version History”// List versionsconst versions = versioning.listVersions(workspaceId, { limit: 10 });
// Get specific versionconst version = versioning.getVersion(hash);
// Get latestconst latest = versioning.getLatestVersion(workspaceId);Diffing
Section titled “Diffing”// Diff between two versionsconst diff = versioning.diff(oldHash, newHash);
console.log(diff);// {// added: ['skills/new-skill/SKILL.md'],// removed: [],// modified: ['AGENTS.md'],// unchanged: ['SOUL.md', 'IDENTITY.md'],// changes: {// 'AGENTS.md': {// oldHash: 'abc...',// newHash: 'def...',// oldContent: '...',// newContent: '...'// }// }// }
// Diff from current stateconst pendingChanges = versioning.diffFromCurrent(lastVersionHash);Rollback
Section titled “Rollback”// Rollback to a previous versionversioning.rollback(previousVersionHash);
// This:// 1. Restores all files to that version's state// 2. Creates an audit log entry// 3. Does NOT delete the old version (versions are immutable)Best Practices
Section titled “Best Practices”- Snapshot before changes - Always capture baseline
- Include messages - Document why changes were made
- Link to traces - Record workspace hash in traces
- Prune old versions - Keep recent + tagged versions
Skill Registry
Section titled “Skill Registry”Versioned, immutable storage for skill manifests.
Publishing Skills
Section titled “Publishing Skills”const registry = getSkillRegistry();
const manifest: SkillManifest = { name: 'ticket_summarizer', version: '1.2.0', description: 'Summarizes support tickets', inputs: { ticket_id: { type: 'string', required: true } }, outputs: { summary: { type: 'string' }, sentiment: { type: 'string', enum: ['positive', 'neutral', 'negative'] } }, permissions: { tools: ['read_ticket', 'get_user'] }, instructions: '...'};
const published = registry.publish(manifest, 'user-123');// { name, version, checksum, publishedAt, publishedBy }Version Immutability
Section titled “Version Immutability”Once published, a version cannot be modified:
// This will throw an errorregistry.publish({ name: 'my-skill', version: '1.0.0', ... });// Error: Version 1.0.0 already existsQuerying Skills
Section titled “Querying Skills”// Get latest versionconst skill = registry.get('ticket_summarizer');
// Get specific versionconst v1 = registry.get('ticket_summarizer', '1.0.0');
// List all versionsconst versions = registry.listVersions('ticket_summarizer');
// Search skillsconst results = registry.search('ticket', { limit: 10 });Skill Manifest Format (YAML)
Section titled “Skill Manifest Format (YAML)”name: ticket_summarizerversion: 1.2.0description: Summarizes support tickets with sentiment analysis
inputs: ticket_id: type: string description: The ticket ID to summarize required: true include_history: type: boolean description: Whether to include ticket history default: false
outputs: summary: type: string description: Summary of the ticket sentiment: type: string enum: [positive, neutral, negative] description: Overall sentiment key_issues: type: array items: { type: string } description: List of key issues identified
permissions: tools: - read_ticket - get_user_info models: - gpt-4o - gpt-4o-mini
gates: env: - TICKET_API_KEY
redaction: patterns: - email - phone strategy: mask
tests: - name: happy_path input: ticket_id: "T-123" expected_output: sentiment: positive - name: negative_sentiment input: ticket_id: "T-456" expected_output: sentiment: negative
instructions: | When summarizing a ticket:
1. Fetch the ticket using read_ticket(ticket_id) 2. Analyze the content for key issues 3. Determine overall sentiment 4. Return structured summary
Keep summaries concise (2-3 sentences).Skill Testing
Section titled “Skill Testing”Run tests defined in skill manifests:
POST /skills/registry/ticket_summarizer/testOr programmatically:
const tester = getSkillTester();
const skill = registry.get('ticket_summarizer', '1.2.0');const results = await tester.runTests(skill.manifest);
console.log(results);// {// skill: 'ticket_summarizer',// version: '1.2.0',// results: [// { name: 'happy_path', status: 'passed', durationMs: 1500 },// { name: 'negative_sentiment', status: 'failed', error: '...' }// ],// summary: { total: 2, passed: 1, failed: 1 }// }Database
Section titled “Database”All operational data is stored in SQLite.
Tables
Section titled “Tables”| Table | Purpose |
|---|---|
traces | Agent execution traces |
audit_log | Immutable audit log |
skill_registry | Versioned skill manifests |
tenant_budgets | Per-tenant budget tracking |
workspace_versions | Workspace snapshots |
eval_results | Evaluation results |
Database Path
Section titled “Database Path”Configure via environment:
CLASPER_DB_PATH=./clasper.db # DefaultDatabase Stats
Section titled “Database Stats”GET /db/statsReturns:
{ "path": "./clasper.db", "size_bytes": 1048576, "tables": { "traces": 1250, "audit_log": 15000, "skill_registry": 25, "tenant_budgets": 10, "workspace_versions": 50, "eval_results": 100 }}Initialization
Section titled “Initialization”Database is automatically initialized on startup:
import { initDatabase } from './lib/core/db.js';
// Called automatically by serverinitDatabase();WAL Mode
Section titled “WAL Mode”SQLite uses WAL (Write-Ahead Logging) for better concurrent access:
PRAGMA journal_mode = WAL;PRAGMA foreign_keys = ON;Library Structure
Section titled “Library Structure”Operations code is organized into modules:
src/lib/├── core/│ ├── config.ts # Configuration│ └── db.ts # Database initialization├── tracing/│ ├── trace.ts # Trace model & builder│ └── traceStore.ts # Trace storage├── skills/│ ├── skillManifest.ts # YAML manifest parsing│ ├── skillRegistry.ts # Version registry│ └── skillTester.ts # Test runner├── workspace/│ ├── workspace.ts # Workspace loader│ └── workspaceVersioning.ts # Versioning└── evals/ └── evals.ts # Evaluation framework