Cost Management Guide¶
Effective cost management is crucial when running AI orchestration at scale. This guide helps you optimize spending while maintaining task quality.
Understanding Costs¶
Token Pricing¶
Current pricing per million tokens:
| Agent | Input Cost | Output Cost | Avg Cost/Task |
|---|---|---|---|
| Claude | $3.00 | $15.00 | $5-50 |
| Q Chat | $0.50 | $1.50 | $1-10 |
| Gemini | $0.50 | $1.50 | $1-10 |
Cost Calculation¶
Example: - Task uses 100K input tokens, 50K output tokens - With Claude: (0.1 × $3) + (0.05 × $15) = $1.05 - With Q Chat: (0.1 × $0.50) + (0.05 × $1.50) = $0.125
Cost Control Mechanisms¶
1. Hard Limits¶
Set maximum spending caps:
# Strict $10 limit
python ralph_orchestrator.py --max-cost 10.0
# Conservative token limit
python ralph_orchestrator.py --max-tokens 100000
2. Context Management¶
Reduce token usage through smart context handling:
# Aggressive context management
python ralph_orchestrator.py \
--context-window 50000 \
--context-threshold 0.6 # Summarize at 60% full
3. Agent Selection¶
Choose cost-effective agents:
# Development: Use cheaper agents
python ralph_orchestrator.py --agent q --max-cost 5.0
# Production: Use quality agents with limits
python ralph_orchestrator.py --agent claude --max-cost 50.0
Optimization Strategies¶
1. Tiered Agent Strategy¶
Use different agents for different task phases:
# Phase 1: Research with Q (cheap)
echo "Research the problem" > research.md
python ralph_orchestrator.py --agent q --prompt research.md --max-cost 2.0
# Phase 2: Implementation with Claude (quality)
echo "Implement the solution" > implement.md
python ralph_orchestrator.py --agent claude --prompt implement.md --max-cost 20.0
# Phase 3: Testing with Q (cheap)
echo "Test the solution" > test.md
python ralph_orchestrator.py --agent q --prompt test.md --max-cost 2.0
2. Prompt Optimization¶
Reduce token usage through efficient prompts:
Before (Expensive)¶
Please create a comprehensive web application with the following features:
- User authentication system with registration, login, password reset
- Dashboard with charts and graphs
- API with full CRUD operations
- Complete test suite
- Detailed documentation
[... 5000 tokens of requirements ...]
After (Optimized)¶
Build user auth API:
- Register/login endpoints
- JWT tokens
- PostgreSQL storage
- Basic tests
See spec.md for details.
3. Context Window Management¶
Automatic Summarization¶
# Trigger summarization early to save tokens
python ralph_orchestrator.py \
--context-window 100000 \
--context-threshold 0.5 # Summarize at 50%
Manual Context Control¶
## Context Management
When context reaches 50%, summarize:
- Keep only essential information
- Remove completed task details
- Compress verbose outputs
4. Iteration Optimization¶
Fewer, smarter iterations save money:
# Many quick iterations (expensive)
python ralph_orchestrator.py --max-iterations 100 # ❌
# Fewer, focused iterations (economical)
python ralph_orchestrator.py --max-iterations 20 # ✅
Cost Monitoring¶
Real-time Tracking¶
Monitor costs during execution:
Output:
Cost Reports¶
Access detailed cost breakdowns:
import json
from pathlib import Path
# Load metrics
metrics_dir = Path('.agent/metrics')
total_cost = 0
for metric_file in metrics_dir.glob('metrics_*.json'):
with open(metric_file) as f:
data = json.load(f)
total_cost += data.get('cost', 0)
print(f"Total cost: ${total_cost:.2f}")
Cost Dashboards¶
Create monitoring dashboards:
#!/usr/bin/env python3
import json
import matplotlib.pyplot as plt
from pathlib import Path
costs = []
iterations = []
for metric_file in sorted(Path('.agent/metrics').glob('*.json')):
with open(metric_file) as f:
data = json.load(f)
costs.append(data.get('total_cost', 0))
iterations.append(data.get('iteration', 0))
plt.plot(iterations, costs)
plt.xlabel('Iteration')
plt.ylabel('Cumulative Cost ($)')
plt.title('Ralph Orchestrator Cost Progression')
plt.savefig('cost_report.png')
Budget Planning¶
Task Cost Estimation¶
| Task Type | Complexity | Recommended Budget | Agent |
|---|---|---|---|
| Simple Script | Low | $0.50 - $2 | Q Chat |
| Web API | Medium | $5 - $20 | Gemini/Claude |
| Full Application | High | $20 - $100 | Claude |
| Data Analysis | Medium | $5 - $15 | Gemini |
| Documentation | Low-Medium | $2 - $10 | Q/Claude |
| Debugging | Variable | $5 - $50 | Claude |
Monthly Budget Planning¶
# Calculate monthly budget needs
tasks_per_month = 50
avg_cost_per_task = 10.0
safety_margin = 1.5
monthly_budget = tasks_per_month * avg_cost_per_task * safety_margin
print(f"Recommended monthly budget: ${monthly_budget}")
Cost Optimization Profiles¶
Minimal Cost Profile¶
Maximum savings, acceptable quality:
python ralph_orchestrator.py \
--agent q \
--max-tokens 50000 \
--max-cost 2.0 \
--context-window 30000 \
--context-threshold 0.5 \
--checkpoint-interval 10
Balanced Profile¶
Good quality, reasonable cost:
python ralph_orchestrator.py \
--agent gemini \
--max-tokens 200000 \
--max-cost 10.0 \
--context-window 100000 \
--context-threshold 0.7 \
--checkpoint-interval 5
Quality Profile¶
Best results, controlled spending:
python ralph_orchestrator.py \
--agent claude \
--max-tokens 500000 \
--max-cost 50.0 \
--context-window 200000 \
--context-threshold 0.8 \
--checkpoint-interval 3
Advanced Cost Management¶
Dynamic Agent Switching¶
Switch agents based on budget remaining:
# Pseudo-code for dynamic switching
if remaining_budget > 20:
agent = "claude"
elif remaining_budget > 5:
agent = "gemini"
else:
agent = "q"
Cost-Aware Prompts¶
Include cost considerations in prompts:
## Budget Constraints
- Maximum budget: $10
- Optimize for efficiency
- Skip non-essential features if approaching limit
- Prioritize core functionality
Batch Processing¶
Combine multiple small tasks:
# Inefficient: Multiple orchestrations
python ralph_orchestrator.py --prompt task1.md # $5
python ralph_orchestrator.py --prompt task2.md # $5
python ralph_orchestrator.py --prompt task3.md # $5
# Total: $15
# Efficient: Batched orchestration
cat task1.md task2.md task3.md > batch.md
python ralph_orchestrator.py --prompt batch.md # $10
# Total: $10 (33% savings)
Cost Alerts¶
Setting Up Alerts¶
#!/bin/bash
# cost_monitor.sh
COST_LIMIT=25.0
CURRENT_COST=$(python -c "
import json
with open('.agent/metrics/state_latest.json') as f:
print(json.load(f)['total_cost'])
")
if (( $(echo "$CURRENT_COST > $COST_LIMIT" | bc -l) )); then
echo "ALERT: Cost exceeded $COST_LIMIT" | mail -s "Ralph Cost Alert" admin@example.com
fi
Automated Stops¶
Implement circuit breakers:
# cost_breaker.py
import json
import sys
with open('.agent/metrics/state_latest.json') as f:
state = json.load(f)
if state['total_cost'] > state['max_cost'] * 0.9:
print("WARNING: 90% of budget consumed")
sys.exit(1)
ROI Analysis¶
Calculating ROI¶
# ROI calculation
hours_saved = 10 # Hours of manual work saved
hourly_rate = 50 # Developer hourly rate
ai_cost = 25 # Cost of AI orchestration
value_created = hours_saved * hourly_rate
roi = (value_created - ai_cost) / ai_cost * 100
print(f"Value created: ${value_created}")
print(f"AI cost: ${ai_cost}")
print(f"ROI: {roi:.1f}%")
Cost-Benefit Matrix¶
| Task | Manual Hours | Manual Cost | AI Cost | Savings |
|---|---|---|---|---|
| API Development | 40h | $2000 | $50 | $1950 |
| Documentation | 20h | $1000 | $20 | $980 |
| Testing Suite | 30h | $1500 | $30 | $1470 |
| Bug Fixing | 10h | $500 | $25 | $475 |
Best Practices¶
1. Start Small¶
Test with minimal budgets first:
# Test run
python ralph_orchestrator.py --max-cost 1.0 --max-iterations 5
# Scale up if successful
python ralph_orchestrator.py --max-cost 10.0 --max-iterations 50
2. Monitor Continuously¶
Track costs in real-time:
# Terminal 1: Run orchestration
python ralph_orchestrator.py --verbose
# Terminal 2: Monitor costs
watch -n 5 'tail -n 20 .agent/metrics/state_latest.json'
3. Optimize Iteratively¶
- Analyze cost reports
- Identify expensive operations
- Refine prompts and settings
- Test optimizations
4. Set Realistic Budgets¶
- Development: 50% of production budget
- Testing: 25% of production budget
- Production: Full budget with safety margin
5. Document Costs¶
Keep records for analysis:
# Save cost report after each run
python ralph_orchestrator.py && \
cp .agent/metrics/state_latest.json "reports/run_$(date +%Y%m%d_%H%M%S).json"
Troubleshooting¶
Common Issues¶
- Unexpected high costs
- Check token usage in metrics
- Review prompt efficiency
-
Verify context settings
-
Budget exceeded quickly
- Lower context window
- Increase summarization threshold
-
Use cheaper agent
-
Poor results with budget constraints
- Increase budget slightly
- Optimize prompts
- Consider phased approach
Next Steps¶
- Review Agent Selection for cost-effective choices
- Optimize Prompts for efficiency
- Configure Checkpointing to save progress
- Explore Examples for cost-optimized patterns