-
Notifications
You must be signed in to change notification settings - Fork 269
Description
Executive Summary
- Agents analyzed: 25 workflow runs (Mar 1–2, 2026)
- Total tokens: 24.15M | Total cost: $5.71 | Total duration: 2.6h
- Agent quality score: 86/100 (↑ 1 from 85)
- Agent effectiveness score: 84/100 (↓ 1 from 85 — AI Moderator now day 4)
- Errors: 7 (4 lockdown P0/P1, 1 AI Moderator, + lockdown-adjacent failures)
- Top performers: The Great Escapi, Contribution Check, Daily Safe Outputs Conformance Checker
- Needs attention: AI Moderator (day 4 failure), Chroma Issue Indexer (8.24M tokens/run), Lockfile Statistics Analysis Agent (cost creeping up)
Performance Rankings
Top Performing Agents 🏆
-
The Great Escapi (Quality: 96/100, Effectiveness: 97/100)
- Run Add documentation unbloat workflow for automated cleanup #1464: 74k tokens, 3.4m, 0 errors — ultra-efficient, ultra-consistent
- Best token-to-output ratio in the ecosystem
- §22587511713
-
Contribution Check (Quality: 93/100, Effectiveness: 93/100)
- Run Proposal: Plan-apply split: agentic workflow operates read-only and outputs are applied. #85: 218k tokens, 3.1m, 0 errors — reliably fast
- §22586971804
-
Daily Safe Outputs Conformance Checker (Quality: 93/100, Effectiveness: 92/100)
- Run Logs revamp. #25: 232k tokens, $0.47, 3.9m, 11 turns, 0 errors
- Efficient use of Claude, highly consistent
- §22586671855
-
Repository Tree Map Generator (Quality: 90/100, Effectiveness: 90/100)
- Run allow time deltas in stop-time #66: 187k tokens, 3.9m, 0 errors — small, clean, reliable
- §22584398552
-
Semantic Function Refactoring (Quality: 88/100, Effectiveness: 87/100)
- Run [claude-test] Code flows like water #162: 1.23M tokens, $1.35, 7.0m, 47 turns — improving trend
- Cost trajectory: $2.36 → $1.72 → $1.35 ✅ (↓ 43% over 3 days)
- §22587119823
Agents Needing Attention 📉
-
AI Moderator (Quality: 35/100, Effectiveness: 20/100)
- Day 4 failure — OpenAI cybersecurity restriction on gpt-5.3-codex model
- Blocking all moderation workflows — no user content is being moderated
- Issue created 2026-03-01; status unknown
- Action needed: Verify issue is being tracked; confirm model switch is underway
-
Chroma Issue Indexer (Quality: 72/100, Effectiveness: 78/100)
- Run docs: Add a "design patterns" section to docs #212: 8.24M tokens, 13.9m — outlier (most other runs < 1.5M)
- 144 blocked Serena MCP socket requests per run (steady pattern)
- Token usage 10× ecosystem average — needs root cause analysis
- Still succeeding (0 errors) but efficiency is a concern
- §22584737911
-
Lockfile Statistics Analysis Agent (Quality: 80/100, Effectiveness: 78/100)
- Run Bug: do not emit actions/checkout when permission do not have content:read #806 #172: 1.17M tokens, $1.61, 9.4m, 29 turns
- Cost trend: $1.36 → $1.53 → $1.61 ↑ (slowly creeping up — watch)
- §22585850530
Lockdown-Failed Agents (External Factor — Not Agent Quality)
❌ These failures are NOT due to agent quality but missing
GH_AW_GITHUB_TOKENsecret:
- Issue Monster (~50+ failures/day) — issue [aw] Issue Monster failed #18919 OPEN, expires 2026-03-07
- PR Triage Agent (every 6h) — issue [aw] PR Triage Agent failed #18952 OPEN, expires 2026-03-08
- Daily Issues Report (daily) — failing 119+ consecutive runs, no active issue
- Org Health Report (weekly) — lockdown-related, no active issue
Quality Analysis
Output Quality Distribution (2026-03-02 sample)
| Agent | Tokens | Cost | Duration | Turns | Errors | Score |
|---|---|---|---|---|---|---|
| The Great Escapi | 74k | - | 3.4m | - | 0 | 96/100 |
| Contribution Check | 218k | - | 3.1m | - | 0 | 93/100 |
| Daily Safe Outputs Conformance Checker | 232k | $0.47 | 3.9m | 11 | 0 | 93/100 |
| Repository Tree Map Generator | 187k | - | 3.9m | - | 0 | 90/100 |
| Semantic Function Refactoring | 1.23M | $1.35 | 7.0m | 47 | 0 | 88/100 |
| Daily Team Evolution Insights | 244k | $0.66 | 8.7m | 7 | 0 | 87/100 |
| The Daily Repository Chronicle | 782k | - | 8.2m | - | 0 | 85/100 |
| Lockfile Statistics Analysis Agent | 1.17M | $1.61 | 9.4m | 29 | 0 | 80/100 |
| Slide Deck Maintainer | 1.63M | - | 8.4m | - | 0 | 80/100 |
| Chroma Issue Indexer | 8.24M | - | 13.9m | - | 0 | 72/100 |
| AI Moderator | N/A | N/A | N/A | N/A | ❌ | 35/100 |
Quality tier breakdown:
- Excellent (90-100): 4 agents
- Good (80-89): 5 agents
- Fair (60-79): 1 agent
- Poor (<40): 1 agent (AI Moderator — external cause)
Firewall Analysis (2026-03-02)
All "-" domain blocks are Serena MCP local socket calls (expected pattern). Real blocked domains of concern:
| Agent | Total Req | Blocked | Notable Blocks |
|---|---|---|---|
| Chroma Issue Indexer | 294 | 144 | - (Serena) |
| Lockfile Statistics Analysis Agent | 110 | 81 | - (Serena), go.dev |
| CLI Version Checker | 248 | 157 | go.dev, golang.org, proxy.golang.org, release-assets |
| Slide Deck Maintainer | 94 | 60 | - (Serena) |
| Daily Security Red Team | 103 | 55 | - (Serena) |
| Daily Testify Uber Super Expert | 88 | 51 | - (Serena) |
| Daily Copilot PR Merged Report | 72 | 49 | - (Serena) |
| Semantic Function Refactoring | 70 | 47 | - (Serena) |
CLI Version Checker is blocking golang.org/proxy.golang.org — this may indicate a workflow that needs to expand its network allowlist or stop downloading Go dependencies.
Behavioral Patterns
Productive Patterns ✅
- Semantic Function Refactoring cost optimization: $2.36 → $1.72 → $1.35 over 3 days — Claude claude-sonnet efficiency improving with context refinement
- Meta-orchestrator coordination: Campaign Manager + Workflow Health + Agent Performance sharing memory cleanly
- Copilot efficiency tier: Multiple Copilot agents completing in 3-4 minutes with <250k tokens
Problematic Patterns ⚠️
- Chroma Issue Indexer token explosion: 8.24M tokens (10× typical) — may be processing the entire issue index each run without caching
- AI Moderator stale failure: Day 4 of same OpenAI cybersecurity restriction — no automatic recovery, needs manual model switch
- Lockdown cascade: 4 workflows failing on same root cause (missing token) with no fix path — creates alert fatigue
Trends
| Metric | 2/27 | 3/1 | 3/2 | Trend |
|---|---|---|---|---|
| Agent Quality | 84/100 | 85/100 | 86/100 | ↑ improving |
| Agent Effectiveness | 85/100 | 85/100 | 84/100 | ↓ slight |
| Semantic Refactoring cost | $2.36 | $1.72 | $1.35 | ↓ ✅ |
| Lockfile Statistics cost | $1.36 | $1.53 | $1.61 | ↑ watch |
| AI Moderator failures | day 1 | day 3 | day 4 | ↑ worsening |
| Chroma blocked req (daily) | ~62 | ~70 | 144 | ↑ worsening |
Recommendations
High Priority
-
Investigate Chroma Issue Indexer token usage (8.24M tokens/run)
- Root cause: likely scanning all GitHub issues on each run without incremental indexing
- Recommendation: implement delta-only indexing, cache last-indexed state
- Expected improvement: 80%+ reduction in token usage
-
AI Moderator: confirm model migration underway (Day 4 failure)
- Issue was created yesterday — verify it has been assigned and triaged
- If no progress by 2026-03-04, escalate to maintainers
- Temporary workaround: switch to claude or copilot engine while OpenAI restriction resolves
-
Daily Issues Report: create tracking issue (119+ consecutive failures)
- Currently has no active issue tracking despite daily failures
- Lockdown root cause but needs visibility
Medium Priority
-
CLI Version Checker network allowlist — blocked go.dev/golang.org/proxy.golang.org
- Likely trying to download Go toolchain; should declare these in network.allowed
- Or refactor to not require Go package downloads at runtime
-
Lockfile Statistics Analysis Agent cost trend — $1.61 and rising
- Monitor for 3 more days; if exceeds $2.00/run, investigate optimization
Low Priority
- Metrics Collector recovery — data is stale (last successful run: 2026-01-18)
- Without working metrics, trend analysis relies on manual log pulls
- Follow up on the P2 issue created 2026-03-01
Actions Taken This Run
- ✅ Analyzed 25 workflow runs (2026-03-01 to 2026-03-02)
- ✅ Identified Chroma Issue Indexer token anomaly (8.24M tokens)
- ✅ Tracked AI Moderator day 4 failure progression
- ✅ Generated this performance report
- ✅ Updated shared memory (
agent-performance-latest.md,shared-alerts.md) - ℹ️ No new improvement issues created (AI Moderator issue already exists from 3/1)
Next Steps
- Verify AI Moderator issue is actively triaged — escalate if no progress by 3/4
- Investigate Chroma Issue Indexer 8.24M token usage
- Create tracking issue for Daily Issues Report lockdown failures
- Monitor Lockfile Statistics cost trend over next 3 days
- Follow up on Metrics Collector P2 fix
Analysis period: 2026-03-01 to 2026-03-02
Next report: 2026-03-03
Run: §22587812861
References:
- §22587511713 — The Great Escapi (top performer)
- §22584737911 — Chroma Issue Indexer (8.24M tokens)
- §22587119823 — Semantic Function Refactoring (improving)
Warning
This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.
Generated by Agent Performance Analyzer - Meta-Orchestrator
- expires on Mar 3, 2026, 5:42 PM UTC