27-agent-driven-development

Agent-Driven Development — The Autonomous QA Loop

How FlowPilot dispatches QA assignments to OpenClaw via A2A, ClawOne inspects via MCP, reports back, and human triage turns findings into permanent source fixes.

Automated Testing Is Old — Agentic Quality Loops Are New

Automated testing is not new. CI pipelines, unit tests, integration tests, and regression suites have existed for decades.

What changes in agentic systems is not the existence of tests — it is the shape of the loop:

Traditional Automated TestingAgent-Driven Development
Human writes assertionsAgent discovers issues in live system context
CI run checks pass/failExternal agent audits behavior, memory, and drift
Report createdStructured findings become objectives automatically
Human fixes instance issueTriage decides runtime fix vs source fix
Next release may regressSource fixes raise baseline for future installs

The key distinction is closed-loop remediation. Findings do not end as reports. They become actions. Actions become permanent improvements. The system gets better cycle by cycle.


The Architecture: Three Channels, One Loop

Flowwink exposes a three-channel architecture that lets an external agent act as a persistent development partner.

Read this chapter with one framing in mind: this is not a feature-level QA integration. It is an operating model. The system can run autonomously, but legitimacy still comes from explicit boundaries and human judgment where intent is ambiguous.

Flowwink exposes a three-channel architecture that lets an external agent act as a persistent development partner:

┌──────────────────────────────────────────────────────────────┐
│                    THE AUTONOMOUS QA LOOP                    │
│                                                              │
│  1. FlowPilot dispatches assignment to ClawOne via A2A       │
│     └── Includes: scenario, scope, MCP credentials          │
│                    │                                         │
│                    ▼                                         │
│  2. ClawOne connects to Flowwink MCP                         │
│     └── Reads resources: health, skills, activity, identity  │
│     └── Inspects pages, products, leads via MCP tools        │
│                    │                                         │
│                    ▼                                         │
│  3. ClawOne reports structured findings via MCP              │
│     └── openclaw_report_finding (type, severity, context)    │
│                    │                                         │
│                    ▼                                         │
│  4. FlowPilot picks up findings in next heartbeat            │
│     └── High/critical → auto-create objective                │
│                    │                                         │
│                    ▼                                         │
│  5. Human triage: dismiss / runtime fix / source fix         │
└──────────────────────────────────────────────────────────────┘

The key insight: FlowPilot is not a passive receiver. It is the dispatcher. It initiates the audit by sending ClawOne an A2A assignment — complete with the MCP credentials and scope needed to perform the inspection. ClawOne executes autonomously, then reports back through the same infrastructure. FlowPilot processes the results.

No human initiates the loop. No human routes the findings. The loop runs itself.


The Three Channels

ChannelTransportRole in the Loop
A2AJSON-RPC 2.0FlowPilot dispatches assignment to ClawOne — includes MCP credentials and audit scope
MCPStreamable HTTPClawOne inspects platform data and reports findings directly
OpenResponsesOpenAI Responses APIBounded, schema-validated tasks when deterministic output is required

Why A2A for Dispatch?

The dispatch step matters. FlowPilot doesn’t wait for ClawOne to connect — it actively sends the assignment. This means:

  • Credential isolation — MCP credentials are scoped per-assignment, not stored in ClawOne
  • Scope control — FlowPilot defines what ClawOne can inspect in this cycle
  • Audit trail — every dispatch is logged with timestamp, scope, and peer

Why MCP for Inspection?

MCP gives ClawOne direct access to platform data without going through FlowPilot as an intermediary:

ResourceWhat ClawOne reads
flowwink://healthSite statistics, active objectives, system status
flowwink://skillsFull skill registry with descriptions and metadata
flowwink://activityRecent FlowPilot actions and heartbeat history
flowwink://identityFlowPilot’s soul and current configuration

Tools (40+ and growing): Pages, blog posts, leads, products, bookings, orders, settings — ClawOne can read and, where permitted, write directly.

The openclaw_report_finding MCP tool is the final step of the inspection:

{
  "title": "Missing meta description on /pricing",
  "type": "seo",
  "severity": "high",
  "description": "Page has no meta description. Default placeholder from template is still in place.",
  "context": {
    "path": "/pricing",
    "current_value": null,
    "recommendation": "Add 120-140 char meta description targeting conversion intent"
  }
}

The Missing Piece: Human Triage

The automated loop handles detection and routing. But the real breakthrough comes when you add human triage between finding and fix:

Finding arrives


┌──────────────────────────────────────┐
│           Human Triage               │
│                                      │
│  False positive? ──► Dismiss         │
│  Runtime issue?  ──► Agent objective │
│  Source issue?   ──► Developer fix   │
└──────────────────────────────────────┘

Why Triage Matters

Without triage, the loop generates noise. An agent will flag draft content as “missing metadata” and placeholder products as “incomplete” — because from the agent’s perspective, they look identical to real gaps.

A human who understands intent can classify a finding in seconds. The agent cannot.

The three outcomes:

OutcomeWhenExample
DismissIntentional state or false positiveDraft blog posts flagged as incomplete
Runtime fixReal issue, one instanceUpdate meta description on /pricing
Source fixTemplate or seed data gapAdd meta description default to template TypeScript

Fix-at-Source: The Quality Ratchet

The most significant outcome of triage is identifying source-level fixes — changes that go into templates, seed data, or default configurations rather than one-off runtime patches.

Fix typeScopePersistenceExample
Runtime fixOne siteUntil resetUpdate a page’s meta description
Source fixAll future sitesPermanentFix template’s meta description default

Source fixes compound. Every audit cycle that produces a source fix raises the baseline for every future installation. After several cycles, your templates ship with SEO metadata, proper content structure, and validated configurations — because real audits found real gaps.

A Real Cycle

An audit of Flowwink’s starter templates produced these findings:

FindingCountActionOutcome
Pages missing meta descriptions16Source fixAdded 80–140 char descriptions to template TypeScript
Blog titles exceeding 60 characters18Source fixShortened in template source
Draft blog posts flagged as incomplete4DismissedIntentional placeholders
Missing product images3DismissedComparison table entries, not real products
Missing header/footer config1DismissedFalse positive — config exists at site level

Result: 2 source fixes, 3 dismissals, 0 runtime patches. Every future Flowwink installation now ships with proper SEO metadata by default.


The Three Layers

Layer 1: AUTONOMOUS DETECTION
  FlowPilot dispatches → ClawOne inspects via MCP → Findings reported

Layer 2: HUMAN TRIAGE
  Developer classifies: false positive / runtime fix / source fix

Layer 3: PERMANENT REMEDIATION
  Source fixes go into templates → baseline rises → every new installation benefits

This is not traditional QA with extra steps. It is a fundamentally different model:

Traditional QAAgent-Driven Development
Periodic, manualContinuous, autonomous
Human initiatesAgent initiates, human triages
ReactiveProactive
Fixes one instanceSource fixes benefit all future instances
Reports → Jira → sprintReports → triage → permanent fix
Knowledge lives in ticketsKnowledge lives in templates and code
Resets with each releaseCompounds across releases

Why This Is More Than Test Automation

A fair question is: “isn’t this just automated testing with new branding?”

No. The difference is architectural:

Classic Test AutomationAgentic Quality Loop
Static assertions against expected outputAdaptive inspection of live system behavior
Test runner invokes systemOne agent dispatches another agent
Failures create reports/ticketsFindings become objectives and remediation workflows
Focus on instance correctnessFocus on instance + template/source correctness
Little memory across runsPersistent peer memory across audit cycles

This is why the loop feels qualitatively different in practice. It does not just verify behavior. It evolves behavior.

What This Means for the Control Plane

This loop is a concrete demonstration of what makes the control plane valuable. The LLM isn’t the product. The orchestration around it — the dispatch, the credentials, the reporting contract, the triage step, the source fix workflow — that is the product.

Five things that make this work, none of which involve model quality:

  1. FlowPilot dispatches via A2A — the loop starts autonomously, no human trigger
  2. MCP credentials are scoped per-assignment — security without friction
  3. openclaw_report_finding is a structured contract — findings are machine-readable, not prose
  4. Human triage is part of the design — not an afterthought
  5. Source fix workflow closes the loop — findings produce permanent improvements

Remove any one of these and the loop degrades to a traditional audit report that sits in a log.


Strategic Implications

  1. Leverage asymmetry — ClawOne’s capabilities improve independently; Flowwink benefits automatically from every OpenClaw ecosystem improvement
  2. Quality ratchet — each source fix is permanent; the baseline only moves forward
  3. Development velocity — agents discover issues no test checklist would catch
  4. Compounding returns — after N audit cycles, templates approach zero-defect baseline
  5. Institutional knowledge — dismissed findings document architectural decisions that would otherwise be invisible

Configuration

For OpenClaw Operators

  1. Create a Flowwink API key in Developer → MCP Keys
  2. FlowPilot dispatches A2A assignments that include the MCP endpoint and scoped credentials
  3. ClawOne connects on assignment receipt — no manual configuration per audit cycle

For Flowwink Administrators

  1. Control which skills are exposed via the Shield toggle in Skills management
  2. Monitor peer activity in Federation → Activity Log
  3. Review incoming findings in FlowPilot → QA Findings
  4. Triage: dismiss, create runtime objective, or escalate to developer for source fix

The Pattern Beyond Flowwink

Any platform can implement this loop. The required primitives:

  1. Expose a MCP server — tools and resources give agents direct data access
  2. Implement a reporting contract — structured findings with type and severity
  3. Build a triage interface — dismiss / runtime / source classification
  4. Connect findings to objectives — auto-create for high/critical findings
  5. Establish a source fix workflow — escalation path for template and seed data fixes

The three-channel architecture (A2A dispatch + MCP inspection + structured reporting) is general. Flowwink is one implementation. The pattern works for any platform where you want an external agent as a persistent quality partner that improves the product — not just the instance.


Next: deterministic, schema-bound QA via OpenResponses — the bounded audit lane. Federation in Practice →

Community — Under Development

This is your handbook

Agentic AI is evolving fast. The patterns, the laws, the architecture — they need to stay current with the community's collective knowledge.

If you have thoughts on autonomous agents, or if you want to contribute to the work around AI-operated CMS, CRM, and ERP systems — whether it's a production story, a pattern you've discovered, or an idea you want to explore — I'd love to hear from you.

Connect on GitHub