Experiment: GitHub Actions Agent Runtime MVP β
Hypothesis β
A fully autonomous SDLC pipeline β from issue triage to merge β can be implemented using only GitHub Actions workflows and AI agent CLIs, with the repository itself as the coordinator (no orchestrator agent). The review/fix loop should converge within a practical number of iterations.
Background β
Fullsend's core claim is that the repo is the coordinator. This experiment tests whether GitHub Actions' native event system (issue events, PR events, review events) can drive a multi-agent pipeline without a central orchestrator. Each agent runs in its own workflow, triggered by the previous agent's side effects.
The key question is whether the event-driven model provides enough coordination to handle the full lifecycle β including the review/fix loop where agents iterate autonomously until the reviewer approves.
Test repo: nonflux/integration-service (public playground fork)
Setup β
Pipeline Architecture β
TL;DR β Pipeline in Action β

GIF preview β view the full-quality video (webm) for a clearer look.
Every agent input passes through Google Cloud Model Armor scan before processing. Prompt injection detected β blocked + human notified.
The Four Agents β
| Agent | Trigger | Model | Tools | Token | Purpose |
|---|---|---|---|---|---|
| Triage | issues.opened | gemini-3-flash-preview | gh, read_file (restricted) | AGENT_TOKEN (PAT) | Classify issues, add labels, determine readiness |
| Implementation | issues.labeled (ready-for-implementation) | gemini-3-pro-preview | Unrestricted shell (git, gh, make, go) | AGENT_TOKEN (PAT) | Create branch, write code, run tests, open PR |
| Review | pull_request.opened/synchronize | Pluggable (see below) | read_file, git only β no gh | Reviewer App token (post step only) | Review code quality and security |
| Fix | pull_request_review.submitted | gemini-3-pro-preview | Unrestricted shell (git, gh, make, go) | Agent App token | Read feedback, fix code, push |
Pluggable Reviewer Architecture β
The review agent slot is designed to be filled by any code review tool that can post a GitHub pull_request_review with APPROVE or CHANGES_REQUESTED. The fix agent triggers on the review event regardless of which reviewer posted it.
| Reviewer | Type | Identity | Status |
|---|---|---|---|
| Gemini CLI (built-in) | Gemini via run-gemini-cli action | fullsend-reviewer[bot] (GitHub App) | Verified on PR #7 |
| cicaddy-action | Reusable GitHub Action | fullsend-reviewer[bot] (GitHub App) | Verified on PR #1 |
| Qodo (formerly Codium) | Third-party SaaS | qodo-merge-pro[bot] | Planned |
| CodeRabbit | Third-party SaaS | coderabbitai[bot] | Planned |
For the fix agent to trigger from a reviewer, the fix-agent.yml if: condition must include the reviewer's bot login:
github.event.review.user.login == 'fullsend-reviewer[bot]'
# To add more reviewers:
# || github.event.review.user.login == 'qodo-merge-pro[bot]'
# || github.event.review.user.login == 'coderabbitai[bot]'Third-party reviewers (Qodo, CodeRabbit) don't need identity isolation setup β they already have their own bot identity. The main integration point is the fix agent's trigger filter.
Identity Model β
| Component | GitHub Identity | Token Source | Purpose |
|---|---|---|---|
| Triage Agent | fullsend-agent[bot] or PAT | AGENT_TOKEN (PAT) | Label issues, post triage comment |
| Implementation Agent | PAT user identity | AGENT_TOKEN (PAT) | Push branches, open PRs (must trigger review agent) |
| Review Agent | fullsend-reviewer[bot] | GitHub App (REVIEWER_APP_ID) | Post reviews, trigger fix agent |
| Fix Agent | fullsend-agent[bot] | GitHub App (AGENT_APP_ID) | Push commits, post status |
| Cancel stale fixes | github-actions[bot] | GITHUB_TOKEN | Cancel workflows (no event trigger needed) |
| Injection scanner | Service account | GCP_SA_KEY (static key β see 015) | Scan for prompt injection |
Why PAT for Implementation Agent? GITHUB_TOKEN events don't trigger other workflows. A PAT from a different identity breaks this restriction so the Implementation Agent's PR triggers the Review Agent.
Why separate GitHub App for Reviewer? GitHub prevents the same token from authoring AND approving PRs. A separate reviewer identity solves the self-approval problem.
Review Agent Flow β
Fix Agent Flow β
Concurrency Model β
Review Agent group: fullsend-review-pr-{N}
cancel-in-progress: true
β New push cancels stale review, latest review wins
Fix Agent group: fullsend-fix-pr-{N}
cancel-in-progress: true
β New fix cancels stale fix of same trigger type
Cross-workflow:
Review Agent step 1 β Cancel stale fix agents via API
(prevents stale fixes from pushing after new code lands)Security Layers β
- Model Armor β Scans PR body, diff, review comments, and issue content for prompt injection at every agent entry point
- Fail-closed scanning β
UNKNOWNor error from Model Armor β block (not pass) - Token isolation (Review Agent) β Review agent's LLM has no
ghtool and no write-capable token; write actions only in deterministic post step - Unique file paths β
${RUN_ID}-$(openssl rand -hex 8)prevents artifact hijack between steps - CODEOWNERS β Agent workflow files, scripts, agent config (
GEMINI.md,.gemini/), and CODEOWNERS itself requirefullsend-sigteam approval - Branch protection β
require_code_owner_reviews: trueenforces CODEOWNERS approval onmain;dismiss_stale_reviews: trueinvalidates prior approvals after new pushes - Fork PR blocking β Fix agent refuses to run on fork PRs
- Iteration cap β Hard limit at 45 review-fix cycles, escalates to human
- Identity separation β Reviewer and fixer are different GitHub Apps
- Least-privilege tools β Each agent's tool allowlist is restricted to what it needs
Branch Protection (Recommended Target Repo Config) β
The test repo (nonflux/integration-service) uses these branch protection settings on main:
| Setting | Value | Why |
|---|---|---|
| Require pull request reviews | Yes | Agents cannot push directly to main |
| Required approving review count | 1 | At least one approval before merge |
| Require code owner reviews | Yes | Enforces CODEOWNERS rules β agent workflow/config changes need team approval |
| Dismiss stale reviews | Yes | New push (e.g., fix agent commit) invalidates prior approval, forcing re-review |
| Require last push approval | No | Not needed β the review agent re-reviews on every synchronize event |
CODEOWNERS protects agent infrastructure from self-modification:
# Agent workflows β listed individually, not as a glob
/.github/workflows/triage-agent.yml @nonflux/fullsend-sig
/.github/workflows/implementation-agent.yml @nonflux/fullsend-sig
/.github/workflows/review-agent.yml @nonflux/fullsend-sig
/.github/workflows/fix-agent.yml @nonflux/fullsend-sig
# Agent config and scripts
/GEMINI.md @nonflux/fullsend-sig
/.gemini/ @nonflux/fullsend-sig
/.github/scripts/ @nonflux/fullsend-sig
# CODEOWNERS itself β prevents agents from weakening review gates
/.github/CODEOWNERS @nonflux/fullsend-sigThe combination of CODEOWNERS + require_code_owner_reviews means agents can push workflow changes during the review/fix loop, but those changes cannot be merged without human team approval.
In-Place Comment Updates β
The review and fix agents each use a single PR comment with an HTML marker (<!-- fullsend:review-agent --> / <!-- fullsend:fix-agent -->), editing it in-place on each new cycle. Previous content is collapsed into a <details> block. GitHub tracks the edit history, keeping the PR timeline clean.
Context Passing Between Steps β
Running β
Demo 1: First-Pass Approval β Issue #15 β PR #16 β
Created Issue #15 (gosec G708 warnings fix). The pipeline processed it end-to-end:
- Triage agent (~1 min) β classified as
bug, added labels includingready-for-implementation - Implementation agent (~18 min) β created branch, wrote fix, opened PR #16
- Review agent (~5 min) β APPROVED on first pass
Total: ~24 min, zero human intervention.
Demo 2: Review/Fix Loop β Issue #19 β PR #20 β
Created Issue #19 (finalizer removal bug). This time the review agent found issues:
| Stage | Agent | Duration | Result |
|---|---|---|---|
| Triage | Triage Agent | ~1 min | Labels: bug, kind/bug, area/controller, area/gitops, priority/high, ready-for-implementation |
| Implementation | Implementation Agent | ~18 min | Created PR #20 |
| Review (1st) | Review Agent | ~7 min | CHANGES_REQUESTED |
| Fix | Fix Agent | ~30 min | Commit bb94e01 β fix: address review feedback |
| Review (2nd) | Review Agent | ~4 min | APPROVED β "LGTM" |
Total: ~60 min, zero human intervention. The review/fix loop converged in one iteration.
Full demo video (58s, webm) β end-to-end pipeline from issue creation to approval.
Review/Fix Loop in PR Timeline β

The PR timeline shows the full autonomous loop: review agent requested changes β fix agent committed fix: address review feedback β review agent approved on re-review.
Review/Fix Loop β Verified on PR #7 β
PR #7 verified the review/fix loop end-to-end with 3 successful bot commits during development of the pipeline itself. This was the primary validation PR where the concurrency model, identity isolation, and comment update patterns were iterated on.
Results β
What Works β
| Feature | Status | Notes |
|---|---|---|
| Full reviewβfixβpush loop | Working | 3 successful bot commits on PR #7 |
| Full issueβtriageβimplementβreviewβmerge pipeline | Working | Issue #15 β PR #16 (first-pass), Issue #19 β PR #20 (with fix loop) |
| In-place comment updates | Working | Single comment per agent, collapsed history |
| Deterministic comment posting | Working | Shell step, not LLM β eliminates duplicates |
| Separate bot identities | Working | fullsend-reviewer[bot] reviews, fullsend-agent[bot] pushes |
| Fix agent triggering from review | Working | pull_request_review event from app token |
/fix-agent human command | Partial | Trigger verified (workflow started, +1 reaction posted), but cancelled by incoming commit before full run completed |
workflow_dispatch fallback | Working | For when events are throttled |
| Model Armor integration | Working | Hard block on PR body, filter-and-continue on diff/comments |
| Iteration cap (45) | Working | Escalates to human with CODEOWNERS mention |
| Strategy escalation (>=5 cycles) | Working | Prompt tells agent to try new approach |
| Cancel stale fix agents | Working | Review agent cancels in-progress fixes on new push |
| Unique artifact file paths | Working | Run ID + random hex prevents hijack |
| Token isolation | Working | LLM gets read-only token, write-only in post step |
| Pluggable reviewer slot | Working | Any tool that posts a GitHub review can fill the slot |
Timing Data (from PR #7) β
Review Agent (11 successful runs) β
| Run | Duration | Notes |
|---|---|---|
| 23645733079 | ~5 min | Early, smaller diff |
| 23646406199 | ~3.5 min | |
| 23646546863 | ~4.5 min | |
| 23646792809 | ~5 min | |
| 23648438805 | ~6.5 min | |
| 23649109721 | ~4.5 min | |
| 23649767441 | ~5 min | |
| 23650887715 | ~10 min | Larger diff after workflow changes |
| 23652775666 | ~8.5 min | |
| 23654379038 | ~4.5 min | |
| 23657972959 | ~5 min | |
| Mean | ~5.5 min |
Fix Agent (3 successful runs) β
| Run | Duration | Notes |
|---|---|---|
| 23608344342 | ~7 min | Small fix |
| 23627557889 | ~21 min | Larger fix |
| 23644666720 | ~17 min | After workflow refactoring |
| Mean | ~15 min | Range: 7-21 min |
Full Loop Cycle (review + fix + push) β
- Best case: ~12 min (fast review + small fix)
- Typical: ~20-25 min
- Worst case: ~35 min (long review + complex fix)
Fix Agent Telemetry (from gemini-output artifacts) β
| Metric | Run 23644666720 (~17 min) | Run 23627557889 (~21 min) |
|---|---|---|
| LLM API time (gemini-3.1-pro) | 283.9s (~4.7 min, 29%) | 341.8s (~5.7 min, 29%) |
| Tool execution time (shell) | 689.5s (~11.5 min, 71%) | 832.9s (~13.9 min, 70%) |
| Loop detector (gemini-3-flash) | 4.3s (<1%) | 12.2s (1%) |
| API requests | 44 (0 errors) | 62 (0 errors) |
| Tool calls | 48 (47 shell, 1 read_file) | 61 (60 shell, 1 read_file) |
| Total tokens | 1.49M (89% cached) | 2.38M (91% cached) |
Key findings:
- ~70% of fix agent time is shell command execution (make test, make lint, git, gh api), not LLM inference
- LLM thinking is only ~30% of total time (~5-6 min per run)
- Token caching is highly effective (89-91% hit rate) β Gemini caches prior context across multi-turn
- The
read_filetool is barely used (1 call per run) β agent prefersrun_shell_commandfor reading files - The loop detector (flash model) adds negligible overhead
- Optimization target is tool execution, not model speed β faster test/lint or parallelized commands would have the most impact
What Failed and Was Reverted β
| Approach | Why It Failed |
|---|---|
| Shared concurrency group for review+fix | Review and fix cancel each other. Max queue depth 1. |
cancel-in-progress per workflow in shared group | GitHub evaluates incoming workflow's setting, not existing. Asymmetric behavior. |
| LLM posts comments via prompt instructions | LLM unreliably follows complex shell sequences. Creates duplicates. |
gh auth login in post step with GH_TOKEN env | Conflicts β gh complains when both are set. GH_TOKEN env var is sufficient. |
GITHUB_TOKEN passed to LLM for read-only | GITHUB_TOKEN has pull-requests: write from workflow permissions. Agent can still post as wrong bot. |
workflows: write in YAML permissions block | Causes parse error with workflow_dispatch trigger. App token handles this at the GitHub App settings level. |
| Trusted scripts from base branch checkout | Scripts don't exist on base branch (main) yet β only on PR branch. Breaks the workflow. |
Review content in gh pr review --body | Creates visible review with full content + separate issue comment = duplicate noise. |
Known Issues β
| # | Issue | Status |
|---|---|---|
| 001 | Fix agent cycle time slow (7-21 min per iteration) β 70% is shell execution | Open |
| 002 | Fix agent workflow push permission vs guardrails | Resolved |
| 003 | GitHub throttles pull_request_review events after ~20 from same bot on one PR | Open |
| 004 | Concurrent bot + human fixes share concurrency group, causing cancellation | Open |
| 005 | Review verdict entries (--request-changes) accumulate in PR timeline | Open |
| 006 | Prompt injection scan returns match state only, not flagged snippet | Open |
| 007 | Workflows hardcoded per repo β need reusable workflows for multi-repo | Open |
| 009 | Triage agent only triggers on issues.opened β no retry on failure | Open |
| 010 | Fix agent modifying workflow files can break agent pipelines | Open |
| 011 | run-gemini-cli output parsing failure with special characters (upstream) | Open |
| 012 | Prompt injection scan truncates diff at 1000 lines β bypass possible | Open |
| 013 | GITHUB_OUTPUT multi-line value injection via echo "instruction=$X" | Open |
| 014 | Triage agent multi-label race causes implementation agent concurrency cancellation | Open |
| 015 | GCP auth uses static SA key β replace with Workload Identity Federation | Open |
| 016 | Agent telemetry trapped in artifacts β export to GCP Cloud Monitoring | Open |
Security Notes β
- All agent inputs are scanned by Google Cloud Model Armor for prompt injection before processing
- Scanner operates in fail-closed mode (unknown results β block)
- The review agent's LLM never has write access to GitHub β all mutations happen in deterministic shell steps
- Each workflow uses unique temp file paths (
${RUN_ID}-$(openssl rand -hex 8)) to prevent artifact hijack between steps - The fix agent has an iteration cap (45 cycles) to prevent infinite loops
- Fork PRs are blocked from triggering the fix agent
- Identity separation: reviewer and fixer are different GitHub Apps β prevents self-approval
- See
experiments/model-armor-vs-claude-triage/for prompt injection scanner effectiveness testing - See
experiments/prompt-injection-defense/for AI agent inherent injection resistance testing
