AIShell-Gate 1.0 Beta — Tester Guide and Test Plan

!! BETA RELEASE — TEST IN ISOLATION !! Run all tests on a clean, isolated Linux system. Do not run live execution tests (without --dry-run) on any system you care about until you are confident in the policy configuration. When in doubt, add --dry-run — it runs the full evaluation and confirmation flow without executing anything.

Confidentiality Notice
This document and all materials delivered with it — including the AIShell-Gate binary executables, the aishell-confirm.sh script, and all accompanying documentation — are the confidential and proprietary property of AIShell Labs LLC, Copyright © 2026 AIShell Labs LLC, Winston-Salem NC USA. All Rights Reserved.

These materials are provided solely for the purpose of beta evaluation as described in this document. By accepting delivery of these materials, the recipient agrees: (1) to hold them in strict confidence and not to disclose, reproduce, or distribute them to any third party; (2) to use them solely for beta evaluation on an isolated, non-production system; (3) to return or destroy all copies upon request by AIShell Labs; and (4) that no licence or other right in the software or documentation is granted except as explicitly stated herein.

Questions regarding permitted use: info@aishell.org

00 Welcome and What We Need

Thank you for taking the time to test AIShell-Gate. You are in a small group of the first people outside AIShell Labs to run this system, and your feedback will directly shape the 1.0 release.

This guide does two things at once. It introduces you to the system progressively — from the simplest possible command to a full AI pipeline — and it collects structured feedback at each stage so we know exactly what worked, what confused you, and what broke.

Please work through the stages in order. Each stage builds on the previous one. You do not need to complete every stage — stop wherever you lose confidence or run into a problem, and report what happened. A tester who gets to Stage 4 and hits an error is giving us something more valuable than one who skips to Stage 9.

Pick a path before you start

This guide covers the full system across twelve stages, which is more than most testers will complete in a single session. Before your first session, pick one of the three paths below. Your coordinator may have already assigned you a path; if not, pick the one that matches the time you have and the depth you want.

Path	Stages	Time	What you learn
A — Interactive	0, 1, 10	~45 min	Interactive mode, both halves. Stage 1 is the policy engine — you type commands, it explains flag reasoning and risk scoring but never executes. Stage 10 is the executor's interactive mode — same typed-command feel, now commands actually run, under full policy enforcement and audit. A clean two-step arc: learn the decision model, then see it enforced.
B — Through the executor	0–6	~90 min	Policy engine basics through to real JSON plan execution with audit logging. Adds preset comparison, plans with denials, and best-effort/fail-fast strategy. The path most AI-pipeline testers should take.
C — Full system	0–11	3+ hours	Everything. Adds custom policy files, jail root containment, large multi-action plans, and the Enterprise features (HMAC audit chain, `--audit-verify`, `--dump-policy`).

Depth beats breadth We would rather have a thorough Path A report than a rushed Path C one. Observation quality comes before coverage speed. If you want to return for another session later and do Path B or C, we welcome that.

About the commands AIShell-Gate requires a fair number of flags and a JSON plan can be daunting to type from scratch. Every command in this guide is ready to copy and paste exactly as written. The COPY label in the top-right of each code block is a reminder that the block is complete and self-contained.

What we are trying to learn from you

Five questions drive the beta programme. Keeping these in mind as you work through the stages will help you notice the kind of feedback that is most valuable to us:

Does the documentation work? Can a practitioner who has never seen AIShell-Gate install it and operate it usefully from only the shipped documentation? Where you struggle — why?
Is the policy model correct? Do the risk scores match your assessment of actual command danger? Are confirmation level thresholds calibrated correctly for your environment?
Does the confirmation machinery work? Do all four levels (none, plan, action, typed) behave as documented? Is the typed challenge a meaningful gate or trivial to bypass without reading the command?
Are there failure modes we have not anticipated? Edge cases in command parsing, policy matching, audit chain handling, or the FIFO relay that only surface in a real environment.
What is your overall practitioner impression? Would a sysadmin in your position find the system credible and useful as a production tool?

To report feedback: email info@aishell.org with the subject line Beta Feedback — [your name]. You can paste your notes from the feedback boxes in this document directly into the email. Even a brief "Stage 3 worked, Stage 4 crashed with this error" is useful.

To report a security issue: email security@aishell.org privately. Do not post security findings in a public channel.

Test environment: All tests should be run on Linux x86_64 or arm64. macOS and FreeBSD are not supported in this beta — pipe2() is Linux-only and the binary will fail at startup on other platforms. A fresh VM or container is ideal.

01 Your Background (Quick Survey)

Before you start, please note your answers to the following — include them at the top of your feedback email. This helps us interpret your results correctly.

Background Survey — copy into your feedback email

Name / handle (or anonymous):
Years of Unix/Linux experience:
Primary role (sysadmin / DevOps / developer / security / researcher / other):
Are you currently using AI tools to generate shell commands? (yes / no / evaluating):
What use case are you evaluating AIShell-Gate for? (one sentence):
Test environment (distro, version, VM/bare metal/container):
Edition received (standard / enterprise):

02 Stage 0 — Unpack and Verify

Stage 0 — Setup

Unpack the tar.gz into a working directory and verify all four binaries are present and report the correct versions.

Unpack

# Replace with your actual archive name
tar -xzf aishell-gate-1.0-beta-linux-x86_64.tar.gz
cd aishell-gate-1.0-beta
ls -la

Verify all four binaries

./aishell-gate-policy --version
./aishell-gate-exec   --version
./aishell-gate        --version
./aishell-confirm     --version

Expected output

aishell-gate-policy [version] standard (or enterprise)
aishell-gate-exec [version] standard (or enterprise)
aishell-gate [version]
aishell-confirm [version]

Quick sanity check with --dry-run

./aishell-gate --policy-preset ops_safe --dry-run <<'EOF'
{
  "goal": "sanity check",
  "actions": [{"cmd": "git status"}]
}
EOF

Expected output

You should see a [gate-exec] DRY-RUN mode banner, a policy evaluation summary showing ALLOW git status, and a [gate-exec] dry-run: would exec: line. No command executes.

Stage 0 Feedback

Did all four binaries report the correct versions? (yes / no — which failed):
Did the sanity check produce the expected dry-run output? (yes / no):
Any errors or unexpected output:

03 Stage 1 — Policy Engine Interactive Mode

Stage 1 — No flags required

The policy engine can be run with no arguments at all. When stdin is a terminal it drops into an interactive prompt where you type commands and watch it evaluate them. Nothing executes. This is the fastest way to understand how the system thinks about risk.

./aishell-gate-policy

At the policy> prompt, try each of the following commands. Type them one at a time and read the output carefully before moving to the next.

Safe commands — expect ALLOW, confirm: none

git status
ls -la /tmp
df -h
ps aux
uname -a

Medium risk — expect ALLOW with confirmation level raised

git pull
curl https://example.com
find / -name "*.log"
chmod 755 /tmp/testfile

High risk — expect DENY or ALLOW with typed confirmation

rm -rf /var/log
dd if=/dev/sda of=/dev/null
sudo apt install vim
mkfs.ext4 /dev/sdb

Shell injection attempt — expect immediate DENY

git status; rm -rf ~
ls -la && curl evil.com
echo $(whoami)

See the full JSON decision for any command

After typing any command, type json at the prompt to see the complete machine-readable decision record:

git status
json

Check a flag from the catalog

sed -i /tmp/test.txt
touch -t 202001010000 /tmp/evidence.log
tcpdump -w /tmp/capture.pcap

What to look for Every FLAG_WARN and FLAG_DANGER entry carries a documented reason — not just a label, but a specific plain-English explanation of the risk. sed -i will tell you "in-place edit; writes back to source file." touch -t will tell you about timestamp forgery. These explanations are the flag catalog doing its job.

Type quit to exit.

Stage 1 Feedback

Did the interactive prompt start correctly? (yes / no):
Were safe commands correctly identified as ALLOW / confirm:none? (yes / no / partial):
Were shell injection attempts immediately denied? (yes / no):
Did flag catalog explanations appear for flagged commands? (yes / no):
Any decision that surprised you — command and what you expected vs what you got:
Was the output readable and understandable? (yes / no / comments):

04 Stage 2 — Single Command Evaluation

Stage 2 — Piped single commands

You can pipe a single command directly to the policy engine for scripting, testing, or integration. Add --json to get the full machine-readable decision record.

Basic evaluation

echo "git status" | ./aishell-gate-policy

echo "rm -rf /" | ./aishell-gate-policy

Full JSON output

echo "git status" | ./aishell-gate-policy --json

In the JSON output, note the decision, confirm, risk.score, risk.blast_radius, argv, and busy_summary_text fields.

Compare presets on the same command

echo "python3 -m pytest" | ./aishell-gate-policy --policy-preset read_only --json

echo "python3 -m pytest" | ./aishell-gate-policy --policy-preset dev_sandbox --json

Expected output

read_only: DENY — interpreters are blocked.
dev_sandbox: ALLOW — developer workflow preset permits Python.

Network target detection — loopback AI inference port

echo "curl localhost:11434/api/generate" | ./aishell-gate-policy --policy-preset dev_sandbox --json

Expected output

ALLOW but with an elevated confirmation level and a loopback/AI-inference warning in the output. Port 11434 is the Ollama default — the engine recognises it as a local AI inference endpoint and flags it for operator attention.

Stage 2 Feedback

Did JSON output appear correctly? (yes / no):
Did the preset comparison produce different results for python3? (yes / no):
Did the Ollama port trigger a warning? (yes / no / what did you see):
Any commands that produced unexpected decisions:

05 Stage 3 — Exploring Presets

Stage 3 — Preset comparison

All seven built-in presets can be explored interactively. Each opens a policy session with a different posture. Try running the same commands across multiple presets to see how the risk profile changes.

Read-only — inspection only

./aishell-gate-policy --policy-preset read_only

Try: ls -la, cat /etc/hostname, git status, git pull, make, rm /tmp/test

Developer sandbox

./aishell-gate-policy --policy-preset dev_sandbox

Try: git pull, npm install, pip install requests, make clean, sudo apt install, dd if=/dev/zero of=test bs=1M count=1

CI build — unattended pipeline

./aishell-gate-policy --policy-preset ci_build

Try: make, npm test, cargo build, git status, sudo, curl https://example.com

Danger zone — minimal restrictions

./aishell-gate-policy --policy-preset danger_zone

Note: most commands ALLOW but confirmation levels will be high. Try: rm /tmp/test, chmod 777 /tmp, dd if=/dev/zero of=/tmp/test

Dump the effective policy stack (Enterprise)

./aishell-gate-policy --policy-preset dev_sandbox --dump-policy

Enterprise only. Standard edition will emit a clear "not available in standard edition" message. If you have the enterprise binary, this prints your operator overlay layers (base, project, user) as JSON — useful for verifying that your custom rules are active and in the correct layer. The built-in standard_policy layer and command catalog are not included in the output.

Stage 3 Feedback

Did the presets produce noticeably different behaviour? (yes / no):
Any preset whose behaviour surprised you:
Enterprise: did --dump-policy produce JSON output? (yes / no / N/A):
Standard: did --dump-policy produce a clear edition message? (yes / no / N/A):

06 Stage 4 — First Executor Plan

Stage 4 — Always use --dry-run first

The executor reads a JSON action plan and submits each command to the policy engine in sequence. We always use --dry-run first — the full evaluation runs, confirmation gates fire, the audit log records a DRY_RUN_SUPPRESSED event, but nothing executes.

Note that we use ./aishell-gate here — the recommended entry point. It handles --policy-binary automatically.

Minimal plan — two safe commands

./aishell-gate --policy-preset ops_safe --dry-run <<'EOF'
{
  "goal": "check repository state",
  "source": "ai",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "git log"}
  ]
}
EOF

Expected output

DRY-RUN banner. Two ALLOW decisions. Two [gate-exec] dry-run: would exec: lines. Exit code 0.

Plan from a file

mkdir -p ~/aishell-test

cat > ~/aishell-test/plan1.json <<'EOF'
{
  "goal": "inspect system state",
  "source": "ai",
  "strategy": "fail_fast",
  "actions": [
    {"cmd": "uname -a"},
    {"cmd": "df -h"},
    {"cmd": "ps aux"},
    {"cmd": "git status"}
  ]
}
EOF

./aishell-gate   --policy-preset ops_safe   --plan ~/aishell-test/plan1.json   --dry-run

Expected output

All four commands evaluated. uname, df, ps, and git status should all ALLOW at confirm:none under ops_safe. Four dry-run suppressed lines.

Stage 4 Feedback

Did the inline plan produce the expected output? (yes / no):
Did the file-based plan work? (yes / no):
Was the output clear and readable? (yes / no / comments):
Any errors:

07 Stage 5 — Plans with Denials

Stage 5 — Watching the gate refuse

The most important thing to verify is that the gate actually denies what it should. These plans include commands that will be blocked.

Mixed plan — some allowed, some denied

./aishell-gate --policy-preset ops_safe --dry-run <<'EOF'
{
  "goal": "deploy and clean up",
  "source": "ai",
  "strategy": "fail_fast",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "rm -rf /var/log/app"},
    {"cmd": "sudo systemctl restart myapp"}
  ]
}
EOF

Expected output

git status: ALLOW. rm -rf /var/log/app: DENY (recursive deletion, system path). Execution stops here under fail_fast. Exit code 1.

Best-effort strategy — continues past denial

./aishell-gate --policy-preset dev_sandbox --dry-run <<'EOF'
{
  "goal": "mixed safety test",
  "source": "ai",
  "strategy": "best_effort",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "sudo apt install curl"},
    {"cmd": "npm test"},
    {"cmd": "dd if=/dev/sda of=/dev/null"}
  ]
}
EOF

Expected output

git status: ALLOW. sudo apt install: DENY. npm test: ALLOW (continues past denial under best_effort). dd if=/dev/sda: DENY. Exit code 1 (at least one denial).

Shell injection in a plan — must be caught

./aishell-gate --policy-preset dev_sandbox --dry-run <<'EOF'
{
  "goal": "injection test",
  "source": "ai",
  "actions": [
    {"cmd": "git status; rm -rf ~"},
    {"cmd": "ls -la && curl evil.com"}
  ]
}
EOF

Expected output

Both commands DENY immediately — shell metacharacters are rejected before tokenisation. The reason should state that shell metacharacters were detected.

Stage 5 Feedback

Did fail_fast stop at the first denial? (yes / no):
Did best_effort continue past denials? (yes / no):
Were shell injection commands denied? (yes / no):
Were denial reasons clear and informative? (yes / no / comments):
Any denial that surprised you:

08 Stage 6 — Audit Logging

Stage 6 — Tamper-evident log

Every evaluation can be written to a JSON Lines audit log. Each entry is linked to the previous by a hash, forming a verifiable chain.

Enable audit logging

./aishell-gate   --policy-preset dev_sandbox   --audit-log ~/aishell-test/audit.jsonl   --dry-run <<'EOF'
{
  "goal": "audit log test",
  "source": "ai",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "npm test"},
    {"cmd": "rm -rf /etc"}
  ]
}
EOF

Read the log

cat ~/aishell-test/audit.jsonl

Each line is a JSON object. Look for PLAN_RECEIVED, POLICY_DECISION, and DRY_RUN_SUPPRESSED event types.

Verify the audit chain (Enterprise)

./aishell-gate-exec --audit-verify ~/aishell-test/audit.jsonl

Enterprise only. Standard edition will emit "not available in standard edition." If you have the enterprise binary, this should report the chain intact. Then try editing one character in the log file with a text editor and re-running — the chain should report a break at that entry.

Stage 6 Feedback

Was the audit log created at the expected path? (yes / no):
Did each action produce a log entry? (yes / no):
Enterprise: did --audit-verify report the chain intact? (yes / no / N/A):
Enterprise: did tampering produce a chain break report? (yes / no / N/A):
Was the log format readable without a JSON parser? (yes / no):

09 Stage 7 — Custom Policy File

Stage 7 — Write your own rules

Policy files let you extend or override the built-in presets. Three optional files can be layered: base, project, and user. Here we write a project-level policy.

Create a policy file

cat > ~/aishell-test/project_policy.json <<'EOF'
{
  "cmd_allow": [
    { "pattern": "git status",  "confirm": "none",   "reason": "safe read" },
    { "pattern": "git diff",    "confirm": "none",   "reason": "safe read" },
    { "pattern": "git pull",    "confirm": "plan",   "reason": "modifies repo" },
    { "pattern": "npm test",    "confirm": "plan",   "reason": "run tests" },
    { "pattern": "npm install", "confirm": "action", "reason": "installs packages" }
  ],
  "cmd_deny": [
    { "pattern": "curl", "reason": "no outbound network in this project" },
    { "pattern": "wget", "reason": "no outbound network in this project" }
  ],
  "writable_dirs": [ "/home/user/myproject", "/tmp" ]
}
EOF

Test an allowed command

echo "git status" | ./aishell-gate-policy   --policy-preset dev_sandbox   --policy-project ~/aishell-test/project_policy.json

Test an explicitly denied command

echo "curl https://example.com" | ./aishell-gate-policy   --policy-preset dev_sandbox   --policy-project ~/aishell-test/project_policy.json

Expected output

git status: ALLOW, confirm:none, matched your project rule.
curl: DENY, reason shows your project policy denial message.

Test a typo in the policy file — should fail closed

cat > ~/aishell-test/typo_policy.json <<'EOF'
{
  "cmd_denny": [
    { "pattern": "curl", "reason": "typo test" }
  ]
}
EOF

echo "git status" | ./aishell-gate-policy   --policy-preset dev_sandbox   --policy-project ~/aishell-test/typo_policy.json

Expected output

A hard error naming the unknown key cmd_denny. The engine fails closed on unknown policy keys rather than silently ignoring them. The command is NOT evaluated.

Generate a template to start from

./aishell-gate-policy --dump-standard-template   | sed '1,/^{$/{ /^{$/!d }'   > ~/aishell-test/template_base.json

wc -l ~/aishell-test/template_base.json

Stage 7 Feedback

Did the project policy correctly allow/deny the test commands? (yes / no):
Did the typo policy produce a named error rather than silently ignoring the key? (yes / no):
Did --dump-standard-template produce a usable template? (yes / no):
Was writing a policy file intuitive? (yes / no / comments):

10 Stage 8 — Jail Root Containment

Stage 8 — Directory confinement

The --jail-root flag tells the policy engine to enforce path containment. Any write-class command targeting a path outside the jail root is denied, regardless of policy rules.

mkdir -p /tmp/aishell-jail/project

Command inside the jail — should allow

echo "ls -la /tmp/aishell-jail/project" | ./aishell-gate-policy   --policy-preset dev_sandbox   --jail-root /tmp/aishell-jail

Command outside the jail — should deny

echo "ls -la /etc" | ./aishell-gate-policy   --policy-preset dev_sandbox   --jail-root /tmp/aishell-jail

Prefix attack — must NOT allow

mkdir -p /tmp/aishell-jailbreak/attack

echo "ls -la /tmp/aishell-jailbreak/attack" | ./aishell-gate-policy   --policy-preset dev_sandbox   --jail-root /tmp/aishell-jail

Expected output

/tmp/aishell-jailbreak/attack must be DENIED even though it shares the /tmp/aishell-jail prefix. The engine checks that the character after the prefix is / or end-of-string, not just a prefix match.

Plan confined to jail

./aishell-gate   --policy-preset dev_sandbox   --jail-root /tmp/aishell-jail   --dry-run <<'EOF'
{
  "goal": "work inside jail only",
  "source": "ai",
  "actions": [
    {"cmd": "ls -la /tmp/aishell-jail/project"},
    {"cmd": "ls -la /etc"},
    {"cmd": "ls -la /tmp/aishell-jailbreak"}
  ]
}
EOF

Stage 8 Feedback

Did paths inside the jail allow correctly? (yes / no):
Did paths outside the jail deny correctly? (yes / no):
Did the prefix attack path deny correctly? (yes / no — this is important):
Any unexpected behaviour:

11 Stage 9 — Large Plans

Stage 9 — Full pipeline simulation

These plans simulate realistic AI agent workloads — the kind of multi-step sequences an AI coding assistant or CI pipeline agent would generate.

Full CI build cycle simulation

cat > ~/aishell-test/ci_plan.json <<'EOF'
{
  "goal": "full CI cycle: pull, install dependencies, lint, test, report",
  "source": "ai",
  "strategy": "fail_fast",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "git pull"},
    {"cmd": "npm install"},
    {"cmd": "npm run lint"},
    {"cmd": "npm test"},
    {"cmd": "npm run build"},
    {"cmd": "git log"},
    {"cmd": "df -h"},
    {"cmd": "ps aux"}
  ]
}
EOF

./aishell-gate   --policy-preset ci_build   --audit-log ~/aishell-test/audit.jsonl   --plan ~/aishell-test/ci_plan.json   --dry-run

DevOps deployment simulation — with one bad action

cat > ~/aishell-test/deploy_plan.json <<'EOF'
{
  "goal": "deploy release v2.4.1 to staging",
  "source": "ai",
  "strategy": "fail_fast",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "git pull"},
    {"cmd": "make clean"},
    {"cmd": "make"},
    {"cmd": "cp dist/app /srv/staging/app"},
    {"cmd": "rm -rf /var/log/staging"},
    {"cmd": "ls -la /srv/staging"}
  ]
}
EOF

./aishell-gate   --policy-preset dev_sandbox   --jail-root /srv/staging   --audit-log ~/aishell-test/audit.jsonl   --plan ~/aishell-test/deploy_plan.json   --dry-run

Expected output

The first five commands should evaluate. rm -rf /var/log/staging should DENY — recursive deletion of a system-adjacent path. Under fail_fast, the plan stops there. The final ls is never reached.

Security audit simulation — read-only inspection

./aishell-gate --policy-preset read_only --dry-run <<'EOF'
{
  "goal": "security audit: inspect system configuration and running services",
  "source": "ai",
  "strategy": "best_effort",
  "actions": [
    {"cmd": "uname -a"},
    {"cmd": "cat /etc/os-release"},
    {"cmd": "ps aux"},
    {"cmd": "ss -tlnp"},
    {"cmd": "df -h"},
    {"cmd": "ls -la /etc/cron.d"},
    {"cmd": "cat /etc/passwd"},
    {"cmd": "find /tmp -type f"},
    {"cmd": "git log"},
    {"cmd": "ls -la /var/log"}
  ]
}
EOF

Stage 9 Feedback

Did the CI plan evaluate all nine actions? (yes / no):
Did the deploy plan stop at the rm -rf action? (yes / no):
Did the read-only audit plan complete without denials? (yes / no — which were denied):
Performance: did large plans evaluate noticeably slowly? (yes / no / how long):
Overall: did the gate behave as you would expect a policy boundary to behave? (yes / no / comments):

12 Stage 10 — Executor Interactive Mode

Stage 10 — Interactive execution

This is the second half of AIShell-Gate's interactive surface. Stage 1 was the policy engine in interactive mode: you typed commands and the engine explained its decisions, but nothing ever ran. Stage 10 is the executor in interactive mode: you type commands the same way, you see the same decisions, but now the allowed ones actually execute under full policy enforcement and audit. Same typed-command feel, real consequences.

This is also how a human operator uses AIShell-Gate in day-to-day work. The policy engine's decisions apply equally whether a human types the command or an AI agent submits it — the gate does not discriminate by source.

This stage executes commands for real. Run only on a test system. Use the --dry-run flag if you want to observe the confirmation flow without actually executing anything.

Interactive with --dry-run (safe)

./aishell-gate --policy-preset ops_safe --interactive --dry-run

Type commands at the exec> prompt. Try: git status, ls -la, rm /tmp/test. Observe the confirmation level declared for each before the dry-run suppression fires. Type quit to exit.

Interactive live execution (on a safe test machine only)

./aishell-gate   --policy-preset ops_safe   --interactive   --audit-log ~/aishell-test/audit.jsonl

Try low-risk commands that will execute immediately, then a medium-risk command that requires plan-level confirmation. Observe the confirmation prompts. Type quit to end the session and close the audit log.

Stage 10 Feedback

Did interactive mode start correctly? (yes / no):
Were confirmation prompts presented at the correct levels? (yes / no):
Did low-risk commands execute immediately without prompts? (yes / no):
Was the interactive experience usable as a daily workflow? (yes / no / comments):

13 Stage 11 — Enterprise Features

Stage 11 — Enterprise edition only

Enterprise edition only. If you received the standard edition, skip this stage. Standard binaries will emit a clear not available in standard edition message for each of these flags — please note whether that message appeared clearly.

Keyed HMAC audit chain

# Generate a key
dd if=/dev/urandom bs=32 count=1 2>/dev/null | xxd -p | tr -d '
'   > ~/aishell-test/audit.key
chmod 640 ~/aishell-test/audit.key

# Run a plan with a keyed audit chain
./aishell-gate   --policy-preset dev_sandbox   --audit-log ~/aishell-test/keyed_audit.jsonl   --dry-run <<'EOF'
{
  "goal": "keyed audit test",
  "source": "ai",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "npm test"}
  ]
}
EOF

Verify the keyed chain

./aishell-gate-exec   --audit-verify ~/aishell-test/keyed_audit.jsonl

Dump effective policy as JSON

./aishell-gate-policy   --policy-preset dev_sandbox   --policy-project ~/aishell-test/project_policy.json   --dump-policy | head -60

Stage 11 Feedback

Enterprise: did the keyed audit chain verify successfully? (yes / no / N/A):
Enterprise: did --dump-policy produce JSON output showing your overlay layers? (yes / no / N/A):
Standard: did enterprise flags produce clear edition messages? (yes / no / N/A):

14 Final Feedback

Thank you for working through the test plan. Please copy the questions below into your feedback email to info@aishell.org with subject Beta Feedback — [your name].

Functionality

What worked

Which stages completed without issues:
Which features worked better than you expected:

What did not work

Which stages failed or behaved unexpectedly:
Exact error messages (paste the full output if possible):
The command that triggered the error:

Policy and correctness

Policy decisions

Any ALLOW decision that should have been a DENY (command and reason):
Any DENY decision that you think was incorrect (command and reason):
Any confirmation level that felt wrong — too high or too low (command and what you expected):
Any flag assessment reason that was unclear or inaccurate:

Usability

Experience

Was the output readable without consulting documentation? (yes / no / comments):
Was the JSON plan format intuitive to write? (yes / no / what was confusing):
Was the policy file format intuitive? (yes / no / what was confusing):
What would reduce the friction of using this day-to-day:

Deployment

Production readiness

Would you deploy this in a production or near-production environment? (yes / no / with what changes):
What is the primary use case you would deploy it for:
What is the single most important thing to fix before 1.0:
What is the single most important feature to add after 1.0:
Any other comments, suggestions, or observations:

Thank you Every report we receive — including "I got to Stage 2 and it crashed" — is genuinely useful. You are helping shape a tool that will sit between AI systems and production infrastructure. The feedback you provide today directly influences who owns that boundary and how well it holds.

Testers — You Can Stop Reading Here

Everything below this line is for the program coordinator managing the beta engagement. It is included in the same document to keep the beta programme in a single file. Testers can ignore this section entirely — your work ends with the Final Feedback section above.

C0 Coordinator Section

This section contains the materials needed to run a beta engagement: the pre-handoff checklist, the structured report card template for capturing tester observations, and the rating scheme used to categorise findings.

The test philosophy is depth over breadth. A thorough report covering four stages is more valuable than a rushed pass over all twelve. Observation quality beats coverage speed.

C1 Coordinator Pre-Handoff Checklist

Before handing the package to each tester, confirm all of the following:

The beta package contains all binaries (aishell-gate-policy, aishell-gate-exec), all launcher scripts (aishell-gate.sh, aishell-confirm.sh, aishell-pipe.sh, aishell-gate-mcp.py), and all shipped documentation.
The tester has been told: use the documentation, not the coordinator, for how-to questions.
The tester has been told: fill feedback in during the session, not from memory afterward.
The tester's environment is isolated from production systems — a dedicated VM, a container with a full init system, or a non-production development workstation.
The tester understands the test is depth-over-breadth: thorough coverage of a few stages beats a rushed pass over all twelve.
The tester has the submission address (info@aishell.org) and the Report Card template below.
The coordinator is prepared to not answer how-to questions for the duration of the engagement.

Why “no how-to support” The tester's ability or inability to find answers in the shipped documentation is one of the primary things being measured. If you answer the question, you have destroyed that data point. The correct response to “how do I X?” is: “that information is in the documentation — please find it there and note whether you could.”

C2 Rating Scheme

The tester applies one of four ratings to each observation recorded on the Report Card:

Rating	Meaning
Pass	The system behaved as documented. No friction.
Fail	The system did not behave as documented, or the documentation was wrong.
Unexpected	The system behaved differently from what the tester expected, but the documentation may or may not cover it. Record both expected and actual.
Note	Behaviour was correct but worth flagging: a risk score opinion, a usability observation, a documentation clarity issue.

C3 Report Card Template

The tester completes one row per observation during the session. A stage that produces multiple interesting observations should have multiple rows, all referencing the same stage but with different inputs. A stage that produced a clean expected result needs only one row.

Header — completed once per submission

Tester:            ___________________________
Date:              ___________________________
OS / Distro:       ___________________________
Kernel version:    ___________________________
Policy binary:     _________________ (from --version)
Exec binary:       _________________ (from --version)
Edition:           [ ] Standard   [ ] Enterprise
Total hours:       _______

Observation rows

Reproduce the following row format as many times as needed. The shaded example shows the intended level of detail:

Stage	Command / Input (verbatim)	Output / Result received	Expected	Rating	Notes / Friction
Stage 5	./aishell-gate-policy --policy-preset read_only > rm -rf /var	DENY: recursive deletion denied	DENY	Pass	Decision clear. Reason text helpful.

Bug reports (for Fail or Unexpected rows)

For any row rated Fail or Unexpected where the behaviour looks like a bug, attach a separate plain-text bug note containing:

One-line summary
The exact input
The complete policy or preset in effect
The full terminal output
The exit code
Whether the issue is reproducible — if it occurred only once and could not be reproduced, say so

Attachments to include with submission

Any policy files written during the test
Audit logs from Stage 6 (Audit Logging) and Stage 7 (Custom Policy File), including the deliberately tampered copy
Any JSON plans constructed during the test

Submission

Completed Report Card, bug notes, and attachments to info@aishell.org. Subject line: Beta Report — [tester name or alias] — [date]. The tester may also include a brief free-form general impressions note, but the Report Card is the primary deliverable.

C4 Out of Scope for This Beta

If the tester encounters any of the following through natural exploration they should note observations, but should not spend planned time here:

Multi-session concurrent remote deployment. Single-session remote use is in scope; concurrent multi-session is not a priority for 1.0 beta.
Production system deployment. No part of this test should occur on infrastructure serving real workloads.
Integration with a live autonomous AI agent. The tester submits JSON plans they construct themselves or via the reference pipeline script.
Security penetration testing. Use the system as intended and report where it does not work as documented.
Performance and load testing.