AIShell-Gate 1.0 Beta  ·  Tester Guide and Test Plan

Beta Tester Guide

A progressive walkthrough from first run to complex AI pipeline — with copy-paste commands at every step and structured feedback collection throughout.

aishell-gate-policy  ·  aishell-gate-exec  ·  aishell-gate  ·  aishell-confirm

Copyright © 2026 AIShell Labs LLC Winston-Salem NC USA. All Rights Reserved. — www.aishellgate.com  ·  www.aishell.org

!! BETA RELEASE — TEST IN ISOLATION !! Run all tests on a clean, isolated Linux system. Do not run live execution tests (without --dry-run) on any system you care about until you are confident in the policy configuration. When in doubt, add --dry-run — it runs the full evaluation and confirmation flow without executing anything.
Confidentiality Notice
This document and all materials delivered with it — including the AIShell-Gate binary executables, the aishell-confirm.sh script, and all accompanying documentation — are the confidential and proprietary property of AIShell Labs LLC, Copyright © 2026 AIShell Labs LLC, Winston-Salem NC USA. All Rights Reserved.

These materials are provided solely for the purpose of beta evaluation as described in this document. By accepting delivery of these materials, the recipient agrees: (1) to hold them in strict confidence and not to disclose, reproduce, or distribute them to any third party; (2) to use them solely for beta evaluation on an isolated, non-production system; (3) to return or destroy all copies upon request by AIShell Labs; and (4) that no licence or other right in the software or documentation is granted except as explicitly stated herein.

Questions regarding permitted use: info@aishell.org

00 Welcome and What We Need

Thank you for taking the time to test AIShell-Gate. You are in a small group of the first people outside AIShell Labs to run this system, and your feedback will directly shape the 1.0 release.

This guide does two things at once. It introduces you to the system progressively — from the simplest possible command to a full AI pipeline — and it collects structured feedback at each stage so we know exactly what worked, what confused you, and what broke.

Please work through the stages in order. Each stage builds on the previous one. You do not need to complete every stage — stop wherever you lose confidence or run into a problem, and report what happened. A tester who gets to Stage 4 and hits an error is giving us something more valuable than one who skips to Stage 9.

Pick a path before you start

This guide covers the full system across twelve stages, which is more than most testers will complete in a single session. Before your first session, pick one of the three paths below. Your coordinator may have already assigned you a path; if not, pick the one that matches the time you have and the depth you want.

Depth beats breadth We would rather have a thorough Path A report than a rushed Path C one. Observation quality comes before coverage speed. If you want to return for another session later and do Path B or C, we welcome that.
About the commands AIShell-Gate requires a fair number of flags and a JSON plan can be daunting to type from scratch. Every command in this guide is ready to copy and paste exactly as written. The COPY label in the top-right of each code block is a reminder that the block is complete and self-contained.

What we are trying to learn from you

Five questions drive the beta programme. Keeping these in mind as you work through the stages will help you notice the kind of feedback that is most valuable to us:

  1. Does the documentation work? Can a practitioner who has never seen AIShell-Gate install it and operate it usefully from only the shipped documentation? Where you struggle — why?
  2. Is the policy model correct? Do the risk scores match your assessment of actual command danger? Are confirmation level thresholds calibrated correctly for your environment?
  3. Does the confirmation machinery work? Do all four levels (none, plan, action, typed) behave as documented? Is the typed challenge a meaningful gate or trivial to bypass without reading the command?
  4. Are there failure modes we have not anticipated? Edge cases in command parsing, policy matching, audit chain handling, or the FIFO relay that only surface in a real environment.
  5. What is your overall practitioner impression? Would a sysadmin in your position find the system credible and useful as a production tool?

To report feedback: email info@aishell.org with the subject line Beta Feedback — [your name]. You can paste your notes from the feedback boxes in this document directly into the email. Even a brief "Stage 3 worked, Stage 4 crashed with this error" is useful.

To report a security issue: email security@aishell.org privately. Do not post security findings in a public channel.

Test environment: All tests should be run on Linux x86_64 or arm64. macOS and FreeBSD are not supported in this beta — pipe2() is Linux-only and the binary will fail at startup on other platforms. A fresh VM or container is ideal.

01 Your Background (Quick Survey)

Before you start, please note your answers to the following — include them at the top of your feedback email. This helps us interpret your results correctly.

02 Stage 0 — Unpack and Verify

Stage 0 — Setup

Unpack the tar.gz into a working directory and verify all four binaries are present and report the correct versions.

Unpack

# Replace with your actual archive name
tar -xzf aishell-gate-1.0-beta-linux-x86_64.tar.gz
cd aishell-gate-1.0-beta
ls -la

Verify all four binaries

./aishell-gate-policy --version
./aishell-gate-exec   --version
./aishell-gate        --version
./aishell-confirm     --version
Expected output
aishell-gate-policy [version] standard (or enterprise)
aishell-gate-exec [version] standard (or enterprise)
aishell-gate [version]
aishell-confirm [version]

Quick sanity check with --dry-run

./aishell-gate --policy-preset ops_safe --dry-run <<'EOF'
{
  "goal": "sanity check",
  "actions": [{"cmd": "git status"}]
}
EOF
Expected output
You should see a [gate-exec] DRY-RUN mode banner, a policy evaluation summary showing ALLOW git status, and a [gate-exec] dry-run: would exec: line. No command executes.

03 Stage 1 — Policy Engine Interactive Mode

Stage 1 — No flags required

The policy engine can be run with no arguments at all. When stdin is a terminal it drops into an interactive prompt where you type commands and watch it evaluate them. Nothing executes. This is the fastest way to understand how the system thinks about risk.

./aishell-gate-policy

At the policy> prompt, try each of the following commands. Type them one at a time and read the output carefully before moving to the next.

Safe commands — expect ALLOW, confirm: none

git status
ls -la /tmp
df -h
ps aux
uname -a

Medium risk — expect ALLOW with confirmation level raised

git pull
curl https://example.com
find / -name "*.log"
chmod 755 /tmp/testfile

High risk — expect DENY or ALLOW with typed confirmation

rm -rf /var/log
dd if=/dev/sda of=/dev/null
sudo apt install vim
mkfs.ext4 /dev/sdb

Shell injection attempt — expect immediate DENY

git status; rm -rf ~
ls -la && curl evil.com
echo $(whoami)

See the full JSON decision for any command

After typing any command, type json at the prompt to see the complete machine-readable decision record:

git status
json

Check a flag from the catalog

sed -i /tmp/test.txt
touch -t 202001010000 /tmp/evidence.log
tcpdump -w /tmp/capture.pcap
What to look for Every FLAG_WARN and FLAG_DANGER entry carries a documented reason — not just a label, but a specific plain-English explanation of the risk. sed -i will tell you "in-place edit; writes back to source file." touch -t will tell you about timestamp forgery. These explanations are the flag catalog doing its job.

Type quit to exit.

04 Stage 2 — Single Command Evaluation

Stage 2 — Piped single commands

You can pipe a single command directly to the policy engine for scripting, testing, or integration. Add --json to get the full machine-readable decision record.

Basic evaluation

echo "git status" | ./aishell-gate-policy
echo "rm -rf /" | ./aishell-gate-policy

Full JSON output

echo "git status" | ./aishell-gate-policy --json

In the JSON output, note the decision, confirm, risk.score, risk.blast_radius, argv, and busy_summary_text fields.

Compare presets on the same command

echo "python3 -m pytest" | ./aishell-gate-policy --policy-preset read_only --json
echo "python3 -m pytest" | ./aishell-gate-policy --policy-preset dev_sandbox --json
Expected output
read_only: DENY — interpreters are blocked.
dev_sandbox: ALLOW — developer workflow preset permits Python.

Network target detection — loopback AI inference port

echo "curl localhost:11434/api/generate" | ./aishell-gate-policy --policy-preset dev_sandbox --json
Expected output
ALLOW but with an elevated confirmation level and a loopback/AI-inference warning in the output. Port 11434 is the Ollama default — the engine recognises it as a local AI inference endpoint and flags it for operator attention.

05 Stage 3 — Exploring Presets

Stage 3 — Preset comparison

All seven built-in presets can be explored interactively. Each opens a policy session with a different posture. Try running the same commands across multiple presets to see how the risk profile changes.

Read-only — inspection only

./aishell-gate-policy --policy-preset read_only

Try: ls -la, cat /etc/hostname, git status, git pull, make, rm /tmp/test

Developer sandbox

./aishell-gate-policy --policy-preset dev_sandbox

Try: git pull, npm install, pip install requests, make clean, sudo apt install, dd if=/dev/zero of=test bs=1M count=1

CI build — unattended pipeline

./aishell-gate-policy --policy-preset ci_build

Try: make, npm test, cargo build, git status, sudo, curl https://example.com

Danger zone — minimal restrictions

./aishell-gate-policy --policy-preset danger_zone

Note: most commands ALLOW but confirmation levels will be high. Try: rm /tmp/test, chmod 777 /tmp, dd if=/dev/zero of=/tmp/test

Dump the effective policy stack (Enterprise)

./aishell-gate-policy --policy-preset dev_sandbox --dump-policy
Enterprise only. Standard edition will emit a clear "not available in standard edition" message. If you have the enterprise binary, this prints your operator overlay layers (base, project, user) as JSON — useful for verifying that your custom rules are active and in the correct layer. The built-in standard_policy layer and command catalog are not included in the output.

06 Stage 4 — First Executor Plan

Stage 4 — Always use --dry-run first

The executor reads a JSON action plan and submits each command to the policy engine in sequence. We always use --dry-run first — the full evaluation runs, confirmation gates fire, the audit log records a DRY_RUN_SUPPRESSED event, but nothing executes.

Note that we use ./aishell-gate here — the recommended entry point. It handles --policy-binary automatically.

Minimal plan — two safe commands

./aishell-gate --policy-preset ops_safe --dry-run <<'EOF'
{
  "goal": "check repository state",
  "source": "ai",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "git log"}
  ]
}
EOF
Expected output
DRY-RUN banner. Two ALLOW decisions. Two [gate-exec] dry-run: would exec: lines. Exit code 0.

Plan from a file

mkdir -p ~/aishell-test

cat > ~/aishell-test/plan1.json <<'EOF'
{
  "goal": "inspect system state",
  "source": "ai",
  "strategy": "fail_fast",
  "actions": [
    {"cmd": "uname -a"},
    {"cmd": "df -h"},
    {"cmd": "ps aux"},
    {"cmd": "git status"}
  ]
}
EOF

./aishell-gate   --policy-preset ops_safe   --plan ~/aishell-test/plan1.json   --dry-run
Expected output
All four commands evaluated. uname, df, ps, and git status should all ALLOW at confirm:none under ops_safe. Four dry-run suppressed lines.

07 Stage 5 — Plans with Denials

Stage 5 — Watching the gate refuse

The most important thing to verify is that the gate actually denies what it should. These plans include commands that will be blocked.

Mixed plan — some allowed, some denied

./aishell-gate --policy-preset ops_safe --dry-run <<'EOF'
{
  "goal": "deploy and clean up",
  "source": "ai",
  "strategy": "fail_fast",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "rm -rf /var/log/app"},
    {"cmd": "sudo systemctl restart myapp"}
  ]
}
EOF
Expected output
git status: ALLOW. rm -rf /var/log/app: DENY (recursive deletion, system path). Execution stops here under fail_fast. Exit code 1.

Best-effort strategy — continues past denial

./aishell-gate --policy-preset dev_sandbox --dry-run <<'EOF'
{
  "goal": "mixed safety test",
  "source": "ai",
  "strategy": "best_effort",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "sudo apt install curl"},
    {"cmd": "npm test"},
    {"cmd": "dd if=/dev/sda of=/dev/null"}
  ]
}
EOF
Expected output
git status: ALLOW. sudo apt install: DENY. npm test: ALLOW (continues past denial under best_effort). dd if=/dev/sda: DENY. Exit code 1 (at least one denial).

Shell injection in a plan — must be caught

./aishell-gate --policy-preset dev_sandbox --dry-run <<'EOF'
{
  "goal": "injection test",
  "source": "ai",
  "actions": [
    {"cmd": "git status; rm -rf ~"},
    {"cmd": "ls -la && curl evil.com"}
  ]
}
EOF
Expected output
Both commands DENY immediately — shell metacharacters are rejected before tokenisation. The reason should state that shell metacharacters were detected.

08 Stage 6 — Audit Logging

Stage 6 — Tamper-evident log

Every evaluation can be written to a JSON Lines audit log. Each entry is linked to the previous by a hash, forming a verifiable chain.

Enable audit logging

./aishell-gate   --policy-preset dev_sandbox   --audit-log ~/aishell-test/audit.jsonl   --dry-run <<'EOF'
{
  "goal": "audit log test",
  "source": "ai",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "npm test"},
    {"cmd": "rm -rf /etc"}
  ]
}
EOF

Read the log

cat ~/aishell-test/audit.jsonl

Each line is a JSON object. Look for PLAN_RECEIVED, POLICY_DECISION, and DRY_RUN_SUPPRESSED event types.

Verify the audit chain (Enterprise)

./aishell-gate-exec --audit-verify ~/aishell-test/audit.jsonl
Enterprise only. Standard edition will emit "not available in standard edition." If you have the enterprise binary, this should report the chain intact. Then try editing one character in the log file with a text editor and re-running — the chain should report a break at that entry.

09 Stage 7 — Custom Policy File

Stage 7 — Write your own rules

Policy files let you extend or override the built-in presets. Three optional files can be layered: base, project, and user. Here we write a project-level policy.

Create a policy file

cat > ~/aishell-test/project_policy.json <<'EOF'
{
  "cmd_allow": [
    { "pattern": "git status",  "confirm": "none",   "reason": "safe read" },
    { "pattern": "git diff",    "confirm": "none",   "reason": "safe read" },
    { "pattern": "git pull",    "confirm": "plan",   "reason": "modifies repo" },
    { "pattern": "npm test",    "confirm": "plan",   "reason": "run tests" },
    { "pattern": "npm install", "confirm": "action", "reason": "installs packages" }
  ],
  "cmd_deny": [
    { "pattern": "curl", "reason": "no outbound network in this project" },
    { "pattern": "wget", "reason": "no outbound network in this project" }
  ],
  "writable_dirs": [ "/home/user/myproject", "/tmp" ]
}
EOF

Test an allowed command

echo "git status" | ./aishell-gate-policy   --policy-preset dev_sandbox   --policy-project ~/aishell-test/project_policy.json

Test an explicitly denied command

echo "curl https://example.com" | ./aishell-gate-policy   --policy-preset dev_sandbox   --policy-project ~/aishell-test/project_policy.json
Expected output
git status: ALLOW, confirm:none, matched your project rule.
curl: DENY, reason shows your project policy denial message.

Test a typo in the policy file — should fail closed

cat > ~/aishell-test/typo_policy.json <<'EOF'
{
  "cmd_denny": [
    { "pattern": "curl", "reason": "typo test" }
  ]
}
EOF

echo "git status" | ./aishell-gate-policy   --policy-preset dev_sandbox   --policy-project ~/aishell-test/typo_policy.json
Expected output
A hard error naming the unknown key cmd_denny. The engine fails closed on unknown policy keys rather than silently ignoring them. The command is NOT evaluated.

Generate a template to start from

./aishell-gate-policy --dump-standard-template   | sed '1,/^{$/{ /^{$/!d }'   > ~/aishell-test/template_base.json

wc -l ~/aishell-test/template_base.json

10 Stage 8 — Jail Root Containment

Stage 8 — Directory confinement

The --jail-root flag tells the policy engine to enforce path containment. Any write-class command targeting a path outside the jail root is denied, regardless of policy rules.

mkdir -p /tmp/aishell-jail/project

Command inside the jail — should allow

echo "ls -la /tmp/aishell-jail/project" | ./aishell-gate-policy   --policy-preset dev_sandbox   --jail-root /tmp/aishell-jail

Command outside the jail — should deny

echo "ls -la /etc" | ./aishell-gate-policy   --policy-preset dev_sandbox   --jail-root /tmp/aishell-jail

Prefix attack — must NOT allow

mkdir -p /tmp/aishell-jailbreak/attack

echo "ls -la /tmp/aishell-jailbreak/attack" | ./aishell-gate-policy   --policy-preset dev_sandbox   --jail-root /tmp/aishell-jail
Expected output
/tmp/aishell-jailbreak/attack must be DENIED even though it shares the /tmp/aishell-jail prefix. The engine checks that the character after the prefix is / or end-of-string, not just a prefix match.

Plan confined to jail

./aishell-gate   --policy-preset dev_sandbox   --jail-root /tmp/aishell-jail   --dry-run <<'EOF'
{
  "goal": "work inside jail only",
  "source": "ai",
  "actions": [
    {"cmd": "ls -la /tmp/aishell-jail/project"},
    {"cmd": "ls -la /etc"},
    {"cmd": "ls -la /tmp/aishell-jailbreak"}
  ]
}
EOF

11 Stage 9 — Large Plans

Stage 9 — Full pipeline simulation

These plans simulate realistic AI agent workloads — the kind of multi-step sequences an AI coding assistant or CI pipeline agent would generate.

Full CI build cycle simulation

cat > ~/aishell-test/ci_plan.json <<'EOF'
{
  "goal": "full CI cycle: pull, install dependencies, lint, test, report",
  "source": "ai",
  "strategy": "fail_fast",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "git pull"},
    {"cmd": "npm install"},
    {"cmd": "npm run lint"},
    {"cmd": "npm test"},
    {"cmd": "npm run build"},
    {"cmd": "git log"},
    {"cmd": "df -h"},
    {"cmd": "ps aux"}
  ]
}
EOF

./aishell-gate   --policy-preset ci_build   --audit-log ~/aishell-test/audit.jsonl   --plan ~/aishell-test/ci_plan.json   --dry-run

DevOps deployment simulation — with one bad action

cat > ~/aishell-test/deploy_plan.json <<'EOF'
{
  "goal": "deploy release v2.4.1 to staging",
  "source": "ai",
  "strategy": "fail_fast",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "git pull"},
    {"cmd": "make clean"},
    {"cmd": "make"},
    {"cmd": "cp dist/app /srv/staging/app"},
    {"cmd": "rm -rf /var/log/staging"},
    {"cmd": "ls -la /srv/staging"}
  ]
}
EOF

./aishell-gate   --policy-preset dev_sandbox   --jail-root /srv/staging   --audit-log ~/aishell-test/audit.jsonl   --plan ~/aishell-test/deploy_plan.json   --dry-run
Expected output
The first five commands should evaluate. rm -rf /var/log/staging should DENY — recursive deletion of a system-adjacent path. Under fail_fast, the plan stops there. The final ls is never reached.

Security audit simulation — read-only inspection

./aishell-gate --policy-preset read_only --dry-run <<'EOF'
{
  "goal": "security audit: inspect system configuration and running services",
  "source": "ai",
  "strategy": "best_effort",
  "actions": [
    {"cmd": "uname -a"},
    {"cmd": "cat /etc/os-release"},
    {"cmd": "ps aux"},
    {"cmd": "ss -tlnp"},
    {"cmd": "df -h"},
    {"cmd": "ls -la /etc/cron.d"},
    {"cmd": "cat /etc/passwd"},
    {"cmd": "find /tmp -type f"},
    {"cmd": "git log"},
    {"cmd": "ls -la /var/log"}
  ]
}
EOF

12 Stage 10 — Executor Interactive Mode

Stage 10 — Interactive execution

This is the second half of AIShell-Gate's interactive surface. Stage 1 was the policy engine in interactive mode: you typed commands and the engine explained its decisions, but nothing ever ran. Stage 10 is the executor in interactive mode: you type commands the same way, you see the same decisions, but now the allowed ones actually execute under full policy enforcement and audit. Same typed-command feel, real consequences.

This is also how a human operator uses AIShell-Gate in day-to-day work. The policy engine's decisions apply equally whether a human types the command or an AI agent submits it — the gate does not discriminate by source.

This stage executes commands for real. Run only on a test system. Use the --dry-run flag if you want to observe the confirmation flow without actually executing anything.

Interactive with --dry-run (safe)

./aishell-gate --policy-preset ops_safe --interactive --dry-run

Type commands at the exec> prompt. Try: git status, ls -la, rm /tmp/test. Observe the confirmation level declared for each before the dry-run suppression fires. Type quit to exit.

Interactive live execution (on a safe test machine only)

./aishell-gate   --policy-preset ops_safe   --interactive   --audit-log ~/aishell-test/audit.jsonl

Try low-risk commands that will execute immediately, then a medium-risk command that requires plan-level confirmation. Observe the confirmation prompts. Type quit to end the session and close the audit log.

13 Stage 11 — Enterprise Features

Stage 11 — Enterprise edition only
Enterprise edition only. If you received the standard edition, skip this stage. Standard binaries will emit a clear not available in standard edition message for each of these flags — please note whether that message appeared clearly.

Keyed HMAC audit chain

# Generate a key
dd if=/dev/urandom bs=32 count=1 2>/dev/null | xxd -p | tr -d '
'   > ~/aishell-test/audit.key
chmod 640 ~/aishell-test/audit.key

# Run a plan with a keyed audit chain
./aishell-gate   --policy-preset dev_sandbox   --audit-log ~/aishell-test/keyed_audit.jsonl   --dry-run <<'EOF'
{
  "goal": "keyed audit test",
  "source": "ai",
  "actions": [
    {"cmd": "git status"},
    {"cmd": "npm test"}
  ]
}
EOF

Verify the keyed chain

./aishell-gate-exec   --audit-verify ~/aishell-test/keyed_audit.jsonl

Dump effective policy as JSON

./aishell-gate-policy   --policy-preset dev_sandbox   --policy-project ~/aishell-test/project_policy.json   --dump-policy | head -60

14 Final Feedback

Thank you for working through the test plan. Please copy the questions below into your feedback email to info@aishell.org with subject Beta Feedback — [your name].

Functionality

Policy and correctness

Usability

Deployment

Thank you Every report we receive — including "I got to Stage 2 and it crashed" — is genuinely useful. You are helping shape a tool that will sit between AI systems and production infrastructure. The feedback you provide today directly influences who owns that boundary and how well it holds.

Testers — You Can Stop Reading Here

Everything below this line is for the program coordinator managing the beta engagement. It is included in the same document to keep the beta programme in a single file. Testers can ignore this section entirely — your work ends with the Final Feedback section above.

C0 Coordinator Section

This section contains the materials needed to run a beta engagement: the pre-handoff checklist, the structured report card template for capturing tester observations, and the rating scheme used to categorise findings.

The test philosophy is depth over breadth. A thorough report covering four stages is more valuable than a rushed pass over all twelve. Observation quality beats coverage speed.

C1 Coordinator Pre-Handoff Checklist

Before handing the package to each tester, confirm all of the following:

Why “no how-to support” The tester's ability or inability to find answers in the shipped documentation is one of the primary things being measured. If you answer the question, you have destroyed that data point. The correct response to “how do I X?” is: “that information is in the documentation — please find it there and note whether you could.”

C2 Rating Scheme

The tester applies one of four ratings to each observation recorded on the Report Card:

C3 Report Card Template

The tester completes one row per observation during the session. A stage that produces multiple interesting observations should have multiple rows, all referencing the same stage but with different inputs. A stage that produced a clean expected result needs only one row.

Header — completed once per submission

Tester:            ___________________________
Date:              ___________________________
OS / Distro:       ___________________________
Kernel version:    ___________________________
Policy binary:     _________________ (from --version)
Exec binary:       _________________ (from --version)
Edition:           [ ] Standard   [ ] Enterprise
Total hours:       _______

Observation rows

Reproduce the following row format as many times as needed. The shaded example shows the intended level of detail:

Bug reports (for Fail or Unexpected rows)

For any row rated Fail or Unexpected where the behaviour looks like a bug, attach a separate plain-text bug note containing:

Attachments to include with submission

Submission

Completed Report Card, bug notes, and attachments to info@aishell.org. Subject line: Beta Report — [tester name or alias] — [date]. The tester may also include a brief free-form general impressions note, but the Report Card is the primary deliverable.

C4 Out of Scope for This Beta

If the tester encounters any of the following through natural exploration they should note observations, but should not spend planned time here: