Skip to content

Testing

Every PR should pass this checklist before merge.

1. Test Suite (required)

python3 -m pytest tests/ -x -q
  • Zero new failures (existing known failures documented below)
  • New features must include tests

Known failures:

  • test_fast_classifier.py — requires transformers or optimum+onnxruntime (guardian ML deps). Not a blocker.

2. Import Check (required)

All new modules must import cleanly:

python3 -c "from <new_module> import <main_function>"

No missing dependencies, no circular imports.

3. Docker Build (if applicable)

New Dockerfiles must build:

docker build -t <image-name> executors/<executor_name>/
  • Image builds without errors
  • Image size is reasonable (minimal base ~10MB, full ~500MB max)

4. Smoke Test (required)

Manually exercise the new feature. Examples by category:

Executors

# Inline test
python3 -c "from executors.<name>.executor import <func>; print(<func>(<args>))"

# Container test (if Docker available)
docker run --rm <image> <args>

Bridge Endpoints

# Start bridge
python3 runner.py bridge &

# Hit endpoint
curl -s -X POST http://localhost:8099/<tool>/<action> \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"key": "value"}'

# Test auth rejection
curl -s -X POST http://localhost:8099/<tool>/<action> \
  -H "Authorization: Bearer wrongtoken"

Config/Model Changes

# Verify YAML parsing
python3 -c "from taskrunner.models import load_task; t = load_task('agent.yaml'); print(t)"

CLI Commands

python3 runner.py --help
python3 runner.py <new_command> --help

5. Error Paths (required for new features)

Test at least:

  • Missing/invalid input — clear error message
  • Missing dependencies — helpful error, not stack trace
  • Timeouts — handled gracefully
  • Auth failures — rejected with 401/403

6. Integration (required)

  • New executor wired into orchestrator.py dispatch?
  • New tool defined in agent.yaml?
  • Existing tests still pass? (no regressions from rename/refactor)

7. Security (required)

  • No shell=True in subprocess calls
  • No string interpolation in shell commands — use argument lists
  • No secrets in log output or error messages
  • Auth tokens validated on all endpoints
  • Container security flags preserved (--read-only, --cap-drop=ALL, --no-new-privileges)

8. Cross-branch Dependencies

Before merging, check:

  • Does this branch depend on another unmerged PR?
  • Will this break other open PRs? (especially shared files like orchestrator.py, models.py, agent.yaml)
  • After merge, do other branches need rebasing?

macOS Tool Testing

Apple ecosystem tools need permissions pre-granted. First-time setup:

# Grant Full Disk Access to Terminal (or your terminal app)
# System Settings → Privacy & Security → Full Disk Access → add Terminal

# Test CLIs directly first
memo notes                    # Apple Notes
remindctl today              # Apple Reminders
things inbox --limit 5       # Things 3
imsg --help                  # iMessage

If any CLI hangs or shows a permission dialog, fix permissions before testing the executor/bridge.

CI (GitHub Actions)

The CI workflow runs python3 -m pytest tests/ -x -q automatically on push. Guardian ML tests are expected to fail in CI (no model weights). Everything else should pass.

Running Full Validation

Quick one-liner to validate a branch:

# Run tests + import check + security scan
python3 -m pytest tests/ -x -q && \
python3 -c "import ast, pathlib; [ast.parse(p.read_text()) for p in pathlib.Path('.').rglob('*.py')]" && \
grep -rn "shell=True" executors/ taskrunner/ bridge/ 2>/dev/null && echo "⚠️  shell=True found!" || echo "✅ No shell=True"