CROW BLUE AGENT READINESS STANDARD · V0.1 · CALL FOR COMMENTS

The standard for
production-ready agentic AI.

Ten domains. What a grumpy senior developer would ask before signing off on an agent going anywhere near production. Not a regulatory framework. Not a compliance checklist. A punchlist written for the people who build agents and the organizations that depend on them.

Version 0.1 · April 2026 · crow.blue/standard


Items marked Required are verified by Signet before a full governance credential is issued. Items marked Advisory are surfaced during registration and periodic review. Items marked Attestation require a human to confirm — they cannot be verified programmatically.

This is version 0.1. It is a call for comments. Submit feedback at the form below. All substantive comments will be acknowledged and addressed in the public change log.


01 · DATA HANDLING

Data Handling

1.1 Data inventory Required

Every data source the agent accesses is documented and classified. Classification uses one of: public, internal, confidential, restricted, personal_data, sensitive_personal_data, financial, health, regulated. Unclassified data sources are flagged and must be resolved before a full credential is issued.

1.2 PII minimization Advisory

The agent does not send personally identifiable information to a model endpoint unless it is necessary for the task. If PII must be sent, it is documented in the registration record and the model provider's data retention policy is reviewed and accepted.

1.3 Model provider data retention Attestation

The builder has reviewed the model provider's data processing terms and confirmed that inputs to the model are not retained for training purposes, or has accepted the risk and documented the decision.

1.4 Output data classification Advisory

The agent's outputs are classified at least as high as its most sensitive input. An agent that reads confidential data does not write outputs to a public channel without explicit human review.

1.5 Data minimization in prompts Advisory

Prompts do not include more data than necessary. If a full record is available but only a subset is needed, the agent sends the subset. Prompt engineering reviews are part of the change management process.

1.6 Logging hygiene Required

Application logs do not contain raw model inputs or outputs that include PII or confidential data. Log sanitization is implemented before logs are written to any storage or aggregation system.

1.7 Data residency Attestation

For agents operating under jurisdictional data requirements (GDPR, state privacy laws, sector-specific regulations), the builder has confirmed that data processed by the agent does not leave the required jurisdiction.

1.8 Retention and deletion Advisory

If the agent stores outputs, a retention policy is defined. Outputs are deleted on schedule. If the agent processes personal data, subject deletion requests can be fulfilled — the agent's outputs can be identified and removed.


02 · RESILIENCY AND FAILURE HANDLING

Resiliency and Failure Handling

2.1 Model unavailability handling Required

The agent has explicit handling for model API unavailability. It does not silently fail, loop indefinitely, or propagate an error as a valid result. The failure mode is documented: fail fast, retry with backoff, queue for later, or alert and stop.

2.2 Retry strategy Required

Where retries are appropriate, the agent implements exponential backoff with jitter. It does not hammer the model API on failure. Maximum retry count and total timeout are defined and bounded.

2.3 Malformed output handling Required

The agent validates model output before acting on it. If the output does not conform to the expected schema or contains nonsense, the agent does not proceed. Downstream systems are not corrupted by garbage outputs.

2.4 Partial failure handling Advisory

For multi-step workflows, the agent handles partial completion. If step 3 of 5 fails, the state is known, the partial work is either rolled back or held for review, and the agent does not leave the system in an inconsistent state.

2.5 Timeout enforcement Required

Every external call the agent makes — to the model, to APIs, to databases — has an explicit timeout. The agent does not wait indefinitely. Timeouts are appropriate to the operation and documented.

2.6 Circuit breaker Advisory

For agents that call downstream services repeatedly, a circuit breaker pattern is implemented. If a downstream service is consistently failing, the agent backs off rather than contributing to a cascade failure.

2.7 Graceful degradation Advisory

Where possible, the agent has a degraded mode. If the model is unavailable but the task is critical, a fallback path exists: a human queue, a rule-based fallback, or a held queue for processing when the model recovers.

2.8 Idempotency Required

For agents that write to databases, send messages, or trigger workflows, operations are idempotent where possible. Running the same operation twice does not produce duplicate records, duplicate messages, or duplicate charges. (Required for agents that write to external systems.)


03 · HUMAN OVERSIGHT

Human Oversight

3.1 Oversight model declared Required

The registration record specifies the agent's human oversight model: always reviewed before action, sampled review, exception-only review, or fully automated. The declared model is appropriate to the risk tier and the consequences of errors.

3.2 Escalation path defined Required

There is a defined escalation path for cases the agent cannot handle with sufficient confidence. The escalation destination is a real person or queue, not a dead end.

3.3 Confidence thresholds Advisory

For agents that make classifications or decisions, a confidence threshold is defined below which the agent escalates to human review rather than acting. The threshold is calibrated to the risk of false positives and false negatives.

3.4 Override mechanism Required

Humans can override the agent's output or decision after the fact. The override mechanism is documented, accessible to the appropriate people, and the override is logged. (Required for high and critical risk tiers.)

3.5 Stop mechanism Required

The agent can be stopped. There is a documented, tested path to halt the agent immediately if it is producing harmful outputs or behaving unexpectedly. The person responsible for triggering the stop is identified.

3.6 Human escalation telemetry Required

Every human escalation event is emitted to the governance telemetry stream. The escalation rate is monitored. An unusual change in escalation rate — spike or drop — triggers a governance review.

3.7 Consequential decision review Attestation

For agents that make decisions with direct consequences for people — hiring, lending, benefits, healthcare — the builder has confirmed that a human reviews decisions before they take effect, or that an appeal mechanism is available to affected individuals.

3.8 Oversight documentation Attestation

The oversight model is documented in plain language that affected stakeholders can understand. "The system automatically approves requests under $500" is documentation. "AI-powered decisioning" is not.


04 · OUTPUT VALIDATION AND QUALITY

Output Validation and Quality

4.1 Output schema validation Required

If the agent produces structured output (JSON, CSV, a database record, a classification), the output is validated against a schema before it is used. Invalid outputs are rejected, not silently coerced.

4.2 Output range and plausibility checks Advisory

For numerical or quantitative outputs, plausibility checks are implemented. An agent that produces a price estimate of $0.00 or $999,999,999.99 should not forward that to a billing system without review.

4.3 Hallucination risk assessment Attestation

The builder has considered the agent's exposure to hallucination and documented the mitigations. For tasks where hallucinated outputs are high-risk (medical, legal, financial), additional validation or human review is required.

4.4 Citation and sourcing Advisory

Where the agent's outputs reference facts, documents, or data, sources are traceable. The agent does not fabricate citations. If the agent uses retrieval-augmented generation, the retrieved content is logged.

4.5 Output sanitization Required

Agent outputs that are rendered in a UI, written to a database, or forwarded to another system are sanitized appropriately. SQL injection, XSS, and prompt injection via output are considered and mitigated.

4.6 Evaluation suite Advisory

A set of known-good inputs and expected outputs exists for the agent's primary tasks. This evaluation suite is run before any significant change to prompts, model versions, or tool configurations. Results are logged.

4.7 Edge case inventory Advisory

The builder has documented the inputs for which the agent's behavior is uncertain or known to be poor. These edge cases are either handled explicitly or routed to human review.


05 · SECURITY

Security

5.1 Prompt injection defense Required

The agent's input handling considers prompt injection. Inputs from untrusted sources — user text, retrieved documents, external API responses — are not directly concatenated into system prompts without sanitization or structural separation.

5.2 Least privilege Required

The agent has access to only the data sources and tools it needs for its stated purpose. Credentials are scoped to the minimum necessary permissions. An agent that reads a database does not have write access unless it writes.

5.3 Credential management Required

The agent's credentials — API keys, database passwords, OAuth tokens — are stored in a secrets manager or environment variables. They are not hardcoded in source code, prompt templates, or configuration files checked into version control.

5.4 Credential rotation Advisory

The agent's credentials are rotated on a defined schedule or on personnel change. The rotation process is documented and tested. Rotation does not require downtime.

5.5 Tool call validation Advisory

For agents that invoke tools (APIs, code execution, file system operations), the tool call parameters are validated before execution. The agent cannot be instructed to call a tool with parameters outside its expected range.

5.6 Output injection prevention Required

The agent's outputs are not forwarded directly to systems that interpret them as instructions. An agent that writes to a message queue does not produce messages that downstream consumers will execute as code or commands.

5.7 Dependency security Advisory

The agent's dependencies — SDK versions, model client libraries, tool libraries — are tracked and updated. Known vulnerabilities in dependencies are remediated on a defined schedule.

5.8 Secrets in prompts Required

Prompts do not contain secrets, credentials, or sensitive configuration values. If prompts must reference configuration, they use references to secrets manager values, not the values themselves.

5.9 Agent-to-agent trust Advisory

If the agent receives instructions from another agent, those instructions are treated with appropriate skepticism. The source agent's identity is verified where possible. Instructions that expand the receiving agent's permissions or scope are rejected.


06 · BIAS AND FAIRNESS

Bias and Fairness

6.1 Demographic impact assessment Attestation

For agents that make or influence decisions affecting people, the builder has assessed whether the agent's outputs vary systematically by demographic group. The assessment is documented and mitigations are implemented where disparate impact is found. (Required for high and critical risk tiers.)

6.2 Training data appropriateness Attestation

The builder has considered whether the base model's training data is appropriate for the agent's use case. Known limitations of the model — demographic biases, knowledge cutoffs, domain gaps — are documented and handled.

6.3 Protected categories Required

The agent does not use protected characteristics (race, gender, age, religion, national origin, disability status) as inputs to decisions unless explicitly permitted by law and documented. Even where permitted, their use is reviewed by legal counsel. (Required for agents making consequential decisions about people.)

6.4 Feedback loop monitoring Advisory

If the agent's outputs influence future inputs — recommendations that shape behavior, classifications that affect what data the agent sees — the feedback loop is documented and monitored for drift toward harmful patterns.

6.5 Appropriate use boundaries Attestation

The agent has defined boundaries for what it will and will not do. These boundaries are implemented in the prompt and tested. The agent refuses requests outside its intended scope rather than attempting them with lower quality.

6.6 Vulnerable population handling Attestation

For agents that interact with or make decisions about vulnerable populations (minors, elderly, people with disabilities, people in crisis), additional safeguards are implemented and documented. (Where applicable.)


07 · TRANSPARENCY AND DISCLOSURE

Transparency and Disclosure

7.1 AI disclosure Required

People who interact with or are significantly affected by the agent's outputs are informed that AI is involved. The disclosure is clear, not buried in terms of service. (Required for high and critical risk tiers.)

7.2 Decision explainability Advisory

Where the agent makes or contributes to a decision, a plain-language explanation of the decision is available to the affected person on request. "The system flagged your application for manual review" is not sufficient. "Your application was flagged because the income documentation did not match the stated salary" is.

7.3 Human review availability Advisory

Affected individuals can request human review of an agent's decision. The request path is documented and accessible. Requests are fulfilled on a defined timeline.

7.4 Agent identity disclosure Required

Agents that interact with humans in real time — chat, voice, email — identify themselves as AI when directly asked. They do not claim to be human. (Required for agents that interact with humans directly.)

7.5 Limitation disclosure Attestation

Known limitations of the agent are disclosed to the people who depend on its outputs. Users who act on agent outputs know the conditions under which those outputs are unreliable.


08 · CODE QUALITY AND MAINTAINABILITY

Code Quality and Maintainability

8.1 Prompt documentation Required

System prompts are documented. The intent of each instruction is explained in comments or accompanying documentation. Someone other than the original author can understand why the prompt is structured as it is.

8.2 Version control Required

All agent code, prompt templates, configuration, and tool definitions are in version control. Changes are tracked. The current production version is identifiable.

8.3 Change log Advisory

Significant changes to the agent — prompt revisions, model version updates, tool additions, behavioral changes — are recorded in a change log with the date, the change, the reason, and the person responsible.

8.4 Unit tests Required

The agent's core logic — input parsing, output formatting, tool call construction, error handling — has unit tests. Tests cover happy paths and known edge cases. Tests pass before any deployment.

8.5 Integration tests Advisory

End-to-end tests exist that exercise the agent's full workflow with realistic inputs. These tests are run before significant changes and their results are logged.

8.6 Readable code Advisory

The agent's code is written for the next developer, not just for the computer. Functions are named clearly. Complex logic is commented. Magic values are named constants. The codebase can be understood by a competent developer in a reasonable amount of time.

8.7 Dependency documentation Required

The agent's dependencies — libraries, external services, model endpoints — are documented. Versions are pinned in the dependency manifest. A new developer can set up a working environment from the documentation.

8.8 Known limitations documented Advisory

The code includes comments identifying known limitations, workarounds, and technical debt. Future maintainers are not surprised by behavior they cannot explain.

8.9 Dead code removed Advisory

Commented-out code, unused imports, experimental features, and debug logging are not present in the production codebase. If something is not used, it is deleted.


09 · VERSIONING AND CHANGE MANAGEMENT

Versioning and Change Management

9.1 Model version pinning Required

The agent specifies an explicit model version, not a floating alias like "latest." The behavior of a pinned version is stable. Model updates are deliberate, not automatic.

9.2 Model update process Required

When the organization wants to update the model version, there is a documented process: update in staging, run the evaluation suite, compare outputs against the previous version, approve, deploy. The update is not made directly in production.

9.3 Prompt change process Required

Changes to system prompts follow the same process as code changes: version control, review, testing against the evaluation suite, documented rationale. A prompt change that affects production behavior is not made casually.

9.4 Behavioral regression testing Advisory

Before any significant change — model update, prompt change, tool addition — the agent's outputs on a standard set of inputs are compared to the previous version. Unexpected regressions are investigated before the change is deployed.

9.5 Rollback capability Required

The previous version of the agent — previous prompt, previous model version, previous tool configuration — can be restored within a defined time window. The rollback process is documented and has been tested.

9.6 Staged rollout Advisory

For significant changes to high-traffic agents, changes are rolled out to a subset of traffic before full deployment. Monitoring confirms expected behavior at partial deployment before full rollout proceeds.

9.7 Deprecation planning Advisory

When the agent will be retired or replaced, the deprecation plan is documented and communicated to stakeholders before the agent is shut down. Data produced by the agent is retained or migrated per the relevant retention policy.


10 · COST AND RESOURCE MANAGEMENT

Cost and Resource Management

10.1 Cost attribution Required

The agent's costs are attributed to a cost center, project, or owner. Finance can answer "how much did this agent cost last month" without a manual investigation.

10.2 Spend limit Required

A spend limit is defined for the agent. If the agent exceeds the limit, it stops or alerts. The limit is appropriate to the agent's expected workload and the organization's tolerance for unexpected spend.

10.3 Per-call cost awareness Advisory

The builder knows the approximate cost per invocation of the agent. Cost per call is documented. Input and output token counts are logged. Cost efficiency is a design consideration, not an afterthought.

10.4 Cost anomaly detection Required

Unusual changes in the agent's cost are detected and alerted. A 10x spike in cost triggers a notification to the agent owner. The owner can determine whether the spike is expected. (Covered by Signet telemetry where Signet is deployed.)

10.5 Token efficiency Advisory

Prompts do not include unnecessary context. Conversation history is managed — old context is truncated or summarized rather than passed indefinitely. Token usage is periodically reviewed for waste.

10.6 Idle resource cleanup Advisory

Agents that are no longer in use are deprecated and their infrastructure is cleaned up. Zombie agents — deployed but no longer invoked — are identified in the registry and reviewed.


Mapping to Regulatory Frameworks

The following table maps CBARS domains to the primary regulatory frameworks relevant to enterprise AI governance. Mappings are conservative — only direct, defensible connections are listed.

Domain EU AI Act 2024 NIST AI RMF 1.0 ISO 42001
1. Data handling Art. 10 (data governance) MAP, MEASURE 6.1, 8.4
2. Resiliency Art. 15 (robustness) MEASURE, MANAGE 8.5
3. Human oversight Art. 14 (human oversight) GOVERN 6.2
4. Output validation Art. 15 (accuracy) MEASURE 8.5
5. Security Art. 15 (cybersecurity) MANAGE 8.3
6. Bias and fairness Art. 10, Art. 9 (risk management) MAP, MEASURE 6.1
7. Transparency Art. 13 (transparency) GOVERN 7.5
8. Code quality Art. 11 (technical documentation) GOVERN 8.2
9. Change management Art. 9 (risk management) MANAGE 8.5
10. Cost management GOVERN

Submit a comment

CBARS is a call for comments. We want input from senior engineers, security teams, governance teams, and anyone who has been burned by the things it covers. All substantive comments are acknowledged and addressed in the public change log.

Download CBARS v0.1

PDF format. Freely reproducible with attribution.
Signet ships this standard with every provisioning event.

Download PDF →