A structured governance assessment for AI systems approaching deployment or scale. Evaluates decision criticality, explainability, human-in-the-loop design, data and bias risk, operational readiness, clinical safety, regulatory alignment, and post-deployment monitoring. Delivers a Go, Conditional Go, or No-Go recommendation.
9 governance layers
27 questions
Approximately 20 minutes
Self-assessment version
Self-assessment version. The facilitated diagnostic includes independent validation, layer-by-layer analysis, and a board-ready Go / Conditional Go / No-Go brief with explicit approval conditions.
Layer 1 of 9: Decision Criticality & Scope0% complete
Layer 1 of 9
Decision Criticality & Scope
Establishes whether the AI system is operating in a decision domain where errors carry material clinical, financial, or reputational consequences.
High-stakes: Errors are difficult to detect or reverse
Question 1
Is the AI system's decision scope precisely defined — including what it is permitted to decide, recommend, flag, or withhold — and is this documented and agreed with clinical and operational leadership?
Satisfied — 3 pointsThe system's decision scope is formally documented, approved by relevant leadership, and operationally enforced through technical constraints.
Partial — 2 pointsA general scope definition exists but lacks formal approval or complete technical enforcement boundaries.
Not Satisfied — 0 pointsThe system's scope is informal or poorly defined. Scope creep is a real risk.
Question 2
Has the system been explicitly classified by decision criticality — and are higher-criticality functions subject to stricter governance controls than lower-criticality ones?
Satisfied — 3 pointsA formal criticality classification framework is in place. High-criticality functions have additional oversight, approval, and monitoring requirements.
Partial — 2 pointsCriticality is informally understood but has not been translated into differentiated governance controls.
Not Satisfied — 0 pointsNo criticality classification exists. All AI functions are governed uniformly regardless of consequence.
Question 3
Is there a defined mechanism for detecting when the AI system is operating outside its intended decision scope — and a clear protocol for escalation when this occurs?
Satisfied — 3 pointsOut-of-scope detection mechanisms exist (technical and/or procedural), with a tested escalation pathway and documented response protocol.
Partial — 2 pointsEscalation protocols exist but out-of-scope detection relies primarily on manual oversight rather than systematic monitoring.
Not Satisfied — 0 pointsNo systematic out-of-scope detection. The system could drift beyond its intended scope without triggering any structured response.
Layer 2 of 9
Explainability & Transparency
Assesses whether the AI system can provide clear, auditable explanations for its outputs — at the level required by clinical reviewers, senior leadership, and regulators.
Question 4
Can the system produce an explanation of its output — in plain language — that is sufficient for the reviewing clinician or decision-maker to assess and, if necessary, override it?
Satisfied — 3 pointsThe system generates plain-language explanations of sufficient detail that users can assess reasoning, identify relevant factors, and make informed override decisions.
Partial — 2 pointsExplanations are available but are technically complex, incomplete, or require interpretation by data science staff rather than clinical reviewers.
Not Satisfied — 0 pointsThe system operates as a "black box". Outputs are produced without accessible explanation.
Question 5
Are confidence scores or uncertainty estimates provided alongside AI outputs — and are users trained to interpret and act on them appropriately?
Satisfied — 3 pointsConfidence scores or uncertainty indicators are displayed, interpreted, and acted upon by trained users as part of the standard workflow.
Partial — 2 pointsConfidence scores are available but user training on their interpretation is incomplete or inconsistent.
Not Satisfied — 0 pointsNo confidence scoring or uncertainty communication. Users receive binary outputs without reliability context.
Question 6
Is there a complete and auditable record of each AI decision, including the input data, model version, output, confidence level, and the human action taken in response?
Satisfied — 3 pointsA complete, tamper-evident audit trail is maintained for all AI-assisted decisions, accessible to compliance, clinical governance, and regulators on request.
Partial — 2 pointsLogging exists but is incomplete — missing some data fields (e.g. model version, human response) or not readily accessible for audit.
Not Satisfied — 0 pointsNo comprehensive audit trail. AI decisions cannot be reconstructed or reviewed after the fact.
Layer 3 of 9
Human-in-the-Loop Design
Evaluates whether the system has been designed to keep human judgment meaningfully in the decision loop — not as a procedural formality, but as a genuine check on AI outputs.
Question 7
Is the human review step in this system designed as a genuine check — with sufficient time, information, and authority for the reviewer to make an independent assessment — rather than a rubber-stamp workflow?
Satisfied — 3 pointsHuman review is structured to support genuine independent assessment — reviewers receive full context, have adequate time, and are empowered to override without friction.
Partial — 2 pointsReview occurs but workflow incentives (speed, volume, alert fatigue) undermine the quality of human oversight in practice.
Not Satisfied — 0 pointsHuman review is nominal. Reviewers typically accept AI outputs without independent assessment due to volume, time, or systemic pressure.
Question 8
Are override rates actively monitored — and is there a structured process for investigating patterns in overrides that may indicate model drift, population shift, or systematic failure?
Satisfied — 3 pointsOverride rates are tracked, analysed regularly, and trigger structured investigation when thresholds are exceeded. Findings feed into model review cycles.
Partial — 2 pointsOverrides are logged but aggregate analysis is infrequent or lacks defined investigation triggers.
Not Satisfied — 0 pointsOverride rates are not systematically tracked or analysed. The organisation lacks insight into whether human-AI agreement is meaningful or forced.
Question 9
Is there a clearly defined and tested "kill switch" — a mechanism to pause or halt the AI system rapidly if a critical error, safety concern, or scope breach is identified?
Satisfied — 3 pointsA tested kill switch mechanism exists with defined authority for activation, a clear fallback workflow, and a tested recovery process.
Partial — 2 pointsA suspension mechanism exists but has not been tested under realistic conditions or lacks clear authority assignment.
Not Satisfied — 0 pointsNo tested kill switch mechanism. Halting the system under emergency conditions would be disruptive or operationally unclear.
Layer 4 of 9
Data Quality & Bias Risk
Tests whether the training and operational data is of sufficient quality, representativeness, and recency to support safe deployment — and whether known bias risks have been assessed and mitigated.
Question 10
Is the model's training data representative of the population on which it will be deployed — including age, gender, clinical complexity, and any relevant demographic or coding patterns specific to your context?
Satisfied — 3 pointsTraining data representativeness has been formally assessed and validated against the deployment population, with documented gaps and mitigation steps.
Partial — 2 pointsRepresentativeness has been considered but a formal validation has not been conducted or gaps have been identified without mitigation.
Not Satisfied — 0 pointsNo representativeness assessment. The model has been trained on available data without formal evaluation of its fit to the deployment population.
Question 11
Has the model been tested for differential performance across demographic subgroups — and are performance gaps documented, accepted by governance, and monitored in production?
Satisfied — 3 pointsSubgroup performance testing has been conducted, results reviewed by governance, performance gaps are documented, and ongoing monitoring is in place.
Partial — 2 pointsSubgroup testing has been done but governance review is incomplete or ongoing monitoring has not been implemented.
Not Satisfied — 0 pointsNo subgroup performance testing. The model may systematically underperform for specific populations without detection.
Question 12
Is there a defined data drift detection process — specifically designed to identify when the operational data distribution has diverged sufficiently from training data to degrade model performance?
Satisfied — 3 pointsAutomated or structured data drift monitoring is in place with defined alert thresholds and a model review/retrain trigger protocol.
Partial — 2 pointsData drift monitoring exists but relies primarily on manual periodic review rather than systematic automated detection.
Not Satisfied — 0 pointsNo data drift monitoring. Model degradation due to distribution shift would not be detected until outcome quality declines.
Layer 5 of 9
Operational Readiness
Evaluates whether the operational environment — clinical workflow, IT infrastructure, and staff capability — is prepared to support safe and effective AI deployment at scale.
Question 13
Has the AI system been integrated into the clinical workflow in a way that minimises friction, prevents alert fatigue, and supports rather than disrupts the review process?
Satisfied — 3 pointsWorkflow integration has been co-designed with clinical teams, alert thresholds have been calibrated to prevent fatigue, and usability has been tested in production conditions.
Partial — 2 pointsIntegration exists but alert fatigue, friction, or usability issues have been identified and not yet resolved.
Not Satisfied — 0 pointsThe system has been deployed without structured workflow integration analysis. Clinician adoption is low or resistance is significant.
Question 14
Are all staff who interact with the AI system — including reviewers, supervisors, and technical operators — trained specifically on this system's capabilities, limitations, and failure modes?
Satisfied — 3 pointsRole-specific training has been completed for all user groups, covering capabilities, limitations, known failure modes, and escalation protocols. Training is refreshed regularly.
Partial — 2 pointsTraining exists but is generic, incomplete for some user groups, or does not cover failure modes and limitations in sufficient depth.
Not Satisfied — 0 pointsNo structured, system-specific training. Staff rely on informal knowledge and vendor documentation only.
Question 15
Is there a defined and tested fallback procedure — describing exactly how clinical and operational teams should function if the AI system becomes unavailable or produces manifestly unreliable outputs?
Satisfied — 3 pointsFallback procedures are documented, tested, and known to all relevant staff. The organisation can operate safely at full capacity without the AI system.
Partial — 2 pointsFallback procedures exist in theory but have not been tested or are not widely known to operational staff.
Not Satisfied — 0 pointsNo tested fallback. The organisation has become operationally dependent on the AI system without a safe contingency.
Layer 6 of 9
Governance & Accountability
Tests whether clear ownership, accountability structures, and governance processes are in place to manage the AI system through its operational lifecycle.
Question 16
Is there a named individual — at sufficient seniority — who holds explicit accountability for this AI system's performance, safety, and compliance outcomes?
Satisfied — 3 pointsA named senior individual (clinical or executive) holds formal accountability, with this responsibility recorded in governance documents and board visibility.
Partial — 2 pointsAccountability is distributed across a team or is held at a level below what the system's criticality demands.
Not Satisfied — 0 pointsNo named accountability. If the system causes harm, it is unclear who is responsible.
Question 17
Is there a formal AI governance committee or equivalent body that reviews this system's performance, incidents, and proposed changes at defined intervals?
Satisfied — 3 pointsA formal governance body with relevant clinical, technical, and executive representation meets at defined intervals and maintains decision records.
Partial — 2 pointsOversight exists informally or via a general IT/clinical committee without dedicated AI governance protocols.
Not Satisfied — 0 pointsNo formal oversight. The system operates without structured periodic governance review.
Question 18
Is there an incident reporting and investigation mechanism specifically designed for AI-related events — including near-misses, unexpected outputs, and user-reported anomalies?
Satisfied — 3 pointsA dedicated AI incident reporting pathway exists, is actively used, and feeds into structured root-cause analysis and governance review.
Partial — 2 pointsAI incidents are reported through general clinical incident systems not specifically designed for AI failure modes.
Not Satisfied — 0 pointsNo structured AI incident reporting. Anomalies and near-misses are not systematically captured or investigated.
Layer 7 of 9
Clinical Safety & Error Tolerance
Assesses whether the consequences of AI error have been explicitly modelled — and whether the system's error tolerance matches the clinical context in which it operates.
Question 19
Has the clinical harm potential of false positives and false negatives been explicitly assessed — and are the thresholds accepted by clinical governance with documented rationale?
Satisfied — 3 pointsError consequences (false positives and negatives) have been clinically assessed, acceptable thresholds defined, and these have been formally reviewed and accepted by clinical governance.
Partial — 2 pointsError impact has been considered informally, but formal threshold-setting and governance acceptance have not been completed.
Not Satisfied — 0 pointsNo formal clinical error impact assessment. The system has been deployed without explicit agreement on what constitutes an acceptable error rate.
Question 20
For high-stakes or irreversible clinical decisions, does the system architecture require explicit human confirmation before action is taken — preventing autonomous AI-driven decisions?
Satisfied — 3 pointsHigh-stakes decisions require explicit human confirmation as a technical requirement — the system cannot act autonomously in these contexts.
Partial — 2 pointsHuman confirmation is the intended process but is not technically enforced — it could be bypassed in practice.
Not Satisfied — 0 pointsThe system can take or trigger high-stakes actions autonomously or with minimal human confirmation.
Question 21
Has the system undergone independent clinical safety testing — distinct from technical validation — in conditions representative of its intended deployment environment?
Satisfied — 3 pointsIndependent clinical safety testing has been conducted in realistic conditions by parties independent of the development team, with documented findings.
Partial — 2 pointsClinical testing has occurred but was conducted by the development team or in conditions that do not fully represent the deployment environment.
Not Satisfied — 0 pointsNo independent clinical safety testing. Validation has been technical only.
Layer 8 of 9
Regulatory & Compliance Alignment
Assesses whether the AI system has been reviewed against applicable regulatory frameworks — and whether compliance obligations are actively managed through the operational lifecycle.
Question 22
Has the system been reviewed against applicable regulatory requirements — including SFDA guidelines, NPHIES compliance obligations, CHI standards, or equivalent frameworks in your jurisdiction?
Satisfied — 3 pointsA formal regulatory review has been completed, applicable requirements identified, compliance gaps addressed, and regulatory status is actively maintained.
Partial — 2 pointsA review has occurred but is incomplete, informal, or has not resulted in documented compliance positions for all applicable requirements.
Not Satisfied — 0 pointsNo formal regulatory review. Compliance status is assumed rather than verified.
Question 23
Is there a process to assess the compliance implications of model updates, data changes, or scope expansions — preventing inadvertent regulatory exposure from operational changes?
Satisfied — 3 pointsA change control process requires compliance review before material model or scope changes are implemented. Records of compliance decisions are maintained.
Partial — 2 pointsChange control exists for technical changes but does not systematically trigger compliance review for model or scope modifications.
Not Satisfied — 0 pointsNo compliance gate in the change management process. Regulatory exposure can accumulate undetected through routine model updates.
Question 24
Are data privacy and patient consent obligations for AI-processed data explicitly assessed, documented, and compliant — not assumed to be covered by existing general consent frameworks?
Satisfied — 3 pointsAI-specific data privacy and consent requirements have been formally assessed, documented, and verified as compliant with applicable law and patient rights frameworks.
Partial — 2 pointsPrivacy and consent have been considered but AI-specific obligations have not been assessed separately from general clinical consent.
Not Satisfied — 0 pointsNo AI-specific privacy or consent assessment. The organisation assumes existing frameworks are sufficient.
Layer 9 of 9
Post-Deployment Monitoring
The final governance layer. Assesses whether the organisation has the infrastructure and commitment to monitor, learn from, and continuously improve the AI system after it goes live.
Question 25
Are post-deployment performance metrics for this AI system defined, measured, and reviewed against pre-deployment benchmarks at regular intervals?
Satisfied — 3 pointsPerformance metrics are defined pre-deployment, measured continuously in production, and formally compared against pre-deployment benchmarks at defined review intervals.
Partial — 2 pointsSome post-deployment monitoring exists but it is ad hoc, lacks pre-defined benchmarks, or is not reviewed at a governance level.
Not Satisfied — 0 pointsNo structured post-deployment performance monitoring. The system is assumed to be performing as expected without measurement.
Question 26
Is there a defined model review and retraining schedule — and are retrain decisions based on objective performance criteria rather than arbitrary timelines?
Satisfied — 3 pointsA defined retraining protocol exists with performance-based triggers (not just calendar-based), validation requirements before redeployment, and governance sign-off.
Partial — 2 pointsRetraining is scheduled but based on fixed intervals rather than objective performance triggers. Governance sign-off is informal.
Not Satisfied — 0 pointsNo defined retraining protocol. The model version in production may drift significantly from optimal performance before review.
Question 27
Does the organisation have a structured mechanism for clinical and operational staff to provide ongoing feedback on AI system performance — and are these insights systematically incorporated into governance and model review?
Satisfied — 3 pointsA structured feedback mechanism is in place, actively used, and findings are formally reviewed and incorporated into governance and model improvement cycles.
Partial — 2 pointsInformal feedback channels exist but there is no structured mechanism to ensure insights are captured, reviewed, and acted upon.
Not Satisfied — 0 pointsNo feedback mechanism. Frontline experience with the system is not captured or incorporated into governance.
AI Deployment Governance Gate — Results
Your AI Decision Risk Profile
0
/ 81
Calculating...
Processing your responses...
Layer-by-Layer Breakdown
Decision Criticality & Scope
0/9
Explainability & Transparency
0/9
Human-in-the-Loop Design
0/9
Data Quality & Bias Risk
0/9
Operational Readiness
0/9
Governance & Accountability
0/9
Clinical Safety & Error Tolerance
0/9
Regulatory & Compliance Alignment
0/9
Post-Deployment Monitoring
0/9
Indicative Findings
Calculating recommendations...
This self-assessment demonstrates the methodology. The full governance gate goes further.
A facilitated HealthElevate AI Governance Gate engagement includes independent validation, layer-by-layer written analysis, risk mitigation mapping, and a board-ready Go / Conditional Go / No-Go brief with explicit approval conditions.