[OR] [P2] [S4] [ST] [C11] Metrics and Evaluation of Results

[P2] [S4] Chapter 11

Metrics and Evaluation of Results

Introduction

1501 Resilience Gears mechanical alignment Scenario testing generates significant insights—but only when results are properly measured and evaluated. Without structured metrics and evaluation frameworks, organisations risk reducing scenario testing to a procedural exercise rather than a meaningful validation of resilience.

Evaluation bridges the gap between execution and improvement. It enables organisations to determine whether they remained within defined impact tolerances, how effectively they responded to disruption, and where critical weaknesses exist.

In the context of operational resilience, evaluation is not simply about whether a test “passed” or “failed,” but about understanding performance under stress and identifying opportunities to strengthen resilience capabilities.

Purpose of the Chapter

The purpose of this chapter is to assess performance against resilience objectives. It provides a structured approach to defining key quantitative metrics, conducting qualitative assessments, and performing gap analysis and root cause identification to support continuous improvement.

Key Metrics for Scenario Testing

Metrics provide objective evidence of how the organisation performed during the scenario. They should be aligned to impact tolerance, ensuring that testing outcomes reflect real resilience expectations.

Impact Tolerance Breaches

The most critical metric in operational resilience is whether the organisation remained within its defined impact tolerance.

Key considerations:

Was the maximum tolerable downtime (MTD) exceeded?
Was there a breach of maximum tolerable data loss (MTDL)?
Were customer impact thresholds exceeded (e.g., number of affected customers)?
Were regulatory obligations breached?

Evaluation approach:

No breach: Service remained within tolerance → indicates acceptable resilience level
Near breach: Threshold approached → highlights potential vulnerabilities
Breach: Threshold exceeded → requires immediate remediation and escalation

Impact tolerance breaches are often the primary focus for regulators, as they directly reflect the organisation’s ability to maintain critical services.

Recovery Time vs Thresholds

Recovery performance is a fundamental measure of resilience. It assesses how quickly services can be restored compared to predefined thresholds.

Key metrics include:

Actual recovery time vs Recovery Time Objective (RTO)
Time to detect the incident
Time to initiate response actions
Time to fully restore service

Evaluation insights:

Delays in detection may indicate monitoring gaps
Delays in response may highlight escalation or coordination issues
Delays in recovery may reveal technical or resource constraints

Tracking recovery time provides a clear indication of operational readiness and efficiency.

Service Degradation Levels

Operational resilience is not only about complete service outages but also about partial degradation.

Examples of service degradation:

Slower transaction processing
Reduced system availability
Limited functionality (e.g., read-only access)
Increased error rates

Metrics to consider:

Percentage reduction in service capacity
Duration of degraded performance
Volume of failed or delayed transactions

Importance:

Service degradation often occurs before full failure and can significantly impact customers. Evaluating degradation helps organisations understand resilience under stress, not just during complete outages.

Qualitative Assessment

While quantitative metrics provide measurable outcomes, qualitative assessment captures the human and organisational dimensions of resilience. These are often the determining factors in real-world incidents.

Decision-Making Effectiveness

Scenario testing provides a unique opportunity to evaluate how decisions are made under pressure.

Assessment areas include:

Speed of decision-making
Appropriateness of decisions
Alignment with policies and procedures
Ability to prioritise competing objectives

Key questions:

Were decisions made promptly?
Were escalation thresholds clearly understood?
Did decision-makers have sufficient information?

Strong decision-making is critical, particularly at the crisis management level, where delays or poor judgment can amplify the impact of disruption.

Communication and Coordination

Effective communication is essential for managing disruption across teams and stakeholders.

Internal communication:

Clarity and timeliness of information sharing
Coordination between business, IT, and risk functions
Effectiveness of escalation channels

External communication:

Customer communication
Regulatory notification
Media handling

Coordination assessment:

Were roles and responsibilities clearly understood?
Did teams collaborate effectively?
Were there any breakdowns in communication flow?

Poor communication is one of the most common causes of failure during crises, even when technical recovery is successful.

Behavioural and Cultural Factors

Qualitative evaluation should also consider organisational behaviour, including:

Leadership effectiveness
Team collaboration
Adherence to established processes
Ability to operate under stress

These factors provide insight into the organisation’s resilience culture, which is a key determinant of sustained operational resilience.

Gap Analysis

Gap analysis involves comparing actual performance against expected performance, identifying areas where resilience capabilities fall short.

Identifying Gaps

Gaps may arise in multiple areas, including:

Technology (e.g., system recovery limitations)
Processes (e.g., unclear procedures or workflows)
People (e.g., lack of training or awareness)
Third-party dependencies (e.g., vendor response delays)

Categorising Gaps

For effective prioritisation, gaps should be categorised based on:

Severity (critical, high, medium, low)
Impact on CBS
Likelihood of occurrence
Regulatory implications

Linking Gaps to Impact Tolerance

A critical aspect of gap analysis is understanding how identified gaps relate to impact tolerance:

Does the gap increase the risk of future breaches?
Does it affect recovery capability or response time?
Does it expose systemic weaknesses across multiple CBS?

This ensures that remediation efforts are aligned with resilience priorities.

Root Cause Identification

Identifying gaps is not sufficient—organisations must understand the root causes behind them to implement effective corrective actions.

Root Cause Analysis Techniques

Common techniques include:

5 Whys Analysis – drilling down to underlying causes
Fishbone (Ishikawa) Diagrams – categorising causes across people, process, and technology
Timeline Analysis – identifying where delays or failures occurred

Types of Root Causes

Root causes typically fall into the following categories:

a. Process Failures

Inadequate or outdated procedures
Lack of clear escalation paths

b. Technology Limitations

System capacity constraints
Ineffective failover mechanisms

c. People and Capability Issues

Insufficient training or awareness
Lack of decision-making authority

d. Third-Party Weaknesses

Vendor dependency risks
Lack of contractual resilience requirements

From Symptoms to Causes

Organisations must avoid focusing only on symptoms (e.g., delayed recovery) and instead identify underlying causes (e.g., unclear escalation protocols or lack of system redundancy).

Integrating Evaluation into Continuous Improvement

Evaluation should not be a standalone activity but part of a broader continuous improvement cycle.

Documentation of Findings

All metrics, observations, gaps, and root causes should be documented in a structured report.

Prioritisation of Actions

Actions should be prioritised based on:

Risk to CBS
Likelihood of recurrence
Regulatory expectations

Feedback into the Resilience Framework

Evaluation outcomes should feed into:

Updates to impact tolerance
Improvements in business continuity and crisis management plans
Enhancements to technology and infrastructure
Refinement of scenario testing approaches

Tracking and Governance

Remediation actions should be tracked through governance structures, ensuring accountability and timely closure.

Metrics and evaluation are the foundation of effective scenario testing. By combining quantitative measures such as impact tolerance breaches, recovery times, and service degradation with qualitative assessments of decision-making and communication, organisations can gain a comprehensive view of their resilience performance.

Through structured gap analysis and root cause identification, scenario testing becomes a powerful diagnostic tool—highlighting weaknesses and driving targeted improvements.

Ultimately, a robust evaluation framework ensures that scenario testing delivers meaningful insights, enabling organisations to strengthen their ability to withstand disruption and consistently operate within defined impact tolerances.

C1	C2	C3	C4	C5

C6	C7	C8	C9	C10

C11	C12	C13	C14	C15

C16	C17	C18	C19	C20

More Information About OR-5000 [OR-5] or OR-300 [OR-3]

To learn more about the course and schedule, click the buttons below for the OR-300 Operational Resilience Implementer and OR-5000 Operational Resilience Expert Implementer courses.



	If you have any questions, click to contact us.

Conducting Scenario Testing: A Practical Guide for Operational Resilience Implementation

[OR] [P2] [S4] [ST] [C11] Metrics and Evaluation of Results

Operational Resilience Certified Planner-Specialist-Expert

[P2] [S4] Chapter 11

Metrics and Evaluation of Results

Introduction

Purpose of the Chapter

Key Metrics for Scenario Testing

Impact Tolerance Breaches

Recovery Time vs Thresholds

Service Degradation Levels

Qualitative Assessment

Decision-Making Effectiveness

Communication and Coordination

Behavioural and Cultural Factors

Gap Analysis

Identifying Gaps

Categorising Gaps

Linking Gaps to Impact Tolerance

Root Cause Identification

Root Cause Analysis Techniques

Types of Root Causes

From Symptoms to Causes

Integrating Evaluation into Continuous Improvement

Documentation of Findings

Prioritisation of Actions

Feedback into the Resilience Framework

Tracking and Governance

More Information About OR-5000 [OR-5] or OR-300 [OR-3]

Comments:

Conducting Scenario Testing: A Practical Guide for Operational Resilience Implementation

[OR] [P2] [S4] [ST] [C11] Metrics and Evaluation of Results

Operational Resilience Certified Planner-Specialist-Expert

hbspt.cta._relativeUrls=true;hbspt.cta.load(3893111, '58ea4ab9-998a-451c-bb54-801ae4cb782d', {"useNewLoader":"true","region":"na1"});

[P2] [S4] Chapter 11

hbspt.cta._relativeUrls=true;hbspt.cta.load(3893111, 'a8ba87ed-c76c-4e8f-b4fa-4eb3aed30b67', {"useNewLoader":"true","region":"na1"}); Metrics and Evaluation of Results

Introduction

Purpose of the Chapter

Key Metrics for Scenario Testing

Impact Tolerance Breaches

Recovery Time vs Thresholds

Service Degradation Levels

Qualitative Assessment

Decision-Making Effectiveness

Communication and Coordination

Behavioural and Cultural Factors

Gap Analysis

Identifying Gaps

Categorising Gaps

Linking Gaps to Impact Tolerance

Root Cause Identification

Root Cause Analysis Techniques

Types of Root Causes

From Symptoms to Causes

Integrating Evaluation into Continuous Improvement

Documentation of Findings

Prioritisation of Actions

Feedback into the Resilience Framework

Tracking and Governance

More Information About OR-5000 [OR-5] or OR-300 [OR-3]

Comments:

Metrics and Evaluation of Results