eBook 3: Chapter 5
Metrics and Performance Measurement
Introduction
Metrics and performance measurement are fundamental to validating operational resilience. While testing and exercising demonstrate capabilities, metrics provide the quantitative evidence needed to assess whether financial institutions can meet regulatory expectations and sustain their Critical Business Services (CBS).
The Monetary Authority of Singapore emphasises that financial institutions must establish clear recovery objectives, monitoring indicators, and performance thresholds to ensure effective response and recovery. These metrics enable organisations to track resilience performance, identify gaps, and drive continuous improvement.
This chapter explores three critical categories of resilience measurement: recovery metrics, service availability metrics, and resilience KPIs and KRIs.
Recovery Metrics (RTO / RPO)
Defining Recovery Metrics
Recovery metrics are the most fundamental indicators of resilience, defining how quickly and effectively a service or system can be restored after disruption.
Key metrics include:
- Recovery Time Objective (RTO)
The maximum acceptable time to restore a system or function after a disruption
- Recovery Point Objective (RPO)
The maximum acceptable amount of data loss measured in time
- Service Recovery Time Objective (SRTO)
A MAS-specific metric that defines the target time to restore a critical business service to a minimum acceptable level
MAS Expectations on Recovery Metrics
MAS requires financial institutions to:
- Establish SRTOs for each Critical Business Service
- Align recovery objectives with customer obligations and systemic impact
- Implement recovery strategies capable of meeting these objectives
SRTOs are particularly important as they shift the focus from system-level recovery to service-level outcomes, reinforcing the service-centric approach to operational resilience.
Role in Testing and Measurement
During scenario testing, recovery metrics are used to:
- Measure actual recovery time vs. target recovery time
- Identify gaps between planned and achieved recovery performance
- Validate whether CBS can be restored within acceptable thresholds
Key Challenges
- Unrealistic or overly optimistic RTO/SRTO definitions
- Lack of alignment between IT recovery and business service expectations
- Inconsistent measurement across systems and services
Service Availability Metrics
Moving Beyond Recovery
While recovery metrics focus on restoration after disruption, service availability metrics measure the ongoing reliability and continuity of services.
Key Service Availability Metrics
- Uptime / Availability Percentage
The proportion of time a service is operational (e.g., 99.9% availability)
- Transaction Success Rate
Percentage of transactions successfully processed without failure
- Service Degradation Levels
Measurement of reduced performance (e.g., latency, delays)
- Incident Frequency and Duration
Number and length of service disruptions over a given period
MAS Perspective
MAS expects financial institutions to ensure that critical business services remain available and can be promptly resumed following disruption.
This means availability metrics must be:
- Aligned with customer expectations
- Linked to impact tolerances
- Monitored continuously in real time
Role in Operational Resilience
Service availability metrics enable organisations to:
- Detect early signs of service degradation
- Prevent incidents from escalating into full disruptions
- Maintain service delivery within acceptable impact thresholds
Key Challenges
- Difficulty in measuring end-to-end service availability across dependencies
- Limited visibility across third-party and cloud environments
- Over-reliance on system-level metrics rather than service-level outcomes
Resilience KPIs and KRIs
Understanding KPIs and KRIs
Operational resilience requires a balanced set of:
- Key Performance Indicators (KPIs) – Measure effectiveness of resilience capabilities
- Key Risk Indicators (KRIs) – Signal potential risks and vulnerabilities
Together, they provide a forward-looking and backward-looking view of resilience.
Examples of Resilience KPIs
- The percentage of CBS meetings defined SRTO targets
- Success rate of scenario testing exercises
- Time taken to detect and respond to incidents
- Percentage of critical dependencies mapped and validated
Examples of Resilience KRIs
- Number of unresolved high-risk vulnerabilities
- Level of third-party concentration risk
- Frequency of system failures or near-misses
- Staff readiness and training coverage
MAS Expectations
MAS requires financial institutions to establish risk monitoring indicators and thresholds as part of their operational risk management framework.
This includes:
- Defining risk appetite and tolerance levels
- Monitoring indicators aligned to these thresholds
- Escalating breaches to senior management
Integrating KPIs and KRIs into Governance
Effective use of KPIs and KRIs requires:
- Regular reporting to senior management and the Board
- Integration into risk dashboards and governance frameworks
- Alignment with scenario testing and audit findings
Key Challenges
- Selecting meaningful and actionable indicators
- Avoiding excessive metrics that dilute focus
- Ensuring data accuracy and consistency
Linking Metrics to Impact Tolerance
Metrics must ultimately support the organisation’s impact tolerance framework.
This means:
- Recovery metrics validate whether services can be restored within tolerance
- Availability metrics ensure services remain within acceptable disruption levels
- KPIs and KRIs provide ongoing monitoring of resilience health
By linking all metrics to impact tolerance, organisations ensure that performance measurement is aligned with real business outcomes, not just technical indicators.
Embedding Metrics into Continuous Improvement
Metrics are not static—they must drive continuous improvement.
Financial institutions should:
- Analyse performance gaps identified during testing
- Update recovery strategies and controls
- Refine metrics based on evolving risks and business changes
- Incorporate lessons learned from incidents and near-misses
MAS emphasises that resilience frameworks must be continuously reviewed and enhanced to remain effective in a dynamic environment.
Metrics and performance measurement are essential for demonstrating operational resilience.
Guided by the expectations of the Monetary Authority of Singapore, financial institutions must establish robust recovery metrics, service availability indicators, and resilience KPIs and KRIs to validate their ability to deliver critical business services.
By integrating these metrics into testing, governance, and continuous improvement processes, organisations can move beyond theoretical resilience to measurable, evidence-based performance.
Ultimately, effective measurement ensures that resilience is not assumed—but quantified, monitored, and continuously strengthened.
| eBook 1 |
C1 |
C2 |
C3 |
C4 |
|
|
|
|
|
|
| eBook 2 |
C5 |
C6 |
C7 |
C8 |
|
|
|
|
|
|
| eBook 3 |
C9 |
C10 |
C11 |
C12 |
|
|
|
|
|
|
Gain Competency: For organisations looking to accelerate their journey, BCM Institute’s training and certification programs, including the OR-5000 Operational Resilience Expert Implementer course, provide in-depth insights and practical toolkits for effectively embedding this model.
More Information About OR-5000 [OR-5] or OR-300 [OR-3]
To learn more about the course and schedule, click the buttons below for the OR-300 Operational Resilience Implementer course and the OR-5000 Operational Resilience Expert Implementer course.
|
|
|
|
|
|
|
|
|
|
If you have any questions, click to contact us.
|
|
|
|
|
|