Without ongoing monitoring, even well-defined tolerances can quickly become outdated or misaligned with reality.
To remain effective, organisations must establish a structured approach to metrics, monitoring, and continuous improvement, ensuring that impact tolerances remain relevant, achievable, and aligned with both operational capability and regulatory expectations.
The purpose of this chapter is to:
Effective monitoring begins with clearly defined and measurable metrics.
Service availability is a fundamental indicator of resilience.
Key Measures:
Example:
|
CBS |
Target Availability |
Actual Performance |
|
Deposit Services |
99.9% |
99.7% |
|
Payment Services |
99.95% |
99.8% |
This measures how well the organisation performs relative to defined impact tolerances.
Key Measures:
Example
|
CBS |
Defined MTD |
Actual Recovery Time |
Result |
|
Deposit Services |
4 hours |
3.5 hours |
Within tolerance |
|
Payment Services |
2 hours |
2.5 hours |
Breach |
|
Digital Banking |
3 hours |
2 hours |
Within tolerance |
KRIs provide early warning signals that tolerance thresholds may be at risk.
KRIs should be:
|
KRI |
Threshold |
Action Trigger |
|
System uptime degradation |
< 98% |
Investigate and escalate |
|
Transaction backlog growth |
> 20% increase |
Activate mitigation measures |
|
Third-party service latency |
> 30% above baseline |
Engage the vendor and monitor |
|
Incident frequency |
> 3 major incidents/month |
Review root causes |
|
Staff availability |
< 80% critical roles filled |
Activate contingency staffing |
KRIs should signal:
Key Principle
KRIs enable organisations to act before tolerance is breached, not after
Monitoring must be supported by systems and processes that provide real-time or near real-time visibility.
Operational teams should monitor:
Regular reporting should include:
|
Reporting Level |
Focus |
|
Operational |
Daily/real-time performance metrics |
|
Management |
Weekly/monthly performance trends |
|
Senior Management / Board |
Strategic overview, breaches, and risks |
Organisations should implement alerts for:
Continuous improvement relies on structured feedback mechanisms.
|
Source |
Insight Provided |
|
Incident Reports |
Actual disruption impact and response effectiveness |
|
Scenario Testing |
Performance under simulated stress conditions |
|
Customer Feedback |
Perceived service quality and pain points |
|
Audit Findings |
Governance and control weaknesses |
|
Regulatory Feedback |
Compliance gaps and expectations |
|
Operational Metrics |
Trends and performance deviations |
A structured approach should include:
Example
|
Event |
Lesson Learned |
Improvement Action |
|
Payment outage |
Recovery time exceeded tolerance |
Upgrade failover systems |
|
Cyber incident |
Detection delay |
Enhance monitoring tools |
|
Third-party failure |
Lack of backup vendor |
Establish an alternate provider |
Impact tolerance should evolve through a structured improvement cycle.
Monitoring and continuous improvement support all stages of the lifecycle:
|
Lifecycle Stage |
Role of Monitoring |
|
Plan |
Define metrics and KRIs |
|
Implement |
Monitor performance against tolerance |
|
Test |
Validate through scenario testing |
|
Improve |
Refine tolerances and capabilities |
|
Challenge |
Description |
|
Inadequate metrics |
Lack of meaningful or measurable indicators |
|
Data fragmentation |
Inconsistent data across systems |
|
Delayed reporting |
Lack of real-time visibility |
|
Reactive approach |
Acting only after incidents occur |
|
Weak feedback loops |
Lessons not translated into improvements |
Monitoring, metrics, and continuous improvement are essential to ensuring that impact tolerances remain relevant and effective in a changing environment. By establishing clear performance indicators, implementing robust monitoring systems, and embedding structured feedback loops, organisations can maintain visibility over their resilience capabilities and respond proactively to emerging risks.
Continuous improvement transforms impact tolerance from a static threshold into a living capability, enabling organisations to adapt, strengthen, and sustain resilience over time. Ultimately, this ensures that critical business services can be delivered consistently within acceptable limits, even in the face of evolving disruptions.
| C1 | C2 | C3 | C4 | C5 | C6 |
| C7 | C8 | C9 | C10 | C11 | C12 |
| C13 | C14 | C15 | C16 | C17 | C18 |
To learn more about the course and schedule, click the buttons below for the OR-300 Operational Resilience Implementer course and the OR-5000 Operational Resilience Expert Implementer course.
|
If you have any questions, click to contact us. |
||
|
|