Blog
Managing Business Risk in a World Built on Hyperscale Cloud

Over the last few years, almost every organisation has increased its dependence on large-scale technology platforms: hyperscale public clouds, global CDNs, identity services and SaaS applications.
Amazon Web Services (AWS), Microsoft Azure, Google Cloud and providers such as Cloudflare now sit in the critical path for a huge proportion of business operations. When they work, they are enormously powerful. When they fail, the impact is immediate and highly visible.
High-profile incidents involving these platforms in 2024 and 2025 have highlighted a simple truth: scale does not eliminate vulnerability. In several cases, relatively small configuration or software changes have caused global disruption for hours at a time.
For CIOs and IT leaders, the key question is no longer whether these platforms are “safe”, but:
How exposed is our business if one of these providers suffers a major incident?
This article looks at why this risk exists, why it particularly affects mid-sized and enterprise organisations, and how Covenco helps to reduce that risk without asking you to abandon hyperscale cloud.
Concentration of Risk in Hyperscale and Internet Infrastructure
Most recent large-scale outages have not been caused by basic operational errors. They are usually the result of complex interactions inside highly automated, global systems:
-
Configuration changes propagating across a worldwide network
-
New security or traffic-management features interacting badly with existing rules
-
Control-plane or quota systems failing in ways that affect multiple regions at once
Incidents at major providers in 2024–2025 have shown similar patterns:
-
Public cloud regions experiencing significant service disruption because an internal automation or control component failed
-
Global content delivery and security providers serving errors or empty responses across large parts of the internet due to an update gone wrong
-
Security update failures impacting millions of endpoints simultaneously
These are not isolated “one-off” stories. They demonstrate that highly centralised platforms carry concentrated risk:
-
A single provider often sits in front of, or underneath, thousands of services
-
When something goes wrong at that layer, everything that depends on it is affected
-
Customers may not even be aware of all the upstream and downstream dependencies involved
In other words, many organisations have unintentionally placed a great deal of operational risk into a small number of large technology stacks.
Why Mid-Sized and Enterprise Organisations Are Particularly Exposed
For most mid-sized and enterprise organisations, the current reality looks something like this:
-
Production workloads in Azure, AWS or Google Cloud
-
Microsoft 365 for email, collaboration and often identity
-
Cloudflare or another CDN/WAF provider in front of public-facing applications
-
A mixture of on-premises systems (IBM, Dell, etc.) and line-of-business SaaS
Architecturally, this makes sense. Hyperscalers and global infrastructure providers offer:
-
Elastic capacity and global reach
-
Modern services for analytics, AI and security
-
Strong baseline resilience within their own platforms
The risk arises when all of the following are true:
-
Critical production systems are concentrated in one cloud region or platform
-
Backups and recovery tools also live in that same environment
-
DNS, identity, traffic management and security controls depend on the same few providers
When a major incident happens upstream, organisations in this position experience more than a simple application outage. They may find that:
-
Users cannot authenticate into systems that are technically still running
-
Administration portals and cloud consoles are unreachable or degraded
-
Backup and recovery interfaces, also running in the affected cloud, are unavailable just when they are needed most
This is what turns a contained technical issue into a full-scale business interruption.
Principles for a More Resilient Architecture
Addressing this risk does not require abandoning hyperscale platforms. AWS, Azure, Google Cloud and providers like Cloudflare will remain central to most IT strategies.
The objective is to avoid relying on them as both the production platform and the only recovery path.
Four practical principles can significantly improve resilience:
1. Treat Major Incidents as a Foreseeable Event
Given the number and scale of recent outages, it is reasonable to plan for them as part of normal risk management:
-
Define clear Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) for critical services
-
Run scenario planning for events such as:
-
A major cloud region being degraded or unreachable
-
A global CDN/WAF provider serving errors or incomplete content
-
Identity or DNS services being partially unavailable
-
-
Document what the organisation can still do under those conditions, and where the hard limits are
This turns “we hope it is fine” into a structured, testable resilience stance.
2. Separate Production from Recovery
If production systems are concentrated in one public cloud platform, recovery should not depend on that platform being healthy.
This is a core design principle for Covenco’s services. Covenco operates:
-
Independent, UK-based private cloud and recovery platforms, architected specifically for data protection, disaster recovery and business continuity
-
ISO 27001–certified environments, where information security, availability and recoverability are audited and managed as an integrated whole
In practice, this allows customers to:
-
Run production wherever it makes most sense – public cloud, on-premises, or hybrid
-
Maintain a “break glass” copy of their critical data, and the tools to restore it, in a separate operational and technical domain
If a hyperscale platform has a significant incident, you still have an independent platform from which to recover.
3. Prioritise Speed and Practicality of Recovery
Having backups is not the same as being able to recover quickly.
For on-premises and hosted infrastructure, Covenco frequently designs solutions around IBM FlashSystem with Safeguarded Copy and modern FlashCore technology, enabling:
-
Immutable, point-in-time copies taken frequently on primary storage
-
The ability to restore a clean copy of data in around 60 seconds for many workloads, turning a potential multi-day recovery into a brief, controlled interruption
-
Built-in anomaly detection capabilities to help identify potential ransomware or data corruption events quickly
This emphasis on fast, predictable recovery means:
-
Critical services can be restored while investigations into the root cause continue
-
Recovery plans are tested against realistic timeframes, not best-case assumptions
-
Organisations are less reliant on mass-restore operations inside an already stressed public cloud region during a major incident
4. Avoid Placing Backups in the Same “Blast Radius”
Backup strategies for cloud workloads often default to “more of the same”: back up cloud workloads within the same cloud.
Veeam’s Data Cloud offering provides a strong platform-as-a-service option for organisations invested in Azure and Microsoft 365. At the same time, there is a strategic question to answer:
Should the backup control plane and storage reside in the same hyperscale environment as the production workloads?
Covenco’s own “Veeam Data Cloud or BaaS?” guidance compares two approaches:
-
Veeam Data Cloud – a SaaS platform operating within a single public cloud
-
Independent Backup as a Service (BaaS) – Veeam-powered backups, designed and managed by Covenco and stored in Covenco’s private UK data centres
The independent BaaS model provides:
-
A logically and physically separate failure domain for backup data and recovery tooling
-
Operational management by a team whose core focus is backup, recovery and continuity
-
A recovery process that does not depend on the health of the same cloud platform that has just experienced an incident
The right choice depends on risk appetite, regulatory environment and existing architecture – but it is important to make that choice explicitly.
Covenco’s Role in Reducing Hyperscale Dependence Risk
Covenco does not position itself “against” AWS, Azure, Google Cloud or global infrastructure providers. In many environments, those platforms are the right place for production workloads.
Instead, Covenco’s role is to provide an independent layer of resilience alongside them.
Key elements include:
-
Independent backup, DR and recovery platforms
-
UK-based private cloud and data centres designed specifically for data protection and business continuity
-
Managed Backup as a Service (BaaS) and Disaster Recovery as a Service (DRaaS) using Veeam and IBM technologies
-
-
Deep ecosystem partnerships and expertise
-
IBM Gold Business Partner for Power Systems and Storage, with decades of experience in mission-critical environments
-
Veeam Platinum / VCSP partner, delivering Veeam-powered services at scale for mid-sized and enterprise customers
-
-
Security and governance
-
ISO 27001 certification for information security management
-
Working in line with UK security standards such as Cyber Essentials, appropriate for a provider handling backup, DR and cyber-resilient infrastructure
-
Above all, Covenco’s experience is grounded in real incident response: recovering customers from hardware failures, ransomware, accidental deletions and upstream platform outages. The focus is not just on storing data, but on getting businesses operational again.
Practical Next Steps for IT and Business Leaders
If recent outages have prompted questions from your board or leadership team, you do not need to respond with speculation or fear. You need a clear, defensible plan.
Three practical steps:
-
Map Critical Dependencies
-
Identify which business-critical services depend on which cloud regions, CDNs, DNS and identity providers.
-
Include backup infrastructure: where it runs, what it relies on, and how it is accessed during an incident.
-
-
Assess Recovery Capability Realistically
-
For each key system, document how long it would take to restore a usable version, and where it would be restored to.
-
Validate those assumptions through tests or structured exercises, not just design documents.
-
-
Evaluate the Case for an Independent Recovery Platform
-
Determine whether relying solely on hyperscale platforms for both production and recovery fits your risk appetite.
-
Where it does not, explore independent options – such as Covenco’s Veeam-based BaaS and DRaaS on private UK infrastructure – that can complement your existing investments in AWS, Azure, Google Cloud and others.
-
If you would like to discuss how an independent backup and recovery platform could reduce your exposure to hyperscale and infrastructure-level outages, the Covenco team can help review your current position and outline practical options.
The objective is straightforward:
Even in a world built on large-scale cloud and internet infrastructure, your organisation should be able to withstand incidents and recover quickly, rather than being entirely at the mercy of someone else’s bad day.