Defining the Journey—the Four Cloud Adoption Patterns
This is the second post in our series, “Network Operations and Security Professionals’ Guide to Managing Public Cloud Journeys”, which we will release as a white paper after we complete the draft and have some time for public feedback. You might want to start with our first post. Special thanks to Gigamon for licensing. As always, the content is being developed completely independently using our Totally Transparent Research methodology.
Understanding Cloud Adoption Patterns
Cloud adoption patterns represent the most common ways organizations move from traditional operations into cloud computing. They contain the hard lessons learned by those who went before. While every journey is distinct, hands-on projects and research have shown us a broad range of consistent experiences, which organizations can use to better manage their own projects. The patterns won’t tell you exactly which architectures and controls to put in place, but they can serve as a great resource to point you in the right general direction and help guide your decisions.
Another way to think of cloud adoption patterns is as embodying the aggregate experiences of hundreds of organizations. To go back to our analogy of hiking up a mountain, it never hurts to ask the people who have already finished the trip what to look out for.
Characteristics of Cloud Adoption Patterns
We will get into more descriptive detail as we walk through each pattern, but we find this grid useful to define the key characteristics.
|Characteristics||Developer Led||Data Center Transformation||Snap Migration||Native New Build|
|Size||Medium/Large||Large||Medium/Large||All (project-only for mid-large)|
|Vertical||All (minus financial and government)||All, including Financial and Government||Variable||All|
|Speed||Fast then slow||Slow (2-3 years or more)||18-24 months||Fast as DevOps|
|Security||Late||Early||Trailing||Mid to late|
|Network Ops||Late||Early||Early to mid||Late (developers manage)|
|Tooling||New + old when forced||Culturally influenced; old + new||Panic (a lot of old)||New, unless culturally forced to old|
|Budget Owner||Project based/no one||IT, Ops, Security||IT or poorly defined||Project-based, some security for shared services|
- Size: The most common organization sizes. For example developer-led projects are rarely seen in small startups, because they can skip directly to native new builds, but common in large companies.
- Vertical: We see these patterns across all verticals, but in highly-regulated ones like financial services and government, certain patterns are less common due to tighter internal controls and compliance requirements.
- Speed: The overall velocity of the project, which often varies during the project lifetime. We’ll jump into theis more, but an example is developer-led, where initial setup and deployment are very fast, but then wrangling in central security and operational control can take years.
- Risk: This is an aggregate of risk to the organization and of project failure. For example in a snap migration everything tends to move faster than security and operations can keep up, which creates a high chance of configuration error.
- Security: When security is engaged and starts influencing the project.
- Network Ops: When network operations becomes engaged and starts influencing the project. While the security folks are used to being late to the party, since developers can build their own networks with a few API calls, this is often a new and unpleasant experience for networking professionals.
- Tooling: The kind of tooling used to support the project. “New” means new, cloud-native tools. “Old” means the tools you already run in your data centers.
- Budget Owner: Someone has to pay at some point. This is important because it represents potential impact on your budget, but also indicates who tends to have the most control over a project.
Characteristics of Cloud Adoption Patterns
In this section we will describe what the patterns look like, and identify some key risks. In our next section we will offer some top-line recommendations to improve your chances of success.
One last point before we jump into the patterns themselves: while they focus on the overall experiences of an organization, patterns also apply at the project level, and an organization may experience multiple patterns at the same time. For example it isn’t unusual for a company with a “new to cloud” policy to also migrate existing resources over time as a long-term project. This places them in both the data center transformation and native new build patterns.
Mark was eating a mediocre lunch at his desk when a new “priority” ticket dropped into his network ops queue. “Huh, we haven’t heard from that team in a while… weird.” He set the microwaved leftovers to the side and clicked open the request… Request for firewall rule change: Allow port 3306 from IP 52.11.33.xxx/32. Mission critical timeline. “What the ?!? That’s not one of our IPs?” Mark thought as he ran a lookup. “amazonaws.com? You have GOT to be kidding me? We shouldn’t have anything up there. Mark fired off emails to his manager and the person who sent the ticket, but he had a bad feeling he was about to get dragged into the kind of mess that would seriously ruin his plans for the next few months.
Developer-led projects are when a developer or team builds something in a cloud on their own, and central IT is then forced to support it. We sometimes call this “developer tethering”, because these often unsanctioned and/or uncoordinated projects anchor an organization to a cloud provider, and drag the rest of the organization in after them. These projects aren’t always against policy – this pattern is also common in mergers and acquisitions. This also isn’t necessarily a first step into the cloud overall – it can also be a project which pulls an enterprise into a new cloud provider, rather than their existing preferrred cloud provider.
This creates a series of tough issues. To meet the definition of this pattern we assume you can’t just shut the project down, but actually need to support it. The project has been developed and deployed without the input of security or networking, and may have access to production data.
- Size: We mostly see this pattern in medium and large organizations. In smaller enterprises the overall scope of what’s going on is easier to understand, whereas larger organizations tend to have an increasing number of teams operating at least semi-independently. Larger organizations are also more likely to engage in M&A activity which forces them to support new providers. In fact most multi-cloud deployments we run into result directly from acquiring a company using something like Azure when everything else is on AWS.
- Vertical: This pattern is everywhere, but less common in highly regulated and tightly governed organizations – particularly financial services and government. Not every financial services organization is well-governed so don’t assume you are immune, but controls tend to be tighter on more sensitive data, so when you do hit this pattern the risk might be lower. In government it’s actually budgets more than regulations which limit this pattern – few government employees can get away with throwing AWS onto corporate cards.
- Speed: In the beginning, at least once security and networking find out about the project, there is a big rush to manage the largest risks and loop the project into some sort of central management. This flurry of activity then slows down into a longer, more methodical wrangling to bring everything up to standard. It starts with stopgaps, such as opening up firewalls to specific IP ranges or throwing in a VPC connection, which is followed by a longer process to rework both the deployment and internal support, such as by setting up direct connect.
- Risk: These are high-risk situations. Security was likely not involved, and we often find a high number of configuration errors when assessing these environments. They can often function as an isolated outpost for a while, but there are still risks of failed integration when the organization tries to pull them back into the fold – especially if they require complicated network connectivity.
- Security: Security is typically involved only late in development or actually deployed, since the project team was off running on their own.
- Network Ops: As with security, networking enters late. If the project doesn’t require connectivity back to an existing network they might not be involved at all.
- Tooling: Most often these projects leverage integrated tools provided by the cloud service provider. There is rarely budget for security or network specific tooling beyond that, since CSP tools all ‘hide’ within basic costs of cloud deployment. One problem we sometimes see is that after the project is discovered and there’s the mad rush to bring it under central management, a bag of existing tools which might be a poor fit for the cloud platform are forced into place. This is most common with network and endpoint security tools which aren’t cloud native – a virtual appliance isn’t necessarily a cloud native answer to a problem.
- Budget Owner: The project team somehow managed to get budget to start the deployment, which they can use as a cudgel to limit external management. This may fall apart as the project grows and costs increase (pro tip – they always do – and the project has to steal budget from someplace else).
These should be obvious; you have an unsanctioned application stack running in an unapproved cloud, with which you may have little experience, uncoordinated with security or networking. However many project teams try to do the right things. You can’t assume the project is an abject failure. Some of these projects are significantly better designed and managed, from the cloud standpoint, than lift and shift or other cloud initiatives. It all depends on the team. Based on our experiences:
- Security configuration errors are highly likely.
- There may be unapproved and ad hoc network connections back to existing resources. At times these are unapproved VPN connections, SSH jump boxes, or similar.
- Deployment environments may be messy, full of cruft and design flaws.
- Development/testing and production are generally intermingled in the same account/subscription/project, which creates a larger blast radius for attacks.
Data Center Transformation
Mark glanced up at the wall of sticky notes layered on top of the whiteboard’s innumerable chicken scratched architectural diagrams. A year of planning and setup, and they were finally about to move the first production application.
“Okay,” started Sarah, “we’ve tested the latency on the direct connect but we are still having problems updating the IPs for the firewall rules in the Dallas DC. The application team says they need more flexibility since they want to deploy with infrastructure as code and use auto scale groups in different availability zones. They claim that trying to manage everything with such restricted IPs doesn’t work well in their VNets. Something about web servers and app servers reusing each other’s IPs as they scale up and down and the firewall team is too slow to update the rules.”
Mark interjected, “What if we drop the firewalls on the direct connects? Then they can use what they want within their CIDR blocks?
“Isn’t that a security risk?”
“I don’t think so,” replied Mark, “after going through the cloud training last month I’m starting to believe we’ve been thinking about this all wrong. The cloud isn’t the untrusted network – we’re just as likely to get breached from the data center or someone compromising an admin’s laptop on the corporate network.”
Data center transformations are long-term projects, where the migration is methodical and centrally planned. That isn’t always beneficial – these projects are often hindered by overanalysis and huge requirements documents, which can result in high costs and slow timelines. They also tend to bring their own particular set of design flaws. In particular, there is often a focus on building a perfect landing zone or “minimum viable cloud” which replicates the existing data center, rather than taking advantage of native capabilities of the cloud platform. Existing tooling and knowledge are thrown at the problem, rather than trying to do things the “cloud way”.
Not to spoil our recommendations, but treating the migration as a series of incremental projects rather than a monolithic deployment will dramatically improve your chance of success. Culture, silos, politics, and experience all significantly impact how well these projects go.
- Size: You need a data center to transform, so this pattern lends itself to very large, large, and sometimes mid-sized enterprises.
- Vertical: This pattern is common across all verticals which meet the size requirements. Five years ago we weren’t seeing it with regulated industries, but cloud computing has long since passed that limitation.
- Speed: These projects tend to move at a snail’s pace. There are a lot of planning cycles, and building baseline cloud infrastructure, before any production workloads are moved. In some cases we see progressive organizations breaking things into smaller projects rather than shoehorning everything into one (or a small number of) cloud environments, but this is uncommon. Multi-year projects are the norm, although more agile approaches are possible.
- Risk: The risk of a security failure is lower due to the slower pace and tighter controls, but there can be a high risk of project failure, depending on approach. Large monolithic cloud environments are highly prone to failure within 18-24 months. Compartmentalized deployments (using multiple accounts, subscriptions, and projects) have a lower chances of major failure.
- Security: Security is engaged early. The risk is that the security team isn’t familiar or experienced with cloud, and attempts to push traditional techniques and tools which don’t work well in cloud.
- Network Ops: Similar to security, networking is involved early. And as with security, the risk is not having the cloud domain knowledge for an effective and appropriate design.
- Tooling: Tooling depends on culture, silos, and politics. There is excellent opportunity to use cloud-native tooling, including existing tools with cloud-native capabilities. But we also see, as in our opening story, frequent reliance on existing tools and techniques which aren’t well-suited to cloud and thus end up causing problems.
- Budget Owner: These projects tend to have a central budget, so shared service teams such as operations, networking, and security may be able to draw on this budget or submit their own requests for additional project funding.
There are two major categories of risks, depending on the overall transformation approach:
- Large, monolithic projects where you set everything up in a small number of cloud environments and try to make them look like your existing data center. These are slow and prone to breaking horribly in 18-24 months. These ‘monocloud’ deployments tend to end poorly. IAM boundaries and service limits are two of the largest obstacles. Agility is also often reduced, which even pushes some teams to avoid the cloud. Costs are also typically higher. Organizations tend to find themselves on this path if they don’t have enough internal cloud knowledge and experience. Both cloud providers and independent consultants usually just nod their heads and say ‘yes’ when you approach them with a monocloud proposal, because they want your business – even though they know the obstacles you will eventually hit.
- Discreet, project-based deployment transformations leverage some shared services, but application stacks and business units have their own cloud accounts/subscriptions/projects (under the central organization/tenant). This cloud-native approach avoids many problems of monolithic deployment, but brings its own costs and complexities. Managing large numbers of cloud environments (hundreds are typical, and thousands are very real) requires deep expertise and proper tooling. The flexible nature of the software defined networks in the cloud is a complex problem, especially when different projects need to talk to each other and back to non-cloud resources which many enterprises never move to the cloud.
Sitting in the back of the conference room, Bill whispered to Clarice, “No way. No forking way. This is NOT going to end well”. “Do they have any friggin’ idea how bad this is?” she replied. “How do they possibly expect us to move 3 entire data centers in 18 months up to Amazon… I can’t even get a VM for testing in less than 3 months!” “I get they want out of our crappy contract before the renewal, but this is insane.” “Heh,” huffed Bill, “Maybe they’ll finally approve those cloud training classes we’ve been asking for.” “Yeah right,” she replied sarcastically, “like they’ll train us instead of throwing cash at some consultant.”
Snap migrations are the worst of all worlds. Massive projects driven by hard deadlines, they are nearly always doomed to some level of failure. In our experiences the decision-makers behind these projects rarely understand the complexity of migrating to cloud, and are overly influenced by executive sales teams and consultants. That isn’t to say they are always doomed to complete failure, but the margins are thin and you will be navigating a tightrope of risks.
There is a subset of this pattern for more limited projects which don’t encompass absolutely everything. For example imagine the same contract renewal drive, but for a subsidiary or acquisition rather than the entire organization. The smaller the scale the lower the risk.
- Size: Mid to large. You need to be big enough to have data centers, but not so big that you own the real estate they sit on. A defining characteristic is that these projects are often driven by contract renewals on hosted or managed data centers. That’s why there’s a hard deadline… management wants out, as much as possible, before they get locked into the next 7-year renewal.
- Vertical: Organizations across all verticals find themselves in hosting contracts which they want out of, so this affects all types of organizations. We even know of projects in highly regulated financial services which you’d think would never accept this level of risk. Government is the least likely, and tends to be driven more by whichever political appointee decides they want to shake things up.
- Speed: 18-24 months for the first phase. We rarely see less than 12 months. Sometimes there will be a shorter initial push to get out of at least some centers as the contract moves into a month-by-month extension phase.
- Risk: As high as it gets in every possible way. Organizations falling into this pattern might have some internal cloud experience, but as a rule not enough people with enough depth to support the needed scale. There is heavy reliance on outside help, but few consulting firms (or cloud providers themselves) have a deep bench of solid experts who can avoid all the common pitfalls.
- Security: Security is engaged somewhat but can’t do anything to slow things down. They are also typically tasked with building out their own shared services so likely don’t have the manpower to evaluate individual projects. They tend to trail behind deployments, trying to assess and clean things up after the fact. They often get to set a few hardline policies up front (typically relating to Internet accessibility), but until they stand up their own monitoring and enforcement capabilities, things slip through the cracks.
- Network Ops: There is a bit more variability here, depending on the deployment style. If there is a monocloud (or small number of environments), networking is typically engaged early and plays a very strong role in getting things set up. They are tasked with configuring the larger pipes needed for such large migrations. The risk is that they often lack cloud experience, and introduce designs which work well in a data center but fit poorly in cloud deployments.
- Tooling: Panic is the name of the game. The initial focus is on the tools at hand, and vendors already in place, combined with cloud-native tools. We hate to say it, but this can be deeply influenced not only by culture but by which consultants are already in the door. Eventually the project starts introducing more cloud-native tooling to solve specific problems. For example in projects we’ve seen, visibility (cloud assessment and mapping) tools tend to be early buys.
- Budget Owner: This can be poorly defined, but often pulls from a central IT budget or specially designated project budget. Whoever controls the money has the most influence on the project. The chances of success go up when all teams are properly funded and staffed. Also, water is wet.
Risks abound, but we can categorize them based on project characteristics:
- As any IT pro knows, every project of this scale runs over time and budget.
- There is often a reliance on outside contractors who push things along quickly, but don’t know (or care) enough to have a sense of the enterprise risk. Their job is to get things moved – not necessarily to do so the safest way. This can lead to exposure as they accept risks a company employee might avoid.
- Security often lacks general cloud security knowledge, as well as provider and platform experience. They can build this but it takes time, and in the process the organization will likely accumulate technical security debt. For example two of the most common flaws we find on assessments are overly-privileged IAM and poorly segregated networks.
- Rapidly designing a cloud network at scale is difficult and complex, especially for a team who is still keeping the existing environment running, and (like security) probably lacks deep cloud experience. We often see one or two members of a team tasked as cloud experts, but this really isn’t enough. With the time constraints of the project, the network often ends up poorly compartmentalized, and projects tend to be shoveled into shared VPCs/VNets in ways which later run up against service limits, performance problems, and other constraints.
Native New Build
John was actually excited when he walked into the meeting room. It had been a long time since he got the chance to stretch his security-creativity muscles. He nodded to Maria from networking as he pulled out an open Aeron-knockoff chair and dropped his new matte-black Surface on the conference table. He still wasn’t sure what stickers to throw on it, but after a little burn-in period during the Azure training he and Maria just finished up, he was getting used to working off an underpowered device. He still had his old desktop for handling all the data center thick clients, but for this Azure project all he needed was a web browser and some PowerShell. Although he kind of envied the consultants with their brand new MacBooks. “Hey everyone,” Wendy started, “we have a tight timeline but we finally have approval for the new Azure subscription. Maria, can you get the network connected up?” “Actually, I don’t think we need to. John signed off on using JIT connections and client VPNs instead of requiring a dedicated backhaul. We went through the architecture and there aren’t any dependencies on internal resources. We know we’ll need more of a hybrid design for the CRM project but we are free and clear for this one.”
Native new build projects are true cloud-native deployments. That doesn’t mean they are all brand new projects – this pattern also includes refactors and rearchitectures, so long as the eventual product design is cloud native. These may also include hybrid deployments – the new build may still need connections back to the premises.
- Size: All sizes. In a large enterprise this will likely be a designated project or a series of projects (especially in a “new to cloud” organization). In a small startup the entire company could be a new build.
- Vertical: All verticals. Even government and highly regulated industries. We have worked on these projects with financials, state governments, and even pubic utilities.
- Speed: As “fast as DevOps”. We don’t mean that facetiously – some teams are faster and some slower, but we nearly always see DevOps techniques used, and they often define the overall project pace. These are developer-driven projects.
- Risk: We will talk more about risk in a moment, but here we just note that risk is highly variable, and depends on the skills and training of the project team.
- Security: Unlike our previous example, security may be late to the project. There is usually an inflection point, when the project is getting close to production, when security gets pulled in. Before that the developers themselves manage most security. This improves over time as the organization is more likely to struggle in this area on their early projects, but starts integrating security earlier over time, as more and more moves to cloud, and skills and staffing improve.
- Network Ops: Networking is more likely to be engaged early if there are hybrid connectivity requirements, or may not be involved at all, depending on the overall architecture. These days we see a growing number of serverless (or mostly serverless) deployments where there isn’t even a project network and all the components talk to each other within the “metastructure” of the cloud management plane.
- Tooling: Typically newer, cloud native, and often CSP provided tools. Quite a few of these projects start in their own cloud silo and use the cloud provider’s tooling, but as more of these projects deploy there is an increased requirement for central management and shared services (such as security assessment) to be added. We do sometimes see development teams forced to use traditional on-premise tools, but this tends to be culturally driven, as it isn’t usually the best solution to the problems at hand.
- Budget Owner: Project based. Once you do enough of these there will also be shared services budgets for teams like security and networking.
Despite our optimistic opening, these projects also come with their own risks. Cloud native deployments aren’t risk-free by any stretch of the imagination – they just carry different risks. The project may be well-segregated in its deployment environment (an Azure subscription in our example) but that doesn’t mean developers won’t be over-provisioned. Or that a PaaS endpoint in the VNet won’t ignore the expected Network Security Group rules (yes, that happens).
- This pattern can carry all the risks of the developer-led pattern if it is poorly governed. We have seen large organizations running dozens or hundreds of these projects, all poorly governed, each carrying tons of risks. If you read about a big cloud breach at a client who was proudly on stage at their cloud provider’s conferences, odds are they are poorly governed internally.
- Cloud native services have different risks which take time to understand and learn to manage. In our data center migration patterns there is less reliance on the latest and greatest “serverless this, AI that”, so traditional security and management techniques can be more effective. With native new builds you may be using services the cloud provider itself barely understands.
- Friction between security and the project team can badly impact the final product. Overly proscriptive security pushes teams for workarounds. Early partnering, ideally during initial development of the architecture, with security pros trained on the cloud platform, reduces risk.
- Managing a lot of these projects at scale is really really hard. Setting up effective shared services and security and network support (especially when hybrid or peered networks are involved) take deep expertise. The cloud providers are often terrible at helping you plan for this – they just want you to move as many workloads onto them as quickly as possible.