As we wrap up our series on Data Security in the SaaS age, let’s work through a scenario to show how these concepts apply in a specific scenario. We’ll revisit the “small, but rapidly growing” pharmaceutical company we used as an example in our Data Guardrails and Behavioral Analytics paper. The CISO has seen the adoption of SaaS accelerate over the past two years. Given the increasing demand to work from anywhere at all organizations, the CTO and CEO have decided to minimize on-premise technology assets.
A few years ago they shifted their approach to use data guardrails and behavioral analytics to protect the sensitive research and clinical trial data generated by the business. But they still need a structured program and appropriate tools to protect their SaaS applications. With hundreds of SaaS applications in use and many more coming, it can be a bit overwhelming to the team, who needs to both understand their extended attack surface and figure out how to protect it at scale. With guidance from their friends at Securosis, they start by looking at a combination of risk (primarily to high-profile data) and broad usage within the business, as they figure out which SaaS application to focus on protecting first.
The senior team decides to start with CRM. Why? After file storage/office automation, CRM tends to be the most widespread application, representing the most sensitive information stored in a SaaS application: customer data. They also have many business partners and vendors accessing the data and the application, because they have multiple (larger) organizations bringing their drugs to market; they want to make sure all those constituencies have the proper entitlements within their CRM. Oh yeah, and their auditors were in a few months back, and suggested that assessing their SaaS applications needs to be a priority, given the sensitive data stored there.
As we described in our last post, we’ll run through a process to determine who should use the data and how. For simplicity’s sake, we’ll generalize and answer these questions at a high level, but you should dig down much deeper to drive policy.
- What’s the data? The CRM has detailed data on all the doctors visited by the sales force. It also contains an extract of prescribing data to provide results to field reps. The CRM has data from across the globe, even though business partners distribute the products in non-US geographies, to provide an overview of sales and activity trends for each product.
- Who needs to see the data? Everyone in the company’s US field organization needs access to the data, as well as the marketing and branding teams focused on targeting more effective advertising. Where it gets a little squishy is the business partners, who also need access to the data. But multiple business partners are serving different geographies, so tagging is critical to ensure each customer is associated with the proper distribution partner. Federated identity allows business partner personnel to access the CRM system, with limited privileges.
- What do they need to do with the data? The field team needs to be able to create and modify customer records. The marketing team just needs read-only access. Business partners update the information in the CRM but cannot create new accounts. That happens through a provider registration process to ensure multiple partners don’t call on the same doctors or medical offices. Finally, doctors want to see their prescribing history so they need access as well.
If the team were starting from scratch, they would enumerate and build out the policies from whole cloth, and then deploy the CRM with the right rules the first time. But that train has already left the station. Thousands of people (internal, business partners, and customers) already access the CRM system, so the first order of business is a quick assessment of the SaaS application’s current configuration.
They didn’t have the internal bandwidth to perform the assessment manually during the timeframe required by the auditors, so they engaged a consulting firm which leveraged a SaaS management tool for the assessment. What they found was problematic. The initial entitlements allowed medical practices to access their prescribing history. But with overly broad privileges, any authorized user for a specific medical practice could see their entire customer record — which included not just the history of all interactions, but also notes from the sales rep. And let’s just say some of the reps were brutally honest about what they thought of some doctors.
Given the potential to upset important customers, it’s time to hit the fire alarm and kick in the damage control process. The internal IT team managing the CRM took a quick look and realized the access rule change happened within the last 48 hours. And only a handful of customers accessed their records since then. They reverted to the more restrictive policy, removed access to the affected records, and asked some (fairly irate) VPs to call customers to smooth over any ruffled feathers. The cardiologist who probably should have taken their own advice about health and fitness appreciated this gesture (and mentioned enjoying the humble pie).
There were a few other over-privileged settings, but they mostly affected internal resources. For example the clinical team had access to see detailed feedback on a recent trial, even though company policy is only to share anonymized information with clinicians. Though not a compliance issue, this did violate internal policy. They also found some problems with business partner access rules, as business partners in Asia could see all the accounts in Asia. They couldn’t make changes (such as reassigning doctors to other partners), but partners should only see the data for doctors they registered.
The other policies still reflect current business practices, so after addressing these issues, the team felt good about their security posture.
But, of course, they cannot afford to get too comfortable given the constant flow of new customers, new partners, and new attacks. The last aspect of the SaaS data security program is monitoring. It’s essential to monitor the SaaS application and gain understanding of how each change impacts security posture. It’s also critical to have clear notification and automated remediation processes for specific issues. Before claiming victory over the security of the CRM application and moving on to the next SaaS application, the team also needs to make sure there is a schedule of periodic assessment to revisit entitlements.
In our pharmaceutical company the security and ops teams got together to define and document ongoing assessment and remediation. A set of issues were identified as never acceptable, including things such as privilege escalation to make customer changes (without authorization), downloading clinical data, and providing global access to customer data, among others. For never-acceptable issues, the security team is empowered to make necessary changes to protect data. Period. In some situations where you take action immediately, and then ask for permission, because critical data is at risk.
For other issues with less risk, the ops team set an expectation with the security team for what information they needed to assess urgency and make appropriate fixs. They also set SLAs for response to issues received, based on the criticality of the data.
And with that we wrap up our series on Data Security in the SaaS Age. Since we first published Tidal Forces back in early 2017, it’s become clear that SaaS will supplant pretty much all of back office applications. That has come to fruition, and if anything will accelerate now that on-premise resources are mostly out of reach. So many of our tried and true approaches to data security must evolve, and we hope we have laid out a path which will help you get there.
Our last post in Data Security in a SaaS World discussed how the use and sharing phases of the (frankly partially defunct) Data Security Lifecycle remain relevant. That approach hinges on a detailed understanding of each application to define appropriate policies for what is allowed and by whom. To be clear, these are not – and cannot be – generic policies. Each SaaS application is different and as such your policies must be different, so you (or a vendor or service provider) need to dig into it to understand what it does and who should do it.
Now the fun part. The typical enterprise has hundreds, if not thousands, of SaaS services. So what’s the best approach to secure those applications? Any answer requires gratuitous use of many platitudes, including both “How do you eat an elephant? One bite at a time.” and that other old favorite, “You can’t boil the ocean.”
Whichever pithy analogy you favor for providing data security for SaaS, you need to think small, by setting policies to protect one application or service at a time. We’re looking for baby steps, not big bangs. The big bang killed initiatives like DLP. (You remember DLP, right?) Not that folks don’t do DLP successfully today – they do – but if you try to classify all the data and build rules for every possible data loss… you’ll get overwhelmed, and then it’s hard to complete the project.
We’ve been preaching this small and measured approach for massive, challenging projects like SIEM for years. You don’t set up all the SIEM rules and use cases at once – at least not if you want the project to succeed. The noise will bury you, and you’ll stop using the tool. People with successful SIEM implementations under their belts started small with a few use cases, then added more once they figured out how to make a few sets set work.
The Pareto principle applies here, bigtime. You can eliminate the bulk of your risk by protecting 20% of your SaaS apps. But if you use 1,000 SaaS apps, you still need to analyze and set policies for 200 apps – a legitimately daunting task. We’re talking about a journey here, one that takes a while. So prioritization of your SaaS applications is essential for project success.
We’ll also discuss opportunities to accelerate the process later on — you can jump the proverbial line with smart technology use.
The first SaaS app you run through the process should be an essential app with pretty sensitive data. We can bet it will be either your office suite (Office365 or G Suite), your CRM tool (likely Salesforce), your file storage service (typically Dropbox or Box), or your ERP or HR package (SAP, Workday, or Oracle).
These applications represent your most sensitive data, so you’ll then want to maximize risk mitigation. Start with the app with the most extensive user base. We’ll illustrate the process with CRM. We get going by answering a few standard questions:
- What’s the data? Your CRM has all your marketing and sales data, including a lot of sensitive customer/prospect data. It may also have your customer support case data, which is pretty sensitive.
- Who needs to see the data? Define who needs to see the data, and use the groups or roles within your identity store – no reason to reinvent the wheel. We discussed the role of federation in our previous post, and this is why. Don’t forget to consider external constituencies – auditors, contractors, or even customers.
- What do they need to do with the data? For each role or group, figure out whether they need to read, write, or otherwise manage data. You can get more specific and define different rights for different data types as required. For example, finance people may have read access to the sales pipeline, while sales operations folks have full access.
Do you see what we did there? We just built a simple entitlement matrix. That wasn’t so scary, was it?
Once you have the entitlement matrix documented, you write the policies. At that point, you load your policies into the application. Then wash, rinse, and repeat for the other SaaS apps you need to protect.
Each SaaS app will have a different process to implement these policies, so there isn’t a whole lot of leverage to be gained in this effort. But you probably aren’t starting from scratch either. A lot of this work happens when deploying the applications initially. Hopefully, it’s a matter of revisiting original entitlements for effectiveness and consistency. But not always. To accelerate a PoC, the vendor uses default entitlements, and the operations team doesn’t always revisit them when the application goes from testing into production deployment.
Once the entitlements are defined (or revisited), and you’ve implemented acceptable policies in the application, you reach the operational stage. Many organizations fail here. They get excited to lock things down during initial deployment but seem to forget that moves, adds, and changes happen every day. New capabilities get rolled out weekly. So when they periodically check policies every quarter or year, they are surprised by how much changed and the resulting security issues.
So continuous monitoring becomes critical to maintain the integrity of data in SaaS apps. You need to watch for changes, with a mechanism to ensure they are authorized and legitimate. It sounds like a change control process, right? What happens if the security team (or even the IT team in some cases) doesn’t operate these apps?
We’ve seen this movie before. It’s like dealing with an application built in the cloud by a business unit. The BU may have operational responsibilities, but the security team should assume responsibility for enforcing governance policies. Security needs access to the SaaS app to monitor changes and ensure adherence to policy.
And that’s the point. Security doesn’t need to have operational responsibilities for SaaS applications. But they need to assess the risk of access when building the entitlement matrix and to monitor the application to ensure changes don’t violate policies or add attack surface.
Buy vs. Build
A described above, thinking small and building access rules and monitoring each SaaS app will be resource-intensive. So is there a scalable way to secure SaaS apps? SaaS providers may provide deployment tools to define access rules quickly and efficiently. It’s in their interest to get your organization up and running as quickly as possible.
You could also send alerts about entitlement changes to your SIEM or security analytics tool to monitor for potentially malicious activity. Once you’ve identified a problem, it can be sent to your work management system (ServiceNow, Jira, etc.) to apply the requested changes. Or you can DIY (Do It Yourself).
Alternatively, you can accelerate your security process by looking at SaaS management tools that have already done the work to understand each SaaS app. These tools provide a quick assessment of current entitlements, and then both manage and monitor access rules going forward. They also provision and deprovision users from the SaaS application, and stay on top of frequent changes within the app.
For example, when Salesforce rolls out a new service (or more likely acquires it and bundles it into your package), how should that impact your entitlements? DIY requires that you evaluate the new service yourself, while a vendor should have those changes enumerated in their tool sooner than you can.
Are these tools a panacea? Of course not – no tool offers everything you need for every application you need to protect. It comes down to resource allocation and risk management. If you have the resources to do it yourself, have at it. And if you aren’t worried about losing data from a particular SaaS app, don’t worry about it. It’s not like you don’t have other worries.
But if your security team remains resource-constrained and a SaaS app stores sensitive data, a SaaS management tool warrants investigation. Here are a few things to consider:
- Application support: Does the tool support your top 10-15 applications? Over time the library of supported applications should grow, but you need to protect your highest-profile applications initially. Understand how well the vendor SDK can support fo new and unsupported apps, and how well the vendor will support the SDK.
- Granularity of policy: Does the tool offer the ability to set sufficiently detailed policies for the application? Can you restrict access to certain parts of the application? Are geographic limitations and device restrictions supported? Many policy knobs are available – make sure you can adequately protect your applications.
- Supported Use Cases: Does the tool offer the capabilities you need? Consider the critical use cases of access control, data security, and provisioning/deprovisioning; and understand how the application’s API capabilities will constrain your use cases.
- Integration with existing Ops tools: Make sure the tool will integrate with your SIEM/monitoring platform, work management tool (to track service tickets), and an automation tool.
The build versus buy decision is challenging in an early market. The tool you like may only support a handful of the SaaS applications you use, so its immediate value may be limited. But don’t just consider today’s capabilities – also think about the 18-24 month roadmap for new application support, and how confident you are in the vendor hitting those milestones. Or you can always build it yourself.
We’ll wrap up this series in our next post by detailing a quick win, where we apply these concepts to a simplified but realistic scenario to show how the approach could work out for you.
As we launched our series on Data Security in the SaaS Age, we described the challenge of protecting data as it continues to spread across dozens (if not hundreds) of different cloud providers. We also focused attention on the Data Security Triangle, as the best tool we can think of to keep focused on addressing at least one of the underlying prerequisites for a data breach (data, exploit, and exfiltration). If you break any leg of the triangle you stop the breach.
The objective of this research is to rethink data security, which requires us to revisit where we’ve been. That brings us back to the Data Security Lifecycle, which we last updated in 2011 in parts one, two and three).
At the highest level, the Data Security Lifecycle lifecycle lays out six phases from creation to destruction. We depict it as a linear progression, but data can bounce between phases without restriction, and need not pass through all stages (for example not all data is eventually destroyed).
- Create: This is probably better called Create/Update because it applies to creating or changing a data/content element, not just a document or database. Creation is generating new digital content or altering/updating of existing content.
- Store: Storing is the act committing digital data to some sort of storage repository, and typically occurs nearly simultaneously with creation.
- Use: Data is viewed, processed, or otherwise used in some sort of activity.
- Share: Exchange of data between users, customers, or partners.
- Archive: Data leaves active use and enters long-term storage.
- Destroy: Permanent destruction of data using physical or digital means such as crypto-shredding.
With this lifecycle in mind, you can evaluate data and make decisions about appropriate locations and access. You need to figure out where the data can reside, which controls apply to each possible location, and how to protect data as it moves. Then go through a similar exercise to specify rules for access, determining who can access the data and how. And your data security strategy depends on protecting all critical data, so you need to run through this exercise for every important data type.
Then dig another level down to figure out which functions (such as Access, Process, Store) apply to each phase of the lifecycle. Finally, you can determine which controls enable data usage for which functions. Sound complicated? It is, enough that it’s impractical to use this model at scale. That’s why we need to rethink data security.
Self-flagellation aside, we can take advantage of the many innovations we’ve seen since 2011 in the areas of application consumption and data provenance. We are building fewer applications and embracing SaaS. For the applications you still build, you leverage cloud storage and other platform services. So data security is not entirely your problem anymore.
To be clear, you are still accountable to protect the critical data – that doesn’t change. But you can share responsibility for data security. You set policies but within the framework of what your provider supports. Managing this shared responsibility becomes the most significant change in how we view data security. And we need this firmly in mind when we think about security controls.
Adapting to What You Control
Returning to the Data Breach Triangle, you can stop a breach by either ‘eliminating’ the data to steal, stopping the exploit, or preventing egress/exfiltration. In SaaS you cannot control the exploit, so forget that. You also probably don’t see the traffic going directly to a SaaS provider unless you inefficiently force all traffic through an inspection point. So focusing on egress/exfiltration probably won’t suffice either.
That leaves you to control the data. Specifically to prevent access to sensitive data, and restrict usage to authorized parties. If you prevent unauthorized parties from accessing data, it’s tough for them to steal it. If we can ensure that only authorized parties can perform certain functions with data, it’s hard for them to misuse it. And yes – we know this is much easier said than done.
Restated, data security in a SaaS world requires much more focus on access and application entitlements. You handle it by managing entitlements at scale. An entitlement ensures the right identity (user, process, or service) can perform the required function at an approved time. Screw this up and you don’t have many chances left to protect your data, because you can’t see the network or control application code.
If we dig back into the traditional Data Security Lifecycle, the SaaS provider handles a lot of these functions – including creation, storage, archiving, and destruction. You can indeed extract data from a SaaS provider for backup or migration, but we’re not going there now. We will focus on the Use and Share phases.
This isn’t much of a lifecycle anymore, is it? Alas, we should probably relegate the full lifecycle to the dustbin of “it seemed like a good idea at the time.” The modern critical requirements for data security involve setting up data access policies, determining the right level of authorization for each SaaS application, and continuously monitoring and enforcing policies.
The Role of Identity in Data Protection
You may have heard the adage that “Identity is the new perimeter.” Platitudes aside, it’s basically true, and SaaS data security offers a good demonstration. Every data access policy associates with an identity. Authorization policies within SaaS apps depend on identity as well.
Your SaaS data security strategy hinges on identity management, like most other things you do in the cloud. This dependency puts a premium on federation because managing hundreds of user lists and handling the provisioning/deprovisioning process individually for each application doesn’t scale. A much more workable plan is to implement an identity broker to interface with your authoritative source and federate identities to each SaaS application. This becomes part of your critical path to provide data security. But that’s a bit afield from this research, so we need to leave it at that.
Data Guardrails and Behavioral Analytics
If managing data security for SaaS applications boils down to being able to set and enforce policies for each SaaS app, your efforts require a clear understanding of each specific application, so you can set appropriate access and authorization policies. Yes, SaaS vendor should make that easy, but in reality… not so much.
But setting policies is only the first step. Environments change regularly, as new modules become operational and new users onboard. Policies change frequently, which creates an opportunity for mistakes and attacks.
That all brings us to Data Guardrails and Behavioral Analytics. We first introduced this concept back in late 2018. For the backstory, you can take a deep dive into the concept in two parts. But here’s a high-level recap of the concepts:
- Data Guardrails: We see Guardrails as a means of enforcing best practices without slowing down or impacting typical operations. Typically used within the context of cloud security (like, er, DisruptOps), a data guardrail enables data to be used in approved ways while blocking unauthorized use. To bust out an old network security term, you can think of guardrails like “default-deny” for data. You define acceptable practices and don’t allow anything else.
- Data Behavioral Analytics: Many of you have heard of UBA (User Behavioral Analytics), which profiles all user activity and then monitors for anomalous activities which could indicate one of the insider risk categories above. But what if you turned UBA inside out and focused on data? Using similar analytics you could profile all data usage in your environment, and then look for abnormal patterns that warrant investigation.
The key difference is that data guardrails leverage this knowledge with deterministic models and processes to define who can do what and stop everything else. Data behavioral analytics extends the analysis to include current and historical activity, using machine learning algorithms to identify unusual patterns that bypass other data security controls.
It’s not an either/or choice, as we figure out how to enforce data security policies in all these SaaS environments. You start by defining data guardrails, which provide you the ability to establish allowed activities in each SaaS application for each user, group, and role. Then you monitor data usage for potentially malicious activities.
We favor this approach to dealing with SaaS (and even most cloud-based data usage) because it enables you to focus on what you control. You still think about data through its lifecycle – but your responsibilities within the lifecycle have changed with shared responsibilities.
Our next post will dig into how to implement this kind of approach in a SaaS world. To give you a little hint, you’ll need to think small for a huge impact.
Between Mira and I, we have 5 teenagers. For better or worse, the teenage experience of the kids this year looks quite a bit different; thanks COVID! They haven’t really been able to go anywhere, and although things are loosening up a bit here in Atlanta, we’ve been trying to keep them pretty isolated. To the degree we can.
In having the kids around a lot more, you can’t help but notice both the subtle and major differences. Not just in personality, but in interests and motivation.
Last summer (2019) was a great example. Our oldest, Leah, was around after returning a trip to Europe with her Mom. (remember when you could travel abroad? Sigh.) She’s had different experiences each summer, including a bunch of travel and different camps. Our second oldest (Zach) also spent the summer in ATL. But he was content to work a little, watch a lot of YouTube, and hang out with us. Our third (Ella) and fifth (Sam) went to their camps, where each has been going for 7-8 years. It’s their home and their camp friends are family. And our fourth (Lindsay) explored Israel for a month. Many campers believe in “10 for 2.” They basically have to suffer through life for 10 months to enjoy the 2 months at camp each year. I think of it as 12 for 2 because we have to work hard for the entire year to pay for them to go away.
Even if all of the kids need to spend the summer near ATL, they’ll do their own thing in their own way. But that way is constantly evolving. I’ve seen the huge difference 6 months at college made for Leah. I expect a similar change for Z when he (hopefully) goes to school in the fall. As the kids get older, they learn more and inevitably think they’ve figured it out. Just like 19-year-old Mike had all the answers, each of the kids will go through that invincibility stage.
The teenage years are challenging because even though the kids think they know everything, we still have some control over them. If they want to stay in our home, they need to adhere to certain rules and there is (almost) daily supervision. Not so much when they leave the nest, and that means they need to figure things out – themselves. I have to get comfortable letting them be and learning lessons. After 50+ years of screwing things up, I’ve made a lot of those mistakes (A LOT!) and could help them avoid a bunch of heartburn and wasted time.
But then I remember I’ve spent most of my life being pretty hard-headed and I that I didn’t listen to my parents trying to tell me things either. I guess I shouldn’t say didn’t, because I’m not sure if they tried to tell me anything. I wasn’t listening.
The kids have to walk their own path(s). Even when it means inevitable failure, heartbreak, and angst. That’s how they learn. That’s how I learned. It’s an important part of the development process. Life can be unforgiving at times, and shielding the kids from disappointment doesn’t prepare them for much of anything.
The key is to be there when they fall. To help them understand what went wrong and how they can improve the next time. If they aren’t making mistakes, they aren’t doing enough. There should be no stigma of failing. Only to quitting. If they are making the same mistakes over and over again, then I’m not doing my job as a parent and mentor.
I guess one of the epiphanies I’ve had over the past few years is that my path was the right path. For me. I could have done so many things differently. But I’m very happy with where I am now and am grateful for the experiences, which have made me. That whole thing about being formed in the crucible of experience is exactly right.
So that’s my plan. Embrace and celebrate each child’s differences and the different paths they will take. Understand that their experiences are not mine and they have to make and then own their choices, and deal with the consequences. Teach them they need to introspect and learn from everything they do. And to make sure they know that when they fall on their ass, we’ll be there to pick them up and dust them off.
Photo credit: “Sakura Series” originally uploaded by Nick Kenrick
Posted under: Research and Analysis
Securosis has a long history of following and publishing on data security. Rich was the lead analyst on DLP about a zillion years ago during his time with Gartner. And when Securosis first got going (even before Mike joined), it was on the back of data security advisory and research. Then we got distracted by this cloud thing, and we haven’t gone back to refresh our research, given some minor shifts in how data is used and stored with SaaS driving the front office and IaaS/PaaS upending the data center (yes that was sarcasm). We described a lot of our thinking of the early stages of this transition in Tidal Forces 1 and Tidal Forces 3, and it seems (miraculously) a lot of what we expected 3 years ago has come to pass.
But data security remains elusive. You can think of it as a holy grail of sorts. We’ve been espousing the idea of “data-centric security” for years, focusing on protecting the data, which then allows you to worry less about securing devices, networks, and associated infrastructure. As with most big ideas, it seemed like a good idea at the time.
In practice, data-centric security has been underwhelming as having security policy and protection travel along with the data, as data spreads to every SaaS service you know about (and a bunch you don’t know about), was too much. How did Digital Rights Management work at scale? Right.
The industry scaled back expectations and started to rely on techniques like tactical encryption, mostly using built-in capabilities (FDE for structured data, and embedded encryption for file systems). Providing a path of least resistance to both achieve compliance requirements, as well as “feel” the data was protected. Though to be clear, this was mostly security theater, as compromising the application still provided unfettered access to the data.
Other techniques, like masking and tokenization, also provided at least a “means” to shield the sensitive data from interlopers. New tactics like test data generation tools also provide an option to ensure that developers don’t inadvertently expose production data. But even with all of these techniques, most organizations still struggle with protecting their data. And it’s not getting easier.
The Data Breach Triangle
Back in 2009, we introduced a concept called The Data Breach Triangle, which gave us a simple construct to enumerate a few different ways to stop a data breach. You need to break one of the legs of the triangle.
- Data: The equivalent of fuel – information to steal or misuse.
- Exploit: The combination of a vulnerability or an exploit path to allow an attacker unapproved access to the data.
- Egress: A path for the data to leave the organization. It could be digital, such as a network egress, or physical, such as portable storage or a stolen hard drive.
Most of the modern-day security industry focused on stopping the exploit, either by impacting the ability to deliver the exploit – firewall/IPS or preventing the compromise of the device – endpoint protection. There also were attempts to stop the egress of sensitive data via outbound filters/FW/web proxy or DLP.
As described above, attempts to either protect or shield the data have been hard to achieve at scale. So what do we get? Consistent breaches. Normalized breaches. To the point that an organization losing tens of millions of identities no longer even registers as news.
SaaS exacerbates the issue
Protecting data continues to get more complicated. SaaS has won. As we described in Tidal Forces, SaaS is the new front office. If anything, the remote work phenomenon driven by the inability to congregate in offices safely will accelerate this trend.
Protecting data was hard enough when we knew where it was. I used to joke how unsettling it was back in 1990 when my company outsourced the mainframe, and it was now in Dallas, as opposed to in our building in Arlington, VA. At least all of our data was in one place. Now, most organizations have dozens (or hundreds) of different organizations controlling critical corporate data. Yeah, the problem isn’t getting easier.
Rethinking Data Security
What we’ve been doing hasn’t worked. Not at scale anyway. We’ve got to take a step back and stop trying to solve yesterday’s problem. Protecting data by encrypting it, masking it, tokenizing it, or putting a heavy usage policy around it wasn’t the answer, for many reasons. The technology industry has rethought applications and the creation, usage, and storage of data. Thus, we security people need to rethink data security for this new SaaS reality. We must both rethink the expectations of what data security means, as well as the potential solutions. That’s what we’ll do in this blog series Data Security for the SaaS Age.
We haven’t been publishing as much research over the past few years, so it probably makes sense to revisit our Totally Transparent Research methodology. We’ll post all of the research to the blog, and you can weigh in and let us know that we are full of crap or that we are missing something rather important. Comments on this post are good or reach out via email or Twitter.
Once we have the entire series posted and have gathered feedback from folks far smarter than us, we package up the research as a paper and license it to a company to educate its customers. In this case, we plan to license the paper to AppOmni (thanks to them), although they can decide not to license it at the end of the process – for any reason. This approach allows us to write our research without worrying about anyone providing undue influence. If they don’t like the paper, they don’t license it. Simple.
In the next post, we focus on the solution, which isn’t a product or a service; rather it’s a process. We update the Data Security Lifecycle for modern times, highlighting the need for a systematic approach to identifying critical data and governing the use of that data in the various SaaS applications in use.
Do you ever play those wacky question games with your friends? You know, where the questions try to embarrass you and make you say silly things? I was never much of a game player, but sometimes it’s fun. At some point in every game, a question about your favorite physical feature comes up. A lot of people say their eyes. Or their legs. Or maybe some other (less obvious) feature. It would also be interesting to ask your significant other or friends what they thought. I shudder to think about that. But if you ask me, the answer is pretty easy.
It’s my hair. Yeah, that sounds a bit vain, but I do like my hair. Even though it turned gray when I was in my early 30s, that was never an impediment. It probably helped early in my career, as it made me seem a bit older and more experienced, even though I had no idea what I was doing (I still don’t). The only issue that ever materialized was when I first started dating Mira (who also has great hair). She showed my picture to her daughter (who was 12 at the time), and she asked, “why are you dating that old guy?” That still cracks me up.
This COVID thing has created a big challenge for me. I usually wear my hair pretty short, trimmed with a clipper on the sides, and styled up top. But for a couple of months, seeing my stylist wasn’t an option. So my hair has grown. And grown. And grown. As it gets longer, it elevates. It’s like a bird’s nest elevation. You know, like losing your keys in there elevation. I could probably fit a Smart Car in there if I don’t get it cut at some point soon.
If I’m going to grow my hair out, I want to have Michael Douglas’s hair. His hair is incredible, especially during his Black Rain period. The way his hair flowed as he was riding the motorcycle through Tokyo in that movie. It was awesome, but that is not to be. My destiny is to have big bird nest hair.
Mira told me to shave it off. I have a bunch of friends that have done the home haircut, and it seems to work OK. I learned that a friend of mine has been doing his hair at home for years. And he looks impeccable even during the pandemic. I’m a bit jealous.
I even bought a hair clipper to do it myself. I figured I’d let one of the kids have fun with it, and it would make for a fun activity. What else are we doing? The clipper is still in its packaging. I can’t bring myself to use it. Even if the self-cut turned out to be a total fiasco, my hair grows so fast it would only take a few weeks to grow out. So we aren’t talking about common sense here. There is something deeper in play, which took me a little while to figure out.
I used to wear my hair very short in college during my meathead stage. So it’s not that I’m scared of really short hair. Then I remembered the one time I did a buzz cut as an adult. It was the mid-90s when I was 60 lbs heavier and into denim shirts. Yes, denim shirts were cool back then, trust me. So combine a big dude with a buzz cut in a denim shirt, and then one of my friends told me I looked like Grossberger from Stir Crazy, that was that. No more buzz cut. Clearly, I’m still scarred from that.
I guess I have a bit of a Samson complex. It’s like I’ll lose my powers if I get a terrible haircut. I’m not sure what powers I have, but I’m not going to risk it. I’ll just let the nest keep growing. Mira says she likes it, especially when I gel my hair into submission and comb it straight back. I call it the poofy Gekko look. But I fear the gel strategy won’t last for much longer. By the end of the day, the top is still under control, but my sides start to go a little wacky, probably from me running my hands through my hair throughout the day. I kind of look like Doc Brown from Back to the Future around 6 PM. It’s pretty scary.
What to do? It turns out hair salons were one of the first businesses to reopen in Georgia. So I made an appointment for mid-June to get a cut from my regular stylist. Is it a risk? Yes. And I’ve never checked her license, but I’m pretty sure her name isn’t Deliah. The salon is taking precautions. I’ll be wearing a mask and so will she. We have to wait outside, and she cleans and disinfects everything between customers.
It’s a risk that I’m willing to take. Because at some point, we have to return to some sense of normalcy. And for me, getting my hair cut without risking a Grossberger is the kind of normalcy I need.
The pandemic is hard on everyone. (says the Master of the Obvious) It’s a combination of things. There are layers of fear — both from the standpoint of the health impact, as well as the financial challenges facing so many. We cannot underestimate the human toll, and unfortunately, the US has never prioritized mental health. As I mentioned last week in my inaugural new Insight, I’m not scared for myself, although too many people I care about are in vulnerable demographics. I’m lucky that (at least for now) the business is OK. I work in an industry that continues to be important and for a company that is holding its own.
But it’s hard not to let the fear run rampant. The Eastern philosophies teach us to stay in the moment. To try to focus on what’s right in front of you. Do not fixate on decisions made or roads not taken. Do not think far ahead about all of the things that may or may not come to pass. Stay right here in the experience of the present. And I try. I really try to keep the things I control at the forefront.
Yet there is so much I don’t control about this situation. And that creates a myriad of challenges. For example, I don’t control the behavior of others. I believe the courteous thing to do now is wear a mask when in public. There are certainly debates about whether the masks make a real difference in controlling the spread of the novel coronavirus.
But when someone near me is wearing a mask, it’s a sign (to me anyway) that they care about other people. Maybe I’m immunocompromised (thankfully I’m not). Maybe I live with someone elderly. They don’t know. The fact is they likely don’t have the infection. But perhaps they do. It’s about consideration, not about personal freedoms. I have the right to approach someone sitting nearby and fart (from 6 feet away, of course). But I don’t do that because it’s rude. I put wearing a mask into the same category.
But alas, I don’t control whether other people wear masks. I can only avoid those that don’t. NY Governor Andrew Cuomo said it pretty well.
I don’t control who takes isolation seriously and who doesn’t. Many people have decided to organize small quarantine pods who isolate with each other because they don’t see anyone else. This arrangement requires discipline and trust and doesn’t scale much past 2 or 3 families. Being in a blended household means that I had my pod defined for me. There are my household and the households of both of our former spouses.
It’s hard to keep everyone in sync. My kids were staying with their Mom in the early days of quarantine. But my son was seeing other kids in the neighborhood. Not a lot, but a few. And supposedly those kids were staying isolated – until they weren’t. One of the neighbors had a worker in the house and then had a visitor who was a healthcare professional in Canada. Sigh. So he goes into isolation for two weeks, and I can’t see my kids.
Then my former spouse got religion about isolation and decided that she wasn’t comfortable with my pod, which includes Mira’s former spouse. She doesn’t know him, and in this situation, trust is challenging. Sigh. Another six weeks of not seeing my kids. Mira and I have done a few social distance walks with them, but it’s hard. You wonder if they are too close. So we adapted and set up chairs in a parking lot and hung out. It’s tough. All I wanted to do was hug my kids, but I couldn’t.
To be clear, in the grand scheme of things, this is a minor problem. A point in time that will pass. Maybe in 6 months, or maybe in a year. But it will pass. And I’ve got it good, given my health and ability to still work. Many people don’t. They may be alone, or they may not have a job. Those are big problems. But I also don’t want to minimize my experience. It sucks not to be able to parent your kids.
It’s getting more complicated by the day. Things in Georgia (where I live) are opening up. Many of the kid’s friends are getting together, and the reality is that we can’t keep them isolated forever. So their Mom and I decided we would keep things locked down through the end of May and then revisit our decision in June. My kids could stay with me for a little while. And that happened last week. When I went over to pick them up, I was overcome. It was only a hug, but it felt like a lot more than that. Over the past week, I got to wake them up, pester them to do online classes, eat with them, and sit next to them as we watched something on Netflix.
We were going to figure out week by week where the kids would stay. I’m not going anywhere, so that would work great. But the best-laid plans… I found out that my oldest is seeing her friends. And isn’t socially distancing. Sigh. She’s an adult (if you call 19 an adult), and she made the decision. I’m unhappy but trying to be kind. I’m trying to understand her feelings as her freshman year in college abruptly ended. She went from the freedom of being independent (if you call college independent living) to being locked up in her Mom’s house. That when you are 19, you don’t really think about the impact of your actions on other people. That you can get depressed and forget about the rules and do anything to take a drive with a couple of friends.
And now the other house where my kids live is no longer in my pod. One of the kids is with me, and she’ll stay for a couple of weeks. But after that, we have to go back to isolation. It’ll no more hugs for a while. And it makes me sad.
Hug? originally uploaded by Simon Hayhurst
It’s a sunny late spring day. Mike steps into the dank building and can smell the must. It feels old but familiar. Strangely familiar. The building looks the same, but he knows it’s different. Too much time has passed. He steps into the confessional and starts to talk.
Mike: Forgive me. It’s been almost 3 and a half years since I’ve been here. I’d say it was because I have been busy, which I have. But it’s not that. I spent close to 13 years here, and I had gone through a pretty significant personal transformation. As I was navigating the associated transitions, I guess I just wanted to live a bit and integrate a lot of the lessons I’ve learned behind the scenes for a while.
Confessor: OK. That seems reasonable. How’s that been going?
Mike: Pretty good, I’d say. I mentioned my new love (her name is Mira). We got married in mid-2017. I’ve packed my oldest daughter off to college last August and my step-son leaves for his college hopefully at the end of this summer. We’ve got a wonderful blended family and we’ve made some close friends as well. Physically I’m good as well. I’ve been able to maintain my fitness through intense workouts (thanks to OrangeTheory) and use the time in class as my mindfulness practice. And I just try to improve a little bit each day and live my life with kindness and grace.
Confessor: How’s work going? You mentioned being busy, but what does that mean? Everyone is busy.
Mike: That’s a good point. Culturally there is some kind of weird incentive to be busy. Or to look busy, anyway. Rich and I have been grinding away. Adrian decided to move on last December, so we’ve just kept pushing forward. Evidently cloud security is a thing, so we’ve benefited from being in the right place at the right time. But we spend a lot of time thinking about how work changes and the impact to security. We don’t quite know what it will look like, but we’re pretty sure it accelerates a lot of the trends we’ve been talking about for the past 5 years. I’m also happy to say DisruptOps is doing well (we closed a Series A back in late February). I guess I’m just grateful. I work with great people and I can still pay the bills, so no complaints.
Confessor: Hmmm. So you are in a good spot personally and the business is doing well. It seems that you used the time away from here productively. Why come back now?
Mike: I found that being here was a way of documenting my journey, for me. And that many of the people here enjoyed it and learned a thing or two. The fact is we are in the midst of a very uncertain time. Our society has undergone shocks to the system and we’re all trying to figure out what a “new normal” looks like. I don’t have any answers, to be clear, but I want to share my fears, my hopes, and my experiences and hope that we’ll all navigate these challenging and turbulent waters together.
Confessor: Fear. That’s a good place to start. What are you scared of?
Mike: Simply put, that COVID-19 impacts people that I love. We’ve been lucky so far, taking the quarantine seriously, but I am not taking that for granted and continuing to stay inside. Good thing I can come here virtually. Strangely enough, I have little fear regarding my own physical well-being. I made a deal with Mira that we’d be together for at least 44 years and I plan to make good on that deal. But our parents are old and in some cases, immunocompromised. We can’t control what other people do and whether they respect the threat or the science. So it’s definitely scary.
Confessor: How are you holding up mentally?
Mike: It’s tough. My head was spinning. I was consumed by the news and reacting to most every Tweet. It wasn’t productive. So I’ve started seated meditation again. I just needed to shut down my thoughts, even for a short time, and open up to possibility. To get into the habit of controlling my thoughts, my outlook, and my mood. Meditation helps me do that. And it’s hard to not be able to do the things we love and have no idea when things will return to some semblance of normal. You know, doing simple things that I took for granted, like travel. Mira and I love to travel and we’re very fortunate to go on very cool trips. We can’t see shows or live sports for the time being. That sucks. I also value the time I can spend with clients and at conferences. Who knew that the RSA Conference would be the last time many of us will travel for business for who knows how long? But you make the best of it.
Confessor: We’ve changed a lot in the time that you were away. There are new people here. Some have moved on.
Mike: It’s not like I’m the same person either. We’re all constantly changing. The goal is to navigate change in the most graceful way possible. I like to think my changes have been positive. I don’t need to act like a grump anymore, I was happy to leave that aspect of my persona behind. I think there is also something to be said about the wisdom of experience. I don’t claim to be wise, but I have a lot of experience. Mostly screwing things up. Hopefully, I’ll be able to continue sharing that experience here and we can learn together. We’re in uncharted territory and that can be pretty exciting if you are open to the inevitable changes ahead.
Confessor: So when will you be back? And I suspect it won’t look the same, will it?
Mike: You are pretty perceptive. I always enjoyed that about being here. I’m going to aim to visit twice a month. Maybe more often when I have a lot to say. Maybe a little less often at times too. And yes, it will look a bit different. First off, I’m changing the name. Kind of. When I retired a few years ago, it was because the term incite didn’t fit anymore. But the idea of providing insight does. It’s really what I want to do. So that’s what we’ll call my periodic visits. So welcome back to the Insight.
Confessor: I have to say, I’m glad you’re back. It’s been way too long…
Mike: Thanks. It’s nice to be home.
Photo credit: “seeking confession” from Chris Booth
Although this is a security blog, this post has absolutely nothing to do with security. No parallels from medicine, no mindset lessons, just some straight up biology. As many readers know I am a licensed Paramedic. I first certified in the early 1990’s, dropped down to EMT for a while, and bumped back up to full medic two years ago. Recently I became interested in fight and critical care and completed an online critical care and flight medic course from the great team at FlightBridgeED. Paramedics don’t normally work with ventilators; it is an add-on skill specific for flight and critical care (ICU) transports. I’m a neophyte to ventilator management, with online and book training but no practice, but I understand the principles and thanks to studying molecular biology back in college have a decent understanding of cellular processes.
COVID-19 dominates all our lives now, and rightfully so. Ventilators are now a national concern and one the technology community is racing to help with. Because of my background I’ve found myself answering a lot of questions on COVID-19, ARDS, and ventilators. While I’m a neophyte at running vents, I’m pretty decent at translating complex technical subjects for non-experts. Here’s my attempt to help everyone understand things a little better.
The TL;DR is that COVID-19 causes damage to the lungs which, for some people, triggers your body to over-react with too much inflammation. This extra fluid gets in the way of gas exchange in your lungs and oxygen can’t get as easily into your bloodstream. You don’t actually stop breathing, so we use the ventilators to change the pressures and oxygen levels to try and diffuse more oxygen through this barrier and into your lungs without causing more damage by over-inflating them.
We start with respiration
Before we get into COVID and ventilators we need to understand a little anatomy and physiology.
Cells need oxygen to convert fuel into energy . Respiration is the process of getting oxygen into cells and taking waste products, predominantly CO2, out. We get oxygen from our environment and release CO2 through ventilation. Which is just air moving in and out of our lungs. Those gases are moved around in out blood and the actual gas exchange occurs in the super-small capillaries that basically wrap around our cells. The process of getting blood to tissues is called perfusion.
This is just all some technical terms to say our lungs take in oxygen and release carbon dioxide, moves the gases around using our circulatory system, and exchanges the gases in and out of the cells down in those super small capillaries. Oxygen is a toxin and CO2 diffused in blood is an acid so our body has all sorts of mechanisms to keep things running. Everything works thanks to diffusion and a few gas laws (Graham’s, Henry’s, and Dalton’s being big ones).
Our lungs have branches and end in leaves called alveoli. Alveoli are pretty wild; they have super thin walls to allow the gases to pass through and are surrounded by capillaries to transfer the gasses into and out of our blood. They look like clumps of bubbles since they maximize surface area to facilitate the greatest amount of gas exchange in the smallest amount of space. Healthy alveoli are covered with a thin liquid called surfactant that keeps them lubricated so they can open and close and slide around each other as we breathe. Want to know one reason smokers and vapers have bad lungs? All those extra chemicals muck up surfactant, thicken the cell walls, cause other damage and a bunch of the alveoli clump together, losing surface area, in a process called atelectasis (remember that word for a few paragraphs).
Our body always wants to keep things in balance and has a bunch of tools to nudge things in different directions. The important bit for our discussion today is that ventilation is managed using how much we breathe in for a given breathe (tidal volume), and how many times a minute we breathe (respiratory rate). This combination is called our minute ventilation and it’s about 6-8 liters per minute. This matches our circulation (cardiac output), which is around 5 liters per minute at rest. The amount of oxygen delivered to our cells is a combination of our cardiac output and the amount of oxygen in our blood.
We need good gas exchange with our environment, good gas exchange into our bloodstream, and good gas exchange into our cells. COVID-19 screws up the gas exchange in our lungs and everything falls apart from there.
Acute Respiratory Distress Syndrome
ARDS is basically your body’s immune system gone haywire. It starts with lung damage, which can be an infection, trauma, or even metabolic. One of the big issues with ventilators is that we can actually cause ARDS with the wrong settings. This triggers an inflammatory response. A key piece of inflammation is various chemical mediators alter cell walls, especially those capillaries, and they start leaking fluid. In the lungs this causes a nasty cascade:
- Fluid leaks from the capillaries and forms a barrier/buffer of liquid between the alveoli and the capillaries and separating them. This makes gas exchange harder.
- Fluid leaks into the alveoli themselves, further inhibiting gas exchange.
- The cells are damage by all this inflammation, triggering yet a stronger immune response. Your body is now in a negative reinforcement cycle and making things worse by trying to make them better.
- This liquid and a bunch of the inflammation chemicals dilute the surfactant and damage the alveolar walls, causing atelectasis. In later stages of ARDS your body starts throwing in additional tools which effectively stuffed up the lungs and put even more barriers in place for gas diffusion.
- The flood of all these inflammatory mediators and other cells/chemicals your body needs for its immune response can cause other issues and shortages throughout the rest of the body.
The net result is gas exchange is MUCH harder and your body is fighting against itself, making it worse. ARDS is REALLY HARD to reverse since we need to try and keep the gas exchange running while the body chills the hell out. If you do survive, you probably have lung damage that will take a long time to recover from. And let’s be clear- I am skipping some major aspects of ARDS to keep things as simple as possible.
COVID-19 and ARDS
SARS-CoV-2 attacks the lungs. This can trigger ARDS, but as you can see from what I described it… might not. It all comes down to your biology, how bad your infection is, and how your body responds.
If you are infected and have shortness of breath that could be because of the direct infection itself, like any other respiratory infection. Your gas exchange is still probably decent but you are exhausted due to less oxygen and your body’s general immune response.
At some point ARDS might kick in. ARDS can be super rapid, which explains some people feeling sick but okay and then dying that night. Based on my current reading we don’t know all the risk factors to go from COVID-19 to ARDS. ARDS is the main killer with COVID-19.
Mechanical Ventilation for ARDS
The primary treatment for ARDS is mechanical ventilation. But a specific way of using ventilators! This is the confusing bit I see lost in a lot of non-medical-professional discussions.
The most basic ventilation settings are tidal volume and respiratory rate- that “minute ventilation” we talked about, and we are just replicating a person breathing. That isn’t enough to treat ARDS.
ARDS is a failure of gas diffusion, not ventilation. We are moving air but the oxygen isn’t getting into the blood, and the carbon dioxide isn’t getting out. It’s because all that swelling creates a shunt that blocks the gas exchange.
We fix this using two things – we first increase expiratory pressure. We call this “PEEP” and basically we keep extra pressure in the lungs so they never fully deflate. This keeps as many alveoli as open as possible, maximizing surface area, and adding extra pressure to try and “push” the gas into the bloodstream. We still need to watch our overall pressure since that just irritates the lungs more and makes the problem worse. See the complexity here? We need more pressure to improve gas diffusion (notice I’m skipping over CO2 for now; it’s a factor but O2 is the bigger issue). Keeping a bit of constant pressure also reduces the movement of the alveolar walls, reducing irritation and injury.
The second thing we do is increase the amount of oxygen in each breath. But not too much, since if we hit 100% in the bloodstream all the extra oxygen is toxic.
For ARDS it’s really important to keep track of a range of pressures beyond PEEP. The idea is to run a lower tidal volume with a higher respiratory rate to reduce further lung injury, but use higher expiratory pressure to keep the alveoli inflated and gas exchanging. It would be a LOT easier if we could just ramp all the pressures up and push the oxygen through all the extra fluid and into the bloodstream more forcefully, but this just exacerbates the problem in ARDS. Instead we balance pressures to keep as much pressure in the lungs as long as possible, WITHOUT stretching the lungs and causing more damage. Aside from PEEP we can control the driving pressure (how hard the push is), the overall Mean Airway Pressure (the pressure average from breath to breath), inspiration and expiration times, inspiratory flow rate, and more. All of these are adjusted to try and recruit as many alveoli as possible and diffuse as much gas as possible, without creating further injury.
Think of everything I just described and at no point did I say “and then the patient stops breathing”. We use different sedation and pain management strategies, but unless we paralyze the patient (which we try to minimize) they will also be breathing spontaneously. Ventilators use sensors to detect this and, depending on the mode, support spontaneous breaths. This is also important to help get the patient off the vent and breathing on their own.
The important piece to remember is that when treating ARDS with mechanical ventilation we are really supporting respiration, which is the gas exchange. We use different features of ventilators to diffuse as much oxygen as possible into the bloodstream (and eventually our cells) while trying to minimize additional damage. COVID-19 sometimes causes ARDS which creates a “shunt” by separating the capillaries from the alveoli with fluid, plus leaking fluid into the lungs, all of which gets in the way of oxygen molecules getting into the bloodstream and then being delivered to cells.
Hit me up if you have any other questions and I’ll do my best to respond. Hopefully I hit the right balance of medical jargon and layperson terms to make this a little bit more understandable.
This is the third post in our series, “Network Operations and Security Professionals’ Guide to Managing Public Cloud Journeys”, which we will release as a white paper after we complete the draft and have some time for public feedback. You might want to start with our first and second posts. Special thanks to Gigamon for licensing. As always, the content is being developed completely independently using our Totally Transparent Research methodology.
Learning cloud adoption patterns doesn’t just help us identify key problems and risks – we can use them to guide operational decisions to address the issues they consistently raise. This research focuses on managing networks and network security, but the patterns include broad security and operational implications which cover all facets of your cloud journey. Governance issues aside, we find that networking is typically one of the first areas of focus for organizations, so it’s a good target for our first focused research. (For the curious, IAM and compliance are two other top areas organizations focus on, and struggle with, early in the process).
Recommendations for a Safe and Smooth Journey
Mark sighed with relief and satisfaction as he validated the VPN certs were propagated and approved the ticket for firewall rule change. The security group was already in good shape and they managed to avoid having to add any kind of direct connect to the AWS account for the formerly-rogue project.
He pulled up their new cloud assessment dashboard and all the critical issues were clear. It would still take the IAM team and the project’s developers a few months to scale down unneeded privileges but… not his problem. The federated identity portal was already hooked up and he would get real time alerts on any security group changes.
“Now onto the next one,” he mumbled after he glanced at his queue and lost his short-lived satisfaction.
“Hey, stop complaining!” remarked Sarah, “We should be clear after this backlog now that accounting is watching the credit cards for cloud charges; just run the assessment and see what we have before you start complaining.”
Having your entire organization dragged into the cloud thanks to the efforts of a single team is disconcerting, but not unmanageable. The following steps will help you both wrangle the errant project under control, and build a base for moving forward. This was the first adoption pattern we started to encounter a decade ago as cloud starting growing, so there are plenty of lessons to pull from. Based on our experiences, a few principles really help manage the situation:
- Remember that to meet this pattern you should be new to either the cloud in general, or to this cloud platform specifically. These are not recommendations for unsanctioned projects covered by your existing experience and footprint.
- Don’t be antagonistic. Yes, the team probably knew better and shouldn’t have done it… but your goal now is corrective actions, not punitive.
- You goal is to reduce urgent risks while developing a plan to bring the errant project into the fold.
- Don’t simply apply your existing policies and tooling from other environments to this one. You need tooling and processes appropriate for this cloud provider.
- In our experience, despite the initial angst, these projects are excellent opportunities to learn your initial lessons on this platform, and to start building out for a larger supported program. If you keep one eye on immediate risks and the other on long-term benefits, everything should be fine.
The following recommendations go a long way towards reducing risks and increasing your chances of success. But before the bullet points we have one overarching recommendation: As you gain control over the unapproved project, use it to learn the particulars of this cloud provider and build out your core cloud management capabilities. When you assess, set yourself up to support your next ten assessments. When you enable monitoring and visibility, do so in a way which supports your next projects. Wherever possible build a core service rather than a one-off.
- Step one is to figure out what you are dealing with:
- How many environments are involved? How many accounts, subscriptions, or projects?
- How are the environments structured? This involves mapping out the application, the PaaS services offered by the provider (they offer PaaS services such as load balancers and serverless capabilities), the IAM, the network(s), and the data storage.
- How are the services configured?
- How are the networks structured and connected? The Software Defined Networks (SDN) used by all major cloud platforms only look the same on the surface – under the hood they are quite a bit different.
- And, most importantly, Where does this project touch other enterprise resources and data?!? This is essential for understanding your exposure. Are there unknown VPN connections? Did someone peer through an existing dedicated network pipe? Is the project talking to an internal database over the Internet? We’ve seen all these and more.
- Then prioritize your biggest risks:
- Internet exposures are common and one of the first things to lock down. We commonly see resources such as administrative servers and jump boxes exposed to the Internet at large. In nearly every single assessment we find at least one instance or container with port 22 exposed to the world. The quick fix for these is to lock them down to your known IP address ranges. Cloud providers’ security groups are very effective because they just drop traffic which doesn’t meet the rules, so they are an extremely effective security control and a better first step than trying to push everything through an on-premise firewall or virtual appliance.
- Identity and Access Management is the next big piece to focus on. This research is focused more on networking, so we won’t spend much time on this here. But when developers build out environments they almost always over-privilege access to themselves and application components. They also tend to use static credentials, because unsanctioned projects are unlikely to integrate into your federated identity management. Sweep out static credentials, enable federation, and turn on MFA everywhere you can.
- Misconfigurations of cloud services are next. Public storage buckets, unsecured API gateways, and other services which are Internet exposed but won’t show up if you only look at the virtual networks.
- After cleaning those up it’s time to start layering in longer-term remediations and your gameplan. This is a huge topic so we will focus on network management and security:
- On early discovery of developer-led projects, it is very common to want to tie the errant cloud account back into your on-premise network for connectivity and management. This instinct is usually wrong. Networking wasn’t involved at the start, so it is unlikely there is an established network connection, and adding one won’t necessarily provide any benefits. If the account is okay on its own, leave it. While outside the scope of this research, a wide range of techniques is available to provide necessary services to disconnected cloud accounts… or cloud-native connections such as service endpoints which achieve the same goals without the heavy lifting on fat pipes and CIDR segmenting.
- We aren’t suggesting you don’t manage the network – we are saying you don’t need to simply wire it up to your existing infrastructure to manage it or the resources it contains.
- A big complicating problem to integrating an unplanned SDN is the existing IP addressing (if there’s even a virtual network – a real question thanks to the new serverless architectures). This may be further motivation to keep it as a separate enclave.
- We assume you followed our advice above and locked down the perimeter. Now it’s important to fully map out all the internal connections, including connections between different virtual networks and accounts which are peered or otherwise connected using cloud-native techniques such as service endpoints.
- One of the most common networking mistakes seen in these kinds of projects is too-open internal networks. Clouds default to least privilege, but it is still all too easy to just open everything up to reduce friction during development. Use your map to start compartmentalizing internally. This may include network structure changes (routing and subnet modifications) which are easier to update with API calls and console clicks than stringing wires between routers.
- Security groups should reference each other (in Azure you need Application Security Groups) instead of relying on IP addressing for internal cloud connections. This is fundamental to cloud networks, but not where people with traditional network security backgrounds tend to start.
- Virtual security appliances (we are mostly talking about firewalls, IDS, and IPS) should only be used when security groups and native cloud capabilities can’t meet your needs. Virtual appliances are expensive to run because cloud providers charge for their compute cycles, and they create unnecessary chokepoints which affect performance and reliability. The most common situation you still need them for are FQDN-based outbound filtering, and specific blocklists which are hard or impossible to enforce with cloud-native security groups.
- Lastly, once everything is in a known good state, you should implement continuous configuration assessments and guardrails to keep things that way. For example in a production application you should generate an alert for any security group change, creation of new internet gateways, and other structural changes. All providers support monitoring these changes but you will likely need third-party tooling to pull the results together across providers and accounts.
- On early discovery of developer-led projects, it is very common to want to tie the errant cloud account back into your on-premise network for connectivity and management. This instinct is usually wrong. Networking wasn’t involved at the start, so it is unlikely there is an established network connection, and adding one won’t necessarily provide any benefits. If the account is okay on its own, leave it. While outside the scope of this research, a wide range of techniques is available to provide necessary services to disconnected cloud accounts… or cloud-native connections such as service endpoints which achieve the same goals without the heavy lifting on fat pipes and CIDR segmenting.
Overall the key to handling this situation is to avoid panic, focus on obvious risks first, and then take your time to sweep through the rest in as cloud and provider specific a way as possible. Use it as a base to build your program, understanding that you will need to make short-term sacrifices to handle any significant exposures.
Data Center Transformation
Sarah snagged an extra chair outside Mark’s cubicle as he shoved a pile of office detritus to the side to make space for her laptop.
“Okay, “ she started, “I published the PrivateLink endpoint for the log receiver and set the internal domain name, but I need you to open up the security group and approve my PR on the CloudFormation templates to deploy all the service endpoints into the VPCs.”
“No problem, “ he replied, “we got approval from the cloud team last week so we’re good to go. Do we need to talk to the server image team to embed the DNS for the log agents?”
“They already have it and are publishing the new base AMIs that need it. We think most teams will just set their agents to save instance and container logs directly to S3, but some of the legacy stuff still needs to push them on the network. We are also letting teams use their own PrivateLink addresses if they want to do a swap out of a local collector.:
“Nice, “ said Mark, “this will really help us drop some peering connections on the transit gateway. And I’m meeting with the database team next week to see if we can start moving them over.”
Large multi-year data center moves are some of the most complex projects in information technology. Moving everything from one physical location to another is a massive undertaking. Doing so while keeping services up and running, without shutting the business down (either planned or unplanned) even more so. Swapping to an entirely different technology foundation at the same time? That can be the definition of insanity, yet every single organization of any size does it at some point.
The most common mistakes we see involve shoehorning traditional architectural and security concepts into the cloud – which can lead to extended timelines, increased costs, and long-term management issues. A few principles will keep you moving in the right direction:
- If you are bad at network management and security in your existing data center, you will be pleasantly surprised at how little changes in cloud. Look at cloud as an opportunity to do things better, ideally in a cloud-native way. Don’t just bring across your existing practices without change – especially bad habits.
- Time is your friend. Don’t rush, and don’t let your cloud provider push you into moving faster than you are comfortable with. Their priorities are not yours.
- Don’t assume your existing tools and processes will work well in the cloud. We see many organizations bring things across due to employee familiarity, or because they already have licenses. Those aren’t the best reasons to deploy something in an entirely new operating environment.
- That said, these days many products offer extensions for the cloud. You should still evaluate them instead of assuming they will meet your needs in the cloud, but they might be a useful bridge.
- Learn first, move second. Take the time (if you have it) to hire and build the skills needed to operate on your new platform. You absolutely cannot expect your existing team to handle both the current environment and cloud if you don’t give them the time to learn the skills and do the job.
In the developer led pattern we had to balance closing immediate risks against simultaneously building support for an entirely new operating environment and preparing for long-term support. Scary and difficult, but also usually self-constrained to something manageable like a single application stack. In a data center transformation the challenges are scaling, transitioning completely to a new environment, and any need to carry over legacy resources not designed to run in cloud.
- Start by building your plan:
- You will not want to run everything in a single huge account/subscription/project on just 1-3 virtual networks. This is all too common and falls apart within 18-24 months due to service limits, how cloud networks work differently, and cloud-native application requirements.
- You will want multiple cloud environments (accounts/subscriptions/projects are the terms used by different providers) and very likely multiple virtual networks in each environment. These are needed for blast radius control, managing service limits, and limiting IAM scopes.
- Map out your existing applications and environments (networks, cross-app connectivity, associated security controls, and related supporting services such as DNS and logging). Create a registry and then prioritize and sequence your moves.
- Map out your application dependencies. You might have 50 applications which all connect back to a shared customer database. This directly impacts how you structure your accounts, virtual networks, and connectivity options.
- Design a flexible architecture. Think of it as a scaffold to build on as you pull project by project into the cloud. You don’t need to definitively plan out every piece of the migration before you move, unless you really like spending massive amounts of money on project managers and consultants.
- Then start building the scaffold:
- Start with foundational shared services you will need across all your cloud environments: logging/monitoring receivers, cloud assessment (cloud security posture management), cloud automation (including cloud detection and response), other visibility/monitoring tools, and IAM.
- Your will likely need at least one transit network (a central virtual network used to peer your other virtual networks, even across cloud environments). Design this network (in its own account) for transit only… don’t design it to contain any actual resources (except possibly some shared services).
- Many shared services work better as “endpoint services” which are published within the cloud provider but don’t require network peering outside. Implementation is quite different at each cloud provider, so we can’t get more specific in this research, but endpoint services really enable you to take advantage of cloud software defined networks, and reduce reliance on fixed IP addresses and traditional network segmentation.
- Build infrastructure as code templates for your “landing zones” for the new accounts you will create for various projects. These can and should embed foundational security controls, such as links to transit networks and endpoint services (as appropriate), baseline network security controls, and implementation of the assessment, monitoring/logging, visibility, automation, IAM, and other core tools you use to track each of your environments.
- Don’t forget, these are just pointers to get you started – we aren’t trying to downplay the complexity of such these projects.
- With the scaffold in place it’s time to start migrating workloads:
- Think of this as an iterative process. Just as you build a scaffold and smaller environments, move your projects over in a prioritized order to help you learn as you go.
- As you move each project over, try to refactor and rearchitect to the best of your ability. For example you should “fit the network to the application” – you can now have multiple software designed networks to each contain the bare minimum to support one project. This really helps reduce attack surface and provides compartmentalization.
- Keep up with continuous assurance. Mistakes happen and your shared monitoring, visibility, and remediation tools will help reduce exposure. Don’t wait until the end for one big assessment.
These migrations and transformation can be overwhelming if you try to plan everything out as one giant project. If you think in terms of building central services and a scaffold, then migrating projects iteratively, you reduce risk while increasing your chance of success.
Clarice clicked swapped in the new security group and closed out the last (for now) high priority ticket. She checked her queue and the latest assessment results and everything looked okay,
“Well,” she thought to herself, “I guess it’s time to start hitting the internal groups.”
She Slacked Bill, “I think I have the dev teams locked down to our sanctioned CIDRs, how’s the service endpoint project going?”
“Pretty good… the log receiver is set up and we are close to cutting over the customer DB. We still need to peer the CRM stack’s network, but I think we can start weening off some of the marketing apps and shut down those direct network connections.”
“Cool. Paul is assessing the rest of the spaghetti mess. It will take a bit to break out most of the apps into their own accounts, but at least we have a good base for the new projects.”
Snap migration can be the riskiest of all adoption patterns. Short timelines, critical resources, and rarely the skills and staff needed. They combine the messiness of the developer led pattern with the scale of a data center transformation. In our experience these projects often include a heavy dose of cloud provider or consultant pressure to move fast and gloss over complexity.
Let’s start with our principles:
- Your primary objective is to minimize immediate risk while creating a baseline to use as you clean things up over time after the cutover.
- Get the right people with the right skills. This includes training, hiring, and consulting. Make sure you really vet the people you are bringing in – even your cloud provider’s experts may be fresh out of school with little real-world experience.
- Don’t just copy and paste your existing network into the cloud. This approach always fails within 18-24 months, for many reasons we have already cited.
- Constantly look for opportunities to manage blast radius. Use multiple virtual networks and accounts, and only connect them where needed.
- You typically won’t have time in a snap migration for any serious refactoring or rearchitecting. Instead focus on a strong scaffold and management controls, with the expectation that you can start making things a little more cloud native once the main cutover is complete.
These are simply bad situations, which you need to make the best of. Making some smart decisions early on will go a long way to helping you set yourself up for iterative cleanup after the mad rush is over.
- Start by building a scaffold – not a parking lot.
- Follow our recommendations for the data center transformation pattern.
- While you might need to replicate your current network, nothing says you have to do that in a single virtual network. With peering and transit networks, you can architect your new cloud network with subnets in separate virtual networks and accounts based on projects, then connect them together with your cloud provider’s peering capabilities. For example you can create the 10.0.1.0/24 subnet in one virtual network in one cloud account, and the 10.0.2.0/24 subnet in an entirely different virtual network and account, then peer them together.
- This improves your long-term security because account segregation, even across peered networks, helps manage the service limit and IAM issues which cause so many problems when everything is in one account. For example if different projects share the same virtual network, it is hard to designate IAM privileges so the various administrators cannot affect each other’s resources.
- Knowing your subnets and connectivity requirements are key factors for success.
- As with our data center transformation pattern, build your shared services after (or concurrently with) your network scaffold.
- Be cautious and judicious allowing Internet access. Controlling the public perimeter early is crucial. Quite a bit can be accidentally opened up during data migrations, as teams rush to throw assets into the cloud, so make sure you keep a continuous eye on things.
- Also track the network connections to your on-premise environments. At some point many of these openings should be shut down, as projects complete migration and no longer need to call back to the doomed data center.
- To the best of your ability also implement in-cloud network segregation with security groups. Another issue we often see is excessive security group openings within the network – ops, devs, or even security may not know all the right port and protocol combinations for a given application. There is literally zero cost to more security groups, which are effectively firewalls around every resource. Use them to your advantage and dial down permissions.
- In the long term you will want to sweep through and refactor and rearchitect where you can. This is much easier if you migrated into multiple accounts and virtual networks.
Native New Build
Maria checked the assessment results from Dev and everything looked good. The Internet facing bit was just a single page app hosted in S3, but the Lambda functions needed network access to hit the Elasticsearch cluster. The security groups were locked down tight and the logging all hooked in using S3 and SNS so they didn’t need to link back using the logging PrivateLink. The security and networking IAM roles had the right permissions for the monitoring tools and the IR team could escalate to write access as needed.
“Hey John, do you know what org unit we are dropping this marketing app into? I want to check the SCPs to make sure nothing will break.”
“Yep, let me check… ” he replied, “looks like the default marketing one.”
“Cool, I’ll go approve it for prod and promote the Terraform build.”
Cloud native doesn’t mean a project is inherently secure, but it does completely shift the security and networking focus. The key principles are:
- Cloud security and operations start with architecture and end with automation. A well-designed architecture will reduce most risks. Automation maintains a strong and safe posture over time.
- Serverless, containers, and other emerging technologies are the norm. You may or may not have networks, and the networks you do have will be quite different from traditional infrastructure.
- Your public-facing perimeter is more than just what your virtual networks expose. Many services in cloud providers are (potentially) directly public-facing, and must be managed at the configuration level.
- Subdomain takeovers in cloud are very common due to these services. Make sure you are monitoring at the DNS level, not just IP addresses.
- The biggest issues we see for this pattern are mostly related to governance. Dev teams are allowed to move fast and break things, and while there is nothing inherently wrong with that, it becomes a problem when they move faster than security can contain risks. Early engagement, architectural support, continuous monitoring, and strong team relations are essential for success.
- Fit networks to applications. We will talk more about this later in recommendations, but this is a core philosophy: start with the application’s needs, and build the network to fit them.
As your organization becomes more and more cloud-native, you will want to start with your people and setting up a secure foundation for individual projects to execute on:
- Invest in people. Hire smart, train them, and allow them to become experts on the platforms you deploy on. When you transition employees with traditional skills to build cloud-native projects, don’t force them to split their time. Let them focus.
- Your scaffold will be similar to the ones we recommend for data center transformation, but you should plan on different network and security architectures. In many cloud-native deployments there is no customer-managed network.
- Rely more on object storage (such as S3), service endpoints, API gateways, and other tools which don’t require managing IP addresses for shared services. That said, you will always still need some virtual networks and a transit gateway.
- Set standards for your container networks and integrate them into your overall network management. Publish guidelines and even templates to build an easy path for independent teams to follow. Container networks can be easy to lose track of, especially when they are self-contained.
- Continuous integration and infrastructure as code are your friends. Develop supported templates for different patterns (e.g., serverless, containers, standard virtual networks) which integrate your monitoring, logging, management, and security tools. Project teams can build these into their own templates; offering an easy path again helps encourage compliance.
- You will need to continuously monitor and enforce your standards across hundreds or even thousands of cloud accounts. Build this early and automate provisioning through infrastructure as code and other automation capabilities.
As a final reminder, cloud native architectures and operations are very different. Your core skills and objectives are the same, but the implementation details are incredibly different and often don’t even translate between cloud providers. Providers launch new features and services on a daily basis, further challenging overworked security and operations teams.
Learn, take your time, work well with project teams, be nimble, and if you are in management… give your people time to keep up with the rapid rate of change.
This is the second post in our series, “Network Operations and Security Professionals’ Guide to Managing Public Cloud Journeys”, which we will release as a white paper after we complete the draft and have some time for public feedback. You might want to start with our first post. Special thanks to Gigamon for licensing. As always, the content is being developed completely independently using our Totally Transparent Research methodology.
Understanding Cloud Adoption Patterns
Cloud adoption patterns represent the most common ways organizations move from traditional operations into cloud computing. They contain the hard lessons learned by those who went before. While every journey is distinct, hands-on projects and research have shown us a broad range of consistent experiences, which organizations can use to better manage their own projects. The patterns won’t tell you exactly which architectures and controls to put in place, but they can serve as a great resource to point you in the right general direction and help guide your decisions.
Another way to think of cloud adoption patterns is as embodying the aggregate experiences of hundreds of organizations. To go back to our analogy of hiking up a mountain, it never hurts to ask the people who have already finished the trip what to look out for.
Characteristics of Cloud Adoption Patterns
We will get into more descriptive detail as we walk through each pattern, but we find this grid useful to define the key characteristics.
|Characteristics||Developer Led||Data Center Transformation||Snap Migration||Native New Build|
|Size||Medium/Large||Large||Medium/Large||All (project-only for mid-large)|
|Vertical||All (minus financial and government)||All, including Financial and Government||Variable||All|
|Speed||Fast then slow||Slow (2-3 years or more)||18-24 months||Fast as DevOps|
|Security||Late||Early||Trailing||Mid to late|
|Network Ops||Late||Early||Early to mid||Late (developers manage)|
|Tooling||New + old when forced||Culturally influenced; old + new||Panic (a lot of old)||New, unless culturally forced to old|
|Budget Owner||Project based/no one||IT, Ops, Security||IT or poorly defined||Project-based, some security for shared services|
- Size: The most common organization sizes. For example developer-led projects are rarely seen in small startups, because they can skip directly to native new builds, but common in large companies.
- Vertical: We see these patterns across all verticals, but in highly-regulated ones like financial services and government, certain patterns are less common due to tighter internal controls and compliance requirements.
- Speed: The overall velocity of the project, which often varies during the project lifetime. We’ll jump into theis more, but an example is developer-led, where initial setup and deployment are very fast, but then wrangling in central security and operational control can take years.
- Risk: This is an aggregate of risk to the organization and of project failure. For example in a snap migration everything tends to move faster than security and operations can keep up, which creates a high chance of configuration error.
- Security: When security is engaged and starts influencing the project.
- Network Ops: When network operations becomes engaged and starts influencing the project. While the security folks are used to being late to the party, since developers can build their own networks with a few API calls, this is often a new and unpleasant experience for networking professionals.
- Tooling: The kind of tooling used to support the project. “New” means new, cloud-native tools. “Old” means the tools you already run in your data centers.
- Budget Owner: Someone has to pay at some point. This is important because it represents potential impact on your budget, but also indicates who tends to have the most control over a project.
Characteristics of Cloud Adoption Patterns
In this section we will describe what the patterns look like, and identify some key risks. In our next section we will offer some top-line recommendations to improve your chances of success.
One last point before we jump into the patterns themselves: while they focus on the overall experiences of an organization, patterns also apply at the project level, and an organization may experience multiple patterns at the same time. For example it isn’t unusual for a company with a “new to cloud” policy to also migrate existing resources over time as a long-term project. This places them in both the data center transformation and native new build patterns.
Mark was eating a mediocre lunch at his desk when a new “priority” ticket dropped into his network ops queue. “Huh, we haven’t heard from that team in a while… weird.” He set the microwaved leftovers to the side and clicked open the request… Request for firewall rule change: Allow port 3306 from IP 52.11.33.xxx/32. Mission critical timeline. “What the ?!? That’s not one of our IPs?” Mark thought as he ran a lookup. “amazonaws.com? You have GOT to be kidding me? We shouldn’t have anything up there. Mark fired off emails to his manager and the person who sent the ticket, but he had a bad feeling he was about to get dragged into the kind of mess that would seriously ruin his plans for the next few months.
Developer-led projects are when a developer or team builds something in a cloud on their own, and central IT is then forced to support it. We sometimes call this “developer tethering”, because these often unsanctioned and/or uncoordinated projects anchor an organization to a cloud provider, and drag the rest of the organization in after them. These projects aren’t always against policy – this pattern is also common in mergers and acquisitions. This also isn’t necessarily a first step into the cloud overall – it can also be a project which pulls an enterprise into a new cloud provider, rather than their existing preferrred cloud provider.
This creates a series of tough issues. To meet the definition of this pattern we assume you can’t just shut the project down, but actually need to support it. The project has been developed and deployed without the input of security or networking, and may have access to production data.
- Size: We mostly see this pattern in medium and large organizations. In smaller enterprises the overall scope of what’s going on is easier to understand, whereas larger organizations tend to have an increasing number of teams operating at least semi-independently. Larger organizations are also more likely to engage in M&A activity which forces them to support new providers. In fact most multi-cloud deployments we run into result directly from acquiring a company using something like Azure when everything else is on AWS.
- Vertical: This pattern is everywhere, but less common in highly regulated and tightly governed organizations – particularly financial services and government. Not every financial services organization is well-governed so don’t assume you are immune, but controls tend to be tighter on more sensitive data, so when you do hit this pattern the risk might be lower. In government it’s actually budgets more than regulations which limit this pattern – few government employees can get away with throwing AWS onto corporate cards.
- Speed: In the beginning, at least once security and networking find out about the project, there is a big rush to manage the largest risks and loop the project into some sort of central management. This flurry of activity then slows down into a longer, more methodical wrangling to bring everything up to standard. It starts with stopgaps, such as opening up firewalls to specific IP ranges or throwing in a VPC connection, which is followed by a longer process to rework both the deployment and internal support, such as by setting up direct connect.
- Risk: These are high-risk situations. Security was likely not involved, and we often find a high number of configuration errors when assessing these environments. They can often function as an isolated outpost for a while, but there are still risks of failed integration when the organization tries to pull them back into the fold – especially if they require complicated network connectivity.
- Security: Security is typically involved only late in development or actually deployed, since the project team was off running on their own.
- Network Ops: As with security, networking enters late. If the project doesn’t require connectivity back to an existing network they might not be involved at all.
- Tooling: Most often these projects leverage integrated tools provided by the cloud service provider. There is rarely budget for security or network specific tooling beyond that, since CSP tools all ‘hide’ within basic costs of cloud deployment. One problem we sometimes see is that after the project is discovered and there’s the mad rush to bring it under central management, a bag of existing tools which might be a poor fit for the cloud platform are forced into place. This is most common with network and endpoint security tools which aren’t cloud native – a virtual appliance isn’t necessarily a cloud native answer to a problem.
- Budget Owner: The project team somehow managed to get budget to start the deployment, which they can use as a cudgel to limit external management. This may fall apart as the project grows and costs increase (pro tip – they always do – and the project has to steal budget from someplace else).
These should be obvious; you have an unsanctioned application stack running in an unapproved cloud, with which you may have little experience, uncoordinated with security or networking. However many project teams try to do the right things. You can’t assume the project is an abject failure. Some of these projects are significantly better designed and managed, from the cloud standpoint, than lift and shift or other cloud initiatives. It all depends on the team. Based on our experiences:
- Security configuration errors are highly likely.
- There may be unapproved and ad hoc network connections back to existing resources. At times these are unapproved VPN connections, SSH jump boxes, or similar.
- Deployment environments may be messy, full of cruft and design flaws.
- Development/testing and production are generally intermingled in the same account/subscription/project, which creates a larger blast radius for attacks.
Data Center Transformation
Mark glanced up at the wall of sticky notes layered on top of the whiteboard’s innumerable chicken scratched architectural diagrams. A year of planning and setup, and they were finally about to move the first production application.
“Okay,” started Sarah, “we’ve tested the latency on the direct connect but we are still having problems updating the IPs for the firewall rules in the Dallas DC. The application team says they need more flexibility since they want to deploy with infrastructure as code and use auto scale groups in different availability zones. They claim that trying to manage everything with such restricted IPs doesn’t work well in their VNets. Something about web servers and app servers reusing each other’s IPs as they scale up and down and the firewall team is too slow to update the rules.”
Mark interjected, “What if we drop the firewalls on the direct connects? Then they can use what they want within their CIDR blocks?
“Isn’t that a security risk?”
“I don’t think so,” replied Mark, “after going through the cloud training last month I’m starting to believe we’ve been thinking about this all wrong. The cloud isn’t the untrusted network – we’re just as likely to get breached from the data center or someone compromising an admin’s laptop on the corporate network.”
Data center transformations are long-term projects, where the migration is methodical and centrally planned. That isn’t always beneficial – these projects are often hindered by overanalysis and huge requirements documents, which can result in high costs and slow timelines. They also tend to bring their own particular set of design flaws. In particular, there is often a focus on building a perfect landing zone or “minimum viable cloud” which replicates the existing data center, rather than taking advantage of native capabilities of the cloud platform. Existing tooling and knowledge are thrown at the problem, rather than trying to do things the “cloud way”.
Not to spoil our recommendations, but treating the migration as a series of incremental projects rather than a monolithic deployment will dramatically improve your chance of success. Culture, silos, politics, and experience all significantly impact how well these projects go.
- Size: You need a data center to transform, so this pattern lends itself to very large, large, and sometimes mid-sized enterprises.
- Vertical: This pattern is common across all verticals which meet the size requirements. Five years ago we weren’t seeing it with regulated industries, but cloud computing has long since passed that limitation.
- Speed: These projects tend to move at a snail’s pace. There are a lot of planning cycles, and building baseline cloud infrastructure, before any production workloads are moved. In some cases we see progressive organizations breaking things into smaller projects rather than shoehorning everything into one (or a small number of) cloud environments, but this is uncommon. Multi-year projects are the norm, although more agile approaches are possible.
- Risk: The risk of a security failure is lower due to the slower pace and tighter controls, but there can be a high risk of project failure, depending on approach. Large monolithic cloud environments are highly prone to failure within 18-24 months. Compartmentalized deployments (using multiple accounts, subscriptions, and projects) have a lower chances of major failure.
- Security: Security is engaged early. The risk is that the security team isn’t familiar or experienced with cloud, and attempts to push traditional techniques and tools which don’t work well in cloud.
- Network Ops: Similar to security, networking is involved early. And as with security, the risk is not having the cloud domain knowledge for an effective and appropriate design.
- Tooling: Tooling depends on culture, silos, and politics. There is excellent opportunity to use cloud-native tooling, including existing tools with cloud-native capabilities. But we also see, as in our opening story, frequent reliance on existing tools and techniques which aren’t well-suited to cloud and thus end up causing problems.
- Budget Owner: These projects tend to have a central budget, so shared service teams such as operations, networking, and security may be able to draw on this budget or submit their own requests for additional project funding.
There are two major categories of risks, depending on the overall transformation approach:
- Large, monolithic projects where you set everything up in a small number of cloud environments and try to make them look like your existing data center. These are slow and prone to breaking horribly in 18-24 months. These ‘monocloud’ deployments tend to end poorly. IAM boundaries and service limits are two of the largest obstacles. Agility is also often reduced, which even pushes some teams to avoid the cloud. Costs are also typically higher. Organizations tend to find themselves on this path if they don’t have enough internal cloud knowledge and experience. Both cloud providers and independent consultants usually just nod their heads and say ‘yes’ when you approach them with a monocloud proposal, because they want your business – even though they know the obstacles you will eventually hit.
- Discreet, project-based deployment transformations leverage some shared services, but application stacks and business units have their own cloud accounts/subscriptions/projects (under the central organization/tenant). This cloud-native approach avoids many problems of monolithic deployment, but brings its own costs and complexities. Managing large numbers of cloud environments (hundreds are typical, and thousands are very real) requires deep expertise and proper tooling. The flexible nature of the software defined networks in the cloud is a complex problem, especially when different projects need to talk to each other and back to non-cloud resources which many enterprises never move to the cloud.
Sitting in the back of the conference room, Bill whispered to Clarice, “No way. No forking way. This is NOT going to end well”. “Do they have any friggin’ idea how bad this is?” she replied. “How do they possibly expect us to move 3 entire data centers in 18 months up to Amazon… I can’t even get a VM for testing in less than 3 months!” “I get they want out of our crappy contract before the renewal, but this is insane.” “Heh,” huffed Bill, “Maybe they’ll finally approve those cloud training classes we’ve been asking for.” “Yeah right,” she replied sarcastically, “like they’ll train us instead of throwing cash at some consultant.”
Snap migrations are the worst of all worlds. Massive projects driven by hard deadlines, they are nearly always doomed to some level of failure. In our experiences the decision-makers behind these projects rarely understand the complexity of migrating to cloud, and are overly influenced by executive sales teams and consultants. That isn’t to say they are always doomed to complete failure, but the margins are thin and you will be navigating a tightrope of risks.
There is a subset of this pattern for more limited projects which don’t encompass absolutely everything. For example imagine the same contract renewal drive, but for a subsidiary or acquisition rather than the entire organization. The smaller the scale the lower the risk.
- Size: Mid to large. You need to be big enough to have data centers, but not so big that you own the real estate they sit on. A defining characteristic is that these projects are often driven by contract renewals on hosted or managed data centers. That’s why there’s a hard deadline… management wants out, as much as possible, before they get locked into the next 7-year renewal.
- Vertical: Organizations across all verticals find themselves in hosting contracts which they want out of, so this affects all types of organizations. We even know of projects in highly regulated financial services which you’d think would never accept this level of risk. Government is the least likely, and tends to be driven more by whichever political appointee decides they want to shake things up.
- Speed: 18-24 months for the first phase. We rarely see less than 12 months. Sometimes there will be a shorter initial push to get out of at least some centers as the contract moves into a month-by-month extension phase.
- Risk: As high as it gets in every possible way. Organizations falling into this pattern might have some internal cloud experience, but as a rule not enough people with enough depth to support the needed scale. There is heavy reliance on outside help, but few consulting firms (or cloud providers themselves) have a deep bench of solid experts who can avoid all the common pitfalls.
- Security: Security is engaged somewhat but can’t do anything to slow things down. They are also typically tasked with building out their own shared services so likely don’t have the manpower to evaluate individual projects. They tend to trail behind deployments, trying to assess and clean things up after the fact. They often get to set a few hardline policies up front (typically relating to Internet accessibility), but until they stand up their own monitoring and enforcement capabilities, things slip through the cracks.
- Network Ops: There is a bit more variability here, depending on the deployment style. If there is a monocloud (or small number of environments), networking is typically engaged early and plays a very strong role in getting things set up. They are tasked with configuring the larger pipes needed for such large migrations. The risk is that they often lack cloud experience, and introduce designs which work well in a data center but fit poorly in cloud deployments.
- Tooling: Panic is the name of the game. The initial focus is on the tools at hand, and vendors already in place, combined with cloud-native tools. We hate to say it, but this can be deeply influenced not only by culture but by which consultants are already in the door. Eventually the project starts introducing more cloud-native tooling to solve specific problems. For example in projects we’ve seen, visibility (cloud assessment and mapping) tools tend to be early buys.
- Budget Owner: This can be poorly defined, but often pulls from a central IT budget or specially designated project budget. Whoever controls the money has the most influence on the project. The chances of success go up when all teams are properly funded and staffed. Also, water is wet.
Risks abound, but we can categorize them based on project characteristics:
- As any IT pro knows, every project of this scale runs over time and budget.
- There is often a reliance on outside contractors who push things along quickly, but don’t know (or care) enough to have a sense of the enterprise risk. Their job is to get things moved – not necessarily to do so the safest way. This can lead to exposure as they accept risks a company employee might avoid.
- Security often lacks general cloud security knowledge, as well as provider and platform experience. They can build this but it takes time, and in the process the organization will likely accumulate technical security debt. For example two of the most common flaws we find on assessments are overly-privileged IAM and poorly segregated networks.
- Rapidly designing a cloud network at scale is difficult and complex, especially for a team who is still keeping the existing environment running, and (like security) probably lacks deep cloud experience. We often see one or two members of a team tasked as cloud experts, but this really isn’t enough. With the time constraints of the project, the network often ends up poorly compartmentalized, and projects tend to be shoveled into shared VPCs/VNets in ways which later run up against service limits, performance problems, and other constraints.
Native New Build
John was actually excited when he walked into the meeting room. It had been a long time since he got the chance to stretch his security-creativity muscles. He nodded to Maria from networking as he pulled out an open Aeron-knockoff chair and dropped his new matte-black Surface on the conference table. He still wasn’t sure what stickers to throw on it, but after a little burn-in period during the Azure training he and Maria just finished up, he was getting used to working off an underpowered device. He still had his old desktop for handling all the data center thick clients, but for this Azure project all he needed was a web browser and some PowerShell. Although he kind of envied the consultants with their brand new MacBooks. “Hey everyone,” Wendy started, “we have a tight timeline but we finally have approval for the new Azure subscription. Maria, can you get the network connected up?” “Actually, I don’t think we need to. John signed off on using JIT connections and client VPNs instead of requiring a dedicated backhaul. We went through the architecture and there aren’t any dependencies on internal resources. We know we’ll need more of a hybrid design for the CRM project but we are free and clear for this one.”
Native new build projects are true cloud-native deployments. That doesn’t mean they are all brand new projects – this pattern also includes refactors and rearchitectures, so long as the eventual product design is cloud native. These may also include hybrid deployments – the new build may still need connections back to the premises.
- Size: All sizes. In a large enterprise this will likely be a designated project or a series of projects (especially in a “new to cloud” organization). In a small startup the entire company could be a new build.
- Vertical: All verticals. Even government and highly regulated industries. We have worked on these projects with financials, state governments, and even pubic utilities.
- Speed: As “fast as DevOps”. We don’t mean that facetiously – some teams are faster and some slower, but we nearly always see DevOps techniques used, and they often define the overall project pace. These are developer-driven projects.
- Risk: We will talk more about risk in a moment, but here we just note that risk is highly variable, and depends on the skills and training of the project team.
- Security: Unlike our previous example, security may be late to the project. There is usually an inflection point, when the project is getting close to production, when security gets pulled in. Before that the developers themselves manage most security. This improves over time as the organization is more likely to struggle in this area on their early projects, but starts integrating security earlier over time, as more and more moves to cloud, and skills and staffing improve.
- Network Ops: Networking is more likely to be engaged early if there are hybrid connectivity requirements, or may not be involved at all, depending on the overall architecture. These days we see a growing number of serverless (or mostly serverless) deployments where there isn’t even a project network and all the components talk to each other within the “metastructure” of the cloud management plane.
- Tooling: Typically newer, cloud native, and often CSP provided tools. Quite a few of these projects start in their own cloud silo and use the cloud provider’s tooling, but as more of these projects deploy there is an increased requirement for central management and shared services (such as security assessment) to be added. We do sometimes see development teams forced to use traditional on-premise tools, but this tends to be culturally driven, as it isn’t usually the best solution to the problems at hand.
- Budget Owner: Project based. Once you do enough of these there will also be shared services budgets for teams like security and networking.
Despite our optimistic opening, these projects also come with their own risks. Cloud native deployments aren’t risk-free by any stretch of the imagination – they just carry different risks. The project may be well-segregated in its deployment environment (an Azure subscription in our example) but that doesn’t mean developers won’t be over-provisioned. Or that a PaaS endpoint in the VNet won’t ignore the expected Network Security Group rules (yes, that happens).
- This pattern can carry all the risks of the developer-led pattern if it is poorly governed. We have seen large organizations running dozens or hundreds of these projects, all poorly governed, each carrying tons of risks. If you read about a big cloud breach at a client who was proudly on stage at their cloud provider’s conferences, odds are they are poorly governed internally.
- Cloud native services have different risks which take time to understand and learn to manage. In our data center migration patterns there is less reliance on the latest and greatest “serverless this, AI that”, so traditional security and management techniques can be more effective. With native new builds you may be using services the cloud provider itself barely understands.
- Friction between security and the project team can badly impact the final product. Overly proscriptive security pushes teams for workarounds. Early partnering, ideally during initial development of the architecture, with security pros trained on the cloud platform, reduces risk.
- Managing a lot of these projects at scale is really really hard. Setting up effective shared services and security and network support (especially when hybrid or peered networks are involved) take deep expertise. The cloud providers are often terrible at helping you plan for this – they just want you to move as many workloads onto them as quickly as possible.
This is the first post in a series on “Network Operations and Security Professionals’ Guide to Managing Public Cloud Journeys” which we will release as a white paper after completion and time for public feedback. Special thanks to Gigamon for licensing. As always, the content is being developed completely independently using our Totally Transparent Research methodology.
Cloud computing is different, disruptive, and transformative. It has little patience for traditional practices and existing architectures. Cloud requires change, and while there is a growing body of documentation on the end states you should strive for, there is a lack of guidance on how to get there. Cloud computing may be a journey, but it’s one with many paths to what is often an all to nebulous destination.
Although every individual enterprise has different goals, needs, and capabilities for their cloud transition, our experience and research has identified a series of relatively consistent patterns. One way to think of moving to cloud is as a mountain with a single peak and everyone starting from the same trailhead. But this simplistic view, which all too often underlies conference presentations and tech articles, fails to capture the individual nature of the opportunities and challenges you will face. On the other end we can think of it as staring at a mountain range with innumerable peaks, starting points, and paths… and a distinct lack of an accurate map. This is the view that ends up with hands thrown in the air, expressions of impossibility, and analysis paralysis.
But our research and experience lands us between those two extremes. Instead of a single constrained path that doesn’t reflect individual needs, or totally individualized paths that require you to build everything and learn every lesson from scratch, we see a smaller set of options with consistent characteristics and experiences. Think of it as starting from a few trailheads, landing on a few peaks, and only dealing with a handful of well-marked trails. Knowing these won’t cover every option, but can be a surprisingly useful way to better structure your journey, move up the hill more gracefully, and avoid falling off some really REALLY sharp cliff edges.
Introducing the Cloud Adoption Patterns
Cloud adoption patterns represent a consolidated set of cloud adoption journeys compiled through discussions with hundreds of enterprises, and dozens of hands-on projects. More nebulous than specific cloud controls, they are a general way of predicting and understanding the problems an organization will face when moving to cloud based on their starting point and destination. These patterns have different implications across functional teams and are especially useful for network operations and network security since the adoption pattern tends to fairly accurately predict many of the architectural implications, when then map directly to management processes.
For example, there are huge differences between a brand new startup or cloud project without any existing resources, a major data center migration, or a smaller migration of key applications. Even a straight up lift and shift migration will be extremely different if it’s a one-off or smaller project or if it’s wrapped up in a massive data center moved with a hard cutoff deadline thanks to an existing hosting contract expiring. Both cases take an existing application stack and move them to cloud, but the different scope and time constraints dramatically affect the actual migration process.
We’ll cover them in more detail in the next post, but the four patterns we’ve identified are:
- Developer lead- where a dev team starts building something in cloud outside normal processes and pulls the rest of the organization behind them.
- Datacenter transformation- an operations lead process defined by an organization planning a methodical migration out of existing datacenter and into the cloud, sometimes over a decade or more.
- Snap migrations- where an enterprise is forced out of some or all of their datacenters on a short timeline due to contract renewals or other business drivers.
- Native new build- where the organization plans to build a new application (or applications) completely in the cloud using native technologies.
You likely noticed that we didn’t mention some common terms like “refactor” and “new to cloud”. Those are important concepts but we consider those options on the journey, not the definition of the journey itself. Our four patterns are defined by the drivers for your cloud migration and your desired end-state.
Using the Cloud Adoption Patterns
The adoption patterns provide a framework to think about your upcoming (or in process) journey and help identify both strategies for success and potential points of failure. These aren’t proscriptive like the Cloud Security Maturity Model or the Cloud Controls Matrix; they won’t tell you exactly which controls to implement, but are instead more helpful in choosing a path, defining priorities, mapping architectures, and adjusting processes.
Going back to our mountain climbing analogy, the cloud adoption patterns point you down the right path and help you decide which gear to take, but it’s still up to you to load your pack, know how to use the gear, plan your stops, and remember to wear sunscreen. The patterns represent a set of characteristics we consistently see based on how organizations move to cloud. Any individual organization might experience multiple patterns across different projects. For example, a single project might behave more like a startup while you may concurrently be running a larger data center migration.
In our next post we will detail the patterns with their defining characteristics. You can use this to determine your overall organizational journey as well as those for individual projects that might not be totally aligned with the organization at large. To help you better internalize these patterns we will provide fictional examples based on our real experiences and projects. Once you know which path you are on, our final sections will make top line recommendations for network operations and security and tie back to our examples to show how they play out in real life. We will also highlight the most common pitfalls and their potential consequences.
This research should help you better understand what approaches will work best for your project in your organization. We are focusing this first round on networking, but future work will build on this based and cover additional operational areas.
Cloud migrations may be tough and confusing. While nothing will let you skip over the hard work, learning the lessons of those that already climbed the mountain will save costs, reduce frustrations, and increase the chances of success.
Posted under: General
For Rich and me, it seems like forever that we’ve been doing this cloud thing. We previewed the first CCSK class back at RSAC 2011, so we’re closing in on 10 years of hands-on, in the weeds cloud stuff. It’s fundamentally changed Securosis, and we ended up as founders of DisruptOps as well.
Yet as the cloud giveth, it also taketh away. Adrian’s unique perspective on application and cloud security made him a great candidate to join Bank of America, so he did. It’s a great opportunity, but we’ll certainly miss having him around during RSAC week. Especially since it means I’ll have to get the aspirin and Tums for you derelicts.
But that’s not this year’s DRB theme. We picked (IM)MATURITY because it’s hard to keep in mind that we’re still in the very early stages of the cloud disruption on IT. The questions we get now are less _ “what is this cloud thing?”_ and more ” what does the journey look like?” We didn’t have a decent answer, so we set out to find one.
That led us to partner with our friends at IANS to develop a Cloud Security Maturity Model that gives you a sense of the cloud journey and helps you understand how to increase your cloud security capability and maturity. At this year’s DRB, we’ll have the model on hand, as well as an online diagnostic where you can do a self-assessment against the model. We may even have a new strategic relationship to announce at the breakfast. (hush, hush – don’t tell anyone)
Let’s celebrate both our maturity in the security space (yes, this is my 24th RSA Conference) while acknowledging the immaturity of securing the cloud by once again holding the most kick-ass breakfast of the year. There are breakfast impostors now, filling up the TableTop Tap House on the other mornings of the conference. But we are the true breakfast innovators of the security industry. There can be only ONE Disaster Recovery Breakfast.
Kidding aside, the DRB happens only because of the support of our long-time supporters LaunchTech and DisruptOps, and our media partner Security Boulevard. We’re excited to welcome IANS, Cloud Security Alliance, Highwire PR, and AimPoint Group to the family as well. Please make sure to say hello and thank them for helping support your recovery.
As always, the breakfast will be Thursday morning of RSA Week (February 27) from 8-11 at Tabletop Tap House in the Metreon (fka Jillian’s). It’s an open door – come and leave as you want.
The breakfast spread will be awesome (it always is), and the bar will be open. I am still drinking Decaf, but I’ve traded in my Bailey’s for a little Amarula after sampling the nectar on my recent trip to South Africa.
Please always always always remember what the DR Breakfast is all about. There will be no spin, no magicians, and we’re pretty sure Rich will keep his pants on -– it’s just a place to find a bit of respite amongst the maelstrom of RSAC.
To help us estimate numbers, RSVP to rsvp (at) securosis (dot) com.
I never thought I would say this, but I am leaving Securosis. By the time you read this I will have started a new position with Bank of America. I have been asked to help out with application and cloud security efforts. I have been giving a lot of thought to what I like to do, what makes me happy, and what I want to do with the rest of my career, and I came to the realization it is time for a change. There are aspects of the practice of security which I can never explore with Securosis or DisruptOps. The bank offers many challenges – and operates at a scale – which I have never experienced. That, and I will get to work with a highly talented team already in place. I could not really have written a better job description for myself, so I am jumping at this opportunity.
12 years ago I sat down with Rich to discuss, “What comes next?” when the demise of my former company was imminent. When he asked me to consider joining him, I asked, “What exactly is it you do?” I really didn’t know what analysts actually do, nor how the profession works. Mike will tell you I still don’t, and he is probably right. When I joined, friends and associates asked, “What the hell are you doing?” and said, “That is not what you are good at!”, and told me Securosis would never survive – there was no way we could compete against the likes of Gartner, Forrester, and IDC. But Securosis has been an amazing success. After a few years I think we proved that “No Bullshit” research, and our open Totally Transparent Research model, both work. We have been very lucky to have found such a workable niche, and had so much fun filling it.
But more than anything I am very thankful for being able to work with Mike and Rich over the last decade. I simply could not ask for better business partners. Both are smart to the point of being prescient, tremendously supportive, and have an intuitive grasp of the security industry. And when we get it wrong – which have have done more than we like to admit – we learn from our mistakes and move on. I recently read a quote from Chuck Akre: “Good judgement comes from experience, and experience comes from bad judgement.” We always wondered how three guys with big egos and aspiration could coexist, but I think our ability to work as a team has been due in large part to learning from each other, learning from our mistakes, and constantly trying to improve the business. And we have constantly pushed this business to improve and move in new directions: from pure research, to training, to the research Nexus, to cloud security, security consulting, investment due diligence, and eventually launching DisruptOps. Each and every change was to address something in the industry we felt was not being served. Even the Disaster Recovery Breakfast stems from that ethos – another way we wanted to do things differently. And we have gained a lot of – ahem – experience along the way. It has been one hell of a ride! Thank you, Rich and Mike.
A hearty thank you to Chris Pepper for enduring my writing style and lack of grammar all these years. Early on one reader went so far as to compare my writing to Nazi atrocities against the English language, which was uncalled for, but perhaps not so far off the mark. Chris has helped me get a lot better, and for that I am very grateful.
Finally, I wanted to thank all the readers of the posts, Friday Summaries, Incites, and research papers we have produced over the years. Some 15,000 blog posts, hundreds of research papers, and more webcasts than I can count. The support of the security community has made this work so rewarding. Thank you all for participating and helping us make Securosis a success.
Posted under: For Research Library
Today we are launching our 2019 updated research paper from our recent series, Understanding and Selecting RASP (Runtime Application Self-Protection). RASP was part of the discussion on application security in just about every one of the hundreds of calls we have taken, and it’s clear that there is a lot of interest – and confusion – on the subject, so it was time to publish a new take on this category. And we would like to heartily thank you to Contrast Security for licensing this content. Without this type of support we could not bring this level of research to you, both free of charge and without requiring registration. We think this research paper will help developers and security professionals who are tackling application security from within understand what other security measures are at their disposal to protect application stacks from attack.
And to be honest we were surprised by the volume of questions being asked. Each team was either considering RASP, or already engaged in a proof-of-concept with a RASP vendor. This was typically in response to difficulties with existing Web Application Firewalls (WAF) as those platforms have not fared well as development has gotten more agile. Since 2017 we have engaged in over 250 additional conversations on what has turned into a ‘DevSecOps’ phenomena, with both security and development groups asking about RASP, how it deploys and the realistic benefits it might provide. And make no mistake, it was not just IT security asking about WAF replacements, but security and development – facing a mountain of ‘technical debt’ with security defects – asking about monitoring/blocking malicious requests while in production.
In this paper we cover what RASP is, how it works, use cases and how to differentiate one RASP product from another. And we address the perspectives and specific concerns of IT, application security and developer audiences.
Again, thank you to Contrast Security for licensing this research. You can download from the attachment link below, or from the research library. And you can tune into our joint webcast on November 19 by registering here: Evaluating RASP Platforms.
Posted under: Heavy Research
As mentioned in an earlier section, DevOps is not all about tools and technology, but much of its success is how people work within this model. We have already gone into great detail about tools and process, and we approached much of the content from the perspective of security practitioners getting onboard with Devops. And since this paper is more geared towards helping security folks, here we outline their role in a DevOps environment. We hope this summation assists you in working with these other teams and reducing friction.
And while we have intentionally called this research paper Enterprise DevSecOps, keep in mind your development and IT counterparts think there is no such thing. To them security becomes part of the operational process of integrating and delivering code. We call security out as a separate thing because, while woven into the DevOps framework, it’s substantially more difficult for security personnel to fit in. You need to look at how you can improve delivery of secure code without waste and without introducing bottlenecks in a development process you may not be intimately familiar with. The good news is that security fits nicely within a DevOps model, but you need to tailor things to work within the automation and orchestration employed by your organization to be successful.
The Security Pro’s Responsibilities
Learn the DevOps model: We have not even touched the theory and practice of DevOps in this paper. There is a lot for you to learn on base concepts and practices. To work in a DevOps environment you need to understand what it is and how it works. You need to understand cultural and philosophical changes, and how they effect process, tooling and priorities. You will need to understand your companies approach to best integrate security tooling and metrics. Once you understand the mechanics of the development team, you’ll have a better idea of how to work with them, in context of their process.
Learn how to be agile: Your participation in a DevOps team means you need to fit into DevOps — not the other way around. The goal of DevOps is fast, faster, fastest: small iterative changes that offer quick feedback, ultimately reducing Work In Progress (WIP). Small, iterative changes to improve security fit this model. You will prioritize things that make the delivery of secure software over the delivery of new features, and that is often a huge undertaking to change a longstanding culture of ‘feature first’. You need to adjust requirements and recommendations so they fit into the process, often simplified into small steps, with enough information for the tasks to be both automated and monitored. You can recommend manual code reviews or fuzz testing, so long as you understand where they fit within the process, and what can — and cannot — gate releases.
Educate: Our experience shows that one of the best ways to bring a development team up to speed in security is training: in-house explanations or demonstrations, third-party experts to help with application security or threat modeling, eLearning, or various commercial courses. The historical downside has been cost, with many classes costing thousands of dollars. You’ll need to evaluate how best to use your resources — the answer typically includes some eLearning for all employees, and select people attending classes and then teaching peers. On-site experts can be expensive, but an entire group can participate in training.
Grow your own support: There is simply no way for security teams to scale out their knowledge without help. This does not mean hundreds of new security employees, it means deputizing developers to help with product security. Security teams are typically small and outnumbered by developers 100:1. What’s more, security people are not present at most development meetings, so they lack visibility in day-to-day agile activities. To help extend the reach of the security team, see if you can get someone on each development team – what is called a ‘security champion’ – to act as a security advocate. This helps not only extend the security team’s reach, but also expand security awareness.
Help DevOps team understand threats: Most developers don’t fully grasp how attackers approach attacking a system, or what it means when a code or SQL injection attack is possible. The depth and breadth of security threats is outside their experience, and most firms do not teach threat modeling. The OWASP Top Ten is a good guide to the types of code deficiencies that plague development teams, but you should map these threats back to real-world examples, show the
damage that a SQL injection attack can cause, and explain how a Heartbleed type vulnerability can completely expose customer credentials. Think of these real-world use cases as ‘shock and awe’, which goes a long way to help developers
Advise on remediation practices: Your security program is inadequate if it simply says to “encrypt data” or “install WAF”. All too often, developers and IT have a singular idea of what constitutes security, centered on a single tool they want to set and forget. Help build out the elements of the security program, including both in-code enhancements and supporting tools. Teach how those each help to address specific threats, and offer help with deployment and policy setup. In the past, firms used to produce code style guides to teach younger developers what properly written code looks like. Typically these are not on line. Consider a wiki page for security coding practices and other reference materials that are easily discovered and easily readable by people who do not have a security background.
Help evaluate security tools: It is unusual for people outside security to fully understand what security tools do, or how they work. So you can help in two ways; first, help developers select tools. Misconceptions are rampant, and not just because vendors over-promise capabilities. Additionally it is uncommon for developers to evaluate code scanners, activity monitors, or even patch management systems effectiveness. In your role as advisor it is your responsibility to help DevOps understand what the tools can provide and what fits within your testing framework. Sure, you might not be able to evaluate the quality of the API, but you can tell when a product fails to deliver meaningful results. Second, you should help position the expenditure as its not always clear to the people holding the purse strings how specific tools address security and compliance requirements. You should specify functional and reporting requirements for the tool that meet the business needs.
Help with priorities: During our research we had many security pros tell us that all vulnerabilities started looking like high priorities, and it was incredibly difficult to differentiate a vulnerability that had impact on the organization and one that did not. The field of exposure analysis is outside the skill set of developers. You need to help fill this gap because not every vulnerability poses real risk. And security folks have a long history of
sounding like the terrorism threat scale, with vague warnings about “severe risk” and “high threat levels”. None of these warnings are valuable without mapping a threat to possible exploitations, what the exploit might mean to the business, or what you can do to address — and reduce — risks. For example you might be able to remediate a critical application vulnerability in code, patch supporting systems, disable the feature if it’s not critical, block with IDS or firewalls, or even filter with WAF or RASP technologies. Or cases where code exploitation cannot actually harm the business, so ‘do nothing’ is the right answer. Rather than the knee-jerk “OMG! Fix it now!” reaction we have historically seen, there are typically several options to address a vulnerability, so presenting tradeoffs to a DevOps team allows them to select the best fit.
Write tests: DevOps has placed some operations and release management personnel in the uncomfortable position of having to learn to script, code, and place their work open for public review. It pushes people outside their comfort zones in the short term, but that is a key part of building a cohesive team in the medium term. It is perfectly acceptable for security folks to contribute tests to the team: scans that validate certificates, checks for known SQL
injection attacks, open source tools for locating vulnerabilities, and so on. If you’re worried about it, help out and integrate unit and regression tests. Integrate and ingratiate! To participate in a DevOps team, where automation plays a key part, it’s likely you will need to know how to write scripts or templates. The good news is that your policies are now embodied into the definition of the environment. Don’t be afraid that you don’t know source code control or the right format for the scripts; this is an area where developers are usually keen to assist. Learning a bit on the scripting side before your tests can be integrated into the build and deployment servers, but you’ll do more than preach security — you can contribute!
Posted under: Heavy Research
In this section we show you how to weave security into the fabric of your DevOps automation framework. We are going to address the questions “We want to integrate security testing into the development pipeline, and are going to start with static analysis. How do we do this?”, “We understand “shift left”, but are the tools effective?” and “What tools do you recommend we start with, and how do we integrate them?”. As DevOps encourages testing in all phases of development and deployment, we will discuss what a build pipeline looks like, and the tooling appropriate for stage. The security tests typically sit side by side with functional and regression tests your quality assurance teams has likely already deployed. And beyond those typical post-build testing points, you can include testing on a developer’s desktop prior to check-in, in the code repositories before and after builds, and in pre-deployment staging areas.
During a few of the calls we had, several of the senior security executives did not know what constituted a build process. This is not a condemnation as many people in security have not participated in software production and delivery, so we want to outline the process and the terminology used by developers. If you’re already familiar with this, skip forward to ‘Building s Security Toolchain’.
Most of you reading this will be familiar with “nightly builds”, where all code checked in the previous day is compiled overnight. And you’re just as familiar with the morning ritual of sipping coffee while you read through the logs to see if the build failed and why. Most development teams have been doing this for a decade or more. The automated build is the first of many steps that companies go through on their way toward full automation of the processes that support code development. Over the last several years we have mashed our ‘foot to the floor’, leveraging more and more automation to accelerate the pace of software delivery.
The path to DevOps is typically taken in two phases: first with continuous integration, which manages the building and testing of code; then continuous deployment, which assembles the entire application stack into an executable environment. And at the same time, there are continuous improvements to all facets of the process, making it easier, faster and more reliable. It takes a lot of work to get here, and the scripts and templates used often take months just to build out the basics, and years to mature them into reliable software delivery infrastructure.
The essence of Continuous Integration (CI) is developers regularly check-in small iterative code advances. For most teams this involves many updates to a shared source code repository, and one or more builds each day. The key is smaller, simpler additions, where we can more easily and quickly find code defects. These are essentially Agile concepts, implemented in processes which drive code, rather than processes that drive people (such as scrums and sprints). The definition of CI has evolved over the last decade, but in a DevOps context CI implies that code is not only built and integrated with supporting libraries, but also automatically dispatched for testing. Additionally, DevOps CI implies that code modifications are not applied to branches, but directly into the main body of the code, reducing the complexity and integration nightmares that can plague development teams.
This sounds simple, but in practice it requires considerable supporting infrastructure. Builds must be fully scripted, and the build process occurs as code changes are made. With each successful build the application stack is bundled and passed along for testing. Test code is built before unit, functional, regression, and security testing, and checked into repositories and part of the automated process. Tests commence automatically whenever a new application bundle is available, but it means the new tests are applied with each new build as well. It also means that before tests can be launched test systems must be automatically provisioned, configured, and seeded with the necessary data. Automation scripts must provide monitoring for each part of the process, and communication of success or failure back to Development and Operations teams as events occur. The creation of the scripts and tools to make all this possible requires Operations, Testing and Development teams to work closely together.
The following graphic shows an automated build pipeline, including security test points, for containers. Again, this level of orchestration does not happen overnight, rather an evolutionary process that take months to establish the basics and years to mature. But that is the very essence of continuous improvement.
Continuous Deployment looks very similar to CI, but focuses on the releasing software to end users
rather than building it. It involves similar packaging, testing, and monitoring tasks, with some additional wrinkles. Upon a successful completion of a build cycle, the results feed the Continuous Deployment (CD) process. CD takes another giant step forward for automation and resiliency but automating the release management, provisioning and final configuration for the application stack, then launches the new application code.
When we talk about CD, there are two ways people embrace this concept. Some teams simply launch the new version of their application into an existing production environment. The CD process automates the application layer, but not the server, data or network layers. We find this common with on-premise applications and in private cloud deployments, and some public cloud deployments still use this model as well.
A large percentage of the security teams we spoke with are genuinely scared of Continuous Deployment. They state “How can you possibly allow code to go live without checks and oversight!”, missing the point that the code does not go live until all of the security tests have been passed. And some CI pipelines contain manual inspection points for some tests. In our experience, CD means more reliable and more secure releases. CD addresses dozens of issues that plague code deployment — particularly in the areas of error-prone manual changes, and discrepancies in revisions of supporting libraries between production and development. Once sufficient testing is in place, there should be no reason to mistrust CD.
Not all firms release code into production every day; in fact less than 10% of the firms we speak with truly release continually, but some notable companies like Netflix, Google and Etsy have automated releases once their tests have completed. But most firms (i.e.: those not in the content or retail verticals) do not have a good business need to release updates multiple times a day, so they don’t.
Managed / Blue-Green Deployments
Most firms have a slower release cycle, often with a ‘go live’ cadence of every one to three sprints. We call these ‘managed releases’ as the execution and timing is manual, but most of the actions are automated. Plus these firms employ another very powerful technique: Automated infrastructure deployments. This is where you cycle the entire infrastructure stack along with the application. These types of deployments rely upon automating the environment the software runs in; this can be as simple as standing up a Kubernetes cluster, or leveraging Openshift to run Terraform templates into Google GCP, or launching an entire AWS environment via Cloudformation templates. The infrastructure and the application are all code, and so both are launched in tandem. This is becoming common in public cloud deployments.
But this method of release offers significant advantages and provides the foundation from what is called ‘Blue-Green’ or Red-Black deployment. Old and new code run side by side, looking close to mirror images, each on their own set of servers. While the old code (i.e.: Blue) continues to serve user requests, new code (i.e.: Green) is exercised only by select users or test harnesses. A rollout is a simple redirection at the load balancer level, and internal users and live customers can be slowly re-directed to the Green servers, essentially using them as testers for the new system. If the new code passes all required tests, the load balancer sends all traffic to the Green servers, the Blue servers are retired, and Green then becomes the new Blue. If errors are discovered, the load balancers are pointed back to the old ‘Blue’ code until a new patched version is available. This is essentially pre-release testing in production, and near instantaneous rollback in the even there are defects or security problems discovered.
Where to Test
Desktop Security Tests
Integrated Development Environments, or IDEs, are the norm for most developers. Visual Studio, Eclipse, IntelliJ and many more, some tailored to specific languages or environments. These desktop tools not only help with building code, but they integrate syntax checkers, runtimes, terminals, packages, and lots of other features to make building code easier. Commercial security vendors have plug-ins for the popular tools, usually providing a form of static analysis. Sometimes these tools are interactive, giving advice as code is written, while others check currently modified files on demand, before code is committed. There are even one or two that do not actually ‘plug in’ to the IDE, but work as a standalone tool and run prior to checkin. As the code scans are typically just the module or container that the developer is working on, code scans happen very quickly. And getting the security scans right at this stage lowers the likelihood that the build server will find and fail on security issues after code is checked in.
The development teams we have worked with that use these tools find them to be effective and deliver as promised. Many individual developers we speak with do no like many of these security plug ins as they find them noisy and distracting. We recommend the use of these desktop scanners when possible but recognize the cultural impediment to use, and caution that security teams may need to help change the culture and allow adoption to grow over time.
Code Repository Scanning
Source code management, configuration management databases, container registries and similar types of tools store code and help with management tasks like versioning, IAM and approval processes. From a security standpoint, several composition analysis vendors have integrated their products to check that open source versioning is correct and that the platforms in use do not contain known CVEs. Additional facilities to create digital fingerprints for known good version and other version control features are becoming more common.
Build Server Integration
Build servers build applications. Often comprised of multiple sources of in-house developed and open source code, it is common for many ‘artifacts’ to be used in the construction of an application. Build servers like Jenkins and Bamboo fortunately have the hooks needed to massage these artifacts, both before and after a build. This is commonly how testing is integrated into the build pipeline. Leveraging this capability will be central to your security test integration. Composition analysis, SAST and custom tests are commonly integrated at this stage.
For ‘code complete’ or systemic testing, where all of the parts of the application and the supporting application stack are assembled, a pre-production ‘staging area’ is setup to mimic the production environment and facilitate a full battery of tests. We are finding several trends of late, and pre-production testing is one of them. As public cloud resources allow for rapid elasticity and on-demand resource procurement, firms are spinning up test environments, running their QA and security tests, then shutting them back down again to reduce costs. In some cases this is used to do DAST testing that used to be performed in production. And in most cases this is the means for Blue-Green deployment model is leveraged to run many different types of testing on anew environment parallel to the existing production environment.
Building a Security Toolchain
Static Application Security Testing (SAST) examines all code — or runtime binaries — to support a thorough search for common vulnerabilities. These tools are highly effective at finding flaws, even in code that has been manually reviewed. You selection criteria will likely boil down to speed of scan, ease of integrations, readability of the results and lack of false positives. Most of these platforms have gotten much better at providing analysis that is useful for developers, not just security geeks. And many of the products are being updated to offer full functionality via APIs or build scripts. If you have a choice, select tools with APIs for integration into the DevOps process, and which don’t require “code complete”. We have seen a slight reduction in use of these tests, as they often take hours or days to run — in a DevOps environment that can prevent them from running inline as a gate to certification or deployment. As we mentioned in the above under ‘Other’, most teams are adjusting to support out-of-band — or what we are calling ‘Parallelized’ — testing for static analysis. We highly recommend keeping SAST testing inline if possible, and focus on new sections of code to reduce runtime.
Rather than scanning code or binaries like SAST, Dynamic Application Security Testing (DAST) dynamically ‘crawls’ through an application’s interface, testing how it reacts to various inputs. These scanners cannot see what’s going on behind the scenes, but they offer valuable insight into how code behaves, and can flush out errors which other tests may not see in dynamic code paths. The good news is they tend to have low rates of false positives. These tests are typically run against fully built applications, and can be destructive, so the tools often offer settings to run more aggressively in test environments. And like SAST may require some time to fully scan code, so in line tests that gate a release are often run against new code only, and full application sweeps are run ‘in parallel’.
Composition and Vulnerability Analysis
Composition Analysis tools check versions of open source libraries to assess open source risk, both with security vulnerabilities and potential licensing issues. Things like Heartbleed, misconfigured databases, and Struts vulnerabilities may not be part of your application testing at all, but they all critical application stack vulnerabilities. Some people equate vulnerability testing with DAST, but there are other ways to identify vulnerabilities. In fact there are several kinds of vulnerability scans; some look settings like platform configuration, patch levels or application composition to detect known vulnerabilities. Make sure you broaden you scans to include your application, your application stack, and the open source platforms that support it.
Manual Code Review
Some organizations find it more than a bit scary to fully automate deployment, so they want a human to review changes before new code goes live — we understand. But there are very good security reasons for review as well. In an environment as automation-centric as DevOps, it may seem antithetical to use or endorse manual code reviews or security inspection, but manual review is still highly desirable. Some types of vulnerabilities are not part of scanning tools. Manual reviews often catch obvious stuff that tests miss, and developers can miss on their first (only) pass. And developers’ ability to write security unit tests varies. Whether through developer error or reviewer skill, people writing tests miss stuff which manual inspections catch. Your tool belt should include manual code inspection — at least periodic spot checks of new code or things like Dockerfile which are often omitted from scans.
Many firms still leverage Web Application Firewalls, but usage is on the decline. We are seeing increased usage of Runtime Application Self Protection in production to augment logging and protection efforts. These platforms instrument code, provide runtime protection, and in some cases identify which lines of application code are vulnerable.
DevOps requires you have better monitoring in order to collect metrics so you can make adjustments based upon operational data. To validate that the new application deployments are functioning, monitoring and instrumentation are more commonly built in. In some cases these are custom packages and ELK stacks, in others it is as simple as leaving logging and ‘debug’ style statements – traditionally used during the development phase – turned on in production. This is more prominent in public cloud IaaS, where you are fully responsible for data and application security where native logs do not provide adequate visibility.
Security Unit Tests
Unit testing is where you check small sub-components or fragments (‘units’) of an application. These tests are written by programmers as they develop new functions, and commonly run by developers prior to code check-in, but possibly within the build pipeline as well. These tests are intended to be long-lived, checked into the source repository along with new code, and run by every subsequent developers who contributes to that code module. For security these may straightforward — such as SQL injection against a web form — or more sophisticated attacks specific to the function under test, such as business logic attacks — all to ensure that each new bit of code correctly reflects the developers’ intent. Every unit test focuses on specific pieces of code — not systems or transactions. Unit tests attempt to catch errors very early in the process, per Deming’s assertion that the earlier flaws are identified, the less expensive they are to fix. In building out unit tests you will need to support developer infrastructure to embody your tests, and also to encourage the team to take testing seriously enough to build good tests. Having multiple team member contributes to the same code, each writing unit tests, helps identify weaknesses a single programmer might not consider.
Security Regression Tests
A regression test verifies that recently changed code still functions as intended. In a security context this is particularly important to ensure that vulnerabilities remain fixed. DevOps regression tests are commonly run parallel to functional tests — after the code stack is built out. And in some cases this may need to be in a dedicated environment, where security testing can be destructive and cause side effects that are unacceptable in production servers with real customer data. Virtualization and cloud infrastructure are leveraged to expedite instantiation of new test environments. The tests themselves are typically home-built test cases to exploit previously discovered vulnerabilities, either as unit or systemic tests. The author uses these types of tests to ensure that credentials like passwords and certificates are not included in files, and that infrastructure does not allow port 22 or 3389 access.
Chaos engineering is where random failures are introduced, usually in the production environment, to see how application environments handle adverse conditions. Firms like Netflix has pioneered efforts in this field in order to force their development teams to understand common failure types, and build graceful failure and recovery into their code. From a security standpoint, if an attacker can force an application into a bad state, they can often coerce the application to perform tasks it was not intended to take. Building ruggedness into the code improves reliability and security.
At its simplest fuzz testing is essentially throwing lots of random garbage at applications, seeing whether any particular (type of) garbage causes errors. Many dynamic scanning vendors will tell you they provide fuzzing. They don’t. Go to any security conference — Black Hat, DefCon, RSA, or B-Sides — and you will see that most security researchers prefer fuzzing to find vulnerable code. It has become essential for identifying misbehaving code which may be exploitable. Over the last 10 years, with Agile development processes and even more with DevOps, we have a steady decline in use of fuzz testing by development and QA teams. This is because it’s slow; running through a large test body of possible malicious inputs takes substantial time. This is a little less of an issue with web applications because attackers can’t throw everything at the code, but much more problematic for applications delivered to users (including mobile apps, desktop applications, and automotive systems). We almost excluded this section as it is rare to see true fuzz testing in use, but for critical systems, periodic – and out of band – fuzz testing should be part of your security testing efforts.
Risk and Exposure Analysis
Integrating security findings from application scans into bug tracking systems is not that difficult technically. Most products offer it as a built-in feature. The hard part is figuring out what to do with the data once obtained. Is a discovered security vulnerability a real issue? If it is not a false positive, can the vulnerability be exploited? What is its criticality and priority, relative to everything else? And if we choose to address it, how should we deal with it from a set of options (unit test, patch, RASP)?
Another aspect to consider is this information distributed without overloading stakeholders. With DevOps you need to close the loop on issues within infrastructure, security testing as well as code. And Dev and Ops offer different possible solutions to most vulnerabilities, so the people managing security need to include operations teams as well. Patching, code changes, blocking, and functional whitelisting are all options for closing security gaps; so you’ll need both Dev and Ops to weigh the tradeoffs.
Posted under: Heavy Research
This post is intended to help security folks create an outline or structure for an application security program. We are going to answer such common questions as “How do we start building out an application security strategy?”, “How do I start incorporating DevSecOps?” and “What application security standards should I follow?”. I will discuss the Software Development Lifecycle (SDLC), introduce security items to consider as you put your plan in place, and reference some application security standards for use as guideposts for what to protect against. This post will help your strategy; the next one will cover tactical tool selection.
Security Planning and your SDLC
A Secure Software Development Lifecycle (S-SDLC) essentially describes how security fits into the different phases of a Software Development Lifecycle. We will look at each phase in an SDLC and discuss which security tools and techniques are appropriate. Note that an S-SDLC is typically drawn as a waterfall development process, with different phases in a linear progression, but that’s really just for clearer depiction – the actual SDLC which is being secured is as likely to be Agile, Extreme, or Spiral as Waterfall. There are good reasons to base an S-SDLC on a more modern SDLC; but the architecture, design, development, testing, and deployment phases all map well to development stages in any development process. They provide a good jumping-off point to adapt current models and processes into a DevOps framework.
As in our previous post, we want you to think of the S-SDLC as a framework for building your security program, not a full step-by-step process. We recognize this is a departure from what is taught in classrooms and wikis, but it is better for planning security in each phase.
Define and Architect
- Reference Security Architectures: Reference security architectures exist for different types of applications and services, including web applications, data processing applications, identity and access management services for applications, stream/event processing, messaging, and so on. The architectures are even more effective in public cloud environments, Kubernetes clusters, and service mesh environments – where we can tightly control via policy how each application operates and communicates. With cloud services we recommend you leverage service provider guidelines on deployment security, and while they may not call them ‘reference security architectures’ they do offer them. Educate yourself on the application platforms and ask software designers and architects which methods they employ. Do not be surprised if for legacy applications they give you a blank stare. But new applications should include plans for process isolation, segregation, and data security, with a full IAM model to promote segregation of duties and data access control.
- Operational Standards: Work with your development teams to define minimal security testing requirements, and critical and high priority issues. You will need to negotiate which security flaws will fail a build, and define the process in advance. You will probably need an agreement on timeframes for fixing issues, and some type of virtual patching to address hard-to-fix application security issues. You need to define these things up front and make sure your development and IT partners agree.
- Security Requirements: Just as with minimum functional tests which must run prior to code acceptance, you’ll have a set of security tests you run prior to deployment. These may be an agreed upon battery of unit tests for specific threats your which team writes. Or you may require all OWASP Top Ten vulnerabilities be mitigated in code or supporting products, mapping each threat to a specific security control for all web applications. Regardless of what you choose, your baseline requirements should account for new functionality as well as old. A growing body of tests requires more resources for validation and can slow your test and deployment cycle over time, so you have some decisions to make regarding which tests can block a release vs. what you scan for post-production.
- Monitoring and Metrics: If you will make small iterative improvements with each release, what needs fixing? Which code modules are problematic for security? What is working and how can you prove it? Metrics are key to answering all these questions. You need to think about what data you want to collect and build it into your CI:CD and production environments to measure how your scripts and tests perform. That means you need to engage developers and IT personnel in collecting data. You’ll continually evolve collection and use of metrics, but plan for basic collection and dissemination of data from the get-go.
- Security Design Principles: Some application security design and operational principles offer significant security improvement. Things like ephemeral instances to aid patching and reduce attacker persistence, immutable services to remove attack surface, configuration management to ensure servers and applications are properly set up, templating environments for consistent cloud deployment, automated patching, segregation of duties by locking development and QA personnel out of production resources, and so on. Just as important, these approaches are key to DevOps because they make delivery and management of software faster and easier. It sounds like a lot to tackle, but IT and development pitch in as it makes their lives easier too.
- Secure the Deployment Pipeline: With both development and production environments more locked down, development and test servers become more attractive targets. Traditionally these environments run with little or no security. But the need for secure source code management, build servers, and deployment pipelines is growing. And as CI/CD pipelines offer an automated pathway into production, you’ll need at minimum stricter access controls for these systems – particularly build servers and code repositories. And given scripts running continuously in the background with minimal human oversight, you’ll need additional monitoring to catch errors and misuse. Many of the tools offer good security, with digital fingerprinting, 2FA, logging, role-based access control, and other security features. When deployed in cloud environments, where the management plane allows control of your entire environment, great care must be taken with access controls and segregation of duties.
- Threat Modeling: Threat modeling remains one of the most productive exercises in security. DevOps does not change that, but it does open up opportunities for security team members to instruct dev team members on common threat types, and to help plan out unit tests to address attacks. This is when you need to decide whether you will develop this talent in-house or engage a consultant as there really is no product to do this for you. Threat modeling is often performed during design phases, but can also occur as smaller units of code are developed, and sometimes enforced with home-built unit tests.
Infrastructure and Automation First: Automation and Continuous Improvement are key DevOps principles, and just as valuable for security. As discussed in the previous post, automation is essential, so you need to select and deploy security tooling. We stress this because planning is important and helps development plan out the tools and tests they need to deploy before they can deliver new code. Keep in mind that many security tools require some development skill to integrate, so plan either to get your staff to help, or engage professional services. The bad news is that there is up-front cost and work to be done in preparation; the good news is that each and every build in the future will benefit from these efforts.
- Automation First: Remember that development is not the only group writing code and building scripts – operations is now up to their elbows as well. This is how DevOps helps bring patching and hardening to a new level. Operations’ DevOps role is to provide build scripts which build out the infrastructure for development, testing, and production servers. The good news is that you are now testing exact copies of production. Templates and configuration management address a problem traditional IT has struggled with for years: ad hoc undocumented work that ‘tweaks’ the environment to get it working. Again, there is a great deal of work to get environments fully automated – on servers, network configuration, applications, and so on – but it makes future efforts faster and more consistent. Most teams we spoke with build new machine images every week, and update their scripts to apply patches, updating configurations and build scripts for different environments. But this work ensures consistency and a secure baseline.
- Secure Code Repositories: You want to provide developers an easy way to get secure and (internally) approved open source libraries. Many clients of ours keep local copies of approved libraries and make it easy to get access to these resources. Then they use a combination of composition analysis tools and scripts, before code is deployed into production, to ensure developers are using approved versions. This helps reduce use of vulnerable open source.
- Security in the Scrum: As mentioned in the previous section, DevOps is process neutral. You can use Spiral, or Agile, or surgical-team approach as you prefer. But Agile Scrums and Kanban techniques are well suited to DevOps. Their focus on smaller, focused, quickly demonstrable tasks aligns nicely. We recommend setting up your “security champions” program at this time, training at least one person on each team on security basics, and determining which team members are interested in security topics. This way security tasks can easily be distributed to team members with interest and skill in tackling them.
- Test Driven Development: A core tenet of Continuous Integration is to never check in broken or untested code. The definitions of broken and untested are up to you. Rather than writing giant waterfall-style specification documents for code quality or security, you’re documenting policies in functional scripts and programs. Unit tests and functional tests not only define but enforce security requirements. Many development teams use what is called “test driven development”, where the tests to ensure desired functionality – and avoid undesired outcomes – are constructed along with the code. These tests are checked in and become a permanent part of the application test suite. Security teams do no leverage this type of testing enough, but this is an excellent way to detect security issues specific to code which commercial tools do not.
- Design for Failure: DevOps turns many long-held principles of both IT and software development upside down. For example durability used to mean ‘uptime’, but now it’s speed of replacement. Huge documents with detailed product specifications have been replaced by Post-It notes. And for security, teams once focused on getting code to pass functional requirements now look for ways to break applications before someone else can. This new approach of “chaos engineering”, which intentionally breaks application deployments, forces engineers to build in reliability and security. A line from James Wickett’s Gauntlt page: Be Mean To Your Code – And Like It expresses the idea eloquently. The goal is not just to test functions during automated delivery, but to really test the ruggedness of code, and substantially raise the minimum security of an acceptable release. We harden an application by intentionally pummeling it with all sorts of functional, stress, and security tests before it goes live – reducing the time required for security experts to test code hands-on. If you can figure out some way to break your application, odds are attackers can too, so build the test – and the remedy – before it goes live. You need to plan for these tests, and the resources needed to build them.
- Parallelize Security Testing: A problem common to all Agile development approaches is what to do about tests which take longer than a development cycle. For example we know that fuzz testing critical pieces of code takes longer than an average Agile sprint. SAST scans of large bodies of code often take an order of magnitude longer than the build process. DevOps is no different – with CI and CD code may be delivered to users within hours of its creation, and it may not be possible to perform complete white-box testing or dynamic code scanning. To address this issue DevOps teams run multiple security tests in parallel to avoid delays. They break down large applications into services to speed up scans as well. Validation against known critical issues is handled by unit tests for quick spot checks, with failures kicking code back to the development team. Code scanners are typically run in parallel with unit or other functional tests. Our point here is that you, as a security professional, should look for ways to speed up security testing. Organizing tests for efficiency vs. speed – and completeness vs. time to completion – was an ongoing balancing act for every development team we spoke with. Focusing scans on specific areas of code helps find issues faster. Several firms also discussed plans to maintain pre-populated and fully configured tests servers – just as they do with production servers – waiting for the next test cycle to avoid latency. Rewriting and reconfiguring test environments for efficiency and quick deployments help with CI.
- Elasticity FTW: With the public cloud and virtualized resources it has become much easier to quickly provision test servers. We now have the ability to spin up new environments with a few API calls and shrink them back down when not in use. Take advantage of on-demand elastic cloud services to speed up security testing.
- Test Data Management: Developers and testers have a very bad habit of copying production data into devopment and test environments to improve their tests. This had been the source of many data breaches over the last couple decades. Locking down production environments so QA and Dev personnel cannot exfiltrate regulated data is great, but also ensure they do not bypass your security controls. Data masking, tokenization, and various tools can produce quality test data, minimizing their motivation to use production data. These tools deliver test data derived from production data, but stripped of sensitive information. This approach has proven successful for many firms, and most vendors offer suitable API or automation capabilities for DevOps pipelines.
- Manual vs. Automated Deployment:: It is easy enough to push new code into production with automation. Vetting that code, or rolling back in case of errors, is much harder. Most teams we spoke with are not yet completely comfortable with fully automated deployment – it scares the hell out of many security folks. Continuous software delivery is really only used by a small minority of firms. Most only release new code to customers every few weeks, often after a series of sprints. These companies execute many deployment actions through scripts, but launch the scripts manually when Operations and Development resources are available to fully monitor the push. Some organizations really are comfortable with fully-automated pushes to production, releasing several times per day. There is no single right answer, but either way automation performs the bulk of the work, freeing up personnel to test and monitor.
- Deployment and Rollback: To double-check that code which worked in pre-deployment tests still works in the development environment, teams we spoke with still do ‘smoke’ tests, but they have evolved them to incorporate automation and more granular control over rollouts. We saw three tricks commonly used to augment deployment. The first and most powerful is called Blue-Green or Red-Black deployment. Old and new code run side by side, each on its own set of servers. A rollout is a simple flip at the load balancer level, and if errors are discovered the load balancers are pointed back to the older code. The second, canary testing, is where a small subset of individual sessions are directed towards the new code – first employee testers, then a subset of real customers. If the canary dies (errors are encountered), the new code is retired until the issued can be fixed, when the process is repeated. Finally, feature tagging enables and disables new code elements through configuration files. If event errors are discovered in a new section of code, the feature can be toggled off until it is fixed. The degrees of automation and human intervention vary greatly between models and organizations, but overall these deployments are far more automated that traditional web services environments.
- Production Security Tests: Applications often continue to function even when security controls fail. For example a new deployment scripts might miss an update to web application firewall policies, or an application could launch without any firewall protection. Validation – at least sanity checks on critical security components – is essential for the production environment. Most of the larger firms we spoke with employ penetration testers, and many have full-time “Red Teams” examining application runtime security for flaws.
- Automated Runtime Security: Many firms employ Web Application Firewalls (WAF) as part of their application security programs, usually in order to satisfy PCI-DSS requirements. Most firms we spoke with were dissatisfied with these tools, so while they continue to leverage WAF blacklists, they were adopting Runtime Application Self-Protection (RASP) to fill remaining gaps. RASP is an application security technology which embeds into an application or application runtime environment, examining requests at the application layer to detect attacks and misuse in real time. Beyond just “WAF in the application context”, RASP can monitor and enforce at many points within an application framework, both tailoring protection to specific types of attacks and allowing web application requests to “play out” until it becomes clear a request is indeed malicious before blocking it. Almost every application security and DevOps call we took over the last three years included discussion of RASP, and most firms we spoke with have deployed the technology.
Application Security Standards
A handful of application security standards are available. The Open Web Application Security Project (OWASP) Top Ten and the SANS Common Weakness Enumeration Top 25 are the most popular, but other lists of threats and common weaknesses are available, typically focused on specific subtopics such as cloud deployment or application security measurement. Each tends to be embraced by one or more standards organizations, so which you use is generally dictated by which industry you are in. Or you can use all of them.
Regardless of your choice, the idea is to understand what attacks are common and account for them with one or more security controls and application security tests in your build pipeline. Essentially you build out a matrix of threats, and map them to security controls. This step helps you plan out what security tools you will adopt and put into your build process, and which you will use in production.
All that leads up to our next post: Building a Security Tool Chain.
Posted under: Heavy Research
In our first paper on ‘Building Security Into DevOps’, given the ‘newness’ of DevOps for most of our readers, we included a discussion on the foundational principles and how DevOps is meant to help tackle numerous problems common to software delivery. Please refer to that paper is you want more detailed background information. For our purposes here we will discuss just a few principles that directly relate to the integration of security teams and testing with DevOps principles. These concepts lay the foundations for addressing the questions we raised in the first section, and readers will need to understand these as we discuss security tooling and approaches in a DevOps environment.
DevOps and Security
Build Security In
It is a terrible truth, but wide use of application security techniques within the code development process is relatively new. Sure, the field of study is decades old, but application security was more often bolted on with network or application firewalls, not baked into the code itself. Security product vendors discovered that understanding application requests in order to detect and then block attacks is incredibly difficult to do outside the application. It is far more effective to fix vulnerable code and close off attack vectors when possible. Add-on tools are getting better – and some work inside the application context – but better to address the issues in the code when possible.
A central concept when building security in is ‘shift left’, or the idea that we integrate security testing earlier within the Software Development Lifecycle (SDLC) – the phases of which are typically listed left to right as design, development, testing, pre-production and production. Essentially we shift more resources away from production on the extreme right, and put more into design, testing and development phases. Born out of lean manufacturing, Kaizen and Deming’s principles, these ideas have been proven effective, but typically applied to the manufacture of physical goods. DevOps has promoted use in software development, demonstrating we can improve security at a lower cost by shifting security defect detection earlier in the process.
Automation is one of the keys to success for most firms we speak with, to the point that the engineering teams often equate DevOps and Automation as synonymous. The reality is the cultural and organizational changes that come with DevOps are equally important, it’s just that automation is sometimes the most quantifiable benefit.
Automation brings speed, consistency and efficiency to all parties involved. DevOps, like Agile, is geared towards doing less, better, and faster. Software releases occur more regularly, with less code change between them. Less work means better focus, more clarity of purpose with each release, resulting in fewer mistakes. It also means it’s easier to rollback in the event of mistakes. Automation helps people get their jobs done with less hands-on work, but as automation software does exactly the same things every time, consistency is the most conspicuous benefit.
The place where automation is first applied, where the benefits of automation are most pronounced, are the application build servers. Build servers (e.g.: Bamboo, Jenkins, ), commonly called Continuous Integration (CI) servers, automatically construct an application – and possibly the entire application stack – as code is changed. Once the application is built, these platforms may also launch QA and security tests, kicking back failed builds to the development team. Automation benefits other facets of software production, including reporting, metrics, quality assurance and release management, but security testing benefits are what we are focused on in this research.
On the outset this may not seem like much; calling security testing tools instead of manually running the tests. That perspective misses the fundamental benefits of automated security testing. Automation is how we ensure that each update to software includes security tests, ensuring consistency. Automation is how we help avoid mistakes and omissions common with repetitive and – let’s be totally transparent here – boring manual tasks. But most importantly, as security teams are typically outnumbered by developers at a ratio of 100 to one, automation is the key ingredient to scaling security coverage without having to scale security personnel headcount.
A key DevOps principle is to break down silos and have better cooperation between developers and supporting QA, IT, security and other teams. We have heard this idea so often that it sounds cliché, but the reality is few in software development actually made changes to implement this idea. Most DevOps centric firms are changing development team composition to include representatives from all disciplines; that means every team has someone who knows a little security and/or represents security interest, even on small teams. And for those that do, they realize the benefits not just of better communication, but true alignment of goals and incentives. Development in isolation is incentivized to write new features. Quality Assurance in isolation is incented to get code coverage for various tests. When everyone on a team is responsible for the successful release of new software, there is a change in priorities and changes to behavior.
This item remains a bit of a problem for many of the firms we have interviewed. The majority of firms we have spoken with are large in size, with hundreds of development teams located in different countries, some of which are third party (re: external) consultants. It is hard to get consistency across all of these teams, and even harder to get general participation. Managerial structure is set up so development managers manage developers, not IT personnel. The management tools for feature tracking, trouble-ticketing, resource allocation are geared towards a siloed structure. And many leading security tools are set to analyze and report defects to security professionals, not developers or IT personnel who resolve an issue. Progress is still measured in feature outputs and code coverage, and bonuses awarded accordingly.
The point here is this cultural change, and the great benefits derived, are not realized without some changes to the supporting systems and structures. This is a very hard adjustment, one where the various managers are all looking to implement policies as if they have full oversight, missing the point that they too need to adopt the ‘one team’ approach with their peers to effectively enact changes.
Security Practitioners and Application Security
Why security folks struggle with DevSecOps, and even application security in general, as they do not have backgrounds in software development. Most security practitioners come from a network security background, and many CISOs we speak with are more risk and compliance focused, so there is a general lack of understanding of software development. This lack of knowledge of development tooling and processes, along with common challenges developers are trying to overcome, means security teams seldom understand why automated build servers, central code repositories, containers, Agile and DevOps have caught fire and widely adopted in a very short time. Here we discuss some of the drivers for changes in development practices and the key areas security teams need to understand when trying to get a handle on application security.
- Knowledge of Process: We are not here to teach the nuances of development process, but we want to point out the reasons why processes change: Speed. Waterfall, Spiral, Prototype Evolutions, Extreme Programming, Agile and Agile with Scrum are all process variations made over the last 20 years. Each came with the same goals: Reduce complexity (i.e.: simplify requirements) and speed up software delivery. When you understand that the majority of changes to how we build software over the last 20 years is to address these two goals, you start to understand the process itself is not important; the goals of delivering better software, faster is. Daily scrums, bi-weekly software delivery (i.e.: Sprints), Kanban, Agile, test driven development and automated build servers are tools to advance the state of the art. So it is critical for security professionals to understand that security testing and policies should embrace these same ideals. And lastly, DevOps is process independant; you can embrace DevOps and still have a waterfall process, but certainly DevOps fits more naturally with Agile.
- Knowledge of Tools: Software development leverages many tools to manage code and process. Of these, the two most important to security are code repositories and code build tools. Repositories like Git essentially manage application code, giving developers a shared location to store code, track versions and changes. Others, like Docker Registry, are specifically used for containers. These tools are essential for developers to manage the code they are building, but also important to security as it provides a place where code can be inspected. Build servers like Jenkins and Bamboo automate the build, testing and delivery of code. But rather than at a component or module level, they are typically employed for full-app-stack testing. Developers and quality assurance teams use the build server to launch functional, regression and unit testing; security teams should also leverage these build servers to integrate security testing (.e.g.: SAST, DAST, Composition Analysis, security unit tests) so it fits within the same build process, and used all of the same management and communications tools. It is important for security teams to understand which tools the development team uses, and who controls those resources, and arrange the integration of security testing.
- Everything Is Code: Applications are software. This is fairly well understood, but what is not understood that in many environments – especially public cloud – that your servers, networks, messaging, IAM and every other bit of the infrastructure may be defined as configuration scripts, templates or application code. IT teams now define an entire data center with a template comprised of a few hundred lines of script. The idea for security practitioners is twofold: security policies can also be defined in scripts/code, and that you can examine code repositories to ensure the templates, scripts and code are secure before they are run. This is a fundamental change to how security audits may be conducted.
- Open Source: Open source software plays a huge part in application development, and is so universally embraced in the development community it is almost impossible to find a new application development project that does not leverage it. This means that a large portion of your code may not be tested in the way you think it is, or developers may intentionally use old, vulnerable versions. Why? Because the old version works with their code. If they change the library it may break the application and require more work. Developers are incented to get code working and we have witnessed heroic efforts on their part to avoid new (i.e.: patched) open source versions for the sake of stability. So we want you to come away with two points: You need to test open source code before it hits production, and you need to ensure that developers do not surreptitiously swap out trusted versions of open source libraries for older, probably vulnerable, versions.
- Tooling and Dev ‘Buy In’: The first step most security teams take when introducing security into application development is to run static analysis scans. The good part is that most security practitioners know what SAST is and what it does. The bad part is that security started with older SAST tools that are slow, produce output only intelligible by security folks, created ‘false positive’ alerts, and and they do not have the critical APIs needed to fully integrate with the rest of the build process. All told, their efforts were developer hostile, and most development teams reacted in kind by ignoring the scans or removing the tools from the build process. Two key aspects here: you want to select tools that operationally fit the development model (faster, easier, better), and use tools that are actually effective. Left to their own decision making process, developers will always choose the easiest tool to integrate, not the most effective security scanning tool. It is important that the security team is part of the security tool selection process to ensure security scans are providing adequate analysis.
- Security Friction & Cultural Dynamics: Most application security teams are playing catch-up. Development is (usually) already agile, and if some of your development organizations are embracing DevOps, it’s possible IT and QA are also agile. This means security folks are the non-agile anomaly; anything you do or ask for adds time and complexity, the antithesis of software engineering goals. This topic is so important that I have added the entire next section, ‘Scaling Security’, on methods to address the cultural friction between security and development.
- SDLC and S-SDLC: Many application security teams approach application security by looking at the Software Development LifeCycle (SDLC), with the goal of applying some form of security analysis in each phase of the lifecycle. A secure-SDLC (S-SDLC) will typically include threat modeling during design, composition analysis in development, static analysis during the build phase, and any number of tests pre-production. This is an excellent way to set up a process independant application security program. And as many large organizations come to understand, each of your development teams employ a slightly different process, and it is entirely possible your company uses every known development process in existence. This is a huge headache, but the S-SDLC becomes your yardstick: Use the S-SDLC as the policy template, and then map the security controls to the fit within the different processes.
As we mentioned in the introductory section, most security teams are vastly outnumbered. As an example, I spoke with three midsized firms this week; the development personnel ranged in number from 800-2000, while the security teams ranged in size from 12 to 25. Of the security personnel, they typically had two or three individuals with a background in application security. While they may be as rare as unicorns, that does not mean they have magical powers to cover all development operations, so they need to learn ways to scale their experience across the enterprise. Further, they need to do wit in a way that meshes with development ideals, getting the software development teams to perform the security controls they designs. Here are several methods that work.
- Automation: We have already discussed automation to some degree so I will keep it short here. Automation is how security analysis occurs faster, more often and without direct operation by security teams. Security tools that perform automated analysis, either out of the box or custom checks of your design, is critical in order to scale across multiple development teams. And yes, this means that for every build pipeline you have in your company you will have to get tools integrated with that pipeline, so this takes time. And this means that not only is the scan automated, but the distribution of the results is integrated with other tools/processes. This is how teams scale, and instrumental for the next two bullet items on our list.
- Failing the Build: Development and security teams commonly have friction between them. Security typically provides development managers security scan results with thousands of defects. Development managers interpret that as saying “Dude, your code sucks, what’s wrong with you, fix it now!” One of the ways to reduce the friction between the two groups is to take the output from static or dynamic scans, discuss the scope of the problem, what critical defects mean, and come to an agreement on what is reasonable to fix in the mid term. Once everyone is in agreement about what a critical issue is and in what time period is reasonable to fix it in, you instruct the security tools to fail a build when critical bugs are discovered. While this process takes some time to implement, and some pain to work through, it changes the nature of the relationship between security and development. No longer is it security saying the code is defective, it’s an unbiased tool reporting a defect to development. Security is no longer the bad guy standing in the way of progress, rather development is now having to meet a new quality standard, one focused on code quality as it relates to security defects. It also changes the nature of the relationship as developers often need assistance in understanding the nature of the defect, look for ways to tackle classes of defects instead of individual bugs, and developers go to security for assistance. Failing the build creates a sea-change in the relationship between the two groups. It takes time to get there, and any security tooling the produces false positives magnifies the difficulty in getting this change in place, but this step is critical for DevOps teams.
- Metrics: Metrics are critical to understand the scope of application security issues present and security tools are how you will collect most metrics. Even of you do not fail the build, and even if the results are not shared with anyone outside security, integrating security testing into build servers and code repositories is critical to gaining visibility and metrics. These metrics will help you decide where to spend your budget, be it additional tooling, developer education, or runtime protection. And these metrics will be your guide to the effectiveness of the tools, education and runtime protection you implement. Without metrics, you’re just guessing.
- Security Champions: One of the most effective methods I have discovered to scale security is to deputize willing developers – those that have an active interest in security – to be a ‘security champion’ on their development team. Most developers have some interest in security, and they know security education makes them more valuable to the company, and that often means raises. For security, this means you have a liaison on a security team, one you can ask questions of and one who will come to you if they have questions. Typically security teams cultivate these relationships through education, having a ‘center of excellence’ where developers and security pros can ask questions (i.e.: Slack channels), sending the developers to security conferences, or simply sponsoring events like lunches where security topics are discussed. Regardless of how you do it, this is an excellent way to scale security without scaling security headcount, and we recommend you set aside some budget and resources as it returns far more benefits than it costs.
- Education: If you want to have developers understand security and threats to applications, educate them. Engineering leads and VPs of Engineering, because of the expense on headcount, are usually under tight restrictions for educational budgets. To fill the gaps, it is not uncommon for security teams to shoulder the expense of educating select developers on skills the organization is lacking. Sometimes it is done through the purchase of security related CBTs, sometimes it is the purchase of professional services provided by security tool vendors, and sometimes it is specific classes from SANS or other institutes. Understanding how to remediate application security issues, security reference architectures, how to perform threat modeling, and how to use security tools are all common.
This series is going to be a bit longer than most. Over the last few years we have acquired considerable research. And while we will attempt to be concise, there is simply a lot of material to cover in order to address the questions from section one.
Next up I will discuss putting together a secure SDLC, and the integration of security testing in the development process.
Posted under: Heavy Research
DevOps is an operational framework that promotes software consistency and standardization through automation. It helps address many of the nightmare development issues around integration, testing, patching and deployment by both breaking down the barriers between different development teams, but also by prioritizing things that make software development faster and easier.
DevSecOps is the integration of security teams and security tools directly into the software development lifecycle, leveraging the automation and efficiencies of DevOps to ensure application security testing occurs with every build cycle. This promotes security, consistency and ensure security is no less important that other quality metrics or feature. Automated security testing, just like automated application build and deployments, must be assembled with the rest of the infrastructure.
And therein lies the problem. Software developers have traditionally not embraced security. It’s not because they did not care about security, rather they were incentivized to to focus on delivery of new features and functions. DevOps is changing the priority on automating build processes to make them faster, easier and more consistent. But it does not mean they are going out of their way to include security or security tooling. That’s often because the security tools don’t easily integrate well with development tools and processes, and usually flood queues with unintelligible findings, and lack development centric filters to help prioritize work. Worse, the security platforms – and the security professionals who recommended them – were difficult to work with or even fail to offer API layer support to provide integration.
On the other side of equation are security teams, who are fearful of automated software processes, and commonly ask the question “How do we get control over development”. The very nature of this question misses both the spirit of DevSecOps, as well as the efforts of development organizations to get faster, more efficient and more consistency with each software release. The only way for security teams to cope with the changes occurring within software development, and to scale their relatively small organizations, is to become just as agile as Dev teams and embrace automation.
Why This Research Paper?
We typically discuss the motivation for our research papers to help readers understand our goals and what we wish to convey. This is doubly so when we update a research paper as it help us spotlight recent changes in the industry that have made older papers inaccurate or fall short in describing recent trends. As DevOps and DevOps option has matured considerably in four years, we have a lot to talk about.
This effort will be a major rewrite of our 2015 research work on Building Security into DevOps, with significant additions around common questions security teams ask about DevSecOps, and a thorough update to tooling and approaches to integration. Much of this research paper will reflect the 400+ conversations since 2017 across some 200+ security teams at Fortune 2000 firms. As such we will include considerably more discussion points derived from those conversations. As DevOps has been around for many years now, less discussion around what DevSecOps is or how it is beneficial, and a more pragmatic discussion on how to put together a DevSecOps program.
Now, let’s shake things up a bit.
Different Focus, Different Value
There are a plethora of new surveys and research papers available, and some of them a very good. And there are more conferences and on-line resources popping up than I can count. For example, Veracode recently released the latest iteration on their State of Software Security (SoSS) report, and it’s a monster, with loads of data and observations. The key takeaways are the agility and automation employed by DevSecOps teams provides demonstrable security benefits, including faster patching cycles, shorter duration of flaw persistence, faster reduction of technical debt, and ‘easier’ scanning meant faster problem identification. Sonatype recently released 2019 State of the Software Supply Chain report shows teams which shows teams that ‘Exemplary Project Teams’ who leverage DevOps principles drastically reduce code deployment failure rates, and remediate vulnerabilities in half the time as average groups. And we have events like All Day DevOps, where hundreds of DevOps practitioners share stories on cultural transformations, continuous integration / continuous deployment (CI:CD) techniques, site reliability engineering as well as DevSecOps. All of which is great, and offers a body of qualitative and quantitative data on why DevOps works and how practitioners are evolving their programs.
That’s not what this paper is about. Those resources are not addressing the questions I am being asked each and every week.
This paper is about putting together a comprehensive DevSecOps program. Overwhelmingly we are asked “How do I put a DevSecOps program together?” and “How does security fit into DevOps?”. They are not looking for justification, nor are they looking for point stories on nuances to address specific impediments. They want a security program that is in line with peer organizations and embraces ‘security best practices’. These audiences are overwhelmingly comprised of security and IT practitioners, largely left behind by development teams who have at the very least embraced Agile concepts if not DevOps outright. The challenge is to understand what development is trying to accomplish, integrate – in some fashion – with those teams, and figure out how to leverage automated security testing to be at least as agile as development teams are.
DevOps vs DevSecOps
Which leads us to another controversial topic and why this research is different: The name DevSecOps. It is our contention that calling out security – the ‘Sec’ in ‘DevSecOps’ – is needed given the lack of maturity and understanding on this topic.
Stated another way, practitioners of DevOps who have fully embraced the movement will tell you that there is no reason to add ‘Sec’ into DevOps, as security is just another ingredient. The DevOps ideal is to break down silos between individual teams (e.g.: architecture, development, IT, security and QA) to better promote teamwork and better incentivize each team member toward the same goals. If security is just another set of skills blended into the overall effort of building and delivering software, there is no reason to call it out any more than we call out quality assurance. Philosophically, these proponents are right. In practice, we are not there yet. Developers may embrace the idea, they generally suck at facilitating team integration. Sure, security is free to participate, but it’s up to them to learn where they can integrate, and typically asked to bring skills to the party they may not possess. It’s passive-aggressive team building!
Automated security testing, just like automated application build and deployments, takes time and skill to build out. In our typical engagements with clients the developers are absent from the call. A divide still exists, with little communication between security and what is usually dozens – to hundreds – of disperse development teams. When developers are present, they state that the security team can create scripts to integrate security testing into the build server; they can codify security policies; security can stitch together security analysis tools with trouble ticketing and metrics with a few lines of python code. After-all, many of the IT practitioners are learning to script for configuration management and build templates to define infrastructure deployments, so why not security? This completely misses the reality that few security practitioners are capable of coding at this level. Worse, most firms we speak with have a ratio of around 100 developers for every security practitioner, and there is simply no way to scale security resources across all development projects.
It does not help that many security professionals are early in their process of understanding DevOps and various methods developers have been adopting over the last decade to become more agile. Security is genuinely behind the curve, and it seems the bulk of the available research – mentioned above – is not aimed at tackling security’s introduction and integration.
Lastly, we have one more very important reason we choose to use the DevSecOps name: security efforts for code security are very different than the efforts to secure infrastructure and supporting components. The security checks to validate application code is secure (i.e.: DevSec), but are different than tooling and processes used verify the supporting infrastructure (i.e.: SecOps) is secure. These are two different disciplines, with different tooling and approaches, and should be discussed as separate efforts.
We went through all of our call notes from the last three years, and tallied the questions that we were asked. The following list is the most common questions, in the order of how often the question was asked.
- We want to integrate security testing into the development pipeline, and are going to start with static analysis. How do we do this?
- How do we build an application security strategy in light of automation, CI:CD and DevOps?
- How to start building out an application strategy? What app security standards should I follow?
- Development is releasing code to production every day. How do we get control over development? Can we realistically modify developers behavior?
- What is the best way to introduce DevSecOps? Where do I start? What are the foundational pieces?
- How do we get different units to adopt the same program (different teams all do things differently) when right now some are waterfall, some agile, some DevOps?
- How should we (Security) work with Dev?
- We understand shift left, but are the tools effective? What tools do you recommend to start with?
The questions all have some common threads; they all come from firms who have at least some DevOps teams already in place, that security has some understanding of the intent of DevOps, but they are all starting from scratch. Even teams that have security tests already built into the development pipeline are struggling with the value each tool provides, pushback from developers over flase positives, how to work with developers or how to scale across many development teams for consistency. And we find during calls and engagements that security is not quite in tune with why developers embrace DevOps, and miss the thrust of the effort, which is why they are commonly out of synch.
The following is a list of questions security we came up with that security teams should ask, but don’t.
- How do we fit – culturally and operationally – into DevSecOps?
- How do we get visibility into Development and Development practices?
- How do we know changes are effective? What metrics should we collect and monitor?
- How do we support Dev?
- Do we need to know how to code?
Over the course of this series we will address both lists of questions. Next up, we will provide a brief introduction to DevSecOps principles and the role of security in DevOps.
Posted under: Heavy Research
We want to take a more formal look at the RASP selection process. For our 2016 version of this paper, the market was young enough that a simple list if features was enough to differentiate one platform from another. But the current level of platform maturity makes top-tier products more difficult to differentiate.
In our previous section we discussed principal use cases, then delved into technical and business requirements. Depending upon who is driving your evaluation, your list of requirements may look like either of those. With those driving factors in mind – and we encourage you to refer back as you go through this list – here is our recommended process for evaluating RASP. We believe this process will help you identify which products and vendors fit your requirements, and avoid some pitfalls along the way.
- Create a selection committee: Yes, we hate the term ‘committee’ as well, but the reality is that when RASP effectively replaces WAF (whether or not WAF is actually going away), RASP requirements come from multiple groups. RASP affects not only the security team, but also development, risk management, compliance, and operational teams as well. So it’s important to include someone from each of those teams (to the degree they exist in your organization) on the committee. Ensure that anyone who could say no, or subvert the selection at the 11th hour, is on board from the beginning.
- Define systems and platforms to monitor: Is your goal to monitor select business applications or all web-facing applications? Are you looking to block application security threats, or only for monitoring and instrumentation to find security issues in your code? These questions can help you refine and prioritize your functional needs. Most firms start small, figure out how best to deploy and manage RASP, then grow over time. Legacy apps, Struts-based applications, and applications which process highly sensitive data may be your immediate priorities; you can monitor other applications later.
- Determine security requirements: The committee approach is incredibly beneficial for understanding true requirements. Sitting down with the entire selection team usually adjusts your perception of what a platform needs to deliver, and the priorities of each function. Everyone may agree that blocking threats is a top priority, but developers might feel that platform integration is the next highest priority, while IT wants trouble-ticket system integration but security wants language support for all platforms in use. Create lists of “must have”, “should have”, and “nice to have”.
- Define: Here the generic needs determined earlier are translated into specific technical features, and any additional requirements are considered. With this information in hand, you can document requirements to produce a coherent RFI.
Evaluate and Test Products
- Issue the RFI: Larger organizations should issue an RFI though established channels and contact a few leading RASP vendors directly. If you are in a smaller organization start by sending your RFI to a trusted VAR and email a few RASP vendors which look appropriate. A Google search or brief contact with an industry analyst can help understand who the relevant vendors are.
- Define the short list: Before bringing anyone in, match any materials from vendors and other sources against your RFI and draft RFP. Your goal is to build a short list of 3 products which can satisfy most of your needs. Also use outside research sources (like Securosis) and product comparisons. Understand that you’ll likely need to compromise at some point in this process, as it’s unlikely any vendor can meet every requirement.
- The dog & pony show: Bring the vendors in, but instead of generic presentations and demonstrations, ask the vendors to walk you through specific use cases which match your expected needs. This is critical because they are very good at showing eye candy and presenting the depth of their capabilities, but having them attempt to deploy and solve your specific use cases will help narrow down the field and finalize your requirements.
- Finalize RFP: At this point you should completely understand your specific requirements, so you can issue a final formal RFP. Bring any remaining products in for in-house testing.
- In-house deployment testing: Set up several test applications if possible; we find public and private cloud resources effective for setting up private test environments to put tools through their paces. Additionally, this exercise will very quickly show you how easy or hard a product is to use. Try embedding the product into a build tool and see how much of the heavy lifting the vendor has done for you. Since this reflects day-to-day efforts required to manage a RASP solution, deployment testing is key to overall satisfaction.
- In-house effectiveness testing: You’ll want to replicate the key capabilities in house. Build a few basic policies to match your use cases, and then violate them. You need a real feel for monitoring, alerting, and workflow. Many firms replay known attacks, or use penetration testers or red teams to hammer test applications to ensure RASP detects and blocks the malicious requests they are most worried about. Many firms leverage OWASP testing tools to exercise all major attack vectors and verify that RASP provides broad coverage. Make sure to tailor some of their features to your environment to ensure customization, UI, and alerts work as you need. Are you getting too many alerts? Are some of their findings false positives? Do their alerts contain actionable information so a developer can do something with them? Put the product through its paces to make sure it meets your needs.
Selection and Deployment
- Select, negotiate, and purchase: Once testing is complete take the results to your full selection committee and begin negotiations with your top two choices. – assuming more than one meets your needs. This takes more time but it is very useful to know you can walk away from a vendor if they won’t play ball on pricing, terms, or conditions. Pay close attention to pricing models – are they per application, per application instance, per server, or some hybrid? As you expand your RASP footprint, you should know it will not cause your bill to skyrocket.
- Implementation planning: Congratulations, you’ve selected a product, navigated the procurement process, and made a sales rep happy. Now the next stage of work begins: as the end of selection you need to plan deployment. That means making sure of details: lining up resources, getting access/credentials to devices, locking in an install schedule, and even the logistics of getting devices to the right locations. No matter how well you execute on selection, unless you implement flawlessly and focus on quick wins and getting immediate value from the RASP platform, your project will be a failure.
- Professional services: In some cases initial setup is where the majority of the work takes place. Unlike WAF, day-to-day maintenance of RASP tends to be minor. Because of this some vendors, either directly or through partners, can help with integration and templating initial deployment.
I can hear the groans from small to medium-sized business looking at this process and thinking this is a ridiculous amount of detail. We developed created a granular selection process, for you to pare down to suit your organization’s requirements. We want to make sure we captured all the gory details some organizations need to go through for successful procurement. Our process is appropriate for a large enterprise, but a little pruning can make it manageable a good fit for a small group. That’s the great thing about process: you can change it however you see fit, at no expense.
For many organizations, implementing a Runtime Application Self-Protection (RASP) platform is a requirement. Given the sheer volume of existing application security defects, and the rate of discovery of new attacks, there is no other suitable option than runtime protection. Regardless of whether it’s driven by compliance or operational security or something else, we need to react to threats – without weeks to fix, test, and deploy. Quick and efficient handling of attacks, and reasonable instrumentation to determine which parts of an application are vulnerable, is critical for security and deployment to tackle application security issues. RASP provides an effective tool to bridge short-term requirements and long-term application security goals.