Enterprise DevSecOps: Security Planning
Posted under: Heavy Research
This post is intended to help security folks create an outline or structure for an application security program. We are going to answer such common questions as “How do we start building out an application security strategy?”, “How do I start incorporating DevSecOps?” and “What application security standards should I follow?”. I will discuss the Software Development Lifecycle (SDLC), introduce security items to consider as you put your plan in place, and reference some application security standards for use as guideposts for what to protect against. This post will help your strategy; the next one will cover tactical tool selection.
Security Planning and your SDLC
A Secure Software Development Lifecycle (S-SDLC) essentially describes how security fits into the different phases of a Software Development Lifecycle. We will look at each phase in an SDLC and discuss which security tools and techniques are appropriate. Note that an S-SDLC is typically drawn as a waterfall development process, with different phases in a linear progression, but that’s really just for clearer depiction – the actual SDLC which is being secured is as likely to be Agile, Extreme, or Spiral as Waterfall. There are good reasons to base an S-SDLC on a more modern SDLC; but the architecture, design, development, testing, and deployment phases all map well to development stages in any development process. They provide a good jumping-off point to adapt current models and processes into a DevOps framework.
As in our previous post, we want you to think of the S-SDLC as a framework for building your security program, not a full step-by-step process. We recognize this is a departure from what is taught in classrooms and wikis, but it is better for planning security in each phase.
Define and Architect
- Reference Security Architectures: Reference security architectures exist for different types of applications and services, including web applications, data processing applications, identity and access management services for applications, stream/event processing, messaging, and so on. The architectures are even more effective in public cloud environments, Kubernetes clusters, and service mesh environments – where we can tightly control via policy how each application operates and communicates. With cloud services we recommend you leverage service provider guidelines on deployment security, and while they may not call them ‘reference security architectures’ they do offer them. Educate yourself on the application platforms and ask software designers and architects which methods they employ. Do not be surprised if for legacy applications they give you a blank stare. But new applications should include plans for process isolation, segregation, and data security, with a full IAM model to promote segregation of duties and data access control.
- Operational Standards: Work with your development teams to define minimal security testing requirements, and critical and high priority issues. You will need to negotiate which security flaws will fail a build, and define the process in advance. You will probably need an agreement on timeframes for fixing issues, and some type of virtual patching to address hard-to-fix application security issues. You need to define these things up front and make sure your development and IT partners agree.
- Security Requirements: Just as with minimum functional tests which must run prior to code acceptance, you’ll have a set of security tests you run prior to deployment. These may be an agreed upon battery of unit tests for specific threats your which team writes. Or you may require all OWASP Top Ten vulnerabilities be mitigated in code or supporting products, mapping each threat to a specific security control for all web applications. Regardless of what you choose, your baseline requirements should account for new functionality as well as old. A growing body of tests requires more resources for validation and can slow your test and deployment cycle over time, so you have some decisions to make regarding which tests can block a release vs. what you scan for post-production.
- Monitoring and Metrics: If you will make small iterative improvements with each release, what needs fixing? Which code modules are problematic for security? What is working and how can you prove it? Metrics are key to answering all these questions. You need to think about what data you want to collect and build it into your CI:CD and production environments to measure how your scripts and tests perform. That means you need to engage developers and IT personnel in collecting data. You’ll continually evolve collection and use of metrics, but plan for basic collection and dissemination of data from the get-go.
- Security Design Principles: Some application security design and operational principles offer significant security improvement. Things like ephemeral instances to aid patching and reduce attacker persistence, immutable services to remove attack surface, configuration management to ensure servers and applications are properly set up, templating environments for consistent cloud deployment, automated patching, segregation of duties by locking development and QA personnel out of production resources, and so on. Just as important, these approaches are key to DevOps because they make delivery and management of software faster and easier. It sounds like a lot to tackle, but IT and development pitch in as it makes their lives easier too.
- Secure the Deployment Pipeline: With both development and production environments more locked down, development and test servers become more attractive targets. Traditionally these environments run with little or no security. But the need for secure source code management, build servers, and deployment pipelines is growing. And as CI/CD pipelines offer an automated pathway into production, you’ll need at minimum stricter access controls for these systems – particularly build servers and code repositories. And given scripts running continuously in the background with minimal human oversight, you’ll need additional monitoring to catch errors and misuse. Many of the tools offer good security, with digital fingerprinting, 2FA, logging, role-based access control, and other security features. When deployed in cloud environments, where the management plane allows control of your entire environment, great care must be taken with access controls and segregation of duties.
- Threat Modeling: Threat modeling remains one of the most productive exercises in security. DevOps does not change that, but it does open up opportunities for security team members to instruct dev team members on common threat types, and to help plan out unit tests to address attacks. This is when you need to decide whether you will develop this talent in-house or engage a consultant as there really is no product to do this for you. Threat modeling is often performed during design phases, but can also occur as smaller units of code are developed, and sometimes enforced with home-built unit tests.
Infrastructure and Automation First: Automation and Continuous Improvement are key DevOps principles, and just as valuable for security. As discussed in the previous post, automation is essential, so you need to select and deploy security tooling. We stress this because planning is important and helps development plan out the tools and tests they need to deploy before they can deliver new code. Keep in mind that many security tools require some development skill to integrate, so plan either to get your staff to help, or engage professional services. The bad news is that there is up-front cost and work to be done in preparation; the good news is that each and every build in the future will benefit from these efforts.
- Automation First: Remember that development is not the only group writing code and building scripts – operations is now up to their elbows as well. This is how DevOps helps bring patching and hardening to a new level. Operations’ DevOps role is to provide build scripts which build out the infrastructure for development, testing, and production servers. The good news is that you are now testing exact copies of production. Templates and configuration management address a problem traditional IT has struggled with for years: ad hoc undocumented work that ‘tweaks’ the environment to get it working. Again, there is a great deal of work to get environments fully automated – on servers, network configuration, applications, and so on – but it makes future efforts faster and more consistent. Most teams we spoke with build new machine images every week, and update their scripts to apply patches, updating configurations and build scripts for different environments. But this work ensures consistency and a secure baseline.
- Secure Code Repositories: You want to provide developers an easy way to get secure and (internally) approved open source libraries. Many clients of ours keep local copies of approved libraries and make it easy to get access to these resources. Then they use a combination of composition analysis tools and scripts, before code is deployed into production, to ensure developers are using approved versions. This helps reduce use of vulnerable open source.
- Security in the Scrum: As mentioned in the previous section, DevOps is process neutral. You can use Spiral, or Agile, or surgical-team approach as you prefer. But Agile Scrums and Kanban techniques are well suited to DevOps. Their focus on smaller, focused, quickly demonstrable tasks aligns nicely. We recommend setting up your “security champions” program at this time, training at least one person on each team on security basics, and determining which team members are interested in security topics. This way security tasks can easily be distributed to team members with interest and skill in tackling them.
- Test Driven Development: A core tenet of Continuous Integration is to never check in broken or untested code. The definitions of broken and untested are up to you. Rather than writing giant waterfall-style specification documents for code quality or security, you’re documenting policies in functional scripts and programs. Unit tests and functional tests not only define but enforce security requirements. Many development teams use what is called “test driven development”, where the tests to ensure desired functionality – and avoid undesired outcomes – are constructed along with the code. These tests are checked in and become a permanent part of the application test suite. Security teams do no leverage this type of testing enough, but this is an excellent way to detect security issues specific to code which commercial tools do not.
- Design for Failure: DevOps turns many long-held principles of both IT and software development upside down. For example durability used to mean ‘uptime’, but now it’s speed of replacement. Huge documents with detailed product specifications have been replaced by Post-It notes. And for security, teams once focused on getting code to pass functional requirements now look for ways to break applications before someone else can. This new approach of “chaos engineering”, which intentionally breaks application deployments, forces engineers to build in reliability and security. A line from James Wickett’s Gauntlt page: Be Mean To Your Code – And Like It expresses the idea eloquently. The goal is not just to test functions during automated delivery, but to really test the ruggedness of code, and substantially raise the minimum security of an acceptable release. We harden an application by intentionally pummeling it with all sorts of functional, stress, and security tests before it goes live – reducing the time required for security experts to test code hands-on. If you can figure out some way to break your application, odds are attackers can too, so build the test – and the remedy – before it goes live. You need to plan for these tests, and the resources needed to build them.
- Parallelize Security Testing: A problem common to all Agile development approaches is what to do about tests which take longer than a development cycle. For example we know that fuzz testing critical pieces of code takes longer than an average Agile sprint. SAST scans of large bodies of code often take an order of magnitude longer than the build process. DevOps is no different – with CI and CD code may be delivered to users within hours of its creation, and it may not be possible to perform complete white-box testing or dynamic code scanning. To address this issue DevOps teams run multiple security tests in parallel to avoid delays. They break down large applications into services to speed up scans as well. Validation against known critical issues is handled by unit tests for quick spot checks, with failures kicking code back to the development team. Code scanners are typically run in parallel with unit or other functional tests. Our point here is that you, as a security professional, should look for ways to speed up security testing. Organizing tests for efficiency vs. speed – and completeness vs. time to completion – was an ongoing balancing act for every development team we spoke with. Focusing scans on specific areas of code helps find issues faster. Several firms also discussed plans to maintain pre-populated and fully configured tests servers – just as they do with production servers – waiting for the next test cycle to avoid latency. Rewriting and reconfiguring test environments for efficiency and quick deployments help with CI.
- Elasticity FTW: With the public cloud and virtualized resources it has become much easier to quickly provision test servers. We now have the ability to spin up new environments with a few API calls and shrink them back down when not in use. Take advantage of on-demand elastic cloud services to speed up security testing.
- Test Data Management: Developers and testers have a very bad habit of copying production data into devopment and test environments to improve their tests. This had been the source of many data breaches over the last couple decades. Locking down production environments so QA and Dev personnel cannot exfiltrate regulated data is great, but also ensure they do not bypass your security controls. Data masking, tokenization, and various tools can produce quality test data, minimizing their motivation to use production data. These tools deliver test data derived from production data, but stripped of sensitive information. This approach has proven successful for many firms, and most vendors offer suitable API or automation capabilities for DevOps pipelines.
- Manual vs. Automated Deployment:: It is easy enough to push new code into production with automation. Vetting that code, or rolling back in case of errors, is much harder. Most teams we spoke with are not yet completely comfortable with fully automated deployment – it scares the hell out of many security folks. Continuous software delivery is really only used by a small minority of firms. Most only release new code to customers every few weeks, often after a series of sprints. These companies execute many deployment actions through scripts, but launch the scripts manually when Operations and Development resources are available to fully monitor the push. Some organizations really are comfortable with fully-automated pushes to production, releasing several times per day. There is no single right answer, but either way automation performs the bulk of the work, freeing up personnel to test and monitor.
- Deployment and Rollback: To double-check that code which worked in pre-deployment tests still works in the development environment, teams we spoke with still do ‘smoke’ tests, but they have evolved them to incorporate automation and more granular control over rollouts. We saw three tricks commonly used to augment deployment. The first and most powerful is called Blue-Green or Red-Black deployment. Old and new code run side by side, each on its own set of servers. A rollout is a simple flip at the load balancer level, and if errors are discovered the load balancers are pointed back to the older code. The second, canary testing, is where a small subset of individual sessions are directed towards the new code – first employee testers, then a subset of real customers. If the canary dies (errors are encountered), the new code is retired until the issued can be fixed, when the process is repeated. Finally, feature tagging enables and disables new code elements through configuration files. If event errors are discovered in a new section of code, the feature can be toggled off until it is fixed. The degrees of automation and human intervention vary greatly between models and organizations, but overall these deployments are far more automated that traditional web services environments.
- Production Security Tests: Applications often continue to function even when security controls fail. For example a new deployment scripts might miss an update to web application firewall policies, or an application could launch without any firewall protection. Validation – at least sanity checks on critical security components – is essential for the production environment. Most of the larger firms we spoke with employ penetration testers, and many have full-time “Red Teams” examining application runtime security for flaws.
- Automated Runtime Security: Many firms employ Web Application Firewalls (WAF) as part of their application security programs, usually in order to satisfy PCI-DSS requirements. Most firms we spoke with were dissatisfied with these tools, so while they continue to leverage WAF blacklists, they were adopting Runtime Application Self-Protection (RASP) to fill remaining gaps. RASP is an application security technology which embeds into an application or application runtime environment, examining requests at the application layer to detect attacks and misuse in real time. Beyond just “WAF in the application context”, RASP can monitor and enforce at many points within an application framework, both tailoring protection to specific types of attacks and allowing web application requests to “play out” until it becomes clear a request is indeed malicious before blocking it. Almost every application security and DevOps call we took over the last three years included discussion of RASP, and most firms we spoke with have deployed the technology.
Application Security Standards
A handful of application security standards are available. The Open Web Application Security Project (OWASP) Top Ten and the SANS Common Weakness Enumeration Top 25 are the most popular, but other lists of threats and common weaknesses are available, typically focused on specific subtopics such as cloud deployment or application security measurement. Each tends to be embraced by one or more standards organizations, so which you use is generally dictated by which industry you are in. Or you can use all of them.
Regardless of your choice, the idea is to understand what attacks are common and account for them with one or more security controls and application security tests in your build pipeline. Essentially you build out a matrix of threats, and map them to security controls. This step helps you plan out what security tools you will adopt and put into your build process, and which you will use in production.
All that leads up to our next post: Building a Security Tool Chain.