Oh no, not I, I will survive
Oh, as long as I know how to hide, I know I’ll stay alive
I’ve got all my life to live
And I’ve got all my risk to give and I’ll survive.
I will survive, hey hey!
~ Theme song of vulns without Kenna Security
Veracode recently published their 9th volume of the State of Software Security report, and as always, it’s a great read for those responsible for developing and ensuring secure software. Something that caught our eye in this particular volume is their inclusion of a technique called survival analysis. Because Veracode worked with our partners at the Cyentia Institute on the data crunching behind that report, we thought it would be interesting to apply survival analysis to our own data and see what new insights it yielded.
What is survival analysis?
Before we dive into the data, let’s review the technique itself. That bastion of contemporary human knowledge, Wikipedia, gives the following explanation for survival analysis:
Survival analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems. [It] attempts to answer questions such as: what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?
It’s not hard to see why this applies readily to software flaws—we have an entity (a flaw) that is born (discovered through code review/analysis) and lives for a while while the developer gods decide their fate and “kill” (fix) the flaw. We can also use survival analysis to determine how various factors affect the probability of survival. As an example, the Veracode report shows how more frequent code scanning correlates with a dramatic reduction in the lifespan of flaws (more scanning => faster fixes). That’s an underlying assumption of DevSecOps, but it’s powerful to see that assumption tested and supported. Hence, our interest in how survival analysis might help us test assumptions in the world of vulnerability management.
Does it apply to vulnerability management?
So how do we apply survival analysis to vulnerability management? Or, more directly, how do we apply it to Kenna’s dataset on and around vulnerabilities?. Talking about a vulnerability “surviving” or not may seem a bit odd and imply that it’s “death” is a negative thing. But awkward semantics have no bearing on the applicability of survival analysis. We could just as easily talk about the “persistence” of vulnerabilities rather than “survivability” and study what proportion of them remain open after some passing of time. Semantics aside, let’s first confirm we have the necessary ingredients to properly apply the technique.
Instead of flaws or bugs, vulnerabilities (CVEs to be exact) are our entity of interest. We know when vulnerabilities are detected in a customer environment,when they are remediated, and constant monitoring of the “state” in between. So we have data pertaining to events and the timing of those events. Furthermore, we’re very interested in knowing which factors increase or decrease the likelihood and timing of observed vulnerabilities being remediated. Looks like we’re a go for survivability analysis—let’s do this!
How long do vulnerabilities survive?
Recall from our Prioritization to Prediction report that the majority of published CVEs are not exploited “in the wild” (by which we mean some organization observed suspicious/malicious activity targeting that CVE). It similarly stands to reason that not all CVEs will be “live” within assets deployed across any given enterprise network. Of that subset of live CVEs, we will start the survivability clock ticking as soon as their existence is identified by vulnerability scanners, penetration tests, etc.
If a new customer observed 100 live/open vulnerabilities within their assets today (Day 0), some might be remediated before quitting time, but let’s say for this example that 90 of them lived to see another day. The survivability rate on Day 0 would be 90%. As time passes and vulnerabilities continued to be killed remediated, that proportion will drop. Of course, subsequent scans/tests will identify new vulnerabilities and so there’s a constant give and take over time. But survival analysis is mainly concerned with time-to-event measurements, and so the “survivability clock” for all vulnerabilities starts ticking down at Day 0, regardless of the actual date they were first observed.
Okay, okay—enough of that, already. Let’s fast-forward through the (rather complex) process of survival analysis and look at the results. The chart below is an aggregate view of survival rates for 190 million CVEs observed across a sample of 12 organizations.
Notice how the aggregate survival rate (the blue line) drops rather quickly during the first month after Day 0. A little less than a third of CVEs are killed off (remediated) in that timeframe. But it takes another two months to cross the 50% survival rate and another 3 months after that (180 days from discovery) to reduce the population of live/unremediated CVEs to 1 in 3. About 18% survive a year or more in the enterprise environments we sampled.
Keep in mind the survival rate in the plot above is shown in the aggregate. Just like nobody has exactly 3.1 children (US average), no single organization will follow this line exactly. But it’s a great reference point for maturing the way we look at vulnerability resiliency and remediation.
Where are we going with this?
Obvious questions arise from these results. Is there wide variation across organizations? Do some organizations start out strong and later slow down? Do others show the opposite behavior? What circumstances and characteristics influence survival rates in individual organizations? How can we help make sure the most risky vulns are killed off quickly?
We know you have these questions because we have them too. And we’re actively working to answer them along with other important questions relevant to how our customers prioritize remediation efforts. If you’re interested in what we’re learning on that front, be on the lookout for our upcoming report in collaboration with the Cyentia Institute, where you’ll see more survival analysis along with a lot of other super interesting research into vulnerability risk and remediation!
- If you’re wondering why this number is more than 1000X higher than the total number of published CVEs, it’s because of duplication across assets. The same CVE may exists on tens of thousands of desktops in an organization, and the lifecycle of each must be tracked individually.