Considering the lessons learned from the recent CrowdStrike outage, what are some recommended changes or best practices for Business Continuity Plans (BCP) that you think could benefit other organizations?
Sort by:
We faired pretty well despite a high number of windows machines that were impacted. Having a list of top application that are most impactful to the company already available meant we could prioritize. It also meant noisy people did not get serviced just because they were noisy, and the business priorities were kept as the priority.
Furthermore everyone knowing where they were on the list helped manage the anxiety application owners would have felt knot knowing when they would get support.
One recommendation that comes to mind is having a comprehensive index or understanding of where your systems are integrated and which vendors you are using. Often, this information is not readily available. Conducting an audit of your entire infrastructure and integrating that into a governance plan for disruptive events is crucial.
Taking a step back from the specifics of the CrowdStrike incident, essentially you should identify failure nodes or single points of failure and incorporate solutions into BCPs. In this case, the failure of our EDR solution wasn't something we had anticipated. Now, we need to consider whether deploying multiple EDR vendors is worth the cost and effort to avoid a similar situation in the future. The broader question is identifying single points of failure and finding ways to architect around them in our BCPs.
Incident should serve as a wake-up call for business leaders. This wasn't a breach of a fintech or medical operation; it was CrowdStrike, a company known for its cybersecurity expertise. If such a well-funded and knowledgeable company can experience this, it underscores the need for all business leaders to ensure they have a well-tested and well-thought-out business continuity plan. This is not just an IT issue but a broader business imperative. Leaders need to invest in IT to drive significant progress and change if they don't already have robust BCPs in place.
After our comprehensive risk assessment following the outage, we realized the importance of considering the impact on the software supply chain. Mitigating these disruptions involves enhanced due diligence in vendor management, including contractual due diligence. Taking a componentized approach for analysis is also essential. For instance, understanding when open source is at play, identifying dependencies, and performing vulnerability scanning not only on our systems but also on the third-party software stacks we use. Additionally, combining this with secure software development practices, such as training staff, conducting peer reviews for code, and maintaining a 360° view, is vital. Previously, we might have assumed that software from reputable companies was automatically secure, but now we know better.