Friday the 19th was a challenging day for many organizations, teams, and individuals worldwide. We can all agree that it was stressful, and the impacts are still being assessed. It will take some time for many organizations to fully understand the ramifications of what has been called the single largest IT outage in history. On a personal note, my own family was impacted, which brought the matter home for me and many others. The outage resulted from a content update pushed by CrowdStrike to its global customer base. The systems impacted by this content update were those running the Microsoft Windows operating system. The organization has clearly acknowledged that neither Apple Mac nor Linux hosts were affected, and this was not the result of a cyberattack – a fact that led many to feel relieved, albeit for a brief time. Industry veteran and CrowdStrike CEO, George Kurtz, stated on the official CrowdStrike blog and through the media that, “The issue has been identified, isolated, and a fix has been deployed.”
What We Should Focus on and Take Away from This Event
In my opinion, what we as an industry and organization should focus on as a result of this unfortunate situation boils down to the following:
- Producing high-quality and consistent code – in addition to the products that leverage said code is difficult and warrants an unrelenting commitment to quality, process, procedure, and thoroughness (see SDLC+Security, The Rugged Software Movement, and many other sources on this topic). CrowdStrike and in this case Microsoft, are no strangers to producing high-quality, effective, and globally adopted and trusted products and my belief is this unfortunate event will further strengthen their resolve to delivering high-quality, and performant technology as they move on from this over time.
- Resilience and fault tolerance in application, small systems, and Internetworking Infrastructure should be factored into design by vendors and consumers alike as we place high degrees of trust in these things on the vendor/provider side of the equation and the consumer (in this case business consumers and beyond). Looking at this from an IT/Internetworking perspective it is clear to me that though there is merit in the cloud and its use as an infrastructure option and alternative, it is just one such construct and should be scrutinized and reconsidered as the net effect observed in this incident was exacerbated by the use of cloud and its impact on systems in cloud and on premises, the latter of which required a significant amount of human intervention.
- Resilience and fault tolerance in design influence and underscore confidence in business continuity planning and disaster recovery (BCP/DR). Historically, BCP/DR has been looked at as an adjunct element of cybersecurity (see the CIA model and domains associated with the ISC2 CISSP for more detail among many other sources). The reality is that a lapse in BCP/DR and subsequent availability can and may have an impact (time will tell and this will remain to be seen) risk posture and attack surface exposure and management, respectively.
- In-depth defense is still – despite many novel and noteworthy attempts by marketing teams through the years to message to the contrary, a terribly important aspect of building solid security programs, managing those programs, in addition to designing, architecting, and managing highly secure networks that continue to deliver visibility, and cognizance of the state of the network (and those assets associated and attached to it). Thus, the risk posture and attack surface become defendable even in the wake of something being unavailable, evaded, or failing.
At NetWitness, many of us are personal friends and colleagues with individuals and teams at both CrowdStrike and Microsoft. Additionally, we are proud to share many joint customers with both organizations and will continue to do our best to be good stewards of those relationships while maintaining a dedicated stance in providing the highest quality products, services, and guidance we can to those customers, in addition to organizations that we are not yet operating within today. If you or your organization have been impacted by the events associated with this recent content push by CrowdStrike and would like to speak to anyone here at NetWitness in regard to what you can do beyond the measures laid out by CrowdStrike to date to ensure your organization has the most optimal and comprehensive visibility and network detection and response at your disposal, please do not hesitate to contact us.
i https://www.cnbc.com/2024/07/19/latest-live-updates-on-a-major-it-outage-spreading-worldwide.html
i https://www.tomsguide.com/news/live/microsoft-worldwide-outage-live
i https://www.yahoo.com/news/microsoft-outage-live-crowdstrike-boss-170429333.html
ii https://www.crowdstrike.com/blog/our-statement-on-todays-outage/
ii https://x.com/george_kurtz/status/1814235001745027317?s=46
ii https://www.crowdstrike.com/blog/statement-on-falcon-content-update-for-windows-hosts/