As a tech team, the number of events you end up attending annually can easily morph into a blur of brunches, happy hours and schwag. With that in mind, on the morning of March 28, we hauled ourselves to Workbench, a NYC-based VC fund with a pretty neat – and definitely roomier than expected – venue close to Midtown Manhattan.
From the getgo, however, it became clear that this wouldn't be your run of the mill event. For one, the crowd seemed genuinely engaged and not necessarily as a networking exercise… the room had a studious vibe.
Chaos engineering day 2019 – or #chaosday19 if you’re keen to catch up on the social blow-by-blow – is the fourth of its kind, organized by Casey Rosenthal who's been a member of the chaos community pretty much since its inception.
Casey – who literally co-wrote the book on chaos engineering – cut his teeth as the engineering manager at Netflix, where he managed the traffic and chaos teams; the latter he also helped build. If you’re keen, punch play and have a listen to this interview he did with DevSecOpsdays:
In his own words, “chaos engineering day originally brought people together who helped shape how people run code in production,” but after v4.0 the growing community seems perfectly placed to shape the future of how systems are deployed at scale.
2015
Four years ago the challenge for managers looking to build chaos engineering teams was that, well… chaos engineering didn’t exist – not in its current form anyway. Therein, however, rested an opportunity and so the very fist chaos engineering community day was born; yes, as a means of bringing like-minded people together, but there was a pragmatic purpose as well: to recruit folks who were closely-skilled enough to tackle the tasks at hand.
2019
Fast-forward to 2019 and we’re a far cry from the initial state. #chaosday19 pulled in 150 people with expertise ranging from security to traffic management. Kicking off at 9am, this year’s lineup included key members of the chaos community who freely shared key insights into their slice of the industry. The super talented Denise Yu (Twitter @deniseyu21) sketched out the highlights of their presentations and we’ve added a few curated social media posts for posterity.
8 Traps of Chaos Engineering
You can't have a prescriptive approach to the unpredictable.
Last week, I gave a talk at #chaosday19 on "Chaos Engineering Traps" I see out in the wild. I shared the slides and contents of the talk here: https://t.co/zs8QAyXgmy
— Nora Jones (@nora_js) April 6, 2019
"you can't automate yourself out of the relationship building needed for chaos engineering" @nora_js #chaosday19
— Jessica DeVita (@UberGeekGirl) March 28, 2019
#chaosday19 @gen_nja is getting started with his presentation “Black tie Chaos: Failing formally” pic.twitter.com/e6eaAu9i87
— Tom Leaman (@tleam) March 28, 2019
Black Tie Chaos: Failing Formally
Nathan spoke about the application of chaos engineering to safety-critical systems like autonomous vehicles. A key take-home was that, as we continually operate systems at greater and varying depths, the growing number of professionals involved in creating a single system ultimately adds to its complexity.
"There’s no value in just running some stuff and seeing it’s on fire. You need to plan."
#chaosday19 @gen_nja : "We're starting to see the unfortunate transition in our systems where we're bolting on ever increasing layers of complexity and abstraction onto our manual systems" [ed: direct reference to the 737Max issues recently]
— Tom Leaman (@tleam) March 28, 2019
Closing the Loop on Chaos with Observability<
Distributed systems are particularly hostile to being cloned or imitated
@mipsytipsy on being ready for Chaos Testing. "You are ready to bring in chaos when every time you're paged, it's for something you don't know how to fix. If you know how to fix the things causing problems, fix those first. Then do Chaos Testing." #chaosday19 #chaosengineering
— Wesley Reisz (@wesreisz) March 28, 2019
"It was always a lie that we could predict what our systems will do." @mipsytipsy on testing in production #chaosday19 #chaosengineering #observability
— Wesley Reisz (@wesreisz) March 28, 2019
Chaos Testing & Security
People operate differently when they expect things to fail.
There is more and more talk about #chaosengineering. By having a look at how active @serverlesschaos has been since January it's clear that the bot is retweeting more and more. It's also quite easy to see what week #srecon and #chaosday19 was. 💥 pic.twitter.com/gWrWSV11yU
— Gunnar Grosch (@GunnarGrosch) April 12, 2019
Reconciling & Chaos
Reconciling is the new regression testing... but better,
👏👏👏 Really great talk about reconciling chaos with @kubernetesio control loops by continuous auditing @krisnova of @VMware at Chaos Community Day v4 #ChaosEngineering pic.twitter.com/nIcH1P6T0t
— Joyce Lin (@PetuniaGray) March 28, 2019
@krisnova on Reconciling Chaos. Chaos is complete disorder and confusion. Chaos is human. Enter https://t.co/SCK3q9GTjq.#chaosday19 pic.twitter.com/UjhCRzxWcp
— Gunnar Grosch (@GunnarGrosch) March 28, 2019
#ChaosDay19 talk by Narendra Nalabothula, member of Capital One’s central chaos team. Exploring Chaos in a financial firm. pic.twitter.com/R6bEongy5W
— Tom Leaman (@tleam) March 28, 2019
Chaos Engineering at CapitalOne
Look at failure as a certainty... and an opportunity.
Implementing chaos engineering for financial institutions comes comes with a unique set of challenges and, according to Narendra, resiliency and avoiding single points of failure go hand in hand.
Padma Gopalan - DiRT at Scale
Failure isn't just the loss of computing.
#chaosday19 : How do we test? Measure everything! Real outages including triggers, Incident Management signals and indicators [ed: would love to understand what measures are being utilized for Incident Management are we talking about things like TT-Respond, TT-Communicate etc?]
— Tom Leaman (@tleam) March 28, 2019
Ashutosh Raina - Madaari for the Monkeys
Failure is inevitable, let's fail in a controlled environment.
Chaos engineering isn’t about one person laboring through complex tests. Nothing grows in a vacuum and chaos engineering is no different, you’re only as good as the team you’re able to build.
As security remains top of mind, a major theme emerging from this year’s event is that security isn’t something to be plastered on afterward, it’s an essential part of the process. As Norah Jones said,
Creating the chaos is easy, thinking about safety is hard.
Finally, as chaos engineering continues to grow in prevalence across major engineering teams, it’s important to understand that no engineering process – chaos or otherwise – happens without planning. Even chaos needs some orchestration and it’s when the world ISN’T on fire that you’re able to experiment.
2020
Seeing the level of discussion both during and in-between presentations it became clear that folks who took the time to attend the 2019 event were genuinely invested in chaos engineering as a discipline; and our team, for one, can’t wait to see what next year’s event brings to the podium.