Rivian Failed OTA - A Story on How to NOT do Major Incident Management

Rivian's latest 2023.42.0 OTA release fails to load on production fleet vehicles, due to a "fat fingering" in the release process leading to hundreds (?) of vehicles in a semi-broken state. Rivian quiet on incident communications and informing owners a consistent message of support and transparency.

Rivian Failed OTA - A Story on How to NOT do Major Incident Management
Apollo 13 - Failure is Not An Option (example of exemplar Incident Response Coordination)
📣
[Update 11/14 1pm Eastern] ~12hr after the update from Wassym was posted on Reddit emails went out to identified and impacted vehicle owners (example below).

[Update 11/14 ~3pm Eastern] Similar or exact text was seen by owners on their registered SMS/TXT phone number.

[Update 11/14 ~4pm Eastern] Jose, the contributor behind RivianTrackr/@RivianSoftware posted a timeline on their Wordpress Site https://rivian.software/2023-42-timeline/ check out future incident timelines at this location.
Example email template sent to impacted owners.
There is no excuse for this failed OTA or the failed process that led to it causing the fall out. This blog isn't trying to do that either, I'm advocating on behalf of owners that are impacted that there should be clearer communication coming from Rivian related to the actual user impact (are vehicles safe to drive?) and the scale of impact to the fleet (numbers we know are accessible to them).

If you are an EV owner... you likely have heard that at 7pm EST November 13th there was a relatively large scale (immediate known impact isn't public knowledge) that Rivian's November 2023 update (2023.42.0) resulted in "bricked" or "semi-bricked" vehicles around the world after the OTA failed to update and install on the various subsystems in the vehicle.

Those that haven't been following along, below are some public sources with information related to what happened, how Rivian responded and what owners are reporting themselves on the situation (in real time in some cases).

Popular @RivianTrackr posted near immediately the state of the Driver and Infotainment Screen to Twitter (I'm not calling it X or ex... just not doing it!)
2023.42 OTA Update Issue
by u/WassymRivian in Rivian
/u/WassymRivian (Rivian's VP of Software, Wassym Bensaid) posted publicly to Reddit a message about the failed update as well as a potential cause they have identified.
Snapshot of public messaging as of 11/14 12:00pm (no update since 10:45pm evening before)

Failure of Incident Management

The first and foremost thing I want to say today is that this incident is a direct example of a failed public handling of a major incident (if this classifies as one, and I believe it does - though don't know the internal classifications Rivian's incident response team is using to determine this). Below I'm going to highlight a few of the failures in this event so far (things are still unfolding as I type though).

  1. No immediate messaging to impacted owners perfered communication method (eg. phone, email, text)
  2. Public messaging to Reddit (a subset of the owner community - should have been posted to a Rivian owned news or communication sourse)
  3. For those impacted Support teams were overwhelmed and out of the loop as to what was going on - leading to more confused owner experiences and venting to their communities
  4. Inconsistent Support messaging on reliability of failure system engaged - some owners got messaging indicating "not to drive your vehicle" and others received messages saying their "vehicle is driveable".
  5. Lack of public communication on user/fleet impact - percentage of vehicles, number of vehicles, number of owners, etc.

As many passionate owners have mentioned in various forms this style of communication and owner engagement is unfortunately par for the course for Rivian. As a fellow owner, and software engineer I hope and pray that Rivian is listening and has the guts and drive to be better in this department and really leans in to determine what all processes and protocols need to change to give their community the relationship and communication they deserve in times like this.

How a number of Rivian owners feel right now (how many though... unknown)

Software failures will and do happen.

How an organization responses to failure and communicates around the resolution process of that failure can make or break them. Attempting to build and retain trust through openness and transparency can be painful. Showing, accepting and talking openly about our faults is painful, not intuitive and too often times in society frowned upon as a symbol of weakness. However, without communication and being open and honest I've learned through marriage that this can do the opposite... it can lead to walls being built up, isolation, fear, assumptions, loss of trust, and ultimately a broken un-repairable connection to another individual (or thousands, millions even).

As a trained Major Incident Manager for a Cloud Software company as part of my day job, the way in which this incident has been managed is not up to par. Call it growing pains, call it start up life, call it what you want... but the process here is broken and needs to be fixed sooner rather than later. Ideally before the next wide scale (was this wide scale? we don't know and that adds to the noise here). Rivian does know the impact of this and should (good or bad as it may be) needs to own that story cause if they won't owners, media and the public will as they often do assume the worst which is never a good thing. Rivian does know definitely each and every impacted vehicle by VIN/VechicleId (as ElectaFi proves through their insight using Rivian's own Cloud API). For ElectraFi there are 31 of the 263 contributing vehicles in the fleet (production and pre-production included), obviously (or maybe not so) this is going to be an infated number as I would assume the users that have and know about ElectraFi are more prone to install available software earlier in the release pipe (maybe not anymore?).

ElectraFi.com Firmware Tracker
ElectraFi Rivian User Fleet Firmware Visibility Tracker

Bad production fleet release occurred on November 13th, so installations tracked by ElectraFi prior to 11-13 are "internal beta" fleet vehicles that have opted into ElectraFi and/or under-NDA content writers that have a "press pre-production" release made available to them.

If these above numbers are to be believed to be true and extrapulated across the production fleet than the worse case is 10-15% of vehicles may have been impacted. I have information to believe it is drastically less than that, but still wider scale than a normal OTA day one release would have been in Rivian's past 1+ years of consumer vehicle delivers and 18+ OTA releases.

Silver Linings

I don't want to make excuses, but there are some silver linings in this production fleet incident that we only know learned about. Rivian Engineering may have known this themselves, but there is nothing in the public space that confirmed these findings until this event happened last night and owners around the country were using Twitter, Reddit, Discord, Facebook, Rivian Owner Forums and Signal to communicate with each other to determine what systems worked and which systems didn't work as well as some creative workarounds.

For context, my wife and I received our OTA notifications and being one that likes to beta test (or nightly test) software for the "greater good" (to my wife's displeasure) the OTA rollout campaign was pulled/halted before I could tap the Install Update button for my 2022 Rivian Blue R1T.

Rollout Campaign was Cancelled/Halted Mid-Flight

Many owners like ourselves received notifications after our vehicles phoned home and downloaded the ~3GB update file last night (around 6-7pm EST on a Monday). To my memory Rivian has never released an OTA on a Monday, usually Thursday/Friday. Similarly the optics from the outside is that this first released cohort was larger than typically done in the past (though that is my perception and not based in any data).

Sully (my Rivian Blue R1T) Network Usage During Download
Download Progress of 2023.42.0 OTA Release
Status "Ready to Install" Never Triggered - After Downloaded Release Campaign Reverted

Thankfully many (?) owners received these "ready to install" mobile notifications only to learn that there isn't any button or ability to install the update cause their vehicles reverted their "Next OTA Version" back to the current production release of 2023.38.0 which was released in October.

Driver's Screen - Critical Functions Only

The driver's screen being a different control module than the infotainment screen (never really confirmed, but now it seems to be) is functioning AND even shows your speed and selected gear (ex. Drive, Reverse, Neural, Hold). Things that aren't functioning on this screen are the left driver information blocks and the realtime augmented reality / Unreal Engine backed driving visualizations.

Driver's display showing drive mode, estimated miles and SoC (state of charge), gear selection as well as current Speed.

Camera Displays - Rear/Reverse and Front/Forward Defaults

Though the main infotainment display is not functioning in this state the reverse cameras do still automatically appear as many owners confirmed and posted to their online communities.

Rear camera display turns on automatically on center screen in Reverse
Shifting from Reverse to Drive toggles the front/drive camera.

HVAC - Remote Controlled Only (not while driving)

The first fear everyone had was "will my truck be hot / cold as HVAC controls are on the main screen". This is a legitament fear, as some have found out you can push remote controls for HVAC from the Rivian official mobile app. However, this only works for "pre-conditioning" and setting the default climate preferences for the interior cabin and seats... doesn't allow dynamic changes while the vehicle is in motion. To change temps, or seat controls one would need to pull over safely and modify the controls manually in the mobile application for the time being.

Media Coverage

Rivian 2023.42 Software Update Soft-Bricks EVs, Some Might Need Physical Repairs
Rivian tested the 2023.42 software update internally and considered it ready for prime time. However, it sent an incorrect security certificate when it d...
Rivian software update bricks infotainment system, fix not obvious
On Monday, Rivian released an incremental software update 2023.42 that bricked the infotainment system in R1Ses and R1Ts. The company...
Latest Over The Air Update Can Brick Some Rivian Models Requiring Repairs - AutoSpies Auto News
Latest Over The Air Update Can Brick Some Rivian Models Requiring Repairs
New Title: Rivian’s Software Update Glitch Leaves Customers in the Dark
New Title: Rivian’s Software Update Glitch Leaves Customers in the Dark - OPP.Today

and counting...