A firmware bug that caused a storage array to fail was the root cause of the outage that downed many critical Internal Revenue Service systems for almost half of Tax Day 2018, according to a new report.
While noting that the IRS’s “substantial efforts” allowed it to resume processing tax returns after 11 hours, the report from the Treasury Inspector General said that the service had missed opportunities to
Firmware is the basic software that allows a piece of hardware to operate. IBM had identified the particular bug at fault in the IRS outage as far back as June 2017, and had developed a fix and released it with its November 2017 “microcode bundle.”
However, a contractor recommended keeping an older microcode bundle “because it was considered more stable,” TIGTA reported – and the IRS agreed with the recommendation. While IBM wrote a script for another client that suffered an outage from the bug in January 2018, the company “did not provide the IRS with any details regarding the other client outage or the availability of a script that would have prevented the Tax Day outage,” according to TIGTA.
The report identified a number of issues relating to both the outage and its aftermath:
- While not particularly old, the IRS’s current “Tier 1” storage environment has no automatic failover or built-in redundancies;
- Meetings to review monthly microcode bundles are not minuted and the decisions taken in them are not documented;
- The IRS storage services contractor failed to meet a number of service-level objectives on April 17, and should page damages to the government; and,
- In the IRS’s “lessons learned” process, there is no overall strategy to consolidate the resulting action steps;
TIGTA recommended formalizing and documenting both its intended response to the outage, and its ongoing monthly microcode bundle meetings, and that it modify its contract with storage services provider and seek damages.
The IRS agreed with all the recommendations, and has begun implementing some of them.