Riju not available
Incident Report for Radian LLC
Postmortem

This issue was caused by the removal of --graph in https://docs.docker.com/engine/release-notes/23.0/#2300. Unfortunately it does not appear that the existing use of the deprecated option resulted in any warnings, thus this was not noticed.

More information at https://docs.docker.com/engine/deprecated/#-g-and---graph-flags-on-dockerd.

Obviously, this only resulted in an outage because maintenance was performed directly on the production instance. The inability to deploy changes properly is a result of https://github.com/radian-software/riju/issues/168 and will be resolved by migrating to Kubernetes as outlined in that issue.

A fix has been retroactively pushed into version control at https://github.com/radian-software/riju/commit/23fc1617f4e8a3f5969c5aa0853e65b36a2e358d.

Posted Mar 27, 2023 - 18:02 PDT

Resolved
We identified that the fix to the Docker systemd file was reverted by the riju-init-volume script which modifies this unit file at startup. After updating the script to use the correct argument the problem is resolved and Riju is back up.
Posted Mar 27, 2023 - 17:59 PDT
Update
We identified that the Docker upgrade resulted in the '-g' command-line option no longer being accepted by the Docker daemon, leading to it being unable to start up since Riju uses this argument to ensure that Docker uses the attached EBS volume for data storage. This issue has been addressed but Riju is still down.
Posted Mar 27, 2023 - 17:56 PDT
Investigating
A routine system package upgrade on the production server appears to have caused downtime. Riju user interface is not available.
Posted Mar 27, 2023 - 17:52 PDT
This incident affected: Riju (Riju - Web interface, Riju - API).