On 2021-08-11 from 21:48 UTC to 22:07 UTC, the Panopto EU Cloud had a full site outage. All users were unable to access their sites, including being unable to view, edit or upload new recordings.

RCA: Starting at 21:48 UTC, the Panopto EU Cloud web servers received a surge of incoming requests. Each of these requests required handling by a subservice, and thus resulted in requests being sent from the web servers to that subservice. Due to both the large quantity of requests and the processing cost of handling these requests, the subservice quickly became backlogged leading to elevated latencies in handling requests. Complicating that, a configuration flaw in the web servers made it so these long-running requests were not terminated in a timely manner, and the web servers quickly became backlogged and unable to handle general site traffic. Our operations team mitigated the issue by manually scaling servers to exceed the demands of the temporary surge of requests.

To prevent a recurrence of this issue, our cloud operations team will update the web server configurations to ensure subservice requests have appropriate timeouts.

