Those of you going on holiday this week may have found the process a little less straight forward than usual, especially if you were travelling by air from Gatwick Airport on Monday 20th August 2018. Rather unfortunately for many travellers, the flight information screens at Gatwick failed requiring staff to resort to using white boards to keep passengers up to date with flight departures and arrivals.
On the face of it, especially given the twitter frenzy that subsequently broke out, this might seem like a major fail for the airport and the IT infrastructure that supports it. But, really just how critical for business was this issue and was the contingency planning and response appropriate given the public nature of the service?
When looking at services and considering them in the context of the criticality to the business you need to consider the risk, impact, possible mitigation and ultimately the cost of mitigated or unmitigated impact. In this case Gatwick could have quite easily mitigated this failure by having a redundant link to their flight information service provider. Certainly, in the University of Brighton case we have provisioned resilient links to all our sites and always connect to different exchanges to ensure that we are resilient against our downline supplier equipment failing. However, for a university internet connectivity is business critical when the core business is teaching, learning, and research which depends so much on access to information and data. For an airport, this may not be true so long as flight control systems and air traffic management can continue. But does that always apply?
Essentially it all comes down to risk and what value you place on it. When considering how much to spend on mitigation of risk, one must consider the probability of the risk and the impact if it came true and then weigh that against the cost. In the case of Gatwick they appear to have concluded that a suitable mitigation using white boards was cost effective and maintained core business without the need to invest in additional resilient data connections. Whilst travellers may be dissatisfied, departures and arrivals still took place on time and the core business of the airport is largely unaffected. In this case the damage to reputation may have a cost which must also be considered, but this could be short lived providing the mitigation is effective.
So, in conclusion, when looking at business critical services in your IT estate you need to consider the value of the service to the business and also have agreed service levels. Assess the risk and impact of the failure of the services individually, not forgetting any underpinning infrastructure, and devise a mitigation that meets the needs of the business and at a cost appropriate to the level of the risk.
The one final question for Gatwick, and anyone else considering Business Critical services, is do you have a sufficiently good understanding of the risk in the first place? This is not the first public failure of infrastructure at Gatwick, and the last time it happened it disrupted core functions including arrivals and departures in the peak Christmas holiday season.