Tuesday, 7 May 2013

De-risking Financial Systems - Through Knowledge And Experience


It takes a great degree of expertise to understand the complex business flows and the logic of numerous business decisions that are implemented across financial institutions. This is taken as a given but, with the increase in internet facing and mobile based systems, the requirements to change a financial institution’s core systems are increasing. Changing such systems with zero defects and high availability is indeed a big challenge.
Greater is the challenge for IT departments to sync with the rate of change in technology. This requires the recruitment of talented and experienced technicians, while also ensuring that there is enough expertise to ensure that the changes implemented match requisite business needs.
These issues were well illustrated last summer when the American investment firm ‘Knight Capital’ lost over $450 million in trying to keep up with changes in high frequency trading. The issue they faced involved trading 150 stocks during a 45 minute period. The stocks were being ordered as buy-high and sell-low when in fact it was meant to be the other way around. As a result ‘Knight Capital’ saw a 75 percent loss of the share value during the 48 hours that followed, forcing the firm to seek emergency funding. Ten years earlier, Knight Capital experienced a similar fault with their trading systems.

Fortunately, on that occasion, the speed of the processes that were creating losses was not so high and that enabled the regulators to benignly cancel all the trades impacted by such errors. This time, however, the regulators chose not to follow that decision because they viewed the error as being an example of incompetence. Being a recurrence of an earlier fault, how come Knight Capital didn’t have it battened down and sorted out? The answer is complex and lies with the business knowledge required to minimize systems risks, as much as the programming knowledge needed to implement high frequency algorithms.

 In this case it was clear that the system teams did not recognize the impact the issue would have on the business – they misjudged, both, the impact (in terms of losses) and the response under a changed regulatory environment. Systems are only as good as the people who program them and a majority of people who program them are often technicians. The technicians will work to business specifications and interpret them into programmable workflows and processes. They are susceptible to interpreting a business logic based on an incomplete or incorrect specification. Business knowledge is crucial to verify the specifications and possible system defects due to incompleteness or errors. 

Furthermore, it can help anticipate likely operational risks that identified-defects can cause – i.e. the business criticality of defects – that requires attention and urgency. Therefore, domain-aware team members are required to visualize all the likely failure scenarios and prioritize them by likely business risks so that those with most impact can be addressed immediately, with required resources. Our experience over the years in testing financial systems suggests that by getting business specifications validated earlier by domain experts the likelihood of critical flaws creeping into the systems at the time of ‘Going-live’ is significantly reduced. Even with the need for a domain aware unit within the project team being recognized, an important question still remains - where should such expertise reside and how can this unit be optimally built and utilized within a project team.

There are few options that we could evaluate:  

a) Expand the Business Analyst team – while some members write business specifications, the rest will verify. How is one to decide, who will do what? Even though the expanded BA team allows for domain experts verifying the specification – it could compromise the key requirement of verification-skills and the need for them to be operating independently.

  b)Create a separate unit of ‘Business Specification’ verifiers: While this unit can be created with domain experts who have the required verification skills, they would be one more entity to be managed – leading to a need for greater coordination, further splits in responsibilities and a strong likelihood of the an overall increase in effort. It is certainly not a cost effective option, even if it meets the need for independence.

 c) Early involvement of a domain-aware testing team: The team (as a unit) would need to get involved at the business-specifications stage itself. This is feasible only if the team has the necessary domain expertise to verify specifications. This could be optimal as it allows for better streamlined coordination across the project - compared to the other two options and makes the testing team responsible for quality throughout the lifecycle thereby strengthening project governance.

 It allows for greater re-use of scripts across various stages of testing. All in all organizations can optimize their testing costs by as much as 40%. With financial authorities seriously reviewing the plan to introduce “capital requirements” in banks to cover operational risks (in addition to those for trading and credit risks), the impact of systemic issues will no more stay within the IT domain, but extend to a financial institution’s business model. In conclusion, the greatest challenge for any firm is to get the business and technology arms of the organisation working in unison; factoring in the geographic spread and frequency of technology updates. The testing process is the key to de-risking system changes. 

It is the one area that the business and technology teams have to get right by making testing a continuum and not just as a passing phase or a one-shot activity. With the likely tightening of regulatory requirements to manage operational risks, ensuring that systems go live first time right without causing any disruption is not only a CIO responsibility, but a matter for the Board. With a domain-ware testing team involved from start, organizations can drastically reduce their “cost per defect” and significantly reduce the operational risks caused by system failures.

No comments:

Post a Comment