Choosing the Right Metrics for AI Success

With creeping automation continuing to advance across industries, enterprise is seeing ever-heavier reliance on AI transformation for running business processes. But inevitably, those in charge of AI implementation for workflows will require some way of measuring the success of said implementations. Access to accurate real status of processes, activities, and routines is key.

If you can't measure it you can't improve it

The obvious reason for developing AI success metrics was articulated by the illustrious Lord Kelvin over a century ago: "If you can't measure it, you can't improve it". If the purpose of AI implementation is to improve work processes, then it is paramount to know just how much these improvements are affecting a company's bottom line, and if not, why (first of all) and then, how to fix that.

Enterprise already utilizes all sorts of metrics to quantify results, income, so evaluating the impact of AI on business operations is a logical next step. Metrics like this require clear criteria to measure performance in a way which would allow project owners to accurately determine whether projects advance in the right direction or not, in order to make adjustments. AI projects should also be testable using real data on real projects.

But developing metrics for measuring AI success can be tricky. There are two main reasons for this:

The first is that deep learning models require convex loss functions. Only with a convex loss function gradient descent can converge to the global minimum, avoiding local minima. However, such loss functions are not necessarily in line with operations. Thus, performance metrics must be determined by the business owners rather than by a convex model.

This also leads into the second reason why measuring AI success can be tricky. A single business problem will typically be broken down into several models, each one of which requires being measured independently, thus multiplying the layers of complexity for measuring AI success criteria.

Foxes guarding the hen house

These added complexities often make it tempting for businesses to simply delegate the task of establishing AI success metrics to data scientists - often the very same data scientists who produce the models in the first place. Such a situation creates an inherent, if not obviously, conflict of interest with the additional detrimental side-effect of encouraging data scientists to get to work inventing new metrics on a weekly basis rather than focus on the more crucial task of advancing the AI itself.

That is because, while data scientists understand how the models work, they are not privy to key information related to business operations, such as current and future possible workflows. On the other hand, enterprise, creates said workflows in function of business trends.

Cognaize's solution: merging business with data science

Understanding the specialization of tasks between those on the enterprise side who, obviously, set business criteria and the data scientists, who develop the models based on those criteria, the solution that Cognaize proposes is to develop metrics for determining AI success in a collaborative environment. In particular, metrics would be decided upon by the business side and then signed off on by data scientists. For example, for tables, cell-based content metrics are critical for the business side, while Intersection over Union (IoU) is what data scientists might use.

Another critical aspect is not only testing against an immutable golden dataset but also ongoing monitoring of metrics on real data. Perhaps the most crucial benefit is Cognaize’s ability to automatically monitor metrics and time spent on validation during production. This combination makes it much easier for operations to determine the current AI success and suggest steps for further improvement.