Spotting Red Flags at Machine Learning Project Ideation

How many times have you finished building a model only to be told that it won’t be used? How many times have you planned a machine learning project that kinda went nowhere despite everyone’s best efforts?

I’ve had a few of these frustrating experiences first hand, and this is how I learned to avoid it.

1. Having a success metric is not enough

Success metrics for a machine learning project is defined as one or several specific model performance metrics, for example: 90% precision and 90% recall. Often, stakeholders readily sign off on a success metric, but do not understand how that translates to the actual endpoint they care about: increased clickthrough rates, increased production, or client happiness. When stakeholders have not prepared for the business reality of deploying a model with 90% precision and 90% recall, your model is not getting deployed even if you achieve it. Success metrics need to be translated into actual business consequences, and this should be made clear at the beginning of a project.

It isn’t easy for data scientists to define success metrics for a project, because many parts of the business are opaque to technical teams. However, there are many tell-tale signs that there is lack of clarity or buy in on success metrics, and spotting them early on and having those conversations can steer your project away from dangerous, time-wasting waters. A definite sign of lack of clarity is when stakeholders do not have a plan for how to handle the errors of a machine learning model. Which brings me to my second point.

2. Not having a plan for handling model errors

Machine learning models are not perfect. If a stakeholder requires deterministic outputs that are always correct, a machine learning model is not the right solution. A good way to clarify this point is to ask what the plan is for handling model errors. If the response is centered around reducing errors … it would be a good time to have this conversation.

Errors are common, and they need to be handled. A simple way to handle machine learning errors is to reduce it so that the benefits of its successful outcomes outweigh the costs of its errors. For example, a recommendation system for online shopping does not need to recommend desirable items to users all the time, it is better to have some relevant recommendations than to have none at all.

For models where errors are very problematic, a human backstop is needed. If a classification model has 90% accuracy above 70% confidence, it does mean that records with lower confidences need to be manually reviewed by subject matter experts (SME’s). The model can provide its top choices to SME’s and significantly reduce review time. In cases like this, it is important that stakeholders are expecting this additional cost and are not expecting a fully automated pipeline.

In my experience, SME’s when benchmarked against each other on the same task, typically score much less than 100% agreement. The more complex the task, the lower the rate of agreeement. This is a useful exercise to conduct to illustrate why an imperfect system can still provide great business value.

With the advent of LLM’s the need for humans have reduced, but they’re still essential to model development!

3. Humans are still required with LLM’s

While LLMs have introduced new avenues for data labelling and automated model evaluation, we still need SME’s.

I’ve been in projects (pre-LLM) where non-technical stakeholders did not fully understand how much time subject matter experts had to spend labelling and evaluating data. This resulted in misalignment as there wasn’t enough budget for labelling half way through a project.

4. Fixated on the solution, losing sight on business problem

Sometimes, stakeholders will specify the type of solution to use - genAI, machine learning, etc., - instead of clearly defining the business problem.

These projects are extremely difficult to work on, because the problem is not clearly defined. It’s not clear what success metrics are, it’s not clear when the project is finally good enough to move on from. It’s hard to prevent scope creep when the goal of the project is ill-defined. It is also more difficult to establish tangible benefits from the project, and it’s at high risk of getting discontinued.

You can avoid these situations by steering conversations with non-technical stakeholders towards tangible outcomes and measurable metrics. Especially early conversations. And avoid discussions around what specific methods and models you’re using.

Conclusion

It isn’t difficult to build a model, but it’s difficult to build a model that actually solves business problems. When spotting any hints of these red flags during project ideation, it’s always worth it to ask a clarifying question. It could save you weeks to months of development time.