What Data Scientists Wish Managers Knew

Published in

Machine Learning in Practice

8 min readJan 23, 2020

Good managers are empathetic to their employees, and good managers of machine learning projects understand the perspective of data scientists. Bridging the divide between technical and non-technical team members is an ongoing challenge in business, in part because the language each speaks can be so different. This article will summarize a number of important themes heard from data scientists[1] about the common misconceptions they felt executives have about machine learning.

Not Every Business Process Is Right for AI

With the rise of machine learning, it is common for organizations to think of any problem as a nail to hit with a machine learning hammer. In some cases, managers get over-eager to apply machine learning when simpler techniques may work just as well if not better.

To find the right fit for AI and ML, focus on solutions to business processes that are:

Labor Intensive. They require a lot of people to do simple, repetitive work.

Well Defined. AI does well with games like chess and Go, which have well-defined rules that confine player activity. Well-defined environments are AI’s sweet spot. Processes that require abstraction and high levels of reasoning are more challenging. An AI judge of criminal law, for instance, would need to not only know all the legal precedents, but make decisions in many legal gray areas in which highly intelligent humans passionately disagree. Training such a system and measuring its success would be a highly subjective process with no universally acceptable definition of human-level accuracy.

(Relatively) Relaxed. Getting a prediction accuracy of 80 or even 90 percent is often feasible. Getting to 99 percent, or 99.9 percent is very hard. You want a situation where 85/90 percent is good enough.

If you can accomplish your goal by building a simple set of rules, do so. It’s not wise to use machine learning simply for the sake of feeling like — or telling your boss — that you’re doing it. Chasing the latest trends and buzz can unnecessarily overcomplicate a process when a simple linear model would’ve sufficed.

Say your current problem is to predict aircraft engine failure. If you know that current voltage or speed above a certain threshold should raise a warning, you can just build a traditional program. Understanding that something is failing is relatively simple. But understanding why it’s failing can be more complicated and something that even engineering experts may not fully understand. In this case you might need machine learning: you collect all of the operational data for the engines and then have data scientists use ML algorithms to try to find the patterns in data.

Executives Shouldn’t Pick Algorithms

Yes, some business executives are highly technical and are able to be hands-on. But in general, decisions about algorithmic techniques are best left to dedicated data scientists.

Business leaders should focus instead on describing the problem they’re trying to solve, the value they’re trying to achieve, and making sure that their vision is communicated well to a data science team. They should then let the data scientists use their years of training and expertise to figure out the right approach.

Nothing irks a data scientist more than for an executive to suggest a machine learning algorithm they read about over the weekend as if the data scientist has never heard of it.

Data Is Everything

Data is so critical to every AI project. But not all data is equal. There are all kinds of data; it can be biased, noisy, mislabeled, and flat-out incorrect. Discussions between managers and data scientists shouldn’t skip over fundamental questions about data quality to get to the ML.

Data scientists certainly hope that the data is ready for machine learning. They’d likely prefer to be training the model and making predictions. But a significant portion — even a vast majority — of the work involved can be the data management and wrangling that comes before ML.

If a data scientist asks a business leader, “Do you have data?”, they’ll often say “Of course — I have a lot of data.” But the question to ask is whether the data is actually accessible, sizable, understandable, usable, and maintainable.

Let’s say there’s have a large database of patient data with a variety of information about the patients including age, address, and birthday. The data scientist knows that data exists, so when they ask to get a list of patients and their ages, they think that’s an easy request.

But the data scientist may have to start with the date the patient was treated and given a patient ID. From there, the data scientist has to map this ID to another ID, then go to a third database based using the second ID to find patient’s birthdate. After doing a calculation on the dates, an age for the patient can be computed. This illustrates the underlying complexity of a task like “find patient age.” On the surface, it may seem like a simple request, but it may not always be so easy to fulfill.

Don’t Use Your Neighbor’s Garbage

There’s a common phrase about data when it comes to machine learning: garbage in, garbage out. If the data going in is bad, then the results coming out of a model will also be bad. Also, garbage data can lead data scientists down rabbit holes that are a waste of time.

A more sinister version of this is what Infinia ML’s head of data science, Ya Xue, calls your neighbor’s garbage. The data could be perfectly formatted and prepared, but if it is unrelated to the problem you are trying to solve, it’s still a waste of time. A data scientist may bang her head against a wall trying to make the data work when it is never going to happen — the data isn’t applicable to the problem at hand. Don’t just get any data set in the hopes it can help with the problem. The wrong data sets can be at best a distraction and at worst lead to false results.

There’s No One Way to Measure Success

Measuring success is business case specific. If you’re predicting failure, for instance, then your metric is the accuracy rate of how many you can get right. Data scientists and subject matter experts need to discuss metrics that make sense for them.

Success metrics can be hard to determine. How does applying a machine learning technique to the data, gaining insight, and making predictions directly relate to ROI? It may not be wise to hold yourself to a certain numeric goal. It could be success enough to get to the point of making change within your organization and getting machine learning used as part of the overall process. It will evolve from there.

Communication Problems Are Often a Matter of Language

Data scientists use a unique set of terms, and communication is very important. This problem is not unique to machine learning and data scientists — there is often a communication gap between technical people and business people.

Data scientists shouldn’t hope business people will work hard to understand what they’re saying. They should do a bit of education to try to bridge the gap themselves. Ya Xue likes to give a Machine Learning 101 tutorial every time she starts working with a new client to at least solve the language description problem.

Terminology is critical. When data scientists talk to business leaders, it can sound like they’re saying different things because they’re using different terminology when in fact they are actually talking about the same thing. Getting on the same page can save countless hours down the road.

Leadership Support Is Essential

We are still in the early days of machine learning. There is a significant backlog of opportunities within just about every company. That said, many of the most valuable projects may be very bespoke and new, not only to the company, but in the broader industry. In some cases, it takes a leap of faith and investment to pursue these projects. That’s why it’s important to have leadership support from the top.

In some cases, it may take months and even years to see a machine learning project all the way through to its conclusion. Without the necessary executive support, it could be tempting to stop the project midstream or lose focus and move on to something else. Also, it is very common for ML projects to be multi-department or require collaboration from multiple groups. Without support from the top, it may be difficult to get the buy-in necessary from each of these groups. An ML project may not work as a skunkworks effort or from the bottom-up.

Think Carefully about How ML Will Change Your Business

Even though it first emerged over 50 years ago, machine learning is a relatively new technology in the business world. When it gets plugged into a system, it can bring a big change to your current business process. For example, you may need to have a new user interface and a new quality assurance testing routine.

When some business leaders think, “I should use ML,” they aren’t always thinking about the future. The first question is if ML can succeed. But if it can succeed, the next question is how that will change the current business process. Many leaders don’t think this through. Imagine the ML project goes extremely well. What changes will that cause in the organization? Don’t let a project’s success be a surprise.

Machine Learning Takes Patience

AI is like any other kind of research: it requires time, effort, and investment. It can’t just happen overnight. You see some business leaders say, “We are going to do ML to transform our business” Then they may have one or two data scientists use an off-the-shelf tool, give it a quick try, and say “Oh it doesn’t work. We are done with ML.” That’s the wrong approach. Machine learning requires some patience.

Most people don’t understand the iterative nature of a data project and the process of developing a model. Business leaders may be more familiar with the Six Sigma DMAIC process: define, measure, analyze, improve, control. A data project is a faster iteration of the first four steps of this cycle.

Data scientists could go through 10, even 20 iterations to define the problem, measure, try some algorithms, look at the results, measure results, then try to analyze the data and everything else to find the reason to find why is it working or not, then go back to modify the algorithm. These iterations require patience. Give your data scientists the opportunity to cross the finish line. Maybe the first attempt won’t work, but a couple iterations later could result in success. Moreover, expect projects to take longer than you think.

You Are Just Starting When You Get to Production

Even when data scientists build a model, put it into production and deploy it, that’s not the end. Businesses need to monitor the model over time to ensure its health stays at an efficient level. Model accuracy is going to decay over time and predictions are going to get worse and worse as behaviors change and the environment changes. The data that the model is built from may no longer be representative of the current situation or the current operating environment.

Summary

It’s important to walk a mile in the shoes of data scientists. Understanding what challenges they face as well as knowing what they wish business leaders understood can ultimately make your ML projects go smoother.

Trust your data scientists. They can do great things with your support.

James Kotecki is the Director of Marketing & Communications at Infinia ML, a team of data scientists, engineers, and business experts putting machine learning to work.

[1] Much of this chapter is based on ideas from the following panel “What Data Scientists Wish Executives Understood” with Ya Xue and Brett Wujek, moderated by Michael Eagan. Rethinc. Machine Learning Symposium, November 30, 2018, Chapel Hill, NC.