Today I’m gonna cover a field many product managers feel uncomfortable with and that’s how to work with data scientists.
I’ve been waiting to write this piece for more than a year. Actually, this is one of the first posts I wanted to write when I started this newsletter/blog, but for some weird reason – I didn’t feel ready.
I feel that now it’s the time.
I worked with data scientists before, and back then when I was still writing code I wrote myself several ‘learning’ algorithms (much before the field was formalized as ‘machine learning and data science’). Hence, I already had some solid knowledge more than a year ago – but something told me to wait. And I’m glad I did.
You see – in the last year I’ve been managing both the product and the data science team in my company. This year has taught me a lot, and I believe I’m now in a position to provide a very unique perspective on the topic that only few people in the industry can provide.
This post is addressed to product managers who have a data science team in their workplace and they are either working with this team directly or indirectly.
In this post I will guide you through what you need to know, what you don’t need to know, what to focus on and how to properly interact with such teams in order to maximize their (and yours) impact on the business.
But as always, first things first:
Why do many product managers avoid working with DS teams?
I get it – it can be scary.
Data scientists are usually very smart, well paid and possess unique knowledge that no one else has. They often speak with tons of confidence and throw academic terms at you.
For many product managers – this can be very intimidating.
In addition, many data scientists, like traditional scientists, have a strong affinity to research. They love data, love playing with data, love experimenting with their models and at the end of the day – you may be just an interference. Of course, I’m stereotyping here a bit. Not all data scientists are like that, though from my personal experience – many are.
By itself – there is nothing wrong with someone who finds joy in what they do and have a strong passion about it. This is actually a good thing.
The problem may arise if the obsession to do ‘proper research’ or ‘build a proper academically-based pipeline’ starts conflicting with the business goals of the company, and precious resources and time are getting wasted.
This is where you should step in. But as I hinted above – on many occasions your involvement is not really welcomed or not well understood. As a result, when you are looking to get some answers about why a delivery is being so late, you may receive a bunch of very well phrased explanations, full of academic terms, as for why their model is still not delivering the expected results, and why it’s perfectly ok to let it run for another quarter to see if it’s getting better.
At this point – many product managers, who dared to question the output of the DS team, will be scratching their heads, not sure what to respond. After all – you were given a well thought-out response (that you clearly don’t understand, but can’t argue with) as to why you are not getting any meaningful results from the team. You will probably turn around and leave with your tail between your legs and yet – feel very frustrated.
Don’t worry. I’m here to help. At the end of this series – I hope you’ll know how to address such situations and how to properly execute your job and align the data science team with the business goals and work effectively together to deliver a meaningful impact.
The basics of data science
In order to effectively communicate with data science teams you need to speak their language. The good news is that you don’t need to understand plenty of the terms you might hear or they might throw at you.
The ‘semi-bad’ news is that you do need to understand some terms in order to properly do your job.
Let’s go over those.
At the heart of the data science work there is the machine learning model. This is what the data scientists are focused on building and improving as part of their day-2-day job.
I will shortly explain to you what it is, but essentially, from the business perspective – it’s a magic black box that you feed with data and it outputs the needed business decisions. It could be:
- Providing the right recommendation to the right user
- Detecting an anomaly in some electronic transactions
- Identifying illegal activities in a video footage
- Optimizing routes for deliveries
- Optimizing inventory management
And tons of other use cases.
Now, I tried to look up the web for a proper definition for a ‘machine learning model’ and I was surprised to find out that this is more challenging than I thought.
It seems like most of the relevant articles out there are written by scientists for scientists and fail to explain this term in words that any human can understand. Luckily, ChatGPT to the rescue (but even then I had to stir it in the right direction):
Me: Can you explain to me what a machine learning model is?
ChatGPT: A machine learning model is a mathematical model that is used to predict outcomes based on certain inputs. These models are trained on data, and they use that data to make predictions or decisions without being explicitly programmed to perform the task. Machine learning models can be used for a wide range of tasks, such as image classification, language translation, and even playing games. There are many different types of machine learning models, including decision trees, random forests, support vector machines, and neural networks.
Me: You say it’s a mathematical model, but isn’t it a software program that leverages math?
ChatGPT: Yes, you are correct. A machine learning model is actually implemented as a software program that uses math to make predictions or decisions. The math used in a machine learning model is used to create a mathematical function that can take in certain inputs and produce a prediction or decision as output. The software program that implements the model is used to train the model on data, and then make predictions or decisions based on the trained model.
For example, a machine learning model might be used to predict the likelihood that a customer will churn (stop using a company’s products or services). The model might be trained on data about the customer, such as their age, location, and how long they have been a customer. The model would use this data to learn patterns that are associated with churn, and then it could make predictions about whether a new customer is likely to churn or not.
Ok, so essentially a model is a software program that, given past data, can provide decisions or predictions about current data that it has never ‘seen’ before.
It achieves it usually by applying sophisticated calculations that very often are relying on math or statistics.
[Side note – this leads us to the main differentiation between data scientists to software engineers or data analysts – they have a strong knowledge in statistics and math and they need to possess knowledge in designing models for different business domains (as there are plenty of academic approaches). Implicitly it means that they are most probably highly intelligent individuals with a strong affinity to data. I’m pointing this out, because you need to understand the persona if you wish to effectively communicate with them.]
The models used within companies can greatly vary from each other as they are based on the business problem at hand and the data the company has access to. I’ll discuss later what it means to you, so for now let’s leave it with that.
Training the models
If you are spending time near data scientists you will definitely hear them discussing the ‘training’ of their models. The process of training a model means that the model is usually being ‘fed’ with past data that will help it improve and deliver better results.
Generally speaking, models are split into two kinds: supervised and unsupervised. The type of the model is usually concluded based on the type of the problem/domain.
If the model is supervised it means that it can be provided with a training set of samples and their corresponding expected results. For example, if the problem the DS team is trying to solve is to design a model that given an image can properly identify which animal (if, at all) appears in the image – then we can easily provide a large database of images, featuring all types of animals, and then classify (label) each sample with the name of the animal which is featured in it.
You can then provide this information (all the images and their associated labeling) to the model and ‘train’ it. If the model is properly designed then eventually it will be able to automatically identify, with high accuracy, which animal appears in an image.
In some domains, it’s not trivial to tell what is the expected result for each sample. For example – let’s say the team is tasked with designing a model that identifies suspicious behavior in videos. In that case – it’s very hard to define what is a ‘suspicious’ behavior, and therefore we might want to use a large training set of video footage which includes only ‘normal’ behavior and design the model to identify deviations from such behaviors. In that case – we’ll be dealing with an unsupervised model.
Whether the team has decided to take the ‘supervised’ or the ‘unsupervised’ approach – is not really your concern for most of the time, but we’ll talk about it later. For now it’s just important to grasp the concept of ‘training’.
Explore/Exploit and having a Control
How does the DS team know it’s doing a good job? That’s a great question that we’ll cover in length later.
But I do want to describe some of the concepts that help with such a measurement.
The first one is what is often referred to as the ‘Control’. Responsible DS teams would probably define and maintain a control group for their experiments. This group represents a baseline to compare to. Or, in other words – what would have happened if the DS models didn’t exist at all. Usually it means providing random results.
For example – in the case of animal identification above – if an image containing a cat would have been given as an input to the ‘Control’ model – then the result would probably be some random animal. The Control model could determine that indeed it’s a cat, but with the same probability it could determine it’s an elephant as well. Or in other words – a totally random output.
It makes a lot of sense to have such a baseline model in place for evaluating the performance of the various models. If one of the designed models is giving results which are quite similar or inferior to the Control – then this model is definitely broken and needs to be fixed.
If the DS team is disciplined enough – the Control needs to be encoded in the production flow. This may pose a problem if the models are serving ‘live’ traffic/customers – because it means you probably need to ‘sacrifice’ some of your traffic or customers’ requests by letting the Control model serve them. The problem, of course, is that these users/customers will almost always receive random results and most likely be disappointed with the service. On the other hand, if you don’t expose your Control to real traffic then you might not have a solid baseline and hence having difficulties evaluating the quality of the model.
There are ways around that or compromises which are ‘good enough’ for most cases, but I don’t wish to delve into this now. I just wanted you to understand the concept of having a control group because we’ll discuss it later.
Another concept, which is in the same ‘neighborhood’ of evaluating the performance of various models is the concept of ‘explore/exploit’.
Let’s say you have a model which is performing ‘quite well’ in your domain. However, the DS team has designed 2 other models which theoretically provide even better results.
On one hand – experimenting with these models when you already have a ‘good horse’ might be risky. On the other hand – if you don’t do that – you might never improve beyond ‘quite well’.
The answer to that is to work in ‘explore/exploit’ mode. It means that you decide in advance about how much you wish to be ‘experimental’ (let’s say 15%) and for the rest of your traffic/input data you’ll be ‘exploiting’ your best model so far.
Hence, if you decide to allocate 15% to experimentation then 15% of your traffic/requests will be served randomly by one of the experimental models. The rest of the traffic will be served by your best performing model so far.
By leveraging the explore/exploit technique your team will be able to learn over time if one of the experimental models is outperforming the ‘exploited’ model and replace it. And if the team has decided to put the extra effort then such a replacement can be done automatically by the code.
For making their decisions, models are evaluating various ‘features’. Features are attributions of the data that can be considered.
It’s the responsibility of the DS team to select the features the model needs to consider when it needs to make a decision.
In the example provided earlier about identifying animals in images there are several potential features the model may consider, such as colors, shapes and textures that appear in each image.
The team might assign different weights to different features. For example – they may instruct the model that a positive identification of a specific texture is more meaningful than the color of the object for identifying an animal.
The features selection process can become more complex when the context is bigger. For example, when you design a model for serving the best content piece for a visiting user that there are plenty of features to consider such as:
- Any metadata of the content – title, text, image, tags, publication time, etc..
- Any metadata about the visitor – geo, age, sex, etc..
- Any metadata about the HTTP request itself – user agent, IP address, cookie information, etc..
This features selection process is therefore very impactful on the quality of the decisions the model will generate.
At first glance – you might wonder why not consider all available features and that’s it. The answer is that each feature you consider creates more possible combinations of inputs and outputs and this number can become really huge. The problem with that is you’ll need a huge amount of traffic/inputs in order to learn something meaningful, and effectively your ‘learning’ will become much slower to the point where it no longer converges.
Therefore, proper features selection is very important, and choosing the wrong features for a model can definitely hold back the team from preventing a true business impact.
Being statistically significant
As I noted on the features selection process – the more data points you wish to consider, the more traffic or input data you’ll need. This is because any decision should be made only if we received enough samples to be statistically significant.
For most domains I know – for each combination of input-result you probably need to gather ten thousands of samples before you can ‘learn’ something definitive about the quality of this result given this specific input.
Sometimes you might be ok with fewer samples, but you need a true justification for that.
As a product manager it’s important that you’ll understand how much volume of data the DS team will need in order to make true progress.
For example – if the model is installed on eCommerce sites and responds to requests from users, then it will take ages to learn something meaningful if it’s only installed on sites with up to 100 visits per day.
I believe those are the basic concepts we need to understand in order to effectively engage with data scientists. By all means this is NOT an inclusive list. There are tons of other concepts and terms in the data science universe, but introducing them might just confuse you, and most of them are not really needed to be fully grasped as a product manager.
In the next post we will learn how to effectively execute the product delivery process with data science teams. I will be using the concepts we learned today in the next post, so make you sure you fully grasp them.
So… that’s it for today! Stay tuned for the next post!
If you found this post/series useful – please let me know in the comments. If you think others can benefit from it – feel free to share it with them.
Thank you, and until next time 🙂