The success in a machine learning project heavily depends on the team involved. Through this second pill, we would like to illustrate, from our experience, the best combination of profiles to drive success.
This profile is generally in charge of building the data pipelines. As stated in our previous post, data wrangling and transformation are key aspects to develop the project. Aggregating, denormalizing or joining data from different sources generally involves having knowledge about both relational and non-relational databases and a deep understanding of SQL and secondary access methods as Python. This profile is also key in the go-to production process where inefficiency in the data flow could render the project useless due to time limitations.
This profile acts as the liaison between data and business by fully understanding business concepts and limitations. Feature engineering and selection in conjunction with the Machine Learning Engineering are its primary and most important tasks. Additionally, it works with the business to create understandable plots that can lead to gaining insights during the project and ensures that the selected features make sense and no concept errors occur. Using variables that will not be available at the moment of generating predictions is a common mistake in ML projects. For instance, when predicting demand in a store, it is possible to use the daily amount of customers attending the store as its main descriptor. However, this feature will not be available at the moment of prediction and will result in a failed project.
Machine Learning Engineer
This role works closely with the Data Analyst and the Data Engineer. Its first task is to perform EDA (Exploratory data analysis) with the Data Engineer in order to start cleaning irrelevant features or imputing missing values to relevant ones and create the first ML-ready dataset. During this phase, the ML Engineer must statistically study the data and discover possible trends and relations. Those results will be later presented to the business domain to understand them. Later on, the ML Engineer works closely with the Data Analyst to understand business concepts, generate new features and finally train and evaluate models. Once the final model has been chosen a collaboration with the Developer is needed to go into production.
Once all the features have been created, chosen and a final model has been trained, a dashboard, web or app is needed so that the users can consume the results. The developer takes part in the last step of the project collaborating with all the previous profiles. It puts the model in production from the data pipelines built by the Data Engineer and creates some sort of application with the Data Analyst and ML Engineer to deliver the results to the end users.
Finally, once the model is in production the Sys admin will be in charge of monitoring data flows and ensuring that predictions are being served within the agreed time. Additionally, if an automatic re-train of the model is programmed this profile should ensure that the process is running smoothly and the model is being evaluated with the agreed frequency to trigger the re-train.
Author: Adrián López
Coauthor: César Hernández