What is the secret formula for MLOps success – And where to look for it?

Clearly, no mastermind holds the key. A better place to search might be in the sleepless nights and the overtime hours spent on operationalizing machine learning models.

This search for the pieces of the formula is what I had to do when I started working a few months ago on the product side of TeachableHub’s machine learning deployment platform. As a novice in the field, it threw me in the deep end and it was overwhelming, to say the least. Thankfully by my side, I had my team and the people forming the awesome MLOps.community and DataTalks.Club communities.

Putting one foot before the other on the way to uncover the secret I spoke with more than 100 ML professionals to learn about their MLOps journeys. My expedition is far from over but I saw three ‘signposts’ along the way that helped me better understand what the formula for MLOps success looks like. I hope you find them helpful on your journey too.

“Don’t Innovate for Innovation’s Sake“

Khaby Lame’s TikTok profile

“Focus more on solving a problem, rather than using a particular tool…”

Luigi Patruno shared in his second talk with Demetrios Brinkmann about productionization.

Carving out new paths requires a lot more energy and resources, which is especially true for machine learning. Before searching for any secret formula for successful ML model operationalization, we need to take a really good look first at the problem and then at the tools we are planning to use for it. That is the foundation for an impactful and beneficial MLOps introduction later on, not only for the Machine Learning and Data Science teams but also for the company as a whole.

“There’s a shocking number of what people classify as DS/ML work that can be solved in SQL (Quintile bucketing, windowing ops, building linear equations, etc.) It might execute in seconds vs. the ML approach — 1 hour to train, 10 mins to validate, and the code you need to maintain. We’re here to solve problems, not to get fancy.”

From Benjamin Wilson’s talk with Alexey Grigorev on how to run away from complexity.

Hyped about the latest cutting-edge technology and FOMO

“AI FOMO leads to teams throwing ML at the wrong problems and then getting disappointed when the models perform poorly or never make it to production”

Chris Sears, Field CTO of Hypergiant

Fear of missing out makes some companies grab on to machine learning and start looking for a problem it can solve, instead of looking for it as a solution to an already established issue.

We have met people from various sizes of organizations and they shared common observations on how decisions on the executive level are affected by the:

  • Lack of understanding of the technology,
  • Poor communication with the data science and engineering teams,
  • An unhealthy innovative spirit at times.

All these are oftentimes leading to selecting a fancy-sounding problem and fixating on innovating with machine learning. In the end, resulting in a mismatch and a failure of the whole initiative.

ML as a one-off project or as a continuously developing product?

The above-mentioned observations are a contributing factor to companies treating ML as a one-off project, rather than as a product. Speaking with the lead ML engineer of a major fintech company building a fraud detection system, she shared a similar observation:

“Companies that are starting with the problem first, improving on a defined metric and reach ML as a solution naturally are the ones that will treat their models as a continuously developing product”

In her experience, the distinguishing feature between the two types of companies and approaches is having a defined key metric that is directly tied to the core of the problem. Having that in place allows ML/DS teams to show clear results to the company’s executives, gain their trust and get a “green light” for further development. It also makes the work more impactful and enjoyable as well as benefiting the business as a whole.

The ML Jackpot!

To conclude this point I’d like to quote two experts from the MLOps.community sharing practical experience about not innovating just for innovation’s sake but rather for the problem’s sake. The first is Oguzhan Gencoglu from his discussion with Demetrios Brinkmann on the Law of Diminishing Returns for Running AI Proofs-of-Concept.

“If you gather 3 or 4 people with tech and business backgrounds and put them in a room for some time to think of all the possible problems in the organization that can be solved with ML you can create a backlog of problems. From which you can select the one that will be the best match with the highest ROI. This incredibly increases the chances of hitting the ML Jackpot!”

Supporting and expanding on this quote Simon Shaienks Product Marketing expert at Snitch AI added:

“We have a similar approach. ROI and impact are great, but feasibility and complexity are equally as important. We gather the best use case but don’t necessarily go for the highest ROI first if it’s too complex. We tend to focus on quick wins more than big bets.“

You can find more on his thoughts about the complexity vs. impact balance here.

The Focus

And the second is a continuation from Luigi Patruno’s talk with Demetrios Brinkmann about productionization:

“If you have strong leadership on the executive table to direct your attention to the really important problems can translate to huge wins for the company as a whole. Focus on THE problem that is going to be the most important for the business”

They both touch on the topic from slightly different angles, but their points feel equally eye-opening and important to me.

“Start with the end in mind”

Now, getting more into the field of productionized models Oguzhan Gencoglu made another great point in the same talk explaining from his experience why a lot of successful PoCs don’t make it to production.

While all the reasons he shared are valid, the thing that struck me the most is that ML teams are missing the holistic view from the beginning:

“If you don’t plan during the Proofs-of-Concept, it will never make it to production so, in the beginning, you should be aware of the holistic view of the constraints. Your models are made for production”.

A smarter way of building PoCs is to think of them as products that will need to get into a real business environment. Of course, that doesn’t mean data scientists shouldn’t experiment and test ideas. But considering the requirements and resources needed to get a particular model working in production from the beginning brings them closer to success.

“Getting into new areas with many unknowns, we, of course, use the scientific methodology, but still maintaining an iterative way of working and keeping in mind the business problem”

Shared Georgi Kostadinov Co-founder & CTO at Kelvin Health and also Head of AI at Imagga

Photo by Ariel Biller

The positive exceptions

A lot of teams have had to undergo several iterations in defining and establishing their machine learning capabilities. Of which two separate fairly small teams we’ve met each consisting of 2–3 Data Scientists and just a single ML engineer. One was at the beginning of improving their recommendation system for e-commerce in production and the other was in its second year of having live models for optimizing freight distribution channels. They both followed Elena Samuylova’s pieces of advice on how to avoid failing at productionizing ML models, which she shared in her talk with Alexey Grigorev:

“Educate non-technical stakeholders and learn from them

Tight collaboration and cross-functional teams

Be highly iterative — iterate on the problem statement, data, and model

Scoping — don’t rush into modeling or settle for convenience

Start small deploy fast — measure time to results, not results

Clear metrics — don’t lose sight of business success criteria

Start with a suitable MVP/prototype”

Both teams on the one hand thought in terms of infrastructure, load balancing, redundancy, availability, performance, scaling, data security, compatibility, reproducibility, etc, and on the other — they were considering the team’s capacity and know-how. And while making all those considerations, they always made sure the core metric and requirements for the problem are communicated with the business. That way, they created a holistic view of their roles and responsibilities in putting models to production, which helped them execute much faster than the rest of the teams we’ve spoken to.

But Unfortunately…

Unfortunately, from our observation, these two cases are more of an exception as the majority of teams think about these things as they go. No wonder Ale Solano first message got pinned right on top of the introductions around the MLOps community:

“I was a happy data scientist until we decided it was time for deploying our models.”

It is common among many DS/ML teams that when the time for productionizing the model comes, they are caught off guard due to poor planning.

Gina Blaber quoting Dinesh Nirmal from his talk on Operationalizing Machine learning

Unsiloing the Data Science teams

As Chris Sears best puts it:

“I think the challenge is getting DS teams embedded or collaborating with the folks that have the deep domain knowledge to identify and frame the ML problem correctly. Small teams do that better than large corporate orgs where there’s a DS team silo’d off in their own world. I used to be at AWS and working backwards from a customer outcome or experience was one of our key principles.”

To that challenge, Soumanta Das shared with us how they are addressing currently the issue at Yugen:

“For the past few months at Yugen, we’ve been pairing Data Scientists and ML Engineers to work together on a new model release or to roll out an A/B test, to foster a positive culture of collaboration. We’ve noticed the following encouraging trends:

• Increased appreciation of system design and architecture. All our Data Scientists now spend more time designing flows before they write the first line of code.

• Using system design documents as a tool to onboard new team members. We’re saving a lot of time that would have been earlier spent in knowledge transition meetings.

• Higher influx of ideas. ML Engineers feel more confident suggesting ideas to drive key KPIs”

“Aim for the low hanging fruits”

Reaching the end of any endeavor requires strategy, the same applies for succeeding with MLOps. One such strategy is to take on a less challenging problem or part of it in the beginning and find the easiest way it can be solved. “Simple bets with quick wins” can be a good foundation and preparation for the more complex problems.

The Strategy

So when it comes to introducing MLOps to your organization, do you go for the lowest hanging fruits first or the highest priority problems?

Photo by Paul Hanaoka on Unsplash

In one of the MLOps community’s Coffee Sessions, Nick Masca was sharing War Stories from Productionising ML and he answered:

“Early on I focused on delivering value, gaining trust through making some early wins, but also having a 3-year plan and thinking about the bigger picture around automating at scale. It gave me both the technical (path to production) and non-tech (customer behavior, measurement issues, business impact) learnings much faster. So start small, think big.”

Strategically picking the problems in the grand scheme of things from the very beginning can allow for quick learning moving forward and showing results. And later on, prepare the team for the more challenging ones. Georgi Kostadinov also shared the experience they have with picking the problems to focus on:

“Throughout the years we’ve noticed that the time spent on hyper optimizing algorithms yields less overall improvement as compared to focusing on improving the data set or switching the perspective on solving a particular problem. The Pareto effect applies with full force 20% of the effort brings 80% of the model accuracy”

The IKEA effect

Taking a certain approach often depends on the company’s culture. Benjamin Wilson later in his talk with Alexey Grigorev touches heavily on that point:

“I have seen that effect with certain companies that I’ve interacted with, where they’ve built that “10,000 piece desk”. They love it. But they also hate it, because they can’t build any more desks, can’t build the chair that goes along because they’re too busy fixing the desk over and over again. The whole DS team spends 90% of their time just fixing and gluing back on little pieces that keep falling off.”

In the same line of thought, one Full-stack Data Scientist who was tasked to introduce MLOps in the start-up company he works in shared with us:

“I don’t want to reinvent the wheel. If there is an MLOps platform that can manage the infrastructure in an optimized way and has some of the best practices built in the workflow, I’ll take it.”

They are a small team consisting of 3 Data Scientists including him and their focus is to be able to deliver faster and continuously improve their models. Building things in-house is something he wanted to avoid as that would add unnecessary complexity to his work and this was in alignment with the company’s culture. They were actively looking to outsource this part of the process and exploring our and other solutions was a possible low-hanging fruit for them.

Avoiding unnecessary complexity

Another example of successfully applying this strategy came from Utkarsh Agrawal Senior Machine Learning Engineer at Trell. Their main focus at the time was to improve their personalization system for a video-sharing platform. They have taken this approach to a smaller scale with the way they have constructed their production pipeline:

“With each model, we focus on putting a stripped-down version, quickly deploy in production and then in the span of a few weeks to continuously add complexity”.

Continuously improving their models was possible because they have well-established performance metrics. Taking one problem at a time and focusing on it was their way to nail the engagement rate they were trying to improve as a core metric. This strategy allowed them to build up the complexity required for accurate personalization models and to fully address the problem over time.

Making a full circle on the MLOps journey isn’t easy and it’s not something that happens for a day. So starting small, but thinking big is making mine, and as it seems others’ work, less overwhelming and a lot more exciting. I hope it can do the same for you.

Fin

The amazing collaborative spirit in the communities is a major driving force in the progress of the MLOps space.

Thanks to it, a search for any kind of success formula is possible. Step by step in my team, we put together insights from all the shared knowledge and experience to paint the detailed picture. If you want to see it yourself sign up to demo test the TeachableHub’s platform.

The journey continues, and these were some of the “signposts’’ that helped me build up an understanding of successful MLOps. Hopefully, they will save you time and give you some sense of direction. I’m interested to hear how are you navigating in the space and to build a more comprehensive guide together. Contact me on LinkedIn and stay tuned for part II.


WRITTEN BY Hristo Krastev from TeachableHub | LinkedIn

PHOTO BY Raghav Bhasin