JUL 24, 2025

25 MIN

Ep. #40, ExperimentOps with Salma Mayorquin of Remyx AI

GuestsSalma Mayorquin

light mode

about the episode

In episode 40 of Generationship, Salma Mayorquin of Remyx AI unpacks the shift from traditional MLOps to ExperimentOps—a framework for scaling insight in modern AI development. With stories from Netflix, Yelp, and Stripe, she explains how teams are bringing foundation models into production and what it takes to keep up. If you're grappling with LLMs, MLOps, or decision paralysis in GenAI, this one’s for you.

about the guests

Salma Mayorquin is a machine learning expert and the co-founder of Remyx AI, where she focuses on operationalizing AI experimentation for engineering teams. With a background spanning MLOps, deep learning, and applied research, her past work includes open-source tools like ActionAI and innovations like a yoga-correcting smart mirror. She’s passionate about making tailored AI more accessible and impactful.

show notes

about the episode

about the guests

show notes

transcript

Rachel Chalmers: Today it's a real pleasure to have Salma Mayorquin back on the show.

Salma is an expert in machine learning and skilled in advising MLOps strategies for small teams to Fortune 500 companies.

She's delivered technical guidance and demos to clients on leveraging Databricks for advanced analytics as a solutions architect.

An ML deep learning subject matter expert, she's created a smart mirror that uses image recognition to teach and correct yoga postures, which was featured in Magpie Magazine.

And she has open sourced action AI, a toolkit for building lightweight, custom action recognition models, coupling pose estimation and LSTMs, getting recognition from NVIDIA's developer community.

Today she's the co-founder of Remyx AI, which is committed to democratizing tailored machine learning. Salma, welcome back on the show.

Salma Mayorquin: Thank you so much, Rachel. I'm so excited to be here with you and thanks again for sharing the time and space.

Rachel: It's wonderful to have you. You've been learning a lot about how AI developers work. What kinds of experiments are people running these days?

Salma: Yeah, I think first of all, it's a super exciting space. It's still heating up. There's so much still happening.

I think over the last years, especially lots of folks starting to productionalize kind of the more foundation model type style, you know, machine learning.

We have some exemplary examples are now starting to put out some blogs. Like, Yelp is actually starting to embed LLMs throughout their stack.

They're doing a lot of interesting things with creating business summaries with LLMs, helping power smarter search.

So helping people actually, you know, get to the intent of what they're searching for within their site, to even powering things like a Yelp assistant that helps their customers fill out requesting a quote for, you know, trying to get to a service provider.

How do they facilitate that conversation with folks? So lots of interesting things there and they're starting to see success and how do they actually go about putting these ideas into production.

There's other groups, like Netflix just recently released a blog where they're describing how they built a foundation model for personalizing content recommendation, which is their bread and butter, right?

So that's a super exciting space to see and see them pioneer that area. And also Stripe recently also getting into the foundation model space by building their own foundation model for payments.

So I think we're starting to see lots of folks who have found their stride and their spike on their particular use cases and what kind of value they want to bring for their customers.

And starting to figure out that process of taking these ideas that are just breaking in research and starting to harden them into production and make them reliable and improve upon them.

Rachel: How does this compare to the early days of productionizing machine learning, which you were also around for?

Salma: Yeah, fun times. I think back then it was a little simpler.

The way that you would work with classical machine learning or deep learning models was a lot simpler than it is today with foundation models.

One way that they're simpler was that they were specialized to a specific task.

So if you have, for example, an XG boost model that is meant to help predict a particular statistic within your data, that was like, that model was specialized only for that task.

If you're using NLP for sentiment analysis, it would only produce, you know, a specific set of labels of sentiment for that text that you provided, for using computer vision models for image classification, it would only give you a label of what it saw in the image from, you know, the examples that you provided.

And these models, either if you were in the classical machine learning realm, you know, required tons of data and lots of feature engineering by data scientists who put their brains to work on how to figure out how to wrangle this data and come up with new insights and different kinds of behaviors that they saw within the data to help the model learn and pick up those patterns.

Or if they're in the machine learning, deep learning space, you know, they could maybe benefit from techniques like transfer learning where they would take a model that was still very specific to a particular task that had been trained with tons of data and now you maybe you could benefit from a smaller data set that still was relevant to a specific task you could further specialize that base model from.

And so in those days, you know, obviously you're in a particular task world and then within that world you also had the benefit of evaluation metrics that were a little bit easier to understand and then track.

So for example, if you are training a image classifier that understands how to tell apart car from truck, then you had an accuracy labeled precision label, like all these metrics that you could track over time in terms of how that model was performing on that specific task.

And it would give you more confidence about putting it out there, right?

Rachel: It's low cardinality, it's a pass-fail test.

Salma: Yeah, pass-fail test.

And so it was a lot easier to be able to tell, you know, whether that model was hitting the mark and was it able to, you know, be ready for the big showtime in terms of putting it in front of your users and then gathering feedback on, you know, whether the data distribution shifted.

So now you have to adjust those labels perhaps, or maybe get new samples, maybe you knew camera angles if you're in that computer vision space with our image classification example.

So the problem was still complex, but relatively simple to what we see nowadays, right, with foundation models.

Rachel: We used to think that was complex, but we had no idea what was coming.

Salma: Oh yeah.

We definitely had much more interesting and exciting horizons when it comes to like all the possibilities, which is both, you know, a great thing, but also can be, you know, a new set of challenges that we have to overcome as a community.

So today, you know, the difference of foundation models, what we might categorize like the in that bucket is GenAI, LLMs, VLMs, all that good stuff.

Foundation models essentially take vast internet scale data and regardless of what shape it is or what format it is or what type it is, these models can ingest all of that data and then pick up the patterns from within it and then be able to perform a variety of tasks and have a variety of skills in that same model. So one model now can do many things.

You can, for example, take one LLM and ask it to do sentiment analysis, but you can also ask it to do summarization.

You could tell it to generate a story for you. It could do all kinds of things with one set of weights. Which is fantastic, right?

We're all excited that AI is a lot more flexible and if in regards to putting it out there in the world and having it touch our lives in different ways.

But that also just makes the problem harder in terms of how do you operationalize that kind of code that is so flexible.

So a big problem right now that a lot of folks are still facing, we're all feeling it, is how to evaluate these models, right?

There's many techniques and new ideas that folks have come up with.

So there's things like chat bot arenas, so places where people can come in and evaluate or judge the outputs of models and you know, to their liking or whatever kind of metrics they might have in their minds about it.

There's benchmarks that folks are trying to come up with so that we could test the different kinds of skills that might be found within these models.

There's even, you know, getting into the meta of it where we use another model to help evaluate or judge how it's performing on a particular task or facet of it.

So that's interesting as well. So now there's just an explosion of ways that you could look at this problem and evaluate, you know, if the model is hitting the mark or not, and now it's a lot more dependent on, you know, what are, what do you value?

What do you want out of this model to do? How do you want to interact with your customers or your world?

And so it makes it kind of a sub problem that developers need to think about beyond just working with the model asset itself.

Another dynamic that's interesting that kind of goes into that meta world is that we have coding assistance that are a lot more powerful.

I think the world has been really excited about code generation, vibe coding is a thing, and it's becoming much more accepted in the community.

And while that's fantastic, you know, we can finally lower the bar in terms of the barrier that it takes to have an idea and then put that into a functional prototype.

We're now left with, you know, what should we build? Is it worth putting or investing our time into that effort?

And how do we wrangle the outputs that are obviously still, you know, still need some refinement, right?

Like the first prototype is great, but there's still lots to do to get it into that polished state that we want to then present and rely upon.

Rachel: So where are people running up against the limitations of machine learning operations-- the traditional MLOps tools? "Traditional,"that's what, seven years old now?

Salma: It's still, I guess in this timeline, it could be considered traditional, right?

Rachel: Yeah.

Salma: Which is great for MLOps. I'm, I'm excited that it's getting a kind of a, it putting its place in the community. But you know--

MLOps is still super relevant. Actually, we still need it now more than ever, but obviously it won't address all of the issues that we are now facing as developers trying to make these AI artifacts into production grade code.

So if we look back at some of the, I guess, frameworks that we've used in the past to help us operationalize code, generally speaking, right?

We have DevOps, which helped us generally operationalize software artifacts.

So being able to build, test, deploy, iterate, deploy out code into integrate it is as a system and process, and artifacts that we could track over time.

And we kind of similarly, you know, find similarities in that world and try to translate that into the machine learning world with MLOps.

So just like code, we wanted to treat ML artifacts like models, data sets, type of parameter searches, all these little artifacts that get produced in the process of creating AI applications.

How do we version control that? How do we keep logs and statistics about what happened in a particular model training session, for example?

Or how do we operationalize a machine learning pipeline where it takes this process dataset, to this fine tuning process, to this deployment process?

And that gave us a framework to kind of work with these model assets that were a little bit more free floating and kind of pin them down into a process that we could reproduce and rely upon over time, right? Or build upon.

And so now, because there's such an explosion of different things that foundation models can do, we now need a process to help us prune that space and pin it down into a process that is also reproducible, relied upon, that can be iterated upon and improved upon just like the other assets.

So that's where we want to essentially help more folks be able to pin these processes down into a scientific process that can help them build essentially like a knowledge flywheel where all the information that you're learning through the process of maybe I tried a new prompting strategy.

Yeah, maybe I also added like a new fine tuning technique that I'm testing out. I switched the base models from all of those possibilities.

What's actually working? What different levers are interacting with each other to produce a particular outcome?

Is that outcome aligned with my angles? Is it making the experience better? Is it actually aligning with even my KPI business goals?

Like is it actually reducing time to produce this output? Is it maybe just making a better quality output for my customers and now they're happier and there's more of them because of it.

So all these things that are left to the devices of individual developers or team leads and all this other extra work is like we're putting down a process that it can then be used then make a reproducible process for more folks coming in and helping out.

And then over time improving upon that asset that we're working on.

Rachel: And you've come up with a cool name for this set of processes, ExperimentOps. Do you want to go deeper on that?

Salma: Yeah, I think obviously trying to liken it to the processes that we already have in our familiar, right, like DevOps and MLOps and just trying to put a name to this whole process that we have kind of sitting in our heads or we do in practice over years of experience.

How do we make it a discipline that everyone is up to speed on that makes new newcomers or people who are pivoting into the AI engineering space have all these tools and frameworks in their minds as well to then move them forward into the process.

And the way we describe ExperimentOps is essentially a concept that treats the process of experimentation iteration process as a first class concern.

So our idea is trying to operationalize knowledge is like MLOps operationalizes data, or DevOps operationalizes code.

And the end goal of it is to help teams maximize their learning velocity and the product impact.

So through all the things that you're testing out, all the possibilities of what you could make or how you could go about developing a particular application.

How quickly can you learn what's working and what isn't, what directions are worth exploring further into, even how the learnings from this particular experiment or product development could inform other products you might be thinking about building that could be adjacent or you know, you could recycle those from.

And then tying all of those efforts directly to how is that impacting your users?

How is that impacting your business, right? Continue feeding what's working and what isn't ultimately in that process.

Rachel: Yeah, demonstrating the ROI of this kind of exploratory process, that's pretty exciting.

So your company Remyx is looking to support this kind of experimentation. Do you want to talk about how that's been implemented?

Salma: Yeah, a good lay of the land is trying to understand the space of AI tools right now.

There's a plethora of different tools and they all provide different services.

You have obviously your colab providers that provide kind of the base foundations of being able to build anything in the world, right?

And then you have maybe AI development platforms that help reduce that scope of that, you know, cloud computing world into what's relevant for AI development, generally speaking.

So it might include things like MLOps, maybe some DevOps, maybe some development experimentation frameworks or environments like Colab notebooks, things like that.

And then there's other tools too, like that might be really specialized towards we help reduce, you know, the GPU compute use on fine tuning or we help optimize and containerize deployments make that really great.

And in that world what we want to be is sort of an analytics layer on top of it, where we want to enhance all of the tools that you'll have in your toolkit that you might rely upon for different parts of the process.

And then make all of those tools just smarter, easier, more informed about where your end goals are, what you're currently developing, what you've tried, how people are responding to it, and then help inform what's next, what's the next direction for you to partake in.

And the way that we're doing that is we're building a developer platform that integrates with all of these suites of tools, right?

And could be deployed in your own environment and it has a system of agents embedded into it so that they can take information from what kinds of experiments you're running in a variety of different environments or different constraints that you're working under, the conversations you're having with your teammates and places where you're keeping notes and kind of ideas like a scratch pad or even like a product tracking system like a Kanban boards like JIRA or Shortcut or things like that.

And then also how do we tie that to other decision makers or stakeholders of the product itself.

So what are the KPIs that the business is trying to optimize for given these projects?

What's motivating these efforts and taking all of that context and information and start building up institutional knowledge for the organization, for the teammates, both in the strategic alignment of how the AI efforts strategy is helping move the business forward as well as the individual developers, you know, given all the things that you've tried, all these strategies that you're testing out, we see that these new directions, based off of the context you've given us, the research that is coming out, what people are talking about in social media that is working and not working, how your users are responding and online metrics.

If you have those of the luxury of that, we think that these next three directions are worth investing in.

Do you want us to go ahead and start configuring and experiment for you? Should we canary test some of these things out and then go from there.

So the idea is, can we have that analytics layer on top of everything you're using and just make that way easier to understand what to do next? What's the next best direction?

Rachel: It sounds super fun and super exciting.

Salma, what are some of your favorite sources for learning about AI? Where are you gleaning all of this insight from?

Salma: I think definitely some fun but hard-earned experience over the years.

It's one of the best ways to do it. It's definitely one of the most interesting in terms of--

If you find a project that's near and dear to your heart, you'll learn a ton. You'll just like dive in. So that's my first piece of advice is find something that is interesting, fun, exciting to you, and figure out how to build something around it.

So if you're interested in music, maybe build a a model that can generate music or, you know, synthesize new assets in that way. Or if you--

Rachel: We don't have video on the pod obviously, but there's very nice guitar just behind Salma.

Salma: Which reminds me, I got to get to it at some point for sure.

But if there's something else that you know, really is really exciting to you, use that as motivation to help you dig into this space and it'll be a great way to kind of dig into the edges that, you know, if you read books or you take tutorials or you take a a pre-canned tutorial, you know, they'll only take you so far.

That's one of the best sources. Books are also fantastic. Tutorials are great. They'll give you great foundations for how to go about even tackling the space.

There's some great O'Reilly books that have been in the space and we're working on one soon enough that we'll want to release.

Rachel: Oh, cool.

Salma: We'll help add to that library of knowledge out there.

But yeah, first and foremost, go do a project that you're really interested in and the pieces will fall together as you go along.

Rachel: Are you dogfooding within Remyx? Are you using your own software to run experiments?

Salma: Naturally, right. I think we would have to be the first guinea pigs to really put our money where our mouth is. But yeah, we are, we actually are.

You know, in our experience, we've dabbled with a ton of different kinds of machine learning projects and different industries and one that's near and dear to our hearts is robotics.

We're like, we've, in our spare time, a lot of our projects look like robotics projects.

And one area in foundation models that is super interesting right now is facial understanding.

So being able to take multimodal models in particular and have them be able to gain the skill of being able to estimate distances between objects or relationships between objects.

So the guitar is to the left of me or the fan that I have in front of me is, you know, a foot away from my persona, from my perspective.

And that particular skill, it could be super valuable for, for example, embodied AI or robotics kind of projects where if you're navigating a space, it would be super helpful if you kind of have an estimate of how far you got to go or if there's obstacles in your way, and being able to use that to help inform other processes.

So we helped open source a result called spatial VLM. Basically some researchers in Google were able to put together a model pipeline that processed raw images into a VQA style dataset that could be used to then help a model learn this spatial reasoning skill.

And so we've taken that project, open sourced it, and then used it to generate a data set to then train a variety of multimodal models now that have this capability.

It's super exciting, actually. It was referenced very recently in a benchmark archive paper.

So more folks are starting to run into that work and validating and seeing how it stacks up in terms of, you know, our fine tuning strategy versus other techniques out there that folks are, are trying out.

So, we very much use this process internally. We have agents that help recommend archive papers on a daily basis that have essentially causal reasoning behind it.

So it understands from our past history, the things that we've tried out and tested before, what particular levers, like prompting strategies versus fine tuning versus using reasoning based models versus not.

You know, what directions are going to be the most fruitful for us to then move that project forward.

So that's been super helpful and we were hoping that more folks too, as they, they start using Remyx AI, find benefits from that as well.

Rachel: I've got three projects I'm thinking of now that we should talk about in more detail.

Salma: That sounds fantastic.

Rachel: Salma, if everything goes exactly the way you'd like it to for the next five years, what does the future look like?

Salma:

I think the future is going to be even more interesting than it's today. I'm hoping that more folks are comfortable with AI engineering, are excited by it, that more folks are able to bring their ideas to life.

And I hope so that it's through tools like Remyx that help prune, you know, the wide variety of things that you could do and help guide that process along the way.

I think that more organizations are going to, you know, find their feet under them from all the developments in the AI space and find that particular use cases that are really driving value for them and establishing, you know, AI engineering departments, just like they might have software engineering departments.

I'm hoping that the discipline and the practice of it is as proliferated as software engineering is. That way more folks have access to what this technology could do.

Rachel: That sounds absolutely great. Last question, favorite question.

If you had your own Generationship, a ship to travel to the stars, what would you name it?

Salma: Oh man, I will describe what I want it to be.

I guess I want it to be the hyper looper, basically the star ship that helps you jump into space, into your next destination.

And hyper speeds helps you move faster in the direction that you're willing to go. I think a ship that is also all encompassing, that helps you explore the world around you.

And again, there's so many things you could do today and the technology that is developing is enabling even more ideas to come to life.

And so a ship that can help take you where your heart desires is what I hope for folks to have.

I'm going to have to circle back to you for a nice name to that, 'cause I'm not that clever to come up with it right now.

Rachel: It's a wonderful description and if anyone can solve faster than light travel, I think it's you.

So thank you so much for coming back on the show. It's a pleasure to have you.

Salma: Oh, thanks again for giving us the time.

Content from the Library

Visit library

Jan 10, 2025

Article

How to Make Open-Source & Local LLMs Work in Practice

How to Get Open-Source LLMs Running Locally Heavybit has partnered with GenLab and the MLOps Community, which gathers thousands...

Sep 19, 2024

Podcast

Generationship Ep. #20, Smells Like ML with Salma Mayorquin and Terry Rodriguez of Remyx AI

In episode 20 of Generationship, Rachel Chalmers is joined by Salma Mayorquin and Terry Rodriguez of Remyx AI. Together they...

Sep 5, 2024

Article

Enterprise AI Infrastructure: Privacy, Maturity, Resources

Enterprise AI Infrastructure: Privacy, Economics, and Best First Steps The path to perfect AI infrastructure has yet to be...