A Healthier Data Diet | Wendy Gonzalez from Sama

You can think of artificial intelligence as being like a living creature—it’s only as healthy as its diet. And unfortunately, AI these days is consuming a lot of junk food. But don’t blame the data—there’s a lot of it, and it’s difficult to organize. Often, the data simply lacks the proper balance and composition to give AI the fullest power.

Wendy Gonzalez is trying to remedy this with her company, Sama. By training the data to feed the AI, Sama is able to help implement a seemingly endless list of solutions—everything from finding the right shoe size to saving endangered species.

On this edition of UpTech Report, Wendy tells us about some of the many projects she’s worked on, and she explains the complex relationship between data and AI and how her work through Sama is helping us move closer to a new era of computing technology.

More information:

Wendy Gonzalez is an executive passionate about building high-performing, high-functioning teams that develop and scale innovative, impactful technology. With two decades of managerial and technology leadership experience for companies including EY, Capgemini, Cycle30 (acquired by Arrow Electronics) and General Communications Inc,.

Gonzalez is currently the CEO of Sama, the provider of accurate data for ambitious AI, used by leading technology companies such as Walmart, Google, Nvidia and Getty. Before taking on her role as CEO, Gonzalez was Sama’s President and COO, and is an active Board Member of the Leila Janah Foundation.

DISCLAIMER: Below is an AI generated transcript. There could be a few typos but it should be at least 90% accurate. Watch video or listen to the podcast for the full experience!

Wendy Gonzalez 0:00
The world is constantly changing. And the more AI gets adopted, the more edge cases and use cases are flying.

Alexander Ferguson 0:12
Welcome to UpTech Report

This is our applied tech series. UpTech Report is sponsored by TeraLeap. Learn how to leverage the power of video at Today I’m joined by my guest, Wendy Gonzalez, who’s based in Los Gatos, California. She’s the CEO of Sama. Good to have you on Wendy.

Wendy Gonzalez 0:29
Thank you for having me, Alex.

Alexander Ferguson 0:31
For sure, now, Sama is all around being a platform as a high quality training data platform. So for those out there who have a lot of data, you have to make sure is accurate for your machine learning models. It’s probably a platform, you’re going to want to check out now when you’re on your site, you state accurate data for ambitious AI, help me understand what was the problem that you guys are focused on solving in a nutshell.

Wendy Gonzalez 0:55
So there is a proliferation of inaccurate data out there. And as you can imagine, on artificial intelligence is basically as intelligent as the training data you you train it on. So it’s very much the garbage in garbage out, kind of situation. And so what we were really focused on is companies who are creating world changing AI, you know, kind of ambitious or enterprise grade AI, they need to have a reliable, consistent source of high accuracy training data to really power their machine learning models. And that’s really what we’re focused on.

Alexander Ferguson 1:25
No, give me just a frame of reference of the quantity of data these days that if you’re wanting to build a machine learning model, and for enterprise companies, mid market companies, and they’re wanting to utilize a scope and size of data that we’re working with these days,

Wendy Gonzalez 1:41
yeah, how much data is enough? is a very, very good question. I mean, certainly, I think, you know, if you’re a data scientist, the The answer is going to be all depends on exactly what you’re trying to do. But there’s no question that at large scale application, there are literally, you know, we’re doing billions of points of annotation, companies have literally 1000s of terabytes of data. So really being able to sift through how much data do you need? What data is representative, accurate and complete is of critical importance to these companies.

Alexander Ferguson 2:10
Help me to see your your history and like, why are you guys different? Basically, what, what’s your story?

Wendy Gonzalez 2:17
Absolutely. So, Sama has a really interesting story. So we’ve actually been doing machine learning and training data, basically, since it’s been adopted at large scale. So really, companies started using, you know, supervised machine learning techniques to build, you know, artificial intelligence, basically, back back in the late 2000s, early 2010. And that’s actually about when we started providing training data, we’ve been able to work with companies. You know, obviously, there’s a lot of proprietary information out there about ones that we can talk about, like Microsoft, where we actually helped power the Xbox Connect. And so so, you know, you can, you know, we were part of, you know, hopefully you had some amazing dance moves on that on that just as video gaming? We did, you can,

Alexander Ferguson 3:00
you can thank you for that.

Wendy Gonzalez 3:04
But yeah, I mean, we’ve been working in this space for for quite some time. And what has really allowed us to do is really understand how do you get to high accuracy and high quality? But not only how do you do that? How do you do that at scale. And what I mean by at scale, it isn’t just about volumes, but it’s about the fact that the world is constantly changing, and the more AI gets adopted, the more edge cases and use cases are applied. So you might have heard of sort of the long tail of AI, it is 100% accurate and true. Because if you can think about it, just taking the self driving car is an example. Um, you know, we’ve got cars here in California, but they’re also cars all around the world driving on, you know, using different lanes to drive on different weather types, different, you know, road signage, there’s an infinite amount of kind of use cases out there to really kind of replicate human context. And so being able to have an extensible platform that really allows you to evolve as a model evolves, is a really big part of how we differentiate ourselves.

Alexander Ferguson 4:04
I feel like the true power right now of AI is the long tail or the very specific use case in like this particular area. The general AI, we’re probably still far off from that, but this is kind of a How can you play in this one particular, which requires specific training on that topic area? Can you give any other use cases or examples that where this has made a lot of sense and developing the right accurate data for you to train on in this specific area? Any other use cases or examples come to your mind?

Wendy Gonzalez 4:40
Oh, yeah, I mean, there are so many incredible use cases, and we actually had the, it’s really exciting to be in this particular space. So there are many and you just tell me when to stop. You know, for example, you know, we do a lot in consumer and media, and so there are so many applications, we’re seeing a ton of Around the augmented and virtual reality space. So as you can imagine applications on your favorite devices or your favorite internet engines around, sort of personalization and virtualization, so you know, I was, I was on a website the other day to where, you know, I took a picture of my feet, to then understand what my size shoe would be and what she might look good on me. There are many applications that we’re seeing like this, in the, in the fashion and retail and e commerce, a lot around visual search applications. So how, you know, we may inherently know what plaid or gangam looks like as a pattern or design, how do you teach a machine to search for and translate and pull back the right kinds of products. So that’s a lot of what we’re seeing in the consumer and media space, as well as kind of the regional e commerce space. Also, in terms of specific applications, there are a lot of interesting things that are happening in the food and agriculture, as well as robotics space. So we are working with companies on things like reducing food waste, as well as plant disease. So how do you use computer vision to be able to identify when you actually have plant disease in your crops, and need to actually address that. We have some fantastic and yet super cool, fantastic use cases around conservation as well. So we’ve worked with the Nature Conservancy and Vulcan as an example, that have one initiative, that’s really one that we’re super proud of that we got to partner with Vulcan on. Um, they’re trying to reduce poaching in East Africa, relative to elephants. And as you can imagine, you’re talking about, you know, millions of acres that these Rangers need to cover, which is incredibly difficult. So Vulcan had a drone solution that would allow them to cover all these millions of acres that had computer vision cameras on them. And we helped train basically the the software to detect poaching behaviors. So we were looking for cars and fires and, you know, cars that were staying and jeeps were staying in one location, we also have the chance to tag elephant butts. So that’s how you identify unique elephants is by the length of their simple aerial footage in and to help drive them, you know, accuracy of tagging elephants. So I mean, these are just, there’s so many incredible applications, but these are just a few.

Alexander Ferguson 7:26
That’s helpful to hear the some of the stories when it comes to to annotation and quality checking is the majority of it when I’m in this space, image clarification, okay, this is what this asset is inside this image, is that the majority and the mainstay right now, as far as quality checking annotation.

Wendy Gonzalez 7:42
So it really exists, you know, kind of for every data type, we’ve seen that what’s really difficult to do is the computer vision side, so that could be images, video. And there is a you know, because you know, you’re you’re basically having to structure and identify the context of everything in that particular image. What I mean by specificity, you know, to use the, to use a, I’ll just use the retail example, if you’re in a cashierless store, you not only have to track people movement, but you have to be able to take every object that’s actually sitting in that, that basket or that cart to be able to build it properly, right, which is really interesting. And as you can imagine, the world never stayed still, right products are constantly being introduced as an example. So that’s an example of computer vision. There’s also a lot happening in natural language processing, which is really text. So you can think of things like, you know, chat bots, or text categorization and classification. And then there’s also audio as well. So in terms of each of the areas, they’re all growing, although I would say the computer vision is really the sort of the area that’s got the greatest grip

Alexander Ferguson 8:47
for a company to say, Okay, let’s let’s get this data, let’s get it accurate, and be able to train on it for our machine learning model. And then we’re done. We’re good. We don’t we don’t need to worry about it anymore. What do you say to that type of mindset?

Wendy Gonzalez 9:03
Yeah, you can’t put AI up on the shelf and let it collect us. It doesn’t work that way, for sure. Models degrade over time. So what happens is, is that the more data that gets introduced, because the world doesn’t stay still, like the the simple example, when I have these conversations with folks is like, you know, who would have thought that we’d be in this pandemic, and that when you were using, you know, your device to use your, your wallet or your your, you know, your Apple, your Google page, but it may not recognize you without with your mouse, right? I mean, things things change, right? There’s a constant evolution, just like the example I gave for products. And if you don’t continue to refresh and in tune, the models that you’ve put out into production, they will degrade. And that’s, you know, it’s basically called Data drift. And a lot of companies are finding now that they need to spend roughly 10 to 15% of kind of their product budgets, if you will. I’m really optimizing and maintaining those models. So that’s also something that we’ve been doing quite a bit of because Yeah, if you, you know, you, you go to the store or you know, an e commerce site, you can’t find your product, that’s a pretty catastrophic, you know, brand experience or product experience.

Alexander Ferguson 10:11
For you guys, your business models is simply basically someone comes, say we’ve got this much data, please go through it, and it’s based off of the data size, and how do you guys work?

Wendy Gonzalez 10:22
Yeah, definitely. So we really do a few things. Our objective, ultimately is to be the training data pipeline for our for our clients, we want to be able to, to really support them in that process. And what that means is that there’s everything from the labeling itself. So you’ve got these tons of you know, images, or video clips or, or audio clips as an example. So we actually have a platform that allows us to, again, drive hierarchy labeling at scale. So we drive that a lot through automation, we also work upstream as well, because as you can imagine, you know, if you have those, you know, 1000s, or millions, you know, of data sets, you don’t necessarily want to train on label every single thing because it’s incredibly costly. And then it takes a long time to sort of train your models. So all of that training and ingestion of the data has uses, you know, GPUs and TP use and server farms, right? And that’s a lot of high computing cost. So clients are kind of pulling a sub screen to say, like, hey, not only can you help me transform my data, but can you actually tell me what’s in it? So that I know, especially as I build my models, how do I get to sort of a complete data set that allows me to get to quality? So, for example, in the in the e commerce example, you know, maybe we’re not doing so well, and, you know, organic snacks. So that’s the area that needs an injection, you know, we need more granola bars, you know, in our data set, because that doesn’t seem to be performing a quality as an example.

Alexander Ferguson 11:50
Gotcha, gotcha. As far as your platform itself, anything you can share about the technology stack or how it’s different from other options or solutions out there.

Wendy Gonzalez 12:03
Yeah, absolutely. So we’ve taken an approach, really one of building our our entire platform, I’m really sure high quality data as well as extensibility. So our approach has been such that we know that the world doesn’t stay still does not need to convene to evolve. So what we’ve done with our AI powered labelings, and we’ve been very much more discreetly focused on everything from entities and types of use cases. So kind of to bring you back to what you said earlier, Alex, it’s pretty difficult to it’s going to be quite some time before there are really effective generalized world models, right, that are like, I see everything like a human would see it, I can detect context, occlusions, depth, all these different things. So our verge is much more focused, where we are really driving to end applications and use cases,

Alexander Ferguson 12:54
like an elephant, like, hey, let’s train it just to know what what this elephant looks like, because of its size of tail.

Wendy Gonzalez 13:01
Yeah, because Exactly, and driving in automation that way. But I think the thing that’s really interesting is that there are many use cases that cross different industries. So for example, people tracking is not something that just is needed in a self driving car, where you need to see pedestrians, you may need that in a smart transit network, you may need that for safety applications. We have companies in the retail space, we want to understand traffic, like, you know, coming in and out of the store, what are peak times of you know, of shopping in urban planning company might need that same sort of people tracking information. And so it’s really about how you drive to the use case. And those use cases are kind of scalable and extensible across kind of industry application, if that makes sense.

Alexander Ferguson 13:47
Gotcha. Gotcha. Do you guys have any API connections or etc, that allow folks to be able to easily work within other platforms, etc? Or is it basically be able to use your platform, and then they get the insights, and then they export, and they use it in other solutions?

Wendy Gonzalez 14:02
You know, API’s are definitely everything. We want to make the processes as frictionless as possible. So yeah, I mean, our platform has API’s to move launch, you know, training data to do things like the transformation, classification and indexing. We do have a, you know, a data focus, we’re not necessarily trying to be the, you know, we’re not trying to replicate the model development process. We’re really trying to really solve for this kind of like high volume, high accuracy, you know, high complexity, data needs.

Alexander Ferguson 14:31
Gotcha. Curious, Wendy, why, why do you like what you do? Like, what what about what about this whole concept of AI and data management? What What fascinates you about this?

Wendy Gonzalez 14:42
Yeah, definitely. Well, I have to admit, I’m a little bit of a data geek. I’ve always loved data, I tracked data and my sleeping in my home life as well, but well beyond that. I think that really that thing that inspired me here is that actually prior to joining Sama, I was actually in the the Internet of Things space. And I was there so early, we still called admission into machine. Definitely on the the early side of that, but sada is like an incredibly disruptive technology to take all this, you know, information and sensor data and make value out of it right and be able to make that decision decisions. I joined Sama back in 2013, because I was like holy cow, like, you know, ai machine learning, right, clearly going to be the most disruptive technology of our lifetimes. But at the end of the day, you know, you’ve got all these end applications that people think about, right, like chatbots, or, you know, your your devices for, you know, recognition, self driving cars, at the end of the day, the real magic is really kind of a layer beneath. It’s not, it’s not sort of the you know, the quote, unquote, sort of sexy part when you think about something like training data. But at the end of the day, what machine learning models are doing is they’re recognizing a series of patterns, right? If you don’t get the data part, right, you could have a completely, you know, sort of ineffective product. And so the influence that I think that we can have on really enabling companies to achieve these amazing end applications and kind of, you know, sort of, really kind of, democratize AI, if you will, I know, that’s a little bit of an overstatement, but it truly is the case, if you can have access to that data. That’s 80% of the effort. And that’s what really is going to make the distinction between a high performing, you know, ai application, and, and a, you know, one that doesn’t function, quite frankly,

Alexander Ferguson 16:27
what if you had to provide a word of advice for a business leader, that they’re looking to build an application or a machine learning algorithm, and they’re looking older data problems, and you would provide a word of wisdom to them, what would you share?

Wendy Gonzalez 16:42
I would, I would say, get a training data strategy. It’s often honestly over, I wouldn’t take white silver over look, but even some of the most kind of sophisticated companies in the world, the data scientists, you know, they’re, they’re focused, you know, on really building their models. And so, there is because of the Trinity, it represents such a large, you know, level of efforts, there have been a number of different research studies out there that basically say, you know, data problem, labeling is like 80% of the effort, like, literally how much time it takes is roughly representative of the mass vast majority of the effort. So actually having a strategy upfront, on you know, what data do I need? What is like sort of my quality, you know, rubric? How am I going to identify what good looks like put in a, you know, kind of simple, straightforward way is actually incredibly helpful towards making the process and the kind of time to market much faster, right. So not only is it a more targeted approach, but beyond that, it’s going to end up being like a lot less costly and take a lot less time. So having that in mind, and knowing that it’s not something that’s one and done, like we were talking about earlier, is incorporating that into the sort of cost of, of kind of building and maintaining your product is really critical.

Alexander Ferguson 17:59
For your roadmap and platform, what can you share as far as exciting features that have just been rolled out? Or will be rolled out shortly that you can talk about?

Wendy Gonzalez 18:08
Yeah, we’re seeing a lot of really interesting activities, certainly, in 3d Point Cloud, you know, so LIDAR, sort of what’s called sensor fusion, or, you know, both looking at 3d inventory, as well as 2d imagery. So we have capabilities there, we’re constantly looking to, you know, evolve and grow those, we also see a lot of this happening in the video space as well. So we’ve been, we delivered a video annotation platform a couple of years ago, and are consistently working to make enhancements and now to really drive, you know, automation and accuracy. So those are a couple of really interesting focus areas. I think the other component, too, is in something called model optimization, which is really how do you how do you address the data drift? And how can you, you know, sort of do create more visibility to your clients to help them understand where they need to kind of refresh and retrain their models. So we’ve done a lot of really exciting development around the analytics space, I think that’s kind of part of our secret sauce. So we really highly instrumented on every aspect of our platform that allows us to, you know, to to detect data drift, basically, which is, which is exciting.

Alexander Ferguson 19:18
What’s the vision for Sama? Like, where are you guys headed? And what can you speak to that?

Wendy Gonzalez 19:23
Oh, definitely. Yeah. So we, we really want to be the end to end training data pipeline for our partners, right? Something to where they can really, you know, I really view as an enablement place, we’re not, you know, here to create, you know, models, for example, that might even one day compete with our clients, we’re actually there to really enable the end application and to do so from the end to end basis of the data. So it’s everything from supporting them and what data do you need, you know, how what is the best tactic or thinking to to to label the actual high volumes of labeling data, the ability to really maintain and refresh that To be able to tune you know, and support the models, as they, you know, kind of go through their application development process.

Alexander Ferguson 20:05
Wendy, I want to ask you just a question here. People always are scared of AI when it comes to Skynet, and etc, things where AI is going to be a bad case or rather, AI and bias, you know, and there’s just so many horror stories or concerns when it comes to AI taking jobs away. Can you share any happy news? Like how can AI be used for good?

Wendy Gonzalez 20:26
Absolutely, we’re seeing so many exciting applications of AI in particular to address all sorts of kind of different social ills or issues. We I mentioned the Nature Conservancy before, we’ve done two things with them, for example of helping improve the sustainability of fisheries, and not over depleting overfished areas. So it’s a combination of not only protecting the environment, and really monitoring, you know, fish growth and movement, but also advising the fishermen whose livelihoods depend on this on how they can work in sustainable areas, right. So how they can continue to basically, you know, keep their livelihood while on overfishing, and having harm into the ecosystem.

Alexander Ferguson 21:08
So equipping fishermen to be better at fishing, how does that actually work?

Wendy Gonzalez 21:14
Yeah, it’s, it’s amazing. So you can use computer vision to understand and detect fish movement. I didn’t think that I would really be putting those words together in a sentence in my in my lifetime. But yes, fish movement is a thing. And there are so many different applications, Alex, some that I think we’re so proud to be a part of. So we’re working with a couple of companies on food waste, to where we are understanding normally, how are companies producing food, everything from from Kenya and beyond? How do you make sure that the right amount of food is prepared, so that you don’t like basically dumping all this excess food? How does that food then get channeled to the right places, if you have too much, right, so it’s really about closing the gap and ensuring that if we are able to reduce food wastage, we’re also able to have a positive effect on the environment. Because less food, you know, is the co2, the water and everything else that is used to manufacture food. So those are just a couple examples of really interesting, social, kind of forget AI. But there are so many more on everything from sustainability to equity. So there’s some really interesting things, for example, that LinkedIn is doing in terms of ensuring that the candidates that you’re able to serve often are come from diverse backgrounds. So it’s, there’s a whole bunch of really interesting stuff happening.

Alexander Ferguson 22:36
You were just telling me about a another use case of AI when it comes to chick eggs.

Wendy Gonzalez 22:42
Yeah, there are. So an unfortunate fact of producing eggs is that oftentimes to to recognize the gender of the of the check so that you know if it’s a producing or not, they actually have to destroy the check. This very startup has identified a way to to identify gender at the egg stage, so that you can actually avoid destroying all these millions of little chicks a year on for the purposes of producing more eggs. I mean, some pretty incredible and cool problems that people are solving with AI.

Alexander Ferguson 23:17
Last question for you, and you just just a fun one. I’m curious. Check predictions looking ahead here near term long term. What do you think we can predict to see when it comes to tech innovation?

Wendy Gonzalez 23:30
That is great. Yeah. Wow, there’s so much out there. I’ll leave you start with what I predict what happened. In the near term, I have an opposite way to answer your question. But we’ve got a long way to go. I if we’re using a baseball analogy, I’d say we’re probably in the first first or second inning of the ballgame, as it relates to the deployment of AI. There is so much left to be done. I think we are many, many years away from that notion of kind of generalized or world you know, kind of worldviews or concepts on the in terms of, and I would say that across vision, language and audio. But in the meantime, I think there is an incredible amount of game to be done across the board for companies that are really looking at how do I basically reduce the cost of how I’m deploying whatever function? It is, right. So on the manufacturing side, we’re seeing a lot of really interesting you know, applications there to reduce everything from the earth should say to enhance things like defect detection, so a lot of energy going in manufacturing, we see a lot coming out in the healthcare space. It’s incredible that you know, everything from you know, robot assisted surgeries, which is absolutely incredible to things like you know, disease detection. So there is some amazing work that is can be done in advancements that we see happening there. And while those are the areas you know, in addition to of course, you know, transportation and kind of consumer media We’re already seeing some very good traction in the course in the next couple of years. I mean, AI is going to be pervasive across like, literally every every industry application, you know, from. Yeah, I mean, there’s just sort of too much to be, you know, to be gained, if you will. So that’s a broad answer to say that, you know, the Singularity is happening anytime soon, we would like, you know, worrying for China, you know, in the course of the next few years here, there’s a lot of work to do. But it’s, it’s incredibly exciting to see how pervasive This is.

Alexander Ferguson 25:32
I appreciate your excitement for the future of AI, I am with you 100%, and that the use cases are numerous and exciting to then have accurate data to build those those right algorithms. So those out there, if you do need accurate training data, you can go to Sama. Sama is, you will learn more be able to request a demo. Thank you, again, Wendy for your time and be able to share your insight and this discussion. It’s good time.

Wendy Gonzalez 26:00
Great. Thank you, Alex. I appreciate it.

Alexander Ferguson 26:04
All right, everyone. We’ll see you on the next episode of UpTech Report. Bye. That concludes the audio version of this episode. To see the original and more visit our UpTech Report YouTube channel. If you know a tech company, we should interview you can nominate them at UpTech Or if you just prefer to listen, make sure you’re subscribed to this series on Apple podcasts, Spotify or your favorite podcasting app.


YouTube | LinkedIn | TwitterPodcast

Bringing Tech to Home Service Pros | Adi Azaria from Workiz

Rewarding the Robots | Chris Nicholson from Pathmind