in

Data Science for All | Pedro Alves from Ople

Perhaps one of the most significant byproducts of the digital era has been the ability to gather and process massive amounts of data to gain a better understanding of the world around us.

We can now look at business, medicine, education, sports, politics, and more, and discover patterns and relationships that were previously impossible to see.

To date, the everyday application of this ability has remained a kind of wizardry for data scientists. But our guest on this edition of UpTech Report wants to put this power in the hands of anyone.

Pedro Alves is the founder and CEO of Ople, a company that offers a platform that helps organizations gain insights and recognize patterns using machine learning and AI. In our conversation, Pedro explains how this technology works, and gives some fascinating examples of how it’s applied.

More information: https://ople.ai/


Pedro Alves is the founder and CEO of Ople Inc. in San Mateo, CA. Pedro loves data science and has spent the past nineteen years working in the area of artificial intelligence – spanning predicting, analyzing and visualizing data across social media content, photos, genomics, insurance fraud/costs, social graphs, human attraction, spam detection, and topic modeling to name a few.

Realizing how rarely companies get a return in investment in the areas of AI and data science, Pedro decided to found Ople.

Ople.AI empowers organizations to utilize predictive analytics, enabling users to gain deeper insights from the historical data to optimize future business outcomes. With its unique Automated Machine Learning technology, the Ople.AI Platform intelligently manages all the complex operations, including data preparation, feature engineering, model creation, optimization, and deployment.

Pedro also enjoys the startup world, his family, intermittent fasting and helping other companies and startups make the most out of their data science efforts through advisory work.

DISCLAIMER: Below is an AI generated transcript. There could be a few typos but it should be at least 90% accurate. Watch video or listen to the podcast for the full experience!

Pedro Alves 0:00
I always ask, is anybody at the company using that model? You know, nine times out of 10? The answer is, I don’t know. To which I respond, you know, how can you tell me this is a successful project? If you don’t even know somebody’s using it?

Alexander Ferguson 0:23
Pedro, I’m excited to be able to chat today hear more about Opal AI? To begin, can you share? What do you do? What is the concept of the company in five seconds very brief,

Pedro Alves 0:35
to help companies get a return on investment in AI.

Alexander Ferguson 0:40
And everyone, like a simple return on investment? Yeah, that’s important. So tell me, where did it begin? What was the problem that you initially saw, like, I need to solve this.

Pedro Alves 0:51
So I’ve been a data scientist for 19 years, some in academia, some in industry, and I, I started to see the same pattern of problems, which was, even though good scientific work was being done, whether in companies or labs or, you know, research institutions, there is still a gap between, you know, model being finished, and a real return on investment happening. And that seeing that gap, that discrepancy started, you know, building up with frustration in me, and, and, you know, seeing that AI is such a wonderful tool, a wonderful field that wasn’t producing real results, the business, so I wanted to do something about that.

Alexander Ferguson 1:41
You saw the potential of it, but seeing it wasn’t really being used, really solving problems being in the marketplace, what was the first iteration then of the product? Like, alright, now I’m going to to make use of it. How did that begin? And where are you now today?

Pedro Alves 1:58
Yeah. So there’s a lot of steps. And like you said, What was the first iteration? What is it now, and those, you know, obviously changed, I think very few products, nail it the very first time around. So to start, we created a tool that would take the work of the data scientist, and help speed it up. Right, because if you think about what the gap in my mind was, it was the model being finished being handed off to somebody that’s not a data scientist, because that’s ultimately person’s gonna use it. It’s a model to improve somebody in marketing or sales or, or logistics, and those are the people using it right. And when that happens, those people wouldn’t use the model, either because they didn’t trust the model, understand the model or know what to do with the output. And so those things can be solved with more communication between the data science team or data scientists and the final users. The other problem is also problem formulation. Even before we start building a model, data scientists would be building things that nobody else wanted, because there wasn’t enough communication. So I thought, okay, if I automate the work that the data scientists spend most of their time in, then they’ll have more time to do more communicating, right? Because I’m freeing them from the time of the thing that takes all the time. And I was very wrong, because the reason they became data scientists was because they liked doing that thing it was taking all their time. And communication was not their real interest

Yeah, and you know, and that’s fine. There’s nothing wrong with that. And, and, you know, that that kind of was highlighted also through throughout my career, I’ve interviewed over 200 data scientists. And one quick question that I always ask, is, tell me about a successful project. And they’ll spend 15 minutes telling me about all these techniques and cool algorithms. And it’s fascinating, and I like it, I, I’m technical guy. So you know, we will spend the time chatting about it when they’re finished. I always ask them, Is anybody at the company using that model you build? Nine times out of 10? The answer is, I don’t know. To which I respond, you know, how can you tell me this is a successful project. If you don’t even know somebody’s using it, you’re telling you is 94% accurate? How many correct predictions didn’t make this year? Nobody was using it zero. It was 0% accurate. Right? So, you know, that kind of highlights the the point that I’m making, and to the to the trust component. I mean, my first job was in health care. Just imagine fresh out of college kid with no medical background, except for you know, I did some genomics and stuff but not really medicine, telling doctors 30 years How to do their work, right? Because I built a model that says, hey, this patient, this is how this disease is gonna progress and having them trust, right kid that barely shaving?

Alexander Ferguson 5:12
You know, that’s not gonna happen.

Pedro Alves 5:14
Not gonna happen because you know, y’all know, right? Medical field is known for how humble the doctors are right, like they have a history of humbleness to know very much, very much. So anyway, so that’s where the product started, right? And then I realized, okay, let’s shift then and see if we can help the non data scientists directly. Right? If we create a tool that’s even easier to use for non data scientists where they can build their own models, and they can get some value out of it, then maybe we’re going to be helping solve the problem, right? And then it kind of evolved into, okay, what are all the things they need to trust, understand no to do with the output, right, those things that I mentioned are the gap. And once we had those in the product, and started interacting with the non data scientists, we also found out something else that was interesting, which was, a lot of times they don’t even care about the model, the prediction, which is the holy grail of why somebody built the model, sometimes it’s almost secondary, and they just want to understand more about their data. So much, you know, more than just, you know, some kind of prediction. And so we started what I dubbed, and I didn’t write about this yet. So the term doesn’t exist, but I’m hoping it’ll catch on, which is date? explainability. Right. Everybody talks about model explainability. What about date? explainability? What can the model tell us about the data? That’s interesting, right, that things that you wouldn’t be able to to find, just because the combinatorial explosion, which is factorial way worse than exponential, right makes it impossible. So so that’s something that that we’re we’re going after with the product, and we have a lot of things that we just added, that are under that realm that we’re calling date. explainability.

Alexander Ferguson 7:10
So it’s really the it’s the you realized communication is the issue. They’re not communicating both before they started, they’re solving the right problem or trying to build the right model. And then after, have they use it? How are they using it? And where you’ve realized the gap it sounds like is they don’t just want, hey, here’s a prediction, go take this and do this. They want to understand facts, interesting knowledge about their data they didn’t know before, so that they can go and make better educated decisions. And so you’re trying to bridge the gap between the the data scientists data analysts looking at the data, and then those who are using it, that they can work better together. Was that because that’s correctly? For you your business model as sounds like it’s evolved your target market? Is it the data scientists or is it the business user users? The business? Business users? Yeah. But the data scientists are the ones who are on the platform using it as well.

Pedro Alves 8:10
No, the non data scientists are the business user data analysts, business analysts, the tableau user, Excel spreadsheet user,

Alexander Ferguson 8:18
they’re the ones using it. Yeah. Got it. For good use case. An example. Can you share one of the more recent ones with any clients that kind of give a good taste of an inaction?

Pedro Alves 8:33
Yeah, one is a really simple example. It’s just with sales leads, right? It’s a it’s an organization, they have a small sales team, I think 25 sales, execs, and they, they tried to sell their product, they have a software product as well. And they sold it to about 100. You know, customers, and they tried to sell to another 200, that didn’t buy it. So they had a small little data set of 300. And then they went to one of those companies that sell sales leads where you buy like the, you know, here’s the email, you need to contact information about the company size, and whatever. So they have this massive data set of 30,000 sales leads. And they’re like, well, we’re not going to go after 30,000 sales lead. And we, you know, we could go after them randomly or alphabetically, or you know, some other non intelligent way. Or we could build a model that at least gives us slight higher chances of going after the deals that might have a chance to close and that’s that’s how they use our our tool. They they input the data of the 300 deals that they try to sell with 100 successes 200 failures, and that’s a really small data set. But you know, it’s better than nothing, right? That’s the question always people ask, is this data too big? Is this data too small? It’s like, if it’s all you have, then what good is a good effect tell it’s too small, you’re not going to magically have more data. Use what you have today. It’s better than no model. Right, unless it’s like, you know, insanely small, but in this case, I wouldn’t call it insanely small. And so that that’s one use case. The other one, it’s a company, they’re their consultants, but they also have like a, they sell like an analytics package, you know, like put together Tableau, where they look at your data, and they give you insights about it. And this one is for hospitals, when they’re billing the insurance companies on procedures they’ve done to get, you know, refunded by the insurance companies. And it’s a model that basically tells them things that they might have filed incorrectly, you know, problems with the filing, so that there’s so much money that’s lost, because of back and forth. And, you know, the insurance company goes back and says, Wait, you forgot this or that, or you didn’t file this correctly, you know, and then it’s, you know, for one of the their customers, they said, it’s something like 20 to $25 million a month that they lose, because they didn’t file things properly. So if we can save even a small percentage of that, that’s, that’s an insane amount of money, of just having a model because they can’t go over and double check everything. There’s too many filings. They don’t have enough people.

Alexander Ferguson 11:11
Yeah. What is the opposite? If they weren’t using your platform? What would they be using? And what gain would they get by using your platform?

Pedro Alves 11:22
Well, nothing like you know, for some of these scenarios, it’s it’s literally nothing, it’s, there’s too many things to go after, with too few resources. And it even though they’re losing that much money, the hit rate would be so small, that still wouldn’t justify. So they just accept it as a loss they average it into well, that’s the price of doing business, we lose X many million a month. And that’s that.

Alexander Ferguson 11:54
But for your product, Opal, what would a data scientist or a business analyst be using? Where then, if they came to yours, how is it different?

Pedro Alves 12:10
Yeah, so data analysts would have to partner with a data science team, or data scientists and have the conversations and build the model. And, you know, that would be a several month long engagement, especially even more. So if you add the fact that most data science teams are really busy, because they’re being bombarded from every department of the company saying build me, this will build you this. And they have to choose, right, they have to prioritize, and sometimes you’re going to be at the bottom of the list. And you might have to wait a long time. I mean, I’ve seen projects where I’ve, I’ve talked to other data scientists, and they said, Oh, yeah, though, that team has been waiting for this project for two years. And here’s the Honest, honest answer is they’re never going to get it. The reason is, it’s not a fair line that progress is because if you have like, sales team, marketing team logistics team, and let’s say, from the CEO down there, like, look, Team A has priority, because they’re more important to us, Team A is going to produce new projects every few months. So they’re never going to run out of projects, which means you are the bottom of the list, you’re never getting to the top teammate, three months new project, new project, and every project teammate produces is more important than yours. And that’s why the data scientist was honest with me and said, that team’s never gonna get it, like, I’m never gonna have time to do their project. They’ve been waiting for two years, they’re gonna wait for another two, and another two, and another two. So, you know, that the idea, you know, and there’s so much auto ml coming out there. And there’s so many data scientists afraid that the idea of auto malice threatening to their job. And that’s just very wrong. The the way I see it is currently imagine even a company with 20,000 people, they might have only 10 People that are AI literate, right? The idea is, the job of the future for the data scientist is not to be the one out of 10,000. But as data literate, it’s for them to leverage AI and software like auto ml, to then be the shepherd that teaches and spreads AI literacy through the company. And it’s kind of the pillar of, of, you know, help and support. Right. So now he’s equipping 1000s to use the software. And he’s still the trusted source to help to teach to, you know, to engage with them to make sure they’re using it correctly. That’s a super important job. That’s a fantastic job. And there’s always going to be some projects that require non conventional data science, and building algorithms that are bespoke and those are gonna still always fall on them because they require research, they require things that are not just okay, train a model tune it, the usual, you know, rigamarole, right. And those those exceptional projects, they’re still going to fall in their lap.

Alexander Ferguson 15:14
Wow, you pose an interesting concept of data scientists, those who are AI literate, going from head down laborers, to educators, and empowers. Empowering the entire business to use AI.

Pedro Alves 15:29
I think that’s what I call the data scientists 2.0. It’s, it’s, we’re think within a few years away of that transition happening. It’s not gonna be a friendly transition. It really isn’t. I mean, didn’t the same thing happened with like DevOps people with the cloud, right? The cloud came, every DevOps person was, you know, scared, right? Saying, Oh, my gosh, my job is at risk, right? Because if, if my job currently is to make our own cloud, and if the cloud out there is offering what I do, then I’m out of a job, they’re not out of a job, there’s still tons of DevOps people, but now their job is managing something much bigger, it’s an elevated position. But they didn’t friendly. You know, it wasn’t a friendly embrace. When the cloud came, DevOps people were saying, Oh, no, we don’t need the cloud. I got this, I got our own little cluster here. You know, and eventually, the company said, You know what, this, this is way more efficient. And I think the same things gonna happen, and very few are going to be ahead of the curve and adapt the majority’s then they’re going to follow say, they’re not hiring me for this anymore, but they’re hiring me for this new position. So I better change with the times. I love

Alexander Ferguson 16:37
to then push forward a little bit more. What’s your most favorite feature that you guys have recently released, that you’re pretty psyched about?

Pedro Alves 16:47
It’s part of the data explained they explainability. It’s what we call intelligence subpopulations. Sorry, interesting subpopulations. It’s intelligent insights. That’s like the header. But then it’s interesting subpopulations. So what does that mean? Right. So a sub population is some kind of intersect of feature values. So for example, let’s say you were trying to sell whatever T shirts online, and you send out these, you know, promotional emails to people to try to get them to come and buy. And you’re trying to predict which email suits best, which person to increase the probability of them buying, right? And then you build the model to predict that predict, if I send this email to this person, what’s the probability they’re going to buy our T shirt? So you have demographic information about the people, maybe level of education, how much money they make, do they own a house? Are they married, gender, age, etc, right. And let’s say, from understanding the data, you know that people over the age of 50 usually have a increased probability of buying your shirts. For some reason, this email works really well, people over the age of 50. Right? Let’s say, similarly, people in the state of Florida, also really well, the 50 states, your emails work the best in the state of Florida, it’s a clear like 10 15% higher probability than any other state. But people over the age of 50. And in the state of Florida, the two things combined, you would expect it to be even higher probability of them buying your shirt, they have a negative 30%, compared to baseline of buying your shirt. So that’s a really weird thing, right? And that’s the sub population is defined by again, people in Florida above the age of 50. That’s the sub population. And it’s interesting, because it doesn’t behave as expected, right? For some reason, you can do well, with everybody over the age of 50, and everybody in Florida, except for those two combined. So there’s something there that you need to look at. I don’t know what it is, you understand your business, you might and you certainly need to look at that. So we’re automatically finding all these interesting pockets, interesting subpopulations, and bringing them to your attention of really unexpected behavior, where two feature values really behave erratically. That’s the example of it’s very hard to you can’t just look at every combination because it’s a factorial growth. So very quickly, with even a small amount of features and feature values. It you know, you’re talking about trillions, quadrillions quintillions of combinations. And so you know, you can’t just go searching for all of them. And we automatically find them and bring them to your attention. The most interesting ones,

Alexander Ferguson 19:51
the the magic of auto ml, automatically showing it to you can you share more about then the underlying technology of your platform? Here’s a little bit more how it works and how it stands out from anything else out there.

Pedro Alves 20:04
So one of the things that we did from the very beginning, that was very different, and then it actually kind of opened the box to do all these different things was, if you look at most, probably all auto ml companies is usually a pretty straight up straightforward formula, right, which is data comes in. And there’s going to be a little bit of help with with with the data. Meaning, if there is missing values, everybody knows how to impute or replace with certain standards, if there’s high cardinality features, if there’s an imbalance in the data set, standard stuff, so you do something to these to these values, you kind of prepare the data set. Then there’s some feature engineering, you know, what, if you try normalizing this feature? What if you try scaling it differently, you know, standard stuff. After you do all those different things with the data, then you start building models. So you say, well, there’s all these different algorithms, why don’t we try them all? Why don’t we try boosting and logistic regression and a support vector machine? And, you know, maybe a neural network? And then you have within each of these selection of algorithms, you have all the parameters, if it’s a random forest 1000 trees, 100 trees, How deep are the trees depth? Three 738? You know, how do you choose how to stop the growth of the tree? Are you going to do some pruning on the trees, every algorithm has a bunch of parameters. And you can’t just choose all values of fall parameters. So you’re going to use some kind of intelligent or pre baked preset values. Now you go from one data set being tried with 10 algorithms, each of these algorithms being tried with 100 different sets of feature of parameter values. So now you’re trying you know, 1000, or 2000, different algorithms. And then you say, Okay, which one did best, and then that’s, that’s it, and that’s you present that to your user, you say, I tried these 2000, this one is the best one. So we, we want to model what we did based on the way people win Kaggle competitions, if your users don’t know, Kaggle is a online platform where companies put money down and say, Hey, we’re GE, we would love it, if you solve this problem for us, here’s $100,000. And then, you know, 10,000 data scientists go there and try to win that money and build the best models possible. I’ve had a lot of cool collaborations with the very best calculators in the world. I mean, we’ve hired the number one Caglar in the past, won, like I think over 50 competitions, millions of dollars in prizes, and understand exactly how they work. And one thing that they all do in common is what’s called ensembles, which is why not, instead of using just the best model, why don’t we combined the best 100 It’s fascinating. It’s a whole conversation why ensembles work. And it started off like a couple 100 years ago, is the mathematician is cousin of Darwin. I forget his name. But anyway, he realized that when they had those fairs, and people guessed the weight of a cow, you could bring in an expert to guess the weight of a cow, somebody that raised cows and new cows, and had a really good guess. Or it could grab 200 random people and have them guessed the weight of the cow. And if you average their guesses, it was usually better than the guess of the expert. It’s really interesting, right? To me, it makes visual sense. If you think of like an expert dart thrower, that can always throw a dart within a couple inches of the the bull’s eye. And now you imagine you have 1000 people, each of them throws 10,000 darts. So you have like millions of darts. If they all miss randomly, they completely fill the dartboard solid, what’s the average of that? bull’s eye? Right. Anyway, so a saddles work. So we said, Okay, well, let’s not let’s do that. Let’s

do it the best to let’s ensemble, so we assemble. The problem with ensembles is that you end up with something that yes, it’s good to win a competition. But no, it’s bad to put in production. It takes a lot of compute. It’s it’s it’s a little more flimsy as far as like smaller things can break, because you’re running different algorithms, sometimes different libraries, different languages, it’s just riddle is the best way to say, so people don’t want to put that in production. It’s also costly to keep it. So we said why don’t we build this ensemble so that we know what the best model would look like? And then we use the ensemble to teach one final model to behave like it. So normally when you train a model, the models train On the data, right, all our models are trained on the data except for the last one, the last model is not trained on the data. It’s trained by the other models. And then that final model is, is all the advantages of a single model. But it has the accuracy almost identical to that of the big ensemble. That was a cool differentiator in the beginning, and it certainly helped us raise money is very cool technology. And at first, it just seemed like that was it, that’s all it could do for us. But that process of models teaching models, not directly, but kind of open the door to a lot of the things that we’re doing now that allow us to do date explainability, it’s, it’s a kind of branch off of that technology into things that you can’t just do, if you just have a model, you need to have some kind of different kind of system of teaching in order to create these other insights.

Alexander Ferguson 25:57
Adding those additional layers that may keep it stable, but making it more better accuracy, and still very easy to use. I did a quick Google search. So for audience, I Googled cousin of Darwin math, and I see Sir Francis Galton,

Pedro Alves 26:13
Francis Dalton, yeah, no bacon. No. Bacon. That’s

Alexander Ferguson 26:20
a, that’s an amazing concept of of then how that can play a major role. So for you moving forward. This is three, four years in now. Almost four, yeah, gender before, what are you most excited about? Coming up going forward into into 2021 and beyond.

Pedro Alves 26:45
So the whole day explainability thing is really new. We worked a lot of that this year. And we’re just now we’re one week away from releasing the subpopulations. So it actually comes out this coming week. And we have two more things already after that. One that’s about a month out. And then the other one is probably another month after that. So it’s starting to see how customers and how people react to that. I mean, we’ve already presented this to people, but not in the product, we’ve presented, like, here’s what the screens are going to look like, here’s what it’s going to do. And we’re trying to get people’s reactions. And so far, the reactions are great. But you know, I want to see what it’s like when it’s actually working in the product, and people get their hands on it. And then seeing if it has the value that I see and that the feedback that we’re getting seems like people really want it. So I’m excited to see how that plays out and opening up a whole new field of what the value is behind building machine learning models.

Alexander Ferguson 27:48
Who are the ideal companies and size and demographic that that can use your your platform is certain size?

Pedro Alves 27:59
Well, I mean, we’ve we’ve dealt with companies, our smallest customer, it’s, I think five people the entire it’s a small startup, it’s like, you know, the beginning, I think they’re in their first year, they just raised some money through an accelerator, I want to say, three $400,000. And they’re, like I said, I think five people and our biggest companies, one of the Fortune 50. So they’re big, big, right? Yeah. So you know, it’s the entire range. Right now, we’ve noticed, kind of stumbled upon the fact that consultants that consulting companies and system integrators have really fallen in love with the product. And we’re seeing a lot more success with those companies. A lot of them want to actually embed our software into their software. And so they want to basically get, you know, offer in their software, right, the ability of, of not only auto ml, but the whole date explainability and model explainability stuff, and all these these values of analytics. And most of the time, it’s not even that they want their customers to build their own models. They’re building the models to their customers, but they weren’t exposed to their customers, all these insights that we’re finding, so we’re making it easy for them to embed our software into their software.

Alexander Ferguson 29:22
So it’s more like a almost like an API or a license ability to to end viewability going on to their end customer. Yeah. Gotcha. Your overall business model is just a monthly is a yearly subscription. How’s that work?

Pedro Alves 29:38
Yeah, it’s it’s a yearly subscription.

Alexander Ferguson 29:41
Gotcha. And if people want to learn more, where do they go and what’s a good first step for them to take

Pedro Alves 29:48
Ople.ai, and you can contact us there if you want. I mean, you could sign up right away to start using it for free if you want to play with it without but if you need help if you want to talk to us to understand a little more We’re happy to schedule some time and show you the product and give you access but also give you a little one hour lesson and talk to you about what you think you want to build with AI and data science and we’re happy to get that.

Alexander Ferguson 30:13
Have you seen a company using AI machine learning or other technology to transform the way we live, work and do business? Go to UpTech report.com and let us know

PART 2

SUBSCRIBE

YouTube | LinkedIn | Twitter| Podcast

Small Business Marketing Software | JB Kellogg at Madwire

When the Prayer became the Product | Ben Hindman at Splash