Rewarding the Robots | Chris Nicholson from Pathmind

Anyone who’s seen the film Moneyball understands how computer simulations and statistical analysis has totally transformed the world of sports. Well, it’s not just sports. The same technology that’s used to assess how a batter gets on base is used to analyze how a factory worker retrieves a box.

The vast complexity of supply chain and manufacturing systems makes these industries perfectly positioned for assistance from AI and machine learning. But these technologies are not simple, and even once simulations are designed, there’s opportunities to take them further.

Our guest on this edition of UpTech Report is doing that with his company, Pathmind, which offers AI and Deep Reinforcement Learning technologies to bring simulations in supply chain and manufacturing sectors to their fullest potential.

Chris Nicholson is Pathmind’s founder and CEO, and he joins us to explain how Deep Reinforcement Learning uses a reward-based approach to train AI, and he discusses the numerous ways it can help companies increase worker efficiency, save energy, and make smart recommendations for better decision making.

More information:

Chris Nicholson is the founder of Pathmind, an AI startup that applies deep reinforcement learning to supply chain and industrial operations. Pathmind optimizes performance in warehouses and on factory floors, using cloud and edge compute.

Chris previously founded Skymind, an AI company focused on deep learning and machine perception. Before that, he headed communications and recruiting for FutureAdvisor, a Sequoia-backed Y Combinator startup acquired by BlackRock.

DISCLAIMER: Below is an AI generated transcript. There could be a few typos but it should be at least 90% accurate. Watch video or listen to the podcast for the full experience!

Chris Nicholson 0:00
So one way to think about deep reinforcement learning is that it makes agents like, say autonomous vehicles or processing processing machines makes them aware of each other, so that they can act as a team. And it’s a teamwork that actually produces radically different results.

Alexander Ferguson 0:19
Welcome to UpTech Report. This is our apply tech series UpTech Report is sponsored by TeraLeap. Learn how to leverage the power of video at Today, I’m joined by my guest, Chris Nicholson, who’s based in San Francisco, California. He’s the founder and CEO of Pathmind. Welcome, Chris. Good to have you on men.

Chris Nicholson 0:37
Hi, Alex. Thanks for having me.

Alexander Ferguson 0:39
Now, Pathmind, what I’ve got from your site is a sass platform that enables businesses to apply reinforcement learning deep reinforcement learning to real world scenarios, without data science expertise. So for those out there, it’s particularly your focus on manufacturing and supply chain organizations. And you’re trying to help industrial engineer simulation modelers get better results with reinforcement learning, this may be a platform for those out there who want to be able to check out now on your site, Chris, you state make better decisions with AI, help me understand like, how did how do you guys start with Pathmind? What was the problem you were looking to solve?

Chris Nicholson 1:16
You know, I got into AI almost eight years ago now has, through the door of deep learning. And deep learning. A lot of people have heard about it. It’s it’s basically deep neural networks. And it’s really made huge strides in our in our ability to say recognize images, understand language, understand speech, as well. So it’s always coming into AI through through that door of deep learning. And it turns out that there was this thing called Deep reinforcement learning that actually does something beyond recognition. Right? He does something out on see object recognition. In an image. It’s not a perceptive form of AI. It’s what we would call a prescriptive form of AI, it tells you what you ought to do. Right. So it, it makes decisions, usually in a simulated environment, like, like a video game. So for for me, and I think for a lot of people, that’s a lot more impressive. And it’s a lot more intelligent, to decide what to do in order to achieve your goal, right? It’s

Alexander Ferguson 2:18
somewhat more closer to what a human thinks right? versus just an AI is this Oh, this is green, this is blue. versus this is green and blue. You should probably go with the green one because of x.

Chris Nicholson 2:28
Yeah, yeah. And if you see an algorithm, like recognize a kitten, in a photo, you do think to yourself, and my three year old can do that. But if you see an algorithm when the game of Go against the world champion, you never think my three year old can do that. Right? So so it’s another level.

Alexander Ferguson 2:42
That’s where that’s where deep reinforcement learning, that’s where it comes into play. Yeah. Help me understand just use cases, for deep reinforcement learning. Right now, like actual use cases for that you’ve seen?

Chris Nicholson 2:56
Yeah, so beyond the video games, which everybody’s kind of seen, there are actually a lot of really interesting, really interesting real world use cases, especially in business, where you can quantify your rewards, right? So obviously, everybody in business is driven to achieve certain goals. Some are intangible, some are very quantifiable, right, especially around profit, safety, carbon emissions, right, we can really measure what’s going on. And as soon as you can measure how close you are to your goal, deep reinforcement learning can actually help you right? Because the way reinforcement learning works is it takes penalties and rewards. So let’s say whether you reach your goal or not what would cause a reward, how close you are to that goal might cause a reward. Right? You can take those penalties and rewards, and you can learn from them. Right? So in classical deep learning with object recognition, if you say that’s a dog, when it’s really a cat, you get you’re wrong. And and your predictive model has to learn it updates right? In deep reinforcement learning. The structure is really, I take I take a step. And if I get closer to my goal, if I reach my goal, I get a reward. And if I get farther away, I have to learn, right, I get I get a penalty, right? So anything you can conceive of like that deep reinforcement learning can be applied to and specifically concretely, we see it being applied in a lot of factories, and a lot of warehouses and supply chain nodes, where people are, they care about throughput? They care about efficiency, right? So they’re measuring very closely, how many items am I moving through my factory? How quickly am I moving those items, say onto a truck to get out of here? Right? How fast are my sheep, my machine is able to operate? Is anybody running it? How many collisions Am I causing? Right? As I move these things through, all those things feed into the rewards, right? That we give the algorithm and then the algorithm based on its vast experience, right? digital experience is going to be telling you Hey, I see this going on in the factory. Now. I think you ought to do these things to increase your throughput

Alexander Ferguson 4:55
is all deep reinforcement learning the same like like no matter how one would say I will Apply it here, I want to play it there, or is it each time is it applied differently?

Chris Nicholson 5:05
There are a ton of choices to make with deep reinforcement learning. And that’s true of most of AI. Right? So it typically requires a lot of expertise. Because expertise, obviously, it’s the ability to make the right choices. Unfortunately, not a lot of people in the world have been trained to use deep reinforcement learning. even fewer than the ones who’ve been trained to use AI in general. And so one reason why we exist is because those people could probably use or benefit from the accuracy of deep reinforcement learning, but they don’t yet have the training. And so one thing that Pathmind does is it just vastly simplifies what it takes to get good results out of a deep reinforcement learning agent.

Alexander Ferguson 5:47
How we understand like someone that uses your platform, what are they using? Or? Or maybe they’re not doing using anything, but what are they doing before they are starting to implement deep reinforcement learning? And that’s not the question, are people coming to you have already been trying deep reinforcement learning? Are they coming to you fresh out of the gate, like, I’ve heard about this, this deep learning, like, now I’m ready to do it, and you might be good solution. So help me understand like that?

Chris Nicholson 6:10
Yeah, it’s a bit of both, actually. So there’s deep reinforcement learning is just one optimization tool among many. And these are industries that have cared a lot about optimization for a long time, on the way they’ve approached optimization in the past is with some tools called mathematical solvers. In linear program, that’s, that’s a really common tool that people use. And it’s very useful actually, um, but it has some limits. And one of the places where mathematical solvers run into kind of our blind is when you apply them to really complex and dynamic situations. So maybe you have multiple agents moving on a factory floor, maybe you have a lot of variability in your inputs, or in the environment, somehow, those mathematical solar suddenly can’t solve the problem anymore. It’s too complex and too dynamic. And, unfortunately, a lot of what you and I and factory operators account encounter in real life are complex, dynamic situations, right? So if I want an optimizer that actually can understand where I’m at the dilemma is I face that and make it and make a recommendation to me, I want that optimizer. Deep reinforcement learning is one of the only algorithms I can turn to.

Alexander Ferguson 7:23
Would you say that it really takes it to a whole nother level different level that maybe they hadn’t had before? And if that is the case? What would be the advice that you give to someone in space and supply chain or manufacturing, when they’re like they’re looking at applying it? And they’re trying to do it the right way? Or the best way possible? What advice would you be? Would you be giving? Yeah,

Chris Nicholson 7:47
deep reinforcement learning can take it to another level, depending on the problem, of course, but we do consistently see that people are achieving, let’s say, 10%. Improvements in their performance, that’s just something pretty typical for us. Sometimes, it’s a lot more, sometimes it’s a bit less. But so these double digit digit percentage improvements are quite common. And then the question people always ask is, how is that happening? Right? How is that possible? Because in supply chain and in manufacturing, and logistics, they’ve been operating lean for so long, they’ve optimized the heck out of everything. How are you squeezing? You know, they they fight hard to squeeze another 1% of efficiency out of their operations? So how does deep reinforcement learning get an additional 10? Right? It’s kind of it’s it’s implausible, to say the least. And the the answer to that question of why is that deep reinforcement learning can coordinate multiple agents at the same time, right? So when if you’ve got a factory or a warehouse, it’s never a single agent situation, right? There’s always a crowd, you got a lot of moving parts, right, interacting, moving over the field. And it’s really helpful to think of all those moving parts as being part of a team, right? If they’re incentivized individually, they’re going to get in each other’s way. Just as you know, a team of five year olds might get in each other’s way, right? But if you incentivize them, and you train them to play as a team, then all of a sudden, your Junior Olympics level, right? Like, they go here on another scale, and they become a unit. So one way to think about deep reinforcement learning is that it makes agents like, say, autonomous vehicles or processing processing machines, makes them aware of each other so that they can act as a team. And it’s a teamwork that actually produces radically different results.

Alexander Ferguson 9:36
Oh, man, understand that kind of history for you guys. Like Where did what what’s the story of how you got to where you guys are today? Yeah,

Chris Nicholson 9:44
so So like I said, I came in through deep learning. And we had a deep reinforcement learning element to that we were building a big open source project. It was a lot of fun. It was a hard business because open source software is free, right? So we were getting we were getting to know deep reinforcement learning. And we were approached by some interesting people who said, Hey, those algorithms might be helpful for my situation and their situation were these really hard, complex optimization scenarios with physical operations. And the more we looked into it, the more we saw, the more we became fascinated by it, not just because it’s a cool application of the AI, although, you know, that’s great, too. But it’s, it has an impact on multiple levels, right. So it’s able to make businesses operate more intelligently. And that’s why they’re buying the software. But by making them more efficient, it’s also serving larger causes that I think everybody probably a lot of your listeners care about as well. Right? So climate change, you just can’t, you can’t turn away from it. Right? Weird things are happening in the climate, Texas is freezing over, right? Crazy, crazy stuff is happening, new volatility is being introduced, that business has to deal with. And some of that has to do with global warming. So global warming is a function of emissions emissions are a function of kind of output, burning energy. And so if I can come in, and I can say, I’ll make you radically more efficient, and you’ll still meet your production quotas. What I’m really doing is I’m coming at climate change sideways, right? So everybody talks about Tesla. I love them. Everybody talks about electrification, solar, right? Those are all hugely important. People don’t talk about that boring, but highly important business of just making existing operations more efficient. Right. But that’s what we do.

Alexander Ferguson 11:34
What’s what can you speak as far as like your roadmap and what you’re working on? Maybe in some interesting features, you just rolled out our will, that you want to share? Huh?

Chris Nicholson 11:43
Yeah, our roadmap, there’s, there’s a lot of pieces to it, I guess. But what we’ve got now on is the ability to train a reinforcement learning agent, and deploy it on the edge in a factory. So it can serve low latency predictions to these machines. And all those machines connected coordination, we’re slowly moving. So So up until now, we’ve been working a lot in simulations, to train our models. And we’re slowly moving straight to real data on and obviously, many more people have real data and have simulations, right? So So being able to expose these agents to the world, just like you and I are exposed to the world, right to increase their intelligence seems seems like a good way to go. Right, you have fewer moving parts. So our roadmap is training these agents on real data.

Alexander Ferguson 12:32
Right now someone would provide their simulated data, or maybe just a record of history data to your to this simulation to the platform to run it and be able to get better insights, then to take back and then apply it to their actual situation, what you’re looking towards going to is, it’s actively getting the data from all those different endpoints and learning from it real time.

Chris Nicholson 12:55
It’s close to that. So what what our users do now, so we serve industrial engineers, and simulation modelers, mechanical engineers, and they, they very often have a simulation of some sort a digital replica of their underlying physical operations, right? Those simulations can take a long time to build, they’re highly valuable. In the end, they’re very useful for reinforcement learning agents. But it’s a, it’s very important to make sure you expose those those agents to real data, right. And so we can run some real data through the simulations already. But exposing those agents, first and foremost, to data and raw data. And rather than training them for a lengthy amount of time in the simulation is one thing that we think will help some people use reinforcement learning better.

Alexander Ferguson 13:43
Could you see a world where it’ll be active reinforcement learning, meaning like, there won’t be a separate loop, it will actually be learning and applying what it learns, when detect, is that have you seen that happening? Or do you see that happening soon

Chris Nicholson 13:59
that that is happening in that’s going to become more widespread. So by active learning, I, I’m kind of what I’m hearing, you saying is online learning, like I’m exposed to data, and in real time, I’m learning the lessons from that data, and I’m altering my behavior to better achieve my goals. Right? So we call that online learning. That’s, that’s very important, especially if, say, You’re embedded in a vehicle and the weather conditions are changing, right? Or you’re driving somewhere new. You just you need to be able to adapt to circumstances, even circumstances you haven’t been exposed to before. Right? So active learning is very important.

Alexander Ferguson 14:35
In a manufacturing, yes, we will have in the manufacturing and supply chain world. Where’s the balance, though, of allowing a reinforcement model to deeper enforcement model to to learn from and apply the changes automatically, versus having human insight and oversights and involve that where’s the balance?

Chris Nicholson 14:58
Yeah, that’s a really good question. Question. I have to say, first of all, humans are underrated. Humans have a role in these scenarios. And we don’t envision a world of purely dark factories, inhabited only by machines. The human operators, the domain experts, who have a theory of their own operations are extremely important. And, and very often, the most important decisions should be left to them. But what reinforcement learning can do, in many cases, is it can surface strategies for them, that they hadn’t thought of themselves. Right, it can expose them to new possibilities. It can point out things in in their strategic layout that they haven’t noticed. Right? So there are cases where within an autonomous vehicle, a little Roomba driving around the floor, you don’t need a human operator, you don’t want someone sitting in a trailer steering that remotely. Not important enough, right? But when you’re operating a whole factory, and you’re deciding, huge, you’re making huge scheduling decisions about what’s going to get processed next, right, there’s a human with their finger on the button. But maybe they’re operating by gut right now. Right? Maybe they’re really letting their gut instinct or some kind of descriptive analytics about what happened yesterday, guide them. And one thing reinforcement learning can do is it can act as decision support on which can lead to pretty significant improvements in this kind of human machine hybrid.

Alexander Ferguson 16:19
So what deep reinforcement learning is, is how allowing people to go from just gut to more data dashboards have have been reinforcement to what your gut says is right or wrong? Yeah,

Chris Nicholson 16:33
yeah. And we think about it. We don’t we’re not a software. As a company, we don’t produce software, we produce cyborgs. Right. So I’m not manufacturing humans. But I’m manufacturing the software that that links up and interfaces with them really tightly, so that their powers expand

Alexander Ferguson 16:49
your making sideboard because I love just the perspective of how this technology should augment us and allow us to be able to make better smarter, faster decisions, and more insightful decisions. And that sounds like this type of technology. And your platform is aimed to do this right? Can you just share the future? What do you see in maybe two, three years from now? Or even five? Or 10? What is it going to look like?

Chris Nicholson 17:18
Yeah, so, you know, operational technology can move slow. So I think a lot of things that are present now in one or two factories will simply be much more widespread, right, which would be a control tower, a center of intelligence, in these physical operations, is really taking it it has a global view of everything going on everything coming in everything going out, and everybody inside, right? And it can help coordinate all those actions. Right? So calibration, is terribly difficult. If you’ve ever played the beer game in supply chain, right, you know how hard it is to time, the orders, right to get a smooth flow from the factory to the retailer. So just getting transparency, right, and then some intelligent intelligence layer on top of that transparent data layer will lead to vastly more efficient operations up and down, up and down the supply chain from manufacturers down to the consumers. So in five years, if that is widespread, I’ll be super happy. Right? I think that’s, that’s, that’s a that’s a reasonable thing to hope for. It’s actually, as I’m sure you know, it’s very hard to predict more than five years in the future. Especially because AI is changing so fast, right? So like, think about the last eight years I’ve been in AI. We went from barely recognizing images well to having superhuman abilities to recognize images. We went from barely doing NLP, well, with deep learning to being able to generate entire paragraphs of novel text that sounds human, it’s, it’s eerie, and it’s creepy. And it’s powerful, right? And all of those things are moving forward together. So are there are there kind of overlaps between all these AI advances along with IoT? Absolutely right like that, that image recognition, that kind of perceptive ability is the basis of how reinforcement learning senses its environment, right? So the better we get at that the better those machines are going to be able to steer themselves is as we look at NLP, right natural language, natural language processing, being able to respond to language. I mean, maybe the humans in the warehouse are gonna be talking to those machines, right? And maybe the machines will be talking back. Right. And I hope they’re polite.

Alexander Ferguson 19:30
Let’s make sure we get nice reinforcement on being polite.

Chris Nicholson 19:33
If it starts at home, Alex,

Alexander Ferguson 19:38
with our kids, now, help me understand just a couple more more stats about about you guys. Is it a typical SAS so monthly yearly subscription that like research or whatever they could sign up for? How does that work?

Chris Nicholson 19:51
It’s a typical SAS. It’s a yearly subscription. Although for students and academics, we have some project based stuff, right, we have shorter term subscription subscription, so They can kind of prove something out for themselves. Yep.

Alexander Ferguson 20:02
And as far as integrations with other solutions, whatever, do you have anything that you can share?

Chris Nicholson 20:08
Yeah, sure. So we integrate with any logic, any logic is a really powerful simulation software for stocks and flows, factories, warehouses, ports, airports. It’s very widespread, and its uses all over, kind of in industrial engineering, a lot of those folks use it. Students and academics, a lot of those folks use it for that, for that specific layer of simulation software, which is kind of business operations. Right. So everything that rained and even government office operations, right, so you could everything from like COVID vaccine distribution on the government side, all the way over to throughput and efficiency on the factory and warehouse side. We integrate with open source Python. So all of the typical tools of the AI stack that range from opening a gym, to re an RL lib, to TensorFlow, we integrate with all of that, right. So so that’s a very active world, all that open source stuff, people are building a lot of simulations in Python as well. Yep. And so that’s what we do. And I think the important thing for folks, obviously, we, you know, we integrate with a lot of things on the infrastructure side, but our users don’t have to think about it, right. So they don’t have to choose a cloud, they don’t have to set up distributed computing, and maintain and monitoring, we just do that for them, right. And we set it up in a way where they can train models are quickly and cheaply, maybe more cheaply than they could manage to themselves.

Alexander Ferguson 21:35
And would you say that’s kind of the real benefit of choosing your platform versus like they could do deep reinforcement learning on managed figured out themselves using open source models, but you’re providing a more streamlined, all in one solution that they not mostly all the one that just provides, it just makes a lot easier to

Chris Nicholson 21:51
roll with yet some can and do do it themselves. Others embark on that endeavor, and a year later, come over to us, right. Here’s the thing about open source, and I worked in it for a long time. It’s not open sources job to reduce complexity, right? open source is a bunch of components that people cobbled together. And one of the ways that open source companies make money is by solving the complexity of open source, which they sometimes create. Right? So so so what if you going into open source, you better be ready to deal with complexity, and one of the things that we help people do is is ignore and move past that complexity. So they can focus on the complexity of their physical operations, right, which is the thing they need to solve.

Unknown Speaker 22:37

Alexander Ferguson 22:38
For you guys, how big is the team now?

Unknown Speaker 22:41

Alexander Ferguson 22:42
Nice. And any anything else you’d want to share that would people should know about if they wanted to kind of get started to look into it?

Chris Nicholson 22:50
Huh? No. I think one of the most interesting things that they’ll find when they start working with reinforcement learning is they’ll they’ll have a chance to think about incentives, right? Because in this branch of machine learning, what you do is you write reward functions. So in supervised learning, you label data. And those are the right answers in unsupervised you kind of predict a probability distribution of the but in reinforcement learning, that feedback loop is based on what you decided is desirable, that’s your reward function, give my agent a reward when it reaches its destination, without a collision, or when it produces more items in a factory, right? So So it’s, it’s really it’s a chance, forgive this this way of putting it, but it’s a chance to play God. Right? You can construct incentives, you can design a society of robots, right? And you can figure out which incentives just like the legislators determine the law that incentives you and me, right, you can figure out which incentives push behavior at group behavior in one direction or another. It’s fascinating. You can watch different societies of robots form, right and to serve those goals because the robots are not rebels, right? You tell them what to do, they’re gonna learn, they’ll do it. And they’ll do it much more precisely than you would want. They’ll do what you said you wanted rather than what you wanted, right, which is all about alignment. So, so thinking, and what’s really interesting, what I really just can’t get over is kind of the emergence of swarm intelligence. And you have multiple robots. And by that, I mean, you have this complexity, you have a bunch of the individual agents, you could go to each one and just say, go as fast as you can, or deliver as many deliver as many items as you can. And that would lead to a certain speed, but also to a lot of chaos and collisions. But if you address them as a group, right, and you shape these collective incentives, right, kind of like the golden rule, right? You shape these collective incentives for them, very often new behaviors emerge that you didn’t program right. So what one thing we see is that these robots who normally should be seen After their destination, they’ll get out of each other’s way. Right? They’ll sacrifice for the team. Nobody programmed that, because they know after many years of, of compute experience, right, run in parallel, which only takes a few hours, but like, they’ve got years of experience by the time they come out of it, they know that getting out of the way will maximize the collective outcome, which is what I told them I cared about.

Alexander Ferguson 25:25
There’s no selfishness in the robots mindset, because they

Chris Nicholson 25:28
see the collective good, you can program in the self net selfishness and you can program it out, which is different than real life.

Alexander Ferguson 25:35
Only if I can do it, we can do that in ourselves and for those around us. I love the passion, I can tell you that you have fought for this topic. And it’s the reason why I imagined you started this company and you guys are leaning forward. So thank you so much for sharing that both the insight and where you guys are headed. For those that want to learn more go to Pathmind calm, and you’ll be able to sign up for free is that what’s a good first step that folks should be able to take

Chris Nicholson 26:01
their free accounts on. We’ve got simulations already hosted in Pathmind. So if you come sign up for free account, in two minutes, you will see reinforcement learning in action, kind of all these collective incentives leading to the outcomes you want.

Alexander Ferguson 26:17
I love it, love. Well, thank you again, so much, Chris, for your time and everyone for joining us. We’ll see you on the next episode of UpTech Report. That concludes the audio version of this episode. To see the original and more visit our UpTech Report YouTube channel. If you know a tech company, we should interview you can nominate them at UpTech Or if you just prefer to listen, make sure you subscribe to this series on Apple podcasts, Spotify or your favorite podcasting app.


YouTube | LinkedIn | TwitterPodcast

A Healthier Data Diet | Wendy Gonzalez from Sama

Building the Team Builder | Josh Millet from Criteria