2022 ACM Awardee Prof Abbeel For Top Work In AI And Robotics

Tech Industry

ACM PRIZE IN COMPUTING

ACM announced in April 2022, Pieter Abbeel awarded the 2021 ACM Prize in Computing for pioneering work in robot learning. Pulling extensively from the prize announcement, my interview with Abbeel, provides a great summary of his contributions.

The $250,000 USD prize, endowed by Infosys Ltd, recognizes Abbeel’s fundamental contributions in computing that through its depth, impact and broad implications, exemplifies the greatest achievements in the discipline. Abbeel is Professor and Director of the Robot Learning Lab at UC Berkeley, Co-Director of the Berkeley AI Research (BAIR) Lab, Co-Founder of Covariant [2017- ], Co-Founder of Gradescope [2014-2018, acquired by Turnitin], Advisor/Investor to many AI/Robotics start-ups, Founding Investment Partner at AIX Ventures, Host of The Robot Brains Podcast.

ABBEEL’S CONTINUING PIONEERING CONTRIBUTIONS

Abbeel pioneered teaching robots to learn from people (imitation or apprenticeship learning), how to make robots learn through their own trial and error (reinforcement learning), how to speed up skill acquisition through learning-to-learn (meta-learning), and how a robot is able to learn to perform a task from just one demonstration after having been pre-trained with a large set of demonstrations on related tasks (few-shot imitation learning). His work continues to be the foundation for the next generation of robotics. His robots have learned knot-tying, basic assembly, organizing laundry, locomotion, surgical suturing, detecting objects and planning their trajectories in uncertain situations, and vision-based robotic manipulation. Reinforcement learning prior to Abbeel’s contributions could perform only simple tasks. Abbeel added “deep” reinforcement learning. The innovation of combining reinforcement learning with deep neural networks ushered in the new field of deep reinforcement learning, which can solve far more complex problems than computer programs developed with reinforcement learning alone.

Abbeel’s key breakthrough contribution in this area was developing a deep reinforcement learning method called Trust Region Policy Optimization. This method stabilizes the reinforcement learning process, enabling robots to learn a range of simulated control skills. By sharing his results, posting video tutorials, and releasing open-source code from his lab, Abbeel helped build a community of researchers that has since pushed deep learning for robotics even further ─ with robots performing ever more complicated tasks.

Abbeel has also made several other pioneering contributions including: generalized advantage estimation, which enabled the first 3D robot locomotion learning; soft-actor critic, which is one of the most popular deep reinforcement learning algorithms to-date; domain randomization, which showcases how learning across appropriately randomized simulators can generalize surprisingly well to the real world; and hindsight experience replay, which has been instrumental for deep reinforcement learning in sparse-reward/goal-oriented environments.

Abbeel’s courses on AI, Advanced Robotics, and Deep Unsupervised Learning are some of the standard references for the field.

ABBEEL’S TRANFORMATIONAL ROLES

Abbeel is an active entrepreneur, he has founded two companies (Gradescope and Covariant), and spent the two first years at OpenAI (the AI research organization in San Francisco co-founded by Elon Musk). Gradescope provides instructors with AI that can significantly speed up grading of homework, projects, exams, and is used at over 1,000 universities. Covariant builds AI for the next generation of robotic automation, enabling robots to see, react, learn (rather than executing preprogrammed motions as robots do in car factories). Abbeel is also an active start-up investor and advisor. Abbeel is founding partner at AIX Ventures, a Venture Capital firm focused on AI start-ups. He advises many AI and robotics start-ups, and is a frequently sought-after speaker worldwide for C-suite sessions on AI future and strategy.

Abbeel is the host of The Robot Brains podcast, which explores what AI and Robotics can do today and where they are headed, through conversations with the world’s leading AI and Robotics pioneers. He has won numerous awards, including best paper awards at ICML, ICLR, NeurIPS and ICRA, early career awards from NSF, Darpa, ONR, AFOSR, Sloan, TR35, IEEE, and the Presidential Early Career Award for Scientists and Engineers (PECASE). His work is frequently featured in the popular press.

COMMENTS FROM ACM AND INFOSYS

“Teaching robots to learn could spur major advances across many industries ─ from surgery and manufacturing to shipping and automated driving,” said ACM President Gabriele Kotsis. “Pieter Abbeel is a recognized leader among a new generation of researchers who are harnessing the latest machine learning techniques to revolutionize this field. Abbeel has made leapfrog research contributions, while also generously sharing his knowledge to build a community of colleagues working to take robots to an exciting new level of ability. His work exemplifies the intent of the ACM Prize in Computing to recognize outstanding work with ‘depth, impact, and broad implications.’”

“Infosys is proud of our longstanding collaboration with ACM, and we are honored to recognize Pieter Abbeel for the 2021 ACM Prize in Computing,” said Salil Parekh, Chief Executive Officer, Infosys. “The robotics field is poised for even greater advances, as innovative new ways are emerging to combine robotics with AI, and we believe researchers like Abbeel will be instrumental in creating the next great advances in this field.”

CHAT WITH PIETER ABBEEL

Updated from 2020, I work pro bono daily across more than 200,000 CEOs, investors, scientists/experts. The ongoing interviews and Forbes articles reflect insights gained from this work.

Leveraging Abbeel’s great history in deep tech, I reached out to Pieter for an interview appearing with the non-profit, ACM Learning Center (under Interviews by Stephen Ibaraki). Here’s a direct link to the interview profile and video. Portions of the interview are summarized below and edited for clarity. AI was used to create the transcript which is about 70% accurate when doing highly technical interviews thus I strongly recommend going directly to the full interview for precision. The edited transcript will help in following and understanding the video interview.

A Chat with Pieter Abbeel: ACM Prize in Computing in 2022, Professor and Director of the Robot Learning Lab at UC Berkeley, Co-Director of the Berkeley AI Research (BAIR) Lab, Co-Founder of Covariant[2017- ], Co-Founder of Gradescope [2014-2018, acquired by Turnitin], Founding Investment Partner at AIX Ventures, Host of The Robot Brains Podcast

Stephen Ibaraki

Pieter, thank you for coming in today. You got this outstanding prize, the ACM Prize in Computing. You’ve done so much and for so many different contributions in the field of robotics that our audience needs to know – I very much appreciate your sharing your insights with our audience.

Pieter Abbeel

Thanks for having me on, Stephen.

Stephen Ibaraki

You’ve got an outstanding arc very early and continuing substantial global contributions. What were the inflection points that made you this outstanding individual?

Pieter Abbeel

When I look back to my childhood, I was just interested in everything. Anything I could learn about was interesting, whether it’s literature, languages, math, physics; I just found everything fascinating. But then at some point, I realized that it’s hard to be at the top of the game if you try to do everything. I have to think hard about what am I actually going to spend my time on; so I can really be at the frontier. Towards the end of my undergraduate, which I did in Belgium, I just got really fascinated, more so than anything else, by artificial intelligence. How is it that humans can think; how is it that humans can make intelligent decisions? How is it possible to write a program that plays chess better than the person who wrote the program? That, to me was really fascinating that it’s possible to somehow write these artificial intelligence programs that are smarter than the writer of the program, at the thing they are supposed to do. That really got me going from a just pure, intrinsic interest point of view. But also, from an impact point of view; it seemed, even if I cared about everything, and I couldn’t do everything. Maybe by working on artificial intelligence, in some way, I could be working on everything, because maybe AI could help everything else. Maybe it could, in the future, help biology, physics and so forth. We’re starting to see some of that very recently; that AI is starting to help other disciplines. That helped me to really consolidate, let me just focus on AI. Because that’s most interesting to me. Then, of course, even AI itself is a pretty large discipline. These days, it’s a lot more converged. I mean, almost all the recent advances are in deep learning and variations on the latest version of deep learning lead to the next breakthrough. When I started my PhD, which was in the early 2000s, that wasn’t the case. AI was still more of a scattered field. It was important to pick an application, to be able to make consistent progress. For me, the natural one was robotics. You might wonder why robotics? There are other domains, of course, that are really interesting, too. But to me, it seemed that if we really care about artificial intelligence, and building truly smart systems, the most natural thing is to look at robots, because robots are a lot like us, a lot like animals. That’s where we see intelligence. In the real world, the natural intelligence is all in physical embodiment. It seemed to me the most natural place to start to try to build AI is tied into physical systems; tied into robots, is a more natural way to make progress.

Stephen Ibaraki

I get this early interest in physics and everything else tied to science. You want to have global impact. You want this practicality element and robotics is really the most practical or one of the most practical ways to do this, and even in autonomous vehicles which are robots, right? There are vision systems or some kind of understanding the environment and you embody all of that early work. I noticed early in your career, you studied under Andrew Ng. He’s done a lot of work and he’s quite a well-known venture capitalist. You pioneered things like imitation learning or via first reinforcement learning but applying it to deep reinforcement learning, but with deep neural networks. Can you talk about that journey of what excited you? Why you did it? How did you create these new paradigms that are used so globally now in the robotics field?

Pieter Abbeel

During my PhD days, the way the state of the field was at the time, was that the right way to make progress was to combine deep domain expertise with machine learning. We looked at it one at a time; one of the hardest open problems in robotic control, which was helicopter flight, how can you have an autonomous helicopter that flies at the same level of capabilities as the most expert, most advanced human pilots. What we did there; we brought together techniques from optimization-based control, system identification, with machine learning, and together, it allowed us to have the most advanced autonomous helicopter. The helicopter could do flips … all kinds of advanced maneuvers for RC helicopters that almost no human pilots can actually do. But a big part of it was also, it was learning from human demonstrations. We have one of those human pilots showing us what they can do, we collected data. That was a big part of the process, combined with optimization-based control. Of course, the very big thing that happened in 2012, was the ImageNet breakthrough, where Geoff Hinton and his students show that deep neural nets can be trained from a lot of data to recognize what’s in images at a level that was unprecedented, at the time, a very big leap forward. What it showed is that maybe you can start with a purely data driven approach, without all the detailed engineering of specific knowledge into the systems directly. To me, that was when you say, hey, where does your deep reinforcement learning work come from? I had worked on reinforcement learning and quite a bit as part of the helicopter project. That was not deep reinforcement. It was regular reinforcement learning where many parts are engineered, and then some parts of learned. And deep reinforcement learning, the idea is that the large neural network is going to learn everything. It’s not just going to be a little bit of extra at the end to make it better. It’s going to be from scratch; it’s going to learn everything; to me, the reason that I thought it was time to revisit reinforcement learning. But now with deep neural networks; that ImageNet breakthrough from Geoff Hinton and his students in 2012. This means that things can be learned completely. What is this for image recognition? What does that mean for learning to control a robot? Can we, for example, have a humanoid robot that we don’t program anything into … and maybe it just starts lying on the ground. And you just say, I want you, robot, to get up. I want you to figure it out on your own. And that’s really <deep> reinforcement learning where you don’t tell it how to do anything. The agent is just being scored on the quality of what it’s doing. The simplest example is video games. A video game, there’s typically a score. And you could say, okay, play the game as many times as you want. And then over time, learn to achieve higher and higher scores. Robotics is similar, but now we have to come up with the score. For the humanoid robot, maybe the score is how high up is the head of the robot, the higher up the better. And so, when it starts on the ground, it has to learn to stand up to get its head high up. And over time, it actually learns that and that, to me, was really probably the most fascinating result. In those early days that we had this humanoid robot. This was a simulated robot— capable of, on its own, learning to get up. We didn’t have to tell it what it meant to get up. We didn’t have to tell it; you want to plant one of your hands or plant your feet. Those were the kind of things that it would have figured out on its own. I would say that’s generally the beauty of <deep> reinforcement learning is that when you look at learning today, there’s really three types of learning. There is supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, it’s pattern recognition. You feed in data, and you say this is the input and this should be the output and give a lot of examples. For example, an image and a classification of what’s in the image, or a sentence in English, and a sentence in Chinese. And give enough examples and the neural network figures out the pattern to go from the input to the output, even for things it’s never seen before, it’s going to be able to do it. Now, tricky things with supervised learning, you need to provide a lot of data that often requires a lot of human effort to provide, because there’s a lot of data out there, but you need to annotate it with a desired annotation or output. For robotics, it would mean that you need to provide the correct motor torques at each of the motors of the robot for every situation. To learn something with supervised learning, that’s very tedious. Now, with <deep> reinforcement learning, what you get is this. You just scored a system. So, you might say, high score in the game is good or standing up, meaning the head is at a good height is good, or maybe running forward, meaning that the center of gravity of the robot is moving forward, and he’s at a certain height is good. And now the beauty is that instead of you telling what all the torques should be at for all the motors, but it’s just not really clear how you’re going to do that—what should be the torque commands that all motors to do running. You just have to come up with a scoring metric. And then the agent on its own, will figure out how to achieve a high score. And of course, that’s also how you train, for example, a dog. When you train a dog, you can’t force its muscle contractions and say this is how you’re going to do things; you give it treats or you talk in an encouraging way or you might talk in a less encouraging way when you don’t like what the dog is doing. But the dog is the one who has to figure out how to do it, how to get you to talk nice to it instead of not so nice. That’s the beauty of <deep> reinforcement learning. Because it not only makes it that you don’t have to supervise all the details, but also makes it that actually the system could learn to do things, possibly better than you can do them. Because you’re not telling them, the system do this this way, you’re telling it this is what you’re trying to optimize for, see how far you can get. We’ve seen this in Deep Mind’s Go systems; better than the best human players. We’ve actually started to see it in some application domains like chip design, where there’s been results where chips can be designed with a computer system that uses <deep> reinforcement learning to come up with new designs that are different from the designs humans had for circuit layouts. There are interesting opportunities here in <deep> reinforcement learning to go beyond even what humans can do. And of course, to go back to what I said earlier, there’s three types of learning. The third type of learning is unsupervised learning, where there is no input / output annotation; there is not even a score function that you provide of what supposed to be maximized, you just have data. And you might wonder, how can we learn from just data that’s seen, if there’s no score, there is nothing we’re supposed to match. The idea there is the following. We spent a lot of time on this these days in my lab, and the combination of unsupervised learning and <deep> reinforcement learning. The idea is that when we watch the world, what’s happening in the world, we’re learning from that. We’re not trying to optimize anything, we’re just watching. From that we understand how the world works. And then when we’re asked to do something, we can do it much better than if we had not had a chance to watch things happen in the world. And that’s what unsupervised learning is about. Can machines, can robots, watch videos, let’s say on YouTube, and from that, learn how things can be done. And then when we ask it to do something, be much quicker at acquiring a new skill.

Stephen Ibaraki

<I talk about Pieter’s fundamental contributions with Trust Region Policy Optimization, Generalized Advantage Estimation, <Soft-Actor Critic>, Domain Randomization, Hindsight Experience Replay. Note: that Pieter’s references to “reinforcement learning” are “deep reinforcement learning”>. These are the technical areas you’re really famous for <and more> — widely cited and used. Can you talk about your work in ways people can understand <with an example>?

Pieter Abbeel

Absolutely. When you think about reinforcement learning, it’s trial and error learning… agent is going to learn from repeated trial and error. Now, if you want to apply this in robotics, your robot is going to go through repeated trial and error before it’s going to do the thing you want it to do. In many ways, that’s beautiful, because it’s learning; you can watch it learn over time. But in other ways, at times, it can be impractical. Because if your robot really doesn’t know yet how to do things, it might damage itself, it might do damage to the environment that it’s in before it actually acquires a skill you wanted to have. And so doing real world directly in the real-world reinforcement learning can be very, very costly. If you’re going to run it that way. It might take a long time because it might require a lot of trial and error. You might be really busy repairing the robot and fixing up the room or environment it’s in. It’s very natural to then say, why not work in simulation, right? In simulation, the robot can’t really break things; when you simulate a robot. You can always reset the simulator or reset the computer as needed. Also, in simulation, you can often run things faster than real time. You can learn faster than you could do in real time. You can run many, many versions of your program in parallel. You can be collecting data faster; just depends on how many computers you are willing to spin up. There are lots of advantages learning in simulation. A lot of work is done that way. But there’s a catch of course. Simulators are never perfectly matched with reality. If your robot is going to learn purely in simulation, and if your simulation is not perfectly matched with reality, then once it’s done learning, and you load the neural network, to the real robot, it might actually fail. In fact, most likely it won’t succeed. The question is then, can we match up the simulator more closely with reality, because if we can do that, then there’s a higher chance of success. That’s an approach that I have followed many times and many have followed many times. It is quite a reasonable approach and a good approach. But it’s typically very hard to get a perfect match between simulation and reality. In the domain randomization work that you referred to Stephen, we thought about this. The idea we’ve put forward in that paper was essentially showing that maybe your simulator does not need to be all that perfectly matched with reality. Instead, what we’re going to do is we’re going to build many, many, many versions of the simulator, and they’re all going to be different. Maybe the friction properties between two surfaces are a little different between the feet of the robot and the ground, in the different simulations. Maybe the mass properties of the robot are a little different. Maybe some delay, between torque command sent and torque command activated at a motor, is a little different, in different simulators. Maybe the camera set up in a slightly different position on the robot’s head, and so forth. There are all these variations that we don’t know how to perfectly match with reality. So instead of trying to somehow find a way to perfectly match it up; we say the things we don’t know, we’re going to vary them. We’re going to have maybe 1000, or even 10,000, 100,000, million different versions of the simulator, that are all a bit different on these parameters. And so now, when I’d say, well, that’s kind of crazy, instead of trying to get the closest to reality, you’re actually making it different in every simulator. Now, what’s good about that. It turns out is that if a single neural network can learn to control the robot, across all those simulators, even though not a single one is matched with reality, the fact that there is a single neural network that’s learned to control the simulated robot, no matter what version of the simulator, makes it actually very likely, it’ll also succeed in the real world, because you’ve learned something very, very robust, that can handle a wide range of variation. And then hopefully, that means it can also handle the variation it encounters in the real world. And so that’s domain randomization; we randomize the domain the robot is learning in and the domain, well, that’s the environment of the robot. That’s the simulator.

Stephen Ibaraki

That’s really fascinating, because there’s just always this barrier and challenge with machine learning and AI and that aspect that it is very narrow <solution to a specific challenge>. Your work is giving generalization capability, which is this big challenge, right? Maybe we can reach some form of artificial general intelligence <AGI>. Do you think we’re going to get to this massive change where we do have true artificial general intelligence or, and some major breakthrough? Is there something you’re working on that can lead in this way and perhaps in your work with Hindsight Experience Replay, but a manifestation or iteration of that work, where you can use sparse-reward / goal-oriented environments, which is tied to your domain randomization as well? Do you see it moving in that direction? Are you going to be part of that change? And how?

Pieter Abbeel

That’s pretty much the most frequently asked question. Also, one of the hardest wants to answer, of course, to have a precise answer, because Artificial General Intelligence is this idea that we would end up with something that’s as smart or even smarter than humans. And in a general sense, and to make this a bit more concrete. We already have computers that are smarter than humans at very specific tasks. There are video games, computers can play better. There are regular games, classical games, chess, checkers, computers can play better, and so forth. But the best computer Go player actually doesn’t know how to physically move a Go piece on the board, all it knows is to think through the different moves in the game and, and then display a command on a computer screen. The thing is, the big missing piece, if we’re looking at AI today, is that artificial general intelligence, the ability to have a system that is extremely general, in its capabilities. That can learn new things quickly, the way humans can learn new things quickly in new environments, it has never been in before. Maybe it has never been in your kitchen before. It somehow knows how to do things in your kitchen. Maybe it has never driven, in a certain city before, but it just knows how to drive there without a map (nothing like that is needed). It just knows how to generalize. Generalize across all these different tasks. Personally, I think it’s hard to predict when we’ll actually get to human intelligence. But personally, I think it’s really fascinating to think about this notion. Can we have our agents, our AI systems, learn things, internalize things that are maximally generalizable? That allows them to learn other things, solve other problems more quickly in the future, rather than being focused on a very specific problem during learning; focus on somehow building a foundation of knowledge that allows it to learn faster in the future. I’ve actually been thinking about this quite a bit, and that the “hindsight experience replay” work you bring up Stephen is, of course related to that. Let me quickly highlight that, and then I’ll look at the more general picture. “Hindsight experience replay”; the idea is the following. This is a very effective, I would say, modification of the standard reinforcement learning paradigm, which just directly optimizes rewards. “Hindsight experience replay” is a very effective modification that allows the agent to learn from data more effectively. Imagine your agent is trying to do something, and you give it, let’s say, a reward for achieving success. But it hasn’t learned how to achieve success yet. And so now it’s trying, and it always gets zero reward, because it just doesn’t know how to do it yet. But if it always gets zero reward, then it can’t learn anything either, because everything is equally bad; it’s always zero. So, then it really has to do random trial and error, to hope to just coincidentally come across the thing that does get your reward. But if the thing that robot is expected to do is complicated, well then to randomly run across a success with random actuation of the motors of the robot. It’s very unlikely, right? So, in “hindsight experience replay”, the idea is the following. No matter what the robot does, we’re going to let it learn from it. We’re going to say, okay, I didn’t ask you to, let’s say, fall down, instead of running forward, because I asked you to run forward but you fell down. But if I had asked you to fall down, you did the right thing. Or if I had asked you to first fall on your right knee, and then fully fallen down on the ground. You did the right thing for that request. So we get here is a notion that this agent, this robot is learning a lot about what kind of commands it already knows how to satisfy; what it has done in the past; it can internalize all these concepts, such then it over time that can generalize; to have learned to fall down; I’ve learned to fall backwards; I’ve learned to fall sideways; I’ve learned to get my right leg in front of my left leg. It learns all these things. By having a wide range of existing skills that are maybe easier to acquire and easy to randomly run across, it can build up a skill repertoire that makes it easier to later then learn the thing you actually care about. That’s “hindsight experience replay”. But when we think about AGI and much more general intelligence, I think what we’re thinking about is, in some sense, a generalization of this. If we want to get there and this is not necessarily the guaranteed path to get there, and if people don’t know how we’re going to get there, we’re not there yet. But if I think about how do we get the most general AI system, I think about a system that has to learn as much as possible from the data that’s available. “Hindsight experience replay” is a way to learn as much as possible from the data the robot is collecting on its own. That’s nice. But there’s so much other data out there, too. There is so much data on the internet that’s already collected, the robot doesn’t have to go collect that data on its own. When I picture, kind of the future of robot intelligence, what I picture is a robot that has watched a ton of YouTube videos, other videos that are online. It doesn’t just watch them; it also looks at the annotations that are with them. It might say, oh, that was a video of somebody chopping carrots. That was a video of somebody maybe playing tennis or basketball or something. It’s learning from that, the connection between what’s in those videos, and how we in language describe what’s going on. But then it’s also going to be learning to predict the future. Because as it’s watching a video, a natural prediction problem to train the system on is to say, what if I don’t tell you what comes next in the video? Can you fill in the blank? Now, of course, it’s not possible to deterministically know exactly what’s going to happen. Because, you know, I can move my right arm, my left arm up, you cannot predict what I’m going to do next. I’m just watching the previous part of the video. But you can predict a probability distribution over possible futures. Can we request our deep neural networks to learn to predict probability distributions over possible futures? Those are the kind of tasks we can give them. And we can do the same thing for text. And in fact, in text in the language domain, that’s something where we’ve seen a lot of excitement in the last five years out of OpenAI’s, GPT models, Google’s BERT models, and so forth. We’ve seen text models that can predict what comes next in an article, not deterministically. But it can predict possible completions that are plausible, and likely to include the one that was there if you give it a set of predictions that it’s allowed to do. We’re going to want the same for videos. Videos are a lot bigger in terms of amount of storage; amount of data you need to process. But ultimately, I think that’s going to be at the core of how we get to more generalized robotic capabilities. These deep neural nets will be largely trained on videos. On these videos, they’ll be trained to predict what comes next; predict maybe what was in the past; predict to fill in the blank, and so forth. They’ll predict associated text with those videos, … For practical purposes, there’s essentially infinite video data on the internet for our robots to learn from. And I think 99 plus percent of the data our robots will be trained on will be that. But then that doesn’t mean the robots know how to do something themselves, they are just watching videos effectively. So how did they know about their own hands, their own legs, their own camera system where they collect data from how they move their head. So that part in my mind is going to be reinforcement learning. The robots going to combine, the same deep neural network, is going to be both doing learning from videos and texts on the internet, and reinforcement learning in a single neural network. Just the way humans have a single head, a single brain that they use to learn from what they see in the world, but also learn from their own experience. So that’s going to be brought together. Now, when I think about reinforcement learning in that context, I don’t think we want to give the robot feedback at all times. We want to give it sparingly some feedback, but we must want it to be capable of learning on its own. And so the type of reinforcement learning that is going to happen; it’s not going to be a, win this game, win that game or achieve this task or that task; it’s mostly going to be unsupervised reinforcement learning. It’s reinforcement learning where the robot or in general, the agent gets rewarded, not for completing something specific. But for being curious; for being curious about the world; being curious about what will happen if I do this, what will happen if I do that. This robot will try a wide range of things on its own, to better understand how it can interact with the world. If you transpose this onto humans, think of it as play. When you have children, they just play. They explore the world by play. And it’s thanks to exploring the world by playing that later, they can learn other things more quickly that you might care about, maybe, I mean, maybe you want them to clean up their room. But it’s not that from day one, you train them to clean up their room; doesn’t make sense, it wouldn’t be fun. But also, it wouldn’t be an effective way to learn because it’d be too focused. By allowing for play, you allow much more generalized learning. And then later, you can have children or robots learn more specialized things more quickly. So, when I envision the future, kind of very large neural networks, that will be the brains of our robots, 99 plus percent learning from data on the internet, mostly video. And then unsupervised reinforcement learning, think of it as play, to learn about its own body interaction with the world and then a little bit of learning with humans’ feedback to learn about the specific thing you want the robot to do for you.

Advertisement

Stephen Ibaraki

I’ve been around a long time, and you had Douglas Lenat and his work, and systems that exist today. You have AI going through an AI winter, where there isn’t a lot of progress in adoption. Then as you mentioned, Hinton with ImageNet surprised everybody. You had Jeff Dean <senior fellow at Google> then and Andrew Ng, working on Google Brain. Pedro Domingos wrote this book, The Master Algorithm, where he talks about the five main schools of AI and you combine them, you can get more generalization. Judea Pearl has his mathematic causal model, and if you do that correctly, then you can get more generalization. What do you think about all of this? Do you think that there’s enough capability in all of the iterations that are occurring in deep learning that it can just get there? Or do you think it’s going to have to be a hybrid amalgamation that ultimately results in what you’re talking about?

You also mentioned all this progress in the language transformer models, GPT-3 with 175 billion parameters, but the Chinese came out with one that’s 1.7 trillion; there’s probably GPT-x which is over 10s of trillions. Jensen Huang <CEO NVIDIA> his keynotes; Keith Strier the head of AI talks about models, at least on the language side that are going to be over 100 trillion <parameters>. And what does that mean? And can that be a factor? Can you provide a cohesive narrative of where you think all of this is going?

Pieter Abbeel

Just zooming out for a moment. I mean, the way research is done. Research is about trying new things coming up with new ideas that might be the next big breakthrough. It’s very natural, I think for researchers to think about, okay, maybe this deep learning thing is not everything, maybe we’re going to need other things. And, you know, depending on the researcher at MIT, either, you know, quietly trying to make progress on it, or they might want to declare, like, deep learning is not everything, we need other things. I think, if you look at the past 10 years; for 10 years, 2012 was the ImageNet moment, right? It’s been a lot of people. Small fraction relative, to the entire community, but a good number of people who like to speak up on, we will also need other things. That’s not a bad thing. But the truth is that in the last 10 years, every single breakthrough that has been of high impact in AI has been deep learning. There’s literally been no exception that I’m aware of. Every headline, every major thing, is it possible if something happened behind the scenes, that is the seed for something new, that in the future will be really important? That’s definitely possible. I mean, Geoff Hinton was working on deep learning since the 60s, 70s. And it didn’t really make its breakthrough till 2012. So obviously, something similar is possible is happening. But I would say if you look at the different frontiers, just moving forward, moving forward, moving forward. In the last 10 years, there’s only been one thing. It has been deep learning all the way. That makes, of course, a lot of people even more excited and hopeful that maybe there’s something else that’ll come later. But it has been always interesting to see these conversations, right? When people say, I think we need something else. I think the natural counter question has to be, what is something we cannot do today? What is something we really don’t know how to do today, something, you can point to, what could not do x? And then let’s see what gets there. That’s a very productive way to think about it. Right? That’s where you turn it from being a skeptic, or maybe hopeful to come up with the next big thing that’s even bigger, into a concrete, constructive way to help us all make progress. And maybe 10 years ago, people would have said, even five years ago, might have said, Art. Art is something intrinsically human, there’s no way deep nets that we train going to be able to do art. But if we look at today’s art creations from deep neural networks, you can just prompt them and say, you know, bunny on the moon, planting a US flag or something. And it’ll actually have an artistic rendering of that. It’s really interesting that anytime people have mentioned specific things; fairly consistently, they get knocked, and they get checked off. With larger, more advanced deep learning. It’s not that deep learning hasn’t changed, right? The deep learning of 2012 is not exactly the same, as the deep learning of today. Architectures have changed, best practices have changed; not to mention, the amount of data that now compute use, has drastically changed. So right now, personally, I see challenges, I see things we can’t do yet with deep learning. But personally, when I look at my research agenda, I think the most likely path, at least that I see now, to address some of the open challenges, such as a much more generalized intelligence system, is by building on what we have in deep learning and making it more capable, rather than coming from a completely different angle than what we have today. It seems there’s so much room for more. When we think about, today, we don’t have good video prediction models? It’s pretty clear to me that if we had, let’s say, a million times of compute we have today, that we should end up with good video prediction models. Mostly … pre-train our neural nets with video prediction. Well, what does that open up in terms of opportunities relative to what we’ve seen in language in recent years. I think it’s going to be tremendous for things like robotics with any kind of visual intelligence. And so to me, that’s the more natural path forward. I applaud the people who try a different path. I think it’s great. I think it’s important that not everybody follows the same path. But to me, there’s this path that seems so obvious that there’s so much headroom for the next, next, next thing, that I’m pretty excited to work on that path and see how far we can get.

Stephen Ibaraki

When I analyze your career, I see this interdisciplinary approach. Do you also look at this researcher back about 15 years ago…Yamanaka and he came up with his Yamanaka factors where you can take adults cells and regress them into stem cells. Others, they’ve taken that work; they’re growing organoids. These organoids are things like brain organoids—like mini brains. You can even see electrical patterns that mimic what you see in fetuses up to about 10 months, and then it stops. Why does it stop? It’s because it’s not getting the sensory information that a normal brain would get. There are these ethical questions; how far do we allow this to grow? Are you looking at some of that work to see what’s happening? Because you can get very granular now in that activity that’s occurring and trying to mimic that in some fashion with deep learning algorithms. Or there is the work that’s in the neuromorphic side. Last week, I interviewed Jack Dongarra, who won the Turing Award, and he talked about neuromorphic. At Oak Ridge they are looking at photonic technology together with their supercomputers and quantum computing and so on. They’re looking at applications in simulations that you mentioned, but very, very detailed simulations. Or what Jenson Huang is doing with his version of the metaverse, called the Omniverse, where you can simulate robotic action and then take it into a factory. Are you looking at some of that other work? To the biological aspects, the attempt to put that biological stuff into silicon? And then maybe some of these other paradigms that are out there, like quantum computing, and supercomputing which is at exascale. There are three <supercomputers> in the US that are Exascale— billion billion operations per second. Right?

Pieter Abbeel

When I think about research, it’s always a combination of looking at the tools you’re already working with, and somehow trying to find inspiration to push them further and pushing them further often means getting inspiration from related fields, or observations of things humans can do, maybe at a certain age that our AI systems cannot do at all. And like, wow, how can a two-year-old do this, and our AI systems don’t even get close to this? So, for example, at Berkeley, I have some collaborations with psychologist Alison Gopnik, where we’re looking at how children explore, how to play video games versus how our current reinforcement learning agents, exploring video games, what are the differences? Why are there these differences? And that’s at the higher level than their neuron activations; it’s at the behavioral level, what do we see occur in their behaviors? I think it’s super interesting to see if we can get anything from the direct neuroscience, and direct measurements of neuron activations, and so forth. Traditionally, that has been hard. And what you’re alluding to these opportunities might start to open up. But traditionally, that has been very hard. And so traditionally, a lot of the inspiration for me, and as I see many others who have had breakthroughs in AI, has come at a slightly higher level of abstraction. It might be, it seems that in the development, cognitive development of a human or animal, this is the sequence of things that get acquired, or evolutionarily early on, these were the things that were possible. And later, these things became possible. So that might inspire the order in which we try to investigate things and get to certain levels. And I mean, another way to think of it as the other things you’re alluding to is indeed, the more compute cycles we have, the likely the more progress we can make more quickly, because our experiments will be faster and larger, and so forth. And so there’s just kind of two axis. One is to get more compute. The other one is to get more compute with less power, so we can put it in our devices more easily. And I think there’s a lot of breakthroughs we can expect, in the next several years, from existing companies like Nvidia <supported builds>… so many startups that are working in the computer chip space and try to innovate and try to come up with, new paradigms that are, I would say, more tailored to neural network compute. Because traditional computers, they’re built for all kinds of purposes. They’re not built specifically with neural networks in mind. When your chip is not designed directly with neural networks in mind, well, it’s not going to be as optimized in how much compute it can produce when you want to run neural networks through it. I think that’s a pretty big trend we’re seeing that’s quite important, for the field. One obvious thing is just the type of compute that happens, essentially, it’s just matrix multiplies more or less; specialized to that. And of course, LINPACK, and so forth, specialized on matrix multiplies. Strong connection there with this year’s Turing Award winner <Jack Dongarra>. But then there’s also things like, do you need exact computation? With neural networks, it seems that maybe eight-bit calculations can at times be good enough, you don’t need 32 or 64 bit. Now, all of a sudden, you open up a lot more compute cycles, because you get away with less bits. Can you go to analog? I don’t know, I’m not an expert in that. But those are the kinds of things that are interesting questions to ask, right? Can analog be a more efficient way to get these approximate calculations done, and so forth? Now, I should also highlight there’s a whole other side of the spectrum. It’s like, okay, where do we go? But there’s also where are we today? Right? And I think, for looking at where we are today, we can already build many applications. And talked about it earlier a little bit. But it’s, there’s so much more we can do. But there’s also things we can already do today. And that’s what if we think about robotics. So far, we talked about it as, what’s the future? But I can also think about what’s today? I can today; we have a company, Covariant, where we do this exactly. We can put a robot in the warehouse to go help out. And that is using today’s deep learning technology, a combination of supervised learning, unsupervised learning, reinforcement learning, to do reliable pick and pack operations in a warehouse. And the reason I’m bringing it up as that, one, it has very big impact already today, but two, Stephen, your earlier question, all these other ways of encoding knowledge, right? Where people just hard code knowledge. When we started Covariant, we told people we’re going to do 100% learning based; we’re going to learn. And they would say, well, but aren’t there things you can hard code? Can’t you just say “if then else” this; “if then else” is correct, right? And like it’s usually correct. But when it’s wrong, how are you going to overrule it? Once you start putting in rules, then you have to put, another “if then else”, another “if then else”, and at some point, things become unwieldy. The classical AI example is, of course, birds can fly. But then what if it’s a penguin? Or what if the bird injured their wings, then this bird can maybe not fly? So, then birds can fly except when they are penguins; they injured their wings? And so, things get very unwieldy very, very quickly, right? Is it still a bird because it cannot fly? Now, this “if then else” rules have that issue, when you logically try to constrain things. What we said at Covariant, which is part of our core philosophy, if we learn everything, you can always correct things by providing more data. If it doesn’t understand something, right now, you provide more data evidencing the thing you want it to understand. You train them on additional data. And it’ll absorb it, assuming a neural network is correctly architected large enough to absorb the data and so forth. And that’s now the beauty. Because if you start hard coding things, as you’re alluding to, there are some classical AI approaches that hard code, all kinds of facts about the world and so forth. You can actually use it as a data engine. You can say, this is part of my data, anything you want to hard code about the world. We’re not going to hard code it; we’re going to use it as a data engine; is going to be part of our data. It’ll generate data, that’s interesting, that our system can learn from, but it’s not going to take it as rules, it’s going to take it as data examples of things that can happen in the world. And I’ll see examples of that, those examples, things that come from the actual world, combine it and from that internalize the most accurate model that it can put together for what it’s faced with. In our case, it is robots doing pick and pack. But this philosophy is actually much more general, to avoid this notion of hard coding, but not throw out things, you can hard code completely. You can still use them as data engines.

Stephen Ibaraki

You are the Director of the famous Robot Learning Lab; Co-director of the Berkeley AI Research Lab, which is also world famous. I wanted to get more into Covariant; your past exit with Gradescope; your venture capitalist group called AIX Ventures and then your Robot Brains Podcast. It’s really, really exciting what you’re doing at Berkeley. Can you map out where you see that going into the future? And some exciting things that you’re working on that people can relate to or have practical applications?

Pieter Abbeel

I’m super excited about robots that combine, learning from all the data on the internet, with their own experiences, where most of the data will come from the internet, video, possibly also text and a little bit of experience comes from themselves; not a little bit, because we’re constraining it, the robot should collect as much data themselves as they possibly can. But even if the robot is collecting as much data as it possibly can, is going to be so much less data than is available on the internet that’s collected by billions of people and posted on the Internet. It’s always going to a small fraction only, that the robot can collect on its own relative to what’s on the internet. The ability to somehow learn from both; a single system that learns from both, and has the ability to quickly acquire new skills, that’s really what’s always on my mind … think about the research at Berkeley. It’s how do we somehow pre-train a robot, such that it’s ready to learn new things really, really quickly? Ideally, pretty much on its own when it’s learning the new things, but mostly pre-training from data that’s available online, because that’s the biggest data source we have today.

Stephen Ibaraki

I guess this gets into one shot, quick shot kind of concept, right? Small models as well. I see so many ramifications of this. You’re leading this work. You also have a community aspect; you share videos, you’re very much into open source. You’re really developing and have developed an amazing community out there. You’re recognized for that in your ACM award as well. Let’s go to now, your venture work and the fact that you’ve co-founded Covariant, and you explain what you’re doing, what’s your ultimate goal for Covariant? And then we’ll talk a little bit about Gradescope, which has already been acquired. And then I’ll get into your ventures, but those are sort of separate buckets. Let’s talk about these companies you’ve started or have exited. Are you thinking of creating more companies? What’s your mind frame in that area?

Pieter Abbeel

It’s always hard to predict the future, but I can share the past how the existing companies came about. Gradescope, is a company that provides AI to help with grading of homework, exams, projects. It’s such a pain point, to grade homework, exams, projects of students, for instructors, for teachers, for teaching assistants, and so forth. This came about from just my teaching work at Berkeley… It just seemed, from a different era, not what we should be doing today. If we just scan that work, or students just scan and upload, … let’s keep everything digital, or turn it into digital and see what we can do. That’s what I built with a few of my students at the time. It has been acquired, as you alluded to, by a bigger education company, Turnitin; at this point, still lives under its own brand and is used in many, many, many places to save people time. This was just like an obvious pain point that just needed to be addressed, both for my own life, but also for all my colleagues and so forth. Covariant is a bit different. Covariant is something that I see more as a very long time coming. It wasn’t like, oh my god, and now run into this pain point. Let’s address it. It’s just the notion that in the future; robots should be helping us with pretty much everything. This should be a robot organizing my laundry; there should be a robot cleaning my house; there should be a robot meal for me maybe; there should be a robot helping us build the things that we build in factories; should be robots sorting through objects in warehouses and fulfilling orders; should be robots doing so many things. Freeing up our time to do other things. And the reality is, it’s not there. If you look around in your house, right now, you’re probably don’t see a whole lot of robots, maybe if you’re lucky, there’s a Roomba roaming the floor. But that’s probably it. And so why is that? And what are robots doing today? Well, if you look at the majority of robots, what they’re doing is they are in car factories, and electronics factories. And the reason they are there is because they’re mass production places where the same thing gets built over and over and over. And it’s literally the same thing. And that these robots can go through the exact same motion, pre-programmed motions, every single time. And that’s enough to build the car. And so it’s a beautiful orchestration of pre-programmed motions. That is what’s happening. And that’s why, even though it’s beautiful, when you watch it, it’s amazing. They can do precision welding, assembly and so forth. It’s just that it’s pre-programmed motions. If we have more capable AI, naturally, we should be able to go well beyond pre-programmed motions; the robots of the future should be robots that look around, react to what they see, understand how to act in that world. They understand what they’re faced with, how to react to it, and exactly what way to achieve their goals. These robots of the future; what’s missing; it’s not the mechanics; what’s missing is the brain. That’s what we’re building at Covariant. We’re building the brain for the robots of the future. Now, and what I just described, you might naturally dream of robots that you know, come into your home, and take care of a lot of things. And that is part of the future. But at Covariant, we’re focused also near term, like we’re building a very general brain for our robots. But we also want to build a business today. And from that expand the business to other use cases, of course. And we found that the most natural place for more intelligent robots, to kind of come into the world, is warehouses. Warehouses, to many people might sound like factories; factories, warehouses, often using the same sentence, right? But warehouses are very different. Because in warehouses, let’s say you go to Amazon warehouse, Walmart warehouse, Target, and so forth. There are hundreds of 1000s of different SKUs (stock keeping units), this type of items that are stored, and this changes all the time, their packaging changes all the time, exactly where they are located. Even within a bin, how the clutter is presented to the robot, it’s always different. And so, what it means is that a robot that just blindly executes pre-programmed motions, is useless to manipulate objects in a warehouse. We need much smarter robots. And so that’s what we’re building up at Covariant; robots that actually through learning, have acquired the ability to do very reliable pick and pack in warehouses. When I say pick and pack, it turns out, there are many, many incarnations of that. It’s a general skill that has many specific applications can be put to life; can be sorting onto a conveyor; it can be singulation, into maybe a pocket sorter, it can be fulfilling an order directly out of storage bins, and so forth. It can be the other way around when things arrive in the warehouse to sort through them and decide where they have to be stored in the warehouse. So, a lot of different tasks. But at the core, it’s pick and place or pick and pack. And it’s a massive growth area. And what we see is that there’s just a lot of demand to get automation into these warehouses. And so that’s where initially, we’re bringing in the Covariant brain, the Covariant robots because we bring full solutions to these places, not just a brain in a box, and then you have to figure out what to do with it. Actually, the customer will get a full solution, that sometimes we deliver it fully ourselves, sometimes with partners, but either way, ultimately the customer has a full solution available to them. That does whatever they need done at that location in their warehouse.

Stephen Ibaraki

Let’s get into your work as founding investment partner at AIX Ventures, which is a Venture Capital Group. What’s your investment thesis; very early on, or later on in terms of investments, or you’re working with government and so on with your Venture Capital Group?

Pieter Abbeel

The foundational hypothesis here is that AI is going to enable building so many new products, and so many new companies that this is a really good place to be investing into. AI can impact robotics, as I talked about with Covariant. But it can impact many other types of robotics, right? It can impact drone technologies, it can impact healthcare, it can impact work interfaces that we use to be more productive at the work we do, it can impact legal. It can really impact anything we do as humans, because AI is kind of what makes us special as humans is intelligence and AI is artificial version of intelligence. It can impact everything we do on so many new product possibilities. It’s very natural to expect that there will be many, many really impactful exciting companies emerging in the space that where AI is enabling to do something new. And “X” <AIX Ventures> here refers to it could be anything; it’s AI, enabling something else to be built. It doesn’t have to be we’re just building AI for AI sake. In fact, probably very few companies will build AI for AI sake. It’s not clear that’s a natural business model. But AI for X where X could be pretty much anything. Now, of course, you might wonder, why am I spending time on this? Right? Venture capital is a bit of a different world, though closely related, of course, to technology, but it’s the investment side of it. What happened for me personally, is that, over the years, through my activities as founder of Gradescope, and Covariant; and through the fact that there are more and more companies being founded in the AI space, and my expertise, my heart, my core technical expertise in AI, I ended up naturally becoming advisor, or early-stage angel investor into AI companies. So about a half year-ish ago, I was exchanging notes with my good friend, Richard Socher, who also founded a couple of companies and was chief scientist at Salesforce for many years. We’re just exchanging notes on our angel investing, and we’re like, hey, you know, maybe we should think of a way that we can do this together in a more structured way and allow other people to co-invest with us. The natural way to do it is to set up a venture fund. Because people can put money in the venture fund, which allows them, we put our own money into it also, of course, and then they’re effectively naturally co-investing with us as we make investments. From there, of course, we realized we don’t want to just be the two of us, two of us are so busy, he has his company YOU.com, I have Covariant and Berkeley. We don’t have a ton of time to spend on this. We want to spend a small amount of time, that’s very high leverage where we can make a difference. But then we need to find other people who are full time, who can do all the other work that also is really critical in a venture firm to happen. And so, there are several other people at AIX Ventures who are full time who are just purely focused on AI ventures as their main job. Whereas Richard and I, as well as Anthony Goldblum from Kaggle, and Chris Manning from Stanford, the four of us are essentially focused on the high leverage things where you might spend, you know, five minutes here, 10 minutes there, being able to help a company with the right intro or the right insight. And so, for me, it’s a very, very rewarding thing to be doing because it’s something I spend extremely little time on. But with a very little time I spend, I have extremely high impact. And that’s fun when you spend a very small amount of time, yet it has a ton of leverage. And it can be really helpful to people starting, you know, a new company in a very interesting space. And just a little bit of work here and there, a tiny bit of effort, as much as can be helpful, … for very high impact, which is always a lot of fun.

Stephen Ibaraki

<To tie up loose ends from prior comments made by Pieter, I talk about recommendations on analog computing with Hava Siegelmann; high bandwidth chip integration with Philip Wong at Taiwan Semiconductor (TSMC); recommend the audience follow Pieter’s The Robot Brain Podcast>. The last question is: Recommendations you want to leave to the audience?

Pieter Abbeel

That’s a very general open-ended question, let’s see. I’ll have a recommendation a bit more specific to the AI space and maybe to the younger generation, because it’s hard to give, super general advice to everybody. I think AI is super exciting. I think one of the amazing things is that it doesn’t require a crazy amount of studying to get up to speed. Once you’re familiar with basic undergrad math, some linear algebra, some vector calculus, and undergrad level, a couple programming courses, Python is the go-to language in AI these days, so be familiar with Python programming. And very basic math, you can start diving in. That’s part of the beauty. And part of why we’re also seeing an explosion of applications being built, is you want to be at the frontier of research; you have to put a bit more time into it, of course, to really understand where things are headed, and where you can contribute. But even then, it’s not the kind of thing where it takes, an inordinate amount of time to be able to start doing something interesting. Often, when undergraduates come to me at Berkeley to get involved in research. I’ll point them to a total of four courses: a course that’s kind of a basic intro to deep learning; a more advanced deep learning course; than a reinforcement learning course; and unsupervised learning course. The combination of those four courses—they are really starting to be up to speed on understanding a lot of things happening today and be able to build things themselves. That’s really fascinating. That’s such a powerful technology is really learnable and in a relatively short amount of time. It’s exciting days.

Advertisement

Leave a Reply

Your email address will not be published.