Deep learning is everywhere. This branch of artificial intelligence curates your social media and serves your Google search results. Soon, deep learning could also check your vitals or set your thermostat.
MIT researchers have developed a system that could bring deep learning neural networks to new – and much smaller – places, like the tiny computer chips in wearable medical devices, household appliances, and the 250 billion other objects that constitute the IoT.
The system, called MCUNet, designs compact neural networks that deliver unprecedented speed and accuracy for deep learning on IoT devices, despite limited memory and processing power. The technology could facilitate the expansion of the IoT universe while saving energy and improving data security.
The Internet of Things
The IoT was born in the early 1980s. Grad students at Carnegie Mellon University, including Mike Kazar ’78, connected a Cola-Cola machine to the internet. The group’s motivation was simple: laziness.
They wanted to use their computers to confirm the machine was stocked before trekking from their office to make a purchase. It was the world’s first internet-connected appliance. “This was pretty much treated as the punchline of a joke,” says Kazar, now a Microsoft engineer. “No one expected billions of devices on the internet.”
Since that Coke machine, everyday objects have become increasingly networked into the growing IoT. That includes everything from wearable heart monitors to smart fridges that tell you when you’re low on milk.
IoT devices often run on microcontrollers – simple computer chips with no operating system, minimal processing power, and less than one thousandth of the memory of a typical smartphone. So pattern-recognition tasks like deep learning are difficult to run locally on IoT devices. For complex analysis, IoT-collected data is often sent to the cloud, making it vulnerable to hacking.
“How do we deploy neural nets directly on these tiny devices? It’s a new research area that’s getting very hot,” says Han. “Companies like Google and ARM are all working in this direction.” Han is too.
With MCUNet, Han’s group codesigned two components needed for “tiny deep learning” – the operation of neural networks on microcontrollers. One component is TinyEngine, an inference engine that directs resource management, akin to an operating system. TinyEngine is optimized to run a particular neural network structure, which is selected by MCUNet’s other component: TinyNAS, a neural architecture search algorithm.
Designing a deep network for microcontrollers isn’t easy. Existing neural architecture search techniques start with a big pool of possible network structures based on a predefined template, then they gradually find the one with high accuracy and low cost. While the method works, it’s not the most efficient.
“It can work pretty well for GPUs or smartphones,” says Lin. “But it’s been difficult to directly apply these techniques to tiny microcontrollers, because they are too small.”
So Lin developed TinyNAS, a neural architecture search method that creates custom-sized networks. “We have a lot of microcontrollers that come with different power capacities and different memory sizes,” says Lin. “So we developed the algorithm [TinyNAS] to optimize the search space for different microcontrollers.”
The customized nature of TinyNAS means it can generate compact neural networks with the best possible performance for a given microcontroller – with no unnecessary parameters. “Then we deliver the final, efficient model to the microcontroller,” say Lin.
To run that tiny neural network, a microcontroller also needs a lean inference engine. A typical inference engine carries some dead weight – instructions for tasks it may rarely run. The extra code poses no problem for a laptop or smartphone, but it could easily overwhelm a microcontroller.
“It doesn’t have off-chip memory, and it doesn’t have a disk,” says Han. “Everything put together is just one megabyte of flash, so we have to really carefully manage such a small resource.” Cue TinyEngine.
The researchers developed their inference engine in conjunction with TinyNAS. TinyEngine generates the essential code necessary to run TinyNAS’ customized neural network. Any deadweight code is discarded, which cuts down on compile-time.
“We keep only what we need,” says Han. “And since we designed the neural network, we know exactly what we need. That’s the advantage of system-algorithm codesign.”
In the group’s tests of TinyEngine, the size of the compiled binary code was between 1.9 and five times smaller than comparable microcontroller inference engines from Google and ARM.
TinyEngine also contains innovations that reduce runtime, including in-place depth-wise convolution, which cuts peak memory usage nearly in half. After codesigning TinyNAS and TinyEngine, Han’s team put MCUNet to the test.
MCUNet’s first challenge was image classification. The researchers used the ImageNet database to train the system with labeled images, then to test its ability to classify novel ones. On a commercial microcontroller they tested, MCUNet successfully classified 70.7 percent of the novel images — the previous state-of-the-art neural network and inference engine combo was just 54 percent accurate. “Even a 1 percent improvement is considered significant,” says Lin. “So this is a giant leap for microcontroller settings.”
The team found similar results in ImageNet tests of three other microcontrollers. And on both speed and accuracy, MCUNet beat the competition for audio and visual “wake-word” tasks, where a user initiates an interaction with a computer using vocal cues (think: “Hey, Siri”) or simply by entering a room. The experiments highlight MCUNet’s adaptability to numerous applications.
The promising test results give Han hope that it will become the new industry standard for microcontrollers. “It has huge potential,” he says.
The advance “extends the frontier of deep neural network design even farther into the computational domain of small energy-efficient microcontrollers,” says Kurt Keutzer, a computer scientist at the University of California at Berkeley, who was not involved in the work. He adds that MCUNet could “bring intelligent computer-vision capabilities to even the simplest kitchen appliances, or enable more intelligent motion sensors.”
MCUNet could also make IoT devices more secure. “A key advantage is preserving privacy,” says Han. “You don’t need to transmit the data to the cloud.”
Analyzing data locally reduces the risk of personal information being stolen — including personal health data. Han envisions smart watches with MCUNet that don’t just sense users’ heartbeat, blood pressure, and oxygen levels, but also analyze and help them understand that information.
MCUNet could also bring deep learning to IoT devices in vehicles and rural areas with limited internet access.
Plus, MCUNet’s slim computing footprint translates into a slim carbon footprint. “Our big dream is for green AI,” says Han, adding that training a large neural network can burn carbon equivalent to the lifetime emissions of five cars. MCUNet on a microcontroller would require a small fraction of that energy.
“Our end goal is to enable efficient, tiny AI with less computational resources, less human resources, and less data,” says Han.
Artificial Intelligence—or, if you prefer, Machine Learning—is today’s hot buzzword. Unlike many buzzwords have come before it, though, this stuff isn’t vaporware dreams—it’s real, it’s here already, and it’s changing your life whether you realize it or not.
A quick overview of AI/ML
Before we go too much further, let’s talk quickly about that term “Artificial Intelligence.” Yes, it’s warranted; no, it doesn’t mean KITT from Knight Rider, or Samantha, the all-too-human unseen digital assistant voiced by Scarlett Johansson in 2013’s Her. Aside from being fictional, KITT and Samantha are examples of strong artificial intelligence, also known as Artificial General Intelligence (AGI). On the other hand, artificial intelligence—without the “strong” or “general” qualifiers—is an established academic term dating back to the 1955 proposal for the Dartmouth Summer Project on Artificial Intelligence (DSRPAI), written by Professors John McCarthy and Marvin Minsky.
All “artificial intelligence” really means is a system that emulates problem-solving skills normally seen in humans or animals. Traditionally, there are two branches of AI—symbolic and connectionist. Symbolic means an approach involving traditional rules-based programming—a programmer tells the computer what to expect and how to deal with it, very explicitly. The “expert systems” of the 1980s and 1990s were examples of symbolic (attempts at) AI; while occasionally useful, it’s generally considered impossible to scale this approach up to anything like real-world complexity.
Artificial Intelligence in the commonly used modern sense almost always refers to connectionist AI. Connectionist AI, unlike symbolic AI, isn’t directly programmed by a human. Artificial neural networks are the most common type of connectionist AI, also sometimes referred to as machine learning. My colleague Tim Lee just got done writing about neural networks last week—you can get caught up right here.
If you wanted to build a system that could drive a car, instead of programming it directly you might attach a sufficiently advanced neural network to its sensors and controls, and then let it “watch” a human driving for tens of thousands of hours. The neural network begins to attach weights to events and patterns in the data flow from its sensors that allow it to predict acceptable actions in response to various conditions. Eventually, you might give the network conditional control of the car’s controls and allow it to accelerate, brake, and steer on its own—but still with a human available. The partially trained neural network can continue learning in response to when the human assistant takes the controls away from it. “Whoops, shouldn’t have done that,” and the neural network adjusts weighted values again.
Sounds very simple, doesn’t it? In practice, not so much—there are many different types of neural networks (simple, convolutional, generative adversarial, and more), and none of them is very bright on its own—the brightest is roughly similar in scale to a worm’s brain. Most complex, really interesting tasks will require networks of neural networks that preprocess data to find areas of interest, pass those areas of interest onto other neural networks trained to more accurately classify them, and so forth.
One last piece of the puzzle is that, when dealing with neural networks, there are two major modes of operation: inference and training. Training is just what it sounds like—you give the neural network a large batch of data that represents a problem space, and let it chew through it, identifying things of interest and possibly learning to match them to labels you’ve provided along with the data. Inference, on the other hand, is using an already-trained neural network to give you answers in a problem space that it understands.
Both inference and training workloads can operate several orders of magnitude more rapidly on GPUs than on general-purpose CPUs—but that doesn’t necessarily mean you want to do absolutely everything on a GPU. It’s generally easier and faster to run small jobs directly on CPUs rather than invoking the initial overhead of loading models and data into a GPU and its onboard VRAM, so you’ll very frequently see inference workloads run on standard CPUs.
The last decade has seen remarkable improvements in the ability of computers to understand the world around them. Photo software automatically recognizes people’s faces. Smartphones transcribe spoken words into text. Self-driving cars recognize objects on the road and avoid hitting them.
Underlying these breakthroughs is an artificial intelligence technique called deep learning. Deep learning is based on neural networks, a type of data structure loosely inspired by networks of biological neurons. Neural networks are organized in layers, with inputs from one layer connected to outputs from the next layer.
Computer scientists have been experimenting with neural networks since the 1950s. But two big breakthroughs—one in 1986, the other in 2012—laid the foundation for today’s vast deep learning industry. The 2012 breakthrough—the deep learning revolution—was the discovery that we can get dramatically better performance out of neural networks with not just a few layers but with many. That discovery was made possible thanks to the growing amount of both data and computing power that had become available by 2012.