In a paper published in the peer-reviewed scientific journal Nature last week, scientists at Google Brain introduced a deep reinforcement learning technique for floorplanning, the process of arranging the placement of different components of computer chips. The researchers managed to use the reinforcement learning technique to design the next generation of Tensor Processing Units, Google’s specialized artificial intelligence processors. The use of software in chip design is not new. But according to the Google researchers, the new reinforcement learning model “automatically generates chip floorplans that are superior or comparable to those produced by humans in all key metrics, including power consumption, performance and chip area.” And it does it in a fraction of the time it would take a human to do so. The AI’s superiority to human performance has drawn a lot of attention. One media outlet described it as “artificial intelligence software that can design computer chips faster than humans can” and wrote that “a chip that would take humans months to design can be dreamed up by [Google’s] new AI in less than six hours.” Another outlet wrote, “The virtuous cycle of AI designing chips for AI looks like it’s only just getting started.” But while reading the paper, what amazed me was not the intricacy of the AI system used to design computer chips but the synergies between human and artificial intelligence.
Analogies, intuitions, and rewards
The paper describes the problem as such: “Chip floorplanning involves placing netlists onto chip canvases (two-dimensional grids) so that performance metrics (for example, power consumption, timing, area and wirelength) are optimized, while adhering to hard constraints on density and routing congestion.” Basically, what you want to do is place the components in the most optimal way. However, like any other problem, as the number of components in a chip grows, finding optimal designs becomes more difficult. Existing software help to speed up the process of discovering chip arrangements, but they fall short when the target chip grows in complexity. The researchers decided to draw experience from the way reinforcement learning has solved other complex space problems, such as the game Go. “Chip floorplanning is analogous [emphasis mine] to a game with varying pieces (for example, netlist topologies, macro counts, macro sizes and aspect ratios), boards (varying canvas sizes and aspect ratios) and win conditions (relative importance of different evaluation metrics or different density and routing congestion constraints),” the researchers wrote. This is the manifestation of one of the most important and complex aspects of human intelligence: analogy. We humans can draw out abstractions from a problem we solve and then apply those abstractions to a new problem. While we take these skills for granted, they’re what makes us very good at transfer learning. This is why the researchers could reframe the chip floorplanning problem as a board game and could tackle it in the same way that other scientists had solved the game of Go. Deep reinforcement learning models can be especially good at searching very large spaces, a feat that is physically impossible with the computing power of the brain. But the scientists faced a problem that was orders of magnitude more complex than Go. “[The] state space of placing 1,000 clusters of nodes on a grid with 1,000 cells is of the order of 1,000! (greater than 102,500), whereas Go has a state space of 10360,” the researchers wrote. The chips they wanted to design would be composed of millions of nodes. They solved the complexity problem with an artificial neural network that could encode chip designs as vector representations and made it much easier to explore the problem space. According to the paper, “Our intuition [emphasis mine] was that a policy capable of the general task of chip placement should also be able to encode the state associated with a new unseen chip into a meaningful signal at inference time. We therefore trained a neural network architecture capable of predicting reward on placements of new netlists, with the ultimate goal of using this architecture as the encoder layer of our policy.” The term intuition is often used loosely. But it is a very complex and little-understood process that involves experience, unconscious knowledge, pattern recognition, and more. Our intuitions come from years of working in one field, but they can also be obtained from experiences in other areas. Fortunately, putting these intuitions to test is becoming easier with the help of high-power computing and machine learning tools. It’s also worth noting that reinforcement learning systems need a well-designed reward. In fact, some scientists believe that with the right reward function, reinforcement learning is enough to reach artificial general intelligence. Without the right reward, however, an RL agent can get stuck in endless loops, doing stupid and meaningless things. In the following video, an RL agent playing Coast Runners is trying to maximize its points and abandons the main goal, which is to win the race. Google’s scientists designed the reward for the floorplanning system as the “negative weighted sum of proxy wirelength, congestion and density.” The weights are hyperparameters they had to adjust during the development and training of the reinforcement learning model. With the right reward, the reinforcement learning model was able to take advantage of its compute power and find all kinds of ways to design floorplans that maximized the reward.
Curated datasets
The deep neural network used in the system was developed using supervised learning. Supervised machine learning requires labeled data to adjust the parameters of the model during training. Google’s scientists created “a dataset of 10,000 chip placements where the input is the state associated with a given placement and the label is the reward for that placement.” To avoid manually creating every floorplan, the researchers used a mix of human-designed plans and computer-generated data. There’s not much information in the paper about how much human effort was involved in the evaluation of the algorithm-generated examples included in the training dataset. But without quality training data, supervised learning models will end up making poor inferences. In this sense, the AI system is different from other reinforcement learning programs such as AlphaZero, which developed its game-playing policy without the need for human input. In the future, the researchers might develop an RL agent that can design its own floorplans without the need for supervised learning components. But my guess is that, given the complexity of the problem, there’s a great chance that solving such problems will continue to require a combination of human intuition, machine learning, and high-performance computing.
Reinforcement learning design vs human design
Among the interesting aspects of the work presented by Google’s researchers is the layout of the chips. We humans use all kinds of shortcuts to overcome the limits of our brains. We can’t tackle complex problems in one big chunk. But we can design modular, hierarchical systems to divide and conquer complexity. Our ability to think and design top-down architectures has played a great part in developing systems that can perform very complicated tasks. I’ll give an example of software engineering, my own area of expertise. In theory, you can write entire programs in a very large, contiguous stream of commands in a single file. But software developers never write their programs that way. We create software in small pieces, functions, classes, modules, that can interact with each other through well-defined interfaces. We then nest those pieces into larger pieces and gradually create a hierarchy of components. You don’t need to read every line of a program to understand what it does. Modularity enables multiple programmers to work on a single program and several programs to reuse previously built components. Sometimes, just looking at the class architecture of a program is enough to point you in the right direction to locate a bug or find the right place to add an upgrade. We often trade speed for modularity and better design. After a fashion, the same can be seen in the design of computer chips. Human-designed chips tend to have neat boundaries between different modules. On the other hand, the floorplans designed by Google’s reinforcement learning agent have found the least path of resistance, regardless of how the layout looks. I’m interested to see whether this will become a sustainable model of design in the future or if it will require some type of compromise between highly-optimize machine learning–generated designs and top-down order imposed by human engineers.
AI + human intelligence
As Google’s reinforcement learning–powered chip designer shows, innovations in AI hardware and software will continue to require abstract thinking, finding the right problems to solve, developing intuitions about the solutions, and choosing the right kind of data to validate the solutions. Those are the kinds of skills that better AI chips can enhance but not replace. At the end of the day, I don’t see this as a story of “AI outsmarting humans,” “AI creating smarter AI,” or AI developing “recursive self-improvement” capabilities. It is rather a manifestation of humans finding ways to use AI as a prop to overcome their own cognitive limits and extend their capabilities. If there’s a virtuous cycle, it’s one of AI and humans finding better ways to cooperate. This article was originally published by Ben Dickson on TechTalks, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech, and what we need to look out for. You can read the original article here.