17 Dec 18
Forecasting musical taste. Detecting metastatic tumors. Generating synthetic scans of brain cancer. Creating virtual environments from real-world videos. Identifying victims of human trafficking. Defeating chess grand masters and expert Dota 2 esports teams. And taking the wheel from human taxi drivers.
That’s just a sampling of artificial intelligent (AI) systems’ achievements in 2018, and evidence of how rapidly the field is advancing. At the current pace of change, analysts at the McKinsey Global Institute predict that, in the U.S. alone, AI will help to capture 20-25 percent in net economic benefits (equating to $13 trillion globally) in the next 12 years.
Some of the most impressive work has arisen from the study of deep neural networks (DNNs), a category of machine learning architecture based on data representations. They’re loosely modeled on the brain: DNNs comprise artificial neurons (i.e., mathematical functions) connected with synapses that transmit signals to other neurons. Said neurons are arranged in layers, and those signals — the product of data, or inputs, fed into the DNN — travel from layer to layer and slowly “tune” the DNN by adjusting the synaptic strength — weights — of each neural connection. Over time, after hundreds or even millions of cycles, the network extracts features from the dataset and identifies trends across samples, eventually learning to make novel predictions.
It was only three decades ago that a foundational weight-calculating technique — backpropagation — was detailed in a monumental paper (“Learning Representations by Back-propagating Errors“) authored by David Rumelhart, Geoffrey Hinton, and Ronald Williams. Backpropagation, aided by increasingly cheaper, more robust computer hardware, has enabled monumental leaps in computer vision, natural language processing, machine translation, drug design, and material inspection, where some DNNs have produced results superior to human experts.
The challenges of AGI
So are DNNs the harbinger of superintelligent robots? Demis Hassabis doesn’t believe so — and he would know. He’s the cofounder of DeepMind, a London-based machine learning startup founded with the mission of applying insights from neuroscience and computer science toward the creation of artificial general intelligence (AGI) — in other words, systems that could successfully perform any intellectual task that a human can.
“There’s still much further to go,” he told VentureBeat at the NeurIPS 2018 conference in Montreal in early December. “Games or board games are quite easy in some ways because the transition model between states is very well-specified and easy to learn. Real-world 3D environments and the real world itself is much more tricky to figure out … but it’s important if you want to do planning.”
Hassabis — a chess prodigy and University of Cambridge graduate who early in his career worked as lead programmer on video games Theme Park and Black & White — studied neuroscience at the University College London, Massachusetts Institute of Technology, and Harvard University, where he coauthored research on the autobiographical memory and episodic memory systems. He cofounded DeepMind in 2010, which only three years later unveiled a pioneering AI system that whizzed through Atari games using only raw pixels as inputs.
In the years since Google purchased DeepMind for £400 million, it and its medical research division, DeepMind Health, have dominated headlines with AlphaGo — an AI system that bested world champion Lee Sedol at the Chinese game Go — and an ongoing collaboration with the University College London Hospital that’s produced models exhibiting “near-human performance” on CT scan segmentation. More recently, DeepMind researchers debuted a protein-folding algorithm — AlphaFold — that nabbed first prize in the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP) by successfully identifying the most accurate structure for 25 out of 43 proteins. And this month, DeepMind published a paper in the journal Science showing that its AlphaZero system, a spiritual successor to AlphaGo, can play three different games — chess, a Japanese variant of chess called shogi, and Go — well enough to beat celebrated human players.
Despite DeepMind’s impressive achievements, Hassabis cautions that they by no means suggest AGI is around the corner — far from it. Unlike the AI systems of today, he says, people draw on intrinsic knowledge about the world to perform prediction and planning. Compared to even novices at Go, chess, and shogi, AlphaGo and AlphaZero are at a bit of an information disadvantage.
“These [AI] systems [are] learning to see, first of all, and then they’re learning to play,” Hassabis said. “Human players can learn [to play something like an] Atari game much more quickly … than an algorithm can [because] they … can ascribe motifs to … pixels quite quickly to identify if it’s something they need to run away from or go towards.”
To get models like AlphaZero to beat a human, it takes somewhere in the ballpark of 700,000 training steps — each step representing 4,096 board positions — on a system with thousands of Google-designed application-specific chips optimized for machine learning. That equates to about 9 hours of training for chess, 12 hours of training for shogi, and 13 days for Go.
DeepMind isn’t the only one contending with the limitations of current AI design.
In a blog post earlier this year, OpenAI — a nonprofit San Francisco-based AI research company backed by Elon Musk, Reid Hoffman, and Peter Thiel, among other tech luminaries — peeled back the curtains on OpenAI Five, the bot responsible for beating a five-person team of four professional Dota 2 players this summer. It plays 180 years’ worth of games every day (80 percent against itself and 20 percent against past selves), the organization said, on a whopping 256 Nvidia Tesla P100 graphics cards and 128,000 processor cores on Google’s Cloud Platform. Even after all that training, it struggles to apply skills it’s acquired to tasks beyond a specific game.
“We don’t have systems that can … transfer in an efficient way knowledge they have from one domain to the next. I think you need things like concepts or extractions to do that,” Hassabis said. “Building models against games is relatively easy, because it’s easy to go from one step to another, but we would like to be able to imbue … systems with generative model capabilities … which would make it easier to do planning in those environments.”
Most AI systems today also don’t scale very well. AlphaZero, AlphaGo, and OpenAI Five leverage a type of programming known as reinforcement learning, in which an AI-controlled software agent learns to take actions in an environment — a board game, for example, or a MOBA — to maximize a reward.
It’s helpful to imagine a system of Skinner boxes, said Hinton in an interview with VentureBeat. Skinner boxes — which derive their name from pioneering Harvard psychologist B. F. Skinner — make use of operant conditioning to train subject animals to perform actions, such as pressing a lever, in response to stimuli, like a light or sound. When the subject performs a behavior correctly, they receive some form of reward, often in the form of food or water.
The problem with reinforcement learning methods in AI research is that the reward signals tend to be “wimpy,” Hinton said. In some environments, agents become stuck looking for patterns in random data — the so-called “noisy TV problem.”
“Every so often you get a scalar signal that tells you that you did good, and it’s not very often, and there’s not very much information, and you’d like to train the system with millions of parameters or trillions of parameters just based on this very wimpy signal,” he said. “What you [can] do is use a vast amount of computation — a lot of the impressive demos rely on vast amounts of computation. That’s one direction, [but] it doesn’t really appeal to me. I think what [researchers] need is better insights.”
Like Hassabis, Hinton, who’s spent the past 30 years tackling a few of AI’s biggest challenges and now divides his time between Google’s Google Brain deep learning research team and the University of Toronto, knows what he’s talking about — he’s been referred to by some as the “Godfather of Deep Learning.” In addition to his seminal work in DNNs, Hinton has authored or coauthored over 200 peer-reviewed publications in machine learning, perception, memory, and symbol processing, and he’s relatively recently turned his attention to capsule neural networks, machine learning systems containing structures that help build more stable representations.
He says that collective decades of research have convinced him that the way to solve reinforcement learning’s scalability problem is to amplify the signal with a hierarchical architecture.
“Suppose you have a big … organization, and the reinforcement signal comes in at the top, and the CEO gets told the company made lots of profits this year — that’s his reinforcement signal,” Hinton explained. “And let’s say it comes in once a quarter. That’s not much signal to train a whole big hierarchy of people to do [a couple of tasks], but if the CEO has a few vice presidents and gives each vice president a goal in order to maximize his reward … that’ll lead to more profits and he’ll get rewarded.”
In this arrangement, even when the reward doesn’t come in — perhaps because the analogical CEO gave a vice president the wrong goal — the cycle will continue, Hinton said. Vice presidents always learn something, and those somethings are likely to become useful in the future eventually.
“By creating subgoals, and paying off people to achieve these subgoals, you can magnify these wimpy signals by creating many more wimpy signals,” he added.
It’s a deceptively complex thought experiment. Those vice presidents, as it were, need a channel — i.e., mid-level and low-level managers — who communicate the goals, subgoals, and associated reward conditions. Each “employee” in the system needs to be able to decide whether they did the right thing, so that they know the reason why they’re being rewarded. And so they need a language system.
“It’s a problem of getting systems where modules create subgoals for other modules,” Hinton said. “You can think of a shepherd with a sheepdog. They create languages which aren’t in English, and a well-trained sheepdog and a shepherd can communicate incredibly well. But imagine if the sheepdog had its own sheepdogs. Then it would have to take what comes from the person, in these gestures and so on, and it would have to make up ways of talking to the sub-sheepdogs.”
Fortunately, a recent AI breakthrough dubbed Transformers could be a step in the right direction.
In a blog post and accompanying paper last year (“Attention Is All You Need“), Google researchers introduced a new type of neural architecture — the abovementioned Transformer — capable of outperforming state-of-the-art models in language translation tasks, all while requiring less computation to train.
Building on its work in Transformers, Google in November open-sourced Bidirectional Encoder Representations from Transformers, or BERT. BERT learns to model relationships between sentences by pretraining on a task that can be generated from any corpus, and enables developers to train a “state-of-the-art” NLP model in 30 minutes on a single Cloud TPU (tensor processing unit, Google’s cloud-hosted accelerator hardware) or a few hours on a single graphics processing unit.
“Transformers … [are] neural nets in which you have routing,” Hinton explained. “Currently in neural nets, you have the activities that change fast, the weights that change slowly, and that’s it. Biology is telling you what you want to do is have activities that change fast, and then you want to modify synapses at many different timescales so that you can have a memory for what happened recently … [and] easily recover that. [With Transformers], a group of neurons figures out something, and it doesn’t just send it to everybody it’s connected to — it sort of figures out to send it to those guys there who know how to deal with it and not those guys over there who don’t know how to deal with it.”
It’s not a new idea. Hinton pointed out that, in the 1970s, most of the work on neural nets focused on memory, with the goal of storing information by modifying weights so it could be recreated rather than simply pulled from some form of storage.
“You don’t actually store [the information] literally like you would in a filing cabinet — you modify parameters such that if I give you a little bit of a thing, you can fill in the rest, much like making a dinosaur out of a few fragments,” he said. “All I’m saying is that we should use that idea for short-term memory, and not just for long-term memory, and it will solve all sorts of problem.”
AI and bias
Projecting ahead a bit, Hinton believes that, taking a page from biology, AI systems of the future will be mostly of the unsupervised variety. Unsupervised learning — a branch of machine learning that gleans knowledge from unlabeled, unclassified, and uncategorized test data — is almost humanlike in its ability to learn commonalities and react to their presence or absence, he says.
“In general, people don’t have labeled data. It’s not like you see a scene, and then someone puts a microelectrode into your inferior temporal cortex and says, ‘This is the one that should go ping,‘” he said. “I think that’s a much more biological way to do learning … That’s mostly what the brain does.”
“We [at DeepMind are] working toward a kind of neuroscience roadmap with the cognitive abilities we think are going to be required in order to have a fully functional human-level AI system,” he said, “capable of transfer learning, conceptual knowledge, maybe creativity in some sense, imagining future scenarios, counterfactuals and planning for the future, language usage, and symbolic reasoning. These are all things that humans do effortlessly.”
As AI becomes increasingly sophisticated, however, there’s a concern among some technologists and ethicists that it will absorb and reflect biases present in available training data. In fact, there’s evidence that has already happened.
AI research scientists at Google recently set loose a pretrained AI model on a freely available, open source dataset. One photo — a Caucasian bride in a Western-style, long and full-skirted wedding dress — resulted in labels like “dress,” “women,” “wedding,” and “bride.” However, another image — also of a bride, but of Asian descent and in ethnic dress — produced labels like “clothing,” “event,” and “performance art.” Worse, the model completely missed the person in the image.
Meanwhile, in a pair of studies commissioned by The Washington Post in July, smart speakers made by Amazon and Google were 30 percent less likely to understand non-American accents than those of native-born speakers. And corpora like Switchboard, a dataset used by companies such as IBM and Microsoft to gauge the error rates of voice models, have been shown to skew toward users from particular regions of the country.
Computer vision algorithms haven’t fared much better on the bias front.
A study published in 2012 showed that facial algorithms from vendor Cognitec performed 5 to 10 percent worse on African Americans than on Caucasians. More recently, it was revealed that a system deployed by London’s Metropolitan Police produces as many as 49 false matches for every hit. And in a test this summer of Amazon’s Rekognition service — the accuracy of which the Seattle company disputes — the American Civil Liberties Union demonstrated that, when fed 25,000 mugshots from a “public source” and tasked with comparing them to official photos of Congressional members, 28 were misidentified as criminals.
Hinton, for his part, isn’t discouraged by the negative press. He contends that a clear advantage of AI is the flexibility it affords — and the ease with which biases in the data can be modeled.
“Anything that learns from data is going to learn all the biases in the data,” he said. “The good news is that, if you can model [biases in the] data, you can … counteract them pretty effectively. There’s all sorts of ways of doing that.”
That doesn’t always work with humans, he pointed out.
“If you have people doing the jobs, you can try and model their biases; telling them not to be biased doesn’t quite work [like] subtracting the biases. So I think it’ll be much easier in a machine learning system … to deal with [bias].”
To Hinton’s point, an emerging class of bias mitigation tools promises to usher in more impartial AI systems.
In May, Facebook announced Fairness Flow, which automatically warns if an algorithm is making an unfair judgment about a person based on his or her race, gender, or age. Accenture released a toolkit that automatically detects bias in AI algorithms and helps data scientists mitigate that bias. Microsoft launched a solution of its own in May, and in September, Google debuted the What-If Tool, a bias-detecting feature of the TensorBoard web dashboard for its TensorFlow machine learning framework.
IBM, not to be outdone, in the fall released AI Fairness 360, a cloud-based, fully automated suite that “continually provides [insights]” into how AI systems are making their decisions and recommends adjustments — such as algorithmic tweaks or counterbalancing data — that might lessen the impact of prejudice. And recent research from its Watson and Cloud Platforms group has focused on mitigating bias in AI models, specifically as they relate to facial recognition.
“One good thing about very fast computers is that you can now write software that’s not totally efficient, but that’s easy to understand, because you’ve got speed you can burn,” Hinton said. “People don’t like doing that, but that’s you really want to do — you want to make your code not totally efficient so that you keep it simple … With [things that are] incredibly accurate, you have room to make them a little less accurate to achieve other things you want. And that seems to me a fair tradeoff.”
AI and jobs
Hinton is optimistic, too, about AI’s impact on the job market.
“The phrase ‘artificial general intelligence’ carries with it the implication that this sort of single robot is suddenly going to be smarter than you. I don’t think it’s going to be that. I think more and more of the routine things we do are going to be replaced by AI systems — like the Google Assistant.”
Analysts at Forrester recently projected that robotic process automation (RPA) and artificial intelligence (AI) will create digital workers — software that automates tasks traditionally performed by humans — for more than 40 percent of companies next year, and that in 2019, roughly 10 percent of U.S. jobs will be eliminated by automation. Moreover, the World Economic Forum, PricewaterhouseCoopers, and Gartner have predicted that AI could make redundant as many as 75 million jobs by 2025.
Hinton argues that AGI won’t so much make humans redundant, though. Rather, he says, it will remain for the most part myopic in its understanding of the world — at least in the near future. And he believes that it’ll continue to improve our lives in small but meaningful ways.
“[AI in the future is] going to know a lot about what you’re probably going to want to do and how to do it, and it’s going to be very helpful. But it’s not going to replace you,” he said. “If you took [a] system that was developed to be able to be very good [at driving], and you sent it on its first date, I think it would be a disaster.”
And for dangerous tasks currently performed by humans, that’s a step in the right direction, according to Hinton.
“[People] should be really afraid to ride in a car that’s controlled by a huge neural net that has no way of telling you what it’s doing,” he said. “That’s called a taxi driver.”