For the last seven years, computing giant IBM has released The Five in Five, its forecast of technologies and innovations that its researchers believes will come to pass in five years’ time. This year, IBM is has done something unusual. Instead of listing five disparate ideas, it has put the weight of its five predictions behind a single notion: In five years time, computers will be able to see, hear, touch, taste, and smell — albeit in their own ways.
It’s easy to say computers already do things like this. After all, computers and websites can “see” cull through images for recognizable face, apps can identify songs by “hearing” snippets, and the Curiosity rover is (in a sense) “tasting” and “smelling” rocks and soil samples to better understand the Martian environment. But IBM is looking beyond these specialized applications to computers that can analyze and interpret the real world in real time, then proactively reprogram themselves to improve at particular sensory tasks, the same way a musician trains her ear or a gourmand hones his palette.
If IBM is right, we could be at the beginning of a new age of computing, where devices move on from being simple calculators and bit-pushers to things that can understand their world — and ours.
IBM’s prediction has less to do with better sensors and more to do with better ways to interpret what comes from them, a field known as cognitive computing.
Today’s processors basically consider one command at a time, perform that function, and blindly move on to the next function. These sequences of instructions can be dizzyingly complex, but the processors are just automatons that can only do what they’re told. Generally, these kinds of computers are dubbed Von Neumann machines, after Princeton mathematician John Von Neumann who laid out the idea in 1945. They’re tremendously powerful and flexible tools responsible for many technological breakthroughs of the last six decades, from digital data storage to personal computers, the Internet and mobile technologies. But while these processors have become ever smaller and more complex, they basically only do what people tell them to do: nothing more, and nothing less.
Cognitive computing applies concepts from neurobiology to computing, including the ways our senses process information and the way our brains develop skills and capabilities. Although cognitive computing develops in part from work in artificial intelligence, the idea is not to create machine intelligence or thinking machines such as the fictional AIs that turn up as villains in so many stories. Instead, the idea is to create devices and services that function in a similar way to human senses — only perhaps faster and with a great deal more precision — to help us with everyday tasks. In essence, cognitive computing is about creating tools that can see, hear, perceive, and draw conclusions about things in very human-like ways. It’s meant to extend our senses and capabilities to new levels, much as we’ve done with tools like microscopes, telescopes, and space probes.
IBM is one of the few companies on the planet that tackle cognitive computing. Over the decades it has amassed a tremendous intellectual property portfolio and continues to invest heavily in difficult, long-term projects that push the limits of computational power and real-time systems. One recent example is Watson, the supercomputing system that roundly defeated all-time champions on the television quiz show Jeopardy last year. (Watson’s technology is now being put to work in health care.) Another example is TrueNorth, which IBM is calling its first cognitive-computing chip. Although it’s based on the same fundamental technologies as traditional Von Neumann processors, TrueNorth is designed to simulate some of the architecture of an organic brain using a massively parallel architecture. It simulates axons, neurons, dendrites, and synapses across a network of processing cores, and uses a parallel compiler that actually maps the long-distance neural pathways of a macaque monkey. TrueNorth is being developed with DARPA (the same folks who brought us the Internet back in the 1960s). Eventually, they aspire to create a cognitive-computing architecture that closely estimates the human brain. IBM already got there earlier this year with simulators on the Lawrence Livermore National Lab Sequoia supercomputer — although it was running more than 1500 times slower than real time.
The key to cognitive computing is that the systems can modify their behavior over time based both on new input (including sensory data like images and sound) but also feedback from humans that they’re on the right track. In a sense, cognitive computing systems will be trained to do things it normally takes humans to do, like recognize pictures, understand and act upon speech, or connect seemingly disparate pieces of information to draw an expert conclusion. Even better, they will be able to constantly improve their performance without being reprogrammed or having to wait for new versions.
So how does IBM believe cognitive computing will enable computers to augment our senses in the next five years?
A computer as simple as a point-and-shoot camera can already recognize faces, but cognitive computing will allow computers to recognize different elements of photos or videos in real time, much the way a human would. For instance, vision systems could be trained to pick out items in scenes based on things like color values, angles, and edge information, so that they could easily distinguish (say) a forest from a cityscape, or a desert from the inside of a store. When applied to video, a computer could monitor security camera footage for prowlers, or issue a real-time alert when a basement floods. Online, cognitive computing systems could look at photos uploaded to social networks and alert authorities about possible emergencies or security problems. The technology could also be applied to high-resolution medical scans, enabling doctors and diagnosticians to more-comprehensively review data and perhaps catch some conditions long before they exhibit symptoms.
Of course, the technology has all sorts of commercial applications. Images of every product you buy and every image you upload to social networks could be analyzed to pick up on your interests. Take a lot of pictures of sports cars? Ads for Porsches might start appearing on your smartphone. If you take a picture of some awesome ankle boots you see on the subway A coupon for something like Fluevog Shoes might mysteriously be delivered a few minutes later. Similarly, taking cell phone video of that fender-bender so you have evidence you weren’t at fault might make car insurance offers roll in.
As with vision, computers can already recognize and process speech, but it’s hard work for a traditional machine. Systems like Apple’s Siri and Google Voice Search have to offload the heavy lifting to cloud-based systems because it’s too much for a phone to handle – that’s why they don’t work offline.
But IBM imagines many other uses besides virtual assistants. For instance, a computer could understand and interpret an infant’s sounds, then send messages to parents or caretakers. A project called Deep Thunder is already using audio data (among other things) to make quick, hyper-local weather forecasts in flood- and slide-prone areas of Brazil. Smartphones could understand when you’re talking to something (or someone) besides the phone and automatically mute their microphone. Analyzing ultrasonics in real time could allow us to listen to bats or dolphins, and medical devices that restore human hearing, like cochlear implants, could be dramatically improved.
Rather than suggesting that computers will be able to better understand touch in the near future, IBM believes they’ll be able to reproduce it for us in ways never before possible. In five years, IBM says “you will be able to touch through your phone.” The same modern haptic technology provides a subtle vibration when you touch a button on your phone could be improved to provide much more advanced feedback that simulates textures – the coarseness of pumice, or the slightly-soft feel or a ripe pear.
There’s reason to be dubious about this particular technology. Many aspects of touch, like mass, specific heat, density, and size, aren’t related to texture. Nonetheless, haptic technology can be much more precise than what’s used to make phones and game controllers vibrate, and things like 3D printers have already paved the way for high-resolution, portable data formats for textures. IBM seems focused on retail applications, like enabling users to feel simulated clothing fabric before they decide to buy: If the technology works out, it could have lots of other applications, including gaming — imagine having to find your way through levels or puzzles using nothing but touch. One advantage of this technology is that it doesn’t seem dependent at all on the heavy lifting of cognitive computing: All the pieces seem to exist right now, which may make it the most feasible of IBM’s forecasts.
Smell and taste
Cognitive computers that understand smell and taste could essentially play the role of perfumer or flavoring manufacturer. By analyzing how different chemical compounds in food react with each other – and how humans sense them – a computer could concoct new flavor combinations and recipes that can do everything from make school lunches more appealing to improving nutrition in under-developed regions. In haute cuisine, a computer might dream up a flavor combination that even seasoned chefs would never have considered (figs, beets, and pulque? anyone?) but still delight our palettes. Hits would quickly “trickle down” to ordinary fare.
A computer with a sense of smell could analyze chemical signatures (whether in the air or on surfaces, objects, or people) and apply highly specific knowledge to interpret that information. One day smartphones might have the sophisticated nose of a wine connoisseur, or be able to detect that a person is getting sick (or at least needs a mint) just by analyzing their breath when they speak on the phone. Phones might also be able to identify flowers (or perfumes) just by scent. Since instruments can be so much more sensitive than the human nose, the technology also has major applications in health care, emergency services, and industry: Imagine hospital equipment that can tell whether or not it’s sterile, smartphones and other equipment that can help locate trapped survivors (or ruptured gas lines) in a disaster, or even smartphones that can tell you how fresh a loaf of bread (or some deli salad) might be.
Is any of this practical?
The resource-intensive nature of IBM’s cognitive computing ambitions probably mean that, even if some of these technologies can be demonstrated in five years, they’re certainly not going to be mainstream.
With a few possible exceptions (like being able to “touch” textures through a smartphone or interpret baby noises), many of IBM’s cognitive-computing applications will require major real-time horsepower. IBM’s TrueNorth simulation was running on a Blue Gene/Q supercomputer capable of 16.32 petaflops — back in June of this year, it was the fastest supercomputer in the world. Computer hardware is always advancing rapidly, but that’s not processing power that’s going to make it into smartphones or traditional PCs in the next five years. The best hope is that computer-intensive sensory applications might become available as cloud-based services.
While IBM’s moxie to take on massive computing projects is certainly to be admired, it’s not necessarily the only way to engineer systems that give human-like results. Companies like Google, for instance, face gargantuan computing problems with things like their core Web search, which not only has to keep a constantly-updated index of essentially the entire Internet, but present relevant search results nearly instantly. Google doesn’t do this with cognitive computing and hardware on the scale of the human brain. Instead, it relies on actual humans: By analyzing the way millions of its users interact with its services, Google is essentially crowd-sourcing real, life, human intelligence to make its systems deliver what people want. It’s not cheap, but for now it’s more practical than throwing supercomputers at these problems. After all, there are billions of humans on the Internet, and only two or three computers on the planet right now potentially capable of doing things like the TrueNorth simulation.
Fortunately, these two approaches are not incompatible, and its possible some of IBM’s forecast sensory technologies might come to pass in the semi-near future through clever combinations of human input and trainable computing resources. One day, asking our phones about the funny noise the car is making or whether the milk is starting to go sour might be as everyday as sending a text message or sharing a photo.