Sensors and machine learning: How applications can see, hear, feel, smell, and taste | Tech News
Through the power of deep and machine learning, faster CPUs, and new types of sensors, computers can now see, hear, feel, smell, taste, and speak. All these senses take the form of some kind of sensor (like a camera) and some kind of mathematical algorithm, usually a supervised machine learning algorithm and a model.
Here is what is available.
See: image and facial recognition
Recent research into image and facial recognition lets computers not only detect object presence but detect multiple instances of similar objects. Facebook and Google have really been leading the way here, with multiple open source releases. Facebook has stated that it has a goal of detecting things in video.
This area has come a long way in recent years, objects in an image can be segmented from other objects. However, just because you found something and can segment it from something else doesn’t mean you know what it is; that requires training a model that recognizes those things.
There are powerful tools, but they are extremely data-hungry. So Facebook and Google can release them, gain the benefits of research and community developed derivatives, and not worry too much about competition in this area. Simply put, few organizations have millions or billions of images to put through them and let alone the computing power to spend on it.
In essence, classifying objects with machine or deep learning is first a matter of “seeing” a lot of instances of a sheep or a cat, including various derivatives (big ones, little ones, furry ones, less-furry ones, skinny ones, fat ones, tailless ones). Then it is a matter of training a model that recognizes all of the variants.
While Facebook and Google are clearly putting the most weight into this field, there are other tools like the venerable OpenCV library, a grab-bag of functionality, and OpenFace, which is focused on just facial recognition.
There is even Jevois (French for “I see”), a smart camera for Arduino devices that has pretrained models based on open source libraries. It’s trained to recognize about 1,000 different objects. You can obviously tweak things with your own models. So your plan to create an autonomous quadcopter that flies itself around is indeed possible!
Hear: Speech recognition and sound classification
Much of computer “hearing” is focused around speech recognition. However, sound classification is possible. Obviously this exists because Shazam is a thing, but the models for general sound classification aren’t quite as available or broad as you’d hope. Still, PyAudioAnalysis lets you take a .wav file and classify sounds.
Did you capture bird song or road noise? Like image recognition, this means training a classification model. This area seems less invested in. Maybe because Facebook is largely run on mute and while there is a video.google.com and an images.google.com, there is no sounds.google.com.
For speech recognition, you can find open source implementations that use the more traditional Hidden Markov models like CMUSphinx and Kaldi use a neural network. There are other implementations, but the breakdown is between online and offline decoding. “Online” means can you read off a mic; “offline” means you have to wait until you have a .wav file.
Feel: one sense with little public technology
When it comes to touch, not much seems to have happened in terms of detecting how something “feels” using touch sensors. Mainly these are used in control applications (like the old Nintendo Power Glove everyone wanted but never got and apparently didn’t work all that well).
There are ”did you touch it” sensors for Arduino and libraries and sensors for detecting gestures. Probably the most promising “did you touch it” innovation is capacitive woven fabric. However when it comes to the more practical machine “touching the surface to see if there is a defect” most applications are optical or ultrasonic.
Smell: the electronic nose
Yes, computers can smell. Yes, there are practical uses for this. And the “electronic nose” has been around for a while.
For the cheap version, in essence a sensor is tied to an Arduino device and “inhales” gases. Based on the volume of gases present, it can ”detect” things like which hops are used in a beer or whether the air is becoming toxic. These technologies have been used from everything from bomb sniffing to quality control.
Taste: another sense with little public technology
What is taste to a computer? This is subjective, and remember that a lot of human taste is actually smell. The sensors here are chemical, microbial, PH, and titration sensors. The practical applications are wide, such as detecting if you’re sick, if you have adequate glucose levels, or if something is poisoned.
There is again a big overlap with smell, just as in human anatomy. There is the least amount of public source code here, and training a model probably means having access to a chem lab or a data from one.
No, you can’t build Commander Data yet
With the five senses covered, can we build Star Trek: The Next Generation’s Commander Data yet or at least his stupid cousin B4 (since we don’t have AGI yet)? Probably not yet. Even if you have the sensors and libraries, we’re still a bit away from having fully trained models everywhere. Also, this stuff is data-hungry and much of it is a bit too slow for practical real-time use.
As a result, we are still working towards practical facial recognition in video. Touch is mainly “did you touch it?” or other single-purpose sensors. Smell is much the same, and taste is moreso.
Still, like much of machine and deep learning, as long as you have a single-purpose application (like, is the coffee rotten?), AI and sensors have come a long way. Maybe computers aren’t up to humans’ level of the five senses but they do have these sense and there are widely available implementations both free and proprietary for developers to use.