They're listening to every word. That's perhaps the most common fear associated with devices with microphones, and a huge part of what's fueled increased scrutiny on how companies handle data collected by their smart products.
Now, Amazon is at the center of a potentially massive data scandal which suggests workers, not just artificial intelligence, are listening to Alexa voice commands.
Amazon has always been honest that its digital assistant is a work in progress. As it states in Alexa's FAQ page, Amazon uses data to train "these systems," and the more it uses, the better Alexa works. What's more, training Alexa with voice recording from a wide range of customers "helps ensure Alexa works well for everyone."
Amazon Is Listening To Your Voice Commands
But an in-depth investigation conducted by Bloomberg reveals that one way Amazon perfects Alexa is by making actual human beings listen to real-life voice recordings. This process is known as data annotation, and The Verge notes that it's silently become one of the core elements of the recent machine learning revolution which has birthed massive improvements across natural language processing, translations, and image and object recognition.
Supervised Learning
The idea is AI algorithms can only improve over time if the data they get can easily be parsed and sorted into categories. They aren't powerful or smart enough to interpret data themselves. When an Amazon Echo makes a voice command, Alexa doesn't always hear it accurately. Which is where Amazon workers come in. They listen to the exchange, label the data correctly, and feed it back to the system to "teach" it. This method of "supervised learning" isn't at all new — Apple, Facebook, and Google use it to improve their respective services.
But in the case of Amazon, Bloomberg sheds light on the literal thousands of Amazon employees around the world tasked with listening to Alexa recordings. More problematic, however, is the fact that most users aren't made aware that this is happening. Worse yet, some recordings might contain identifiable information about the person speaking, which could be a path for data abuse.
In response to Bloomberg's report, Amazon said that it only annotates "an extremely small sample" of voice recordings to "improve the customer experience."
"[T]his information helps us train our speech recognition and natural language understanding systems, so Alexa can better understand your requests, and ensure the service works well for everyone." Amazon added that it employs "strict technical and operational safeguards" and that it imposes "zero tolerance policy for the abuse of our system."
Furthermore, it said employees don't have access to the identity of the person speaking, and any information that may have been included in the voice recording is treated with high confidentiality.
According to Amazon, Echo devices don't store audio unless it detects a wake word or is activated by the press of a button. However, Alexa sometimes records stuff even when it's not triggered or prompted. Whether or not a recording was intended, Bloomberg reports that transcribers are still required to parse it. One of the sources said that auditors each transcribe as many as 100 recordings each day when Alexa receives no wake word or is prompted by accident.