Anyone that has an Amazon Echo the device is in constant listen mode so it can respond to questions and give answers on a moment’s notice.
That is a good thing, but it can also end up being used to eavesdrop on what is going on.
“Without its continuous listening,” said Arden Rubens of Checkmarx in a post, “such voice assistants would require activation buttons and would understandably not be the incredibly effortless helpers that they are today. However, with this device’s rise in popularity, one of today’s biggest fears in connection to such devices is privacy. Especially when it comes to a user’s fear of being unknowingly recorded.”
With this in mind, Maty Siman and Shimi Eshkenazi from the Checkmarx Research Lab decided to test the idea of turning their own Amazon Echo into a tapping device.
The team’s first challenge was activating the Echo, given that the audio from Intelligent Personal Assistant (IPA) devices is only streamed to the cloud after the wake-up word is detected (“Alexa”).
Therefore, the only option left for the team was to try to turn the device into a recording device after the wake-up word is detected. Afterward, the wake-up word is detected and Alexa launches the requested capability or application, making the next step identifying how a harmless-looking “malicious” skill could be built, while secretly recording and transcribing what the user is saying, and then sending everything directly to the hacker.
There were two challenges in the team’s way to reaching their goal:
1. They had to ensure the Alexa recording session would stay alive after the user received a silent response from the device
2. They wanted the listening device to accurately transcribe the voice received by the skill
“They needed to find a way to keep the Alexa recording session alive after the user received a response from the benign part of the skill, and do so without providing any audial indication to disclose that the device was still ‘listening,’” Rubens said. This was not completely straightforward, given the Echo device needs to be prompted by users between cycles, otherwise the session ends after each response to protect users’ privacy.
They also needed to find a way to accurately transcribe the voice received by the skill application. Skills perform well when they are configured to accept a specific sentence format with placeholders for closed lists of values. Since they didn’t want to limit themselves to specific conversations, they set out to find a way for the Echo to accept any text.
The Checkmarx Research Lab disclosed this attack scenario to Amazon Lab126 and worked closely with their team to mitigate the risk. Some of the measures that were put in place are:
1. Setting specific criteria to identify (and reject if necessary) eavesdropping skills during certification
2. Detecting empty-reprompts and taking appropriate actions
3. Detecting longer-than-usual sessions and taking appropriate actions
Click here to register for a more detailed paper on the subject.