side-channel attacks

Eavesdropping on Phone Taps from Voice Assistants

The microphones on voice assistants are very sensitive, and can snoop on all sorts of data:

In Hey Alexa what did I just type? we show that when sitting up to half a meter away, a voice assistant can still hear the taps you make on your phone, even in presence of noise. Modern voice assistants have two to seven microphones, so they can do directional localisation, just as human ears do, but with greater sensitivity. We assess the risk and show that a lot more work is needed to understand the privacy implications of the always-on microphones that are increasingly infesting our work spaces and our homes.

From the paper:

Abstract: Voice assistants are now ubiquitous and listen in on our everyday lives. Ever since they became commercially available, privacy advocates worried that the data they collect can be abused: might private conversations be extracted by third parties? In this paper we show that privacy threats go beyond spoken conversations and include sensitive data typed on nearby smartphones. Using two different smartphones and a tablet we demonstrate that the attacker can extract PIN codes and text messages from recordings collected by a voice assistant located up to half a meter away. This shows that remote keyboard-inference attacks are not limited to physical keyboards but extend to virtual keyboards too. As our homes become full of always-on microphones, we need to work through the implications.

Manipulating Systems Using Remote Lasers

Many systems are vulnerable:

Researchers at the time said that they were able to launch inaudible commands by shining lasers — from as far as 360 feet — at the microphones on various popular voice assistants, including Amazon Alexa, Apple Siri, Facebook Portal, and Google Assistant.

[…]

They broadened their research to show how light can be used to manipulate a wider range of digital assistants — including Amazon Echo 3 — but also sensing systems found in medical devices, autonomous vehicles, industrial systems and even space systems.

The researchers also delved into how the ecosystem of devices connected to voice-activated assistants — such as smart-locks, home switches and even cars — also fail under common security vulnerabilities that can make these attacks even more dangerous. The paper shows how using a digital assistant as the gateway can allow attackers to take control of other devices in the home: Once an attacker takes control of a digital assistant, he or she can have the run of any device connected to it that also responds to voice commands. Indeed, these attacks can get even more interesting if these devices are connected to other aspects of the smart home, such as smart door locks, garage doors, computers and even people’s cars, they said.

Another article. The researchers will present their findings at Black Hat Europe — which, of course, will be happening virtually — on December 10.

Determining What Video Conference Participants Are Typing from Watching Shoulder Movements

Determining What Video Conference Participants Are Typing from Watching Shoulder Movements

Accuracy isn’t great, but that it can be done at all is impressive.

Murtuza Jadiwala, a computer science professor heading the research project, said his team was able to identify the contents of texts by examining body movement of the participants. Specifically, they focused on the movement of their shoulders and arms to extrapolate the actions of their fingers as they typed.

Given the widespread use of high-resolution web cams during conference calls, Jadiwala was able to record and analyze slight pixel shifts around users’ shoulders to determine if they were moving left or right, forward or backward. He then created a software program that linked the movements to a list of commonly used words. He says the “text inference framework that uses the keystrokes detected from the video … predict[s] words that were most likely typed by the target user. We then comprehensively evaluate[d] both the keystroke/typing detection and text inference frameworks using data collected from a large number of participants.”

In a controlled setting, with specific chairs, keyboards and webcam, Jadiwala said he achieved an accuracy rate of 75 percent. However, in uncontrolled environments, accuracy dropped to only one out of every five words being correctly identified.

Other factors contribute to lower accuracy levels, he said, including whether long sleeve or short sleeve shirts were worn, and the length of a user’s hair. With long hair obstructing a clear view of the shoulders, accuracy plummeted.

Sidebar photo of Bruce Schneier by Joe MacInnis.