Multi-User Smart Speakers - A Narrative Review of Concerns and Problematic Interactions

Smart speakers in multi-user spaces, such as Amazon Echos, introduce risks to both owners and anyone sharing the space. They store voice recordings of user requests, and anyone in range can potentially interact with the device. As smart speakers are usually bound to a single account, despite being shareable by design, it introduces potential tensions between users. We systematically searched the literature for findings on concerns and scenarios in which problems may arise and synthesised the resulting 20 papers in a narrative review. Owners were concerned about other users’, potentially malicious, interactions, device faults, and third party sharing. In contrast, bystanders worried about "being listened" to and a lack of awareness and protections. Our findings show a clear gap in literature on the privacy concerns of regular and incidental secondary users of smart speakers.


INTRODUCTION
Smart speakers are internet-connected speakers with built-in microphones, hosting a smart voice assistant [3]. Prominent products are Amazon Echos or Google Nest speakers, previously Google Home. They are often found in smart homes alongside smart TVs, Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). CHI EA '23, April [23][24][25][26][27][28]2023 door bells, and thermostats. They are very versatile and can be programmed to interact with services from accessing entertainment to controlling smart home appliances.
Compared to other smart home devices, smart speakers pose unique security and privacy risks. Smart speakers are typically placed in shared locations like kitchens and living rooms, hence such devices are expected to be shared [15,17,22], and they can control communal smart devices like smart lights. Different than general smart home devices, smart speaker interactions, requests as well as responses, are audible to anyone in the room. This has lead to privacy issues such as revealing calendar entries or items from an online order [19,32]. Smart speakers also accept voice request from anyone in the range of audibility, leading to further security risks such as unauthorised purchases or unlocking of doors [19,24,32,36].
Smart speakers also collect and store voice recordings of anyone who interacts with them [1,5,18]. People worry about having their voice recorded [11,12], especially in a private environment like a home. Even if people do not wish to interact with the smart speaker, there is a risk that their voice is recorded, processed and reviewable by the owner. If a smart speaker hears anything remotely resembling their wake word (e.g. 'Alexa'), it records and sends the request off for processing [18]. This happens regardless of whether the request was made intentionally or not [7,27,32,37,39].
Prior work shows that for smart homes and smart speakers alike, typically one motivated, tech-savvy user makes the decision, accepts the risks, and sets up the smart speaker with their account [17,35]. Other people living in the same space like family members and cohabitants are often not consulted [17,27]. Bystanders visiting the space have even less notice [27,39]. These secondary users may have concerns and not understand the data handling procedure and risks, less even accept them [17,22,27]. The clear differences between account owners and secondary users in terms of control, device awareness and risk acceptance [23,27] have led to tensions between users in smart homes [17,21,35,40].
In this narrative qualitative review, we examine what is known about security and privacy concerns regarding smart speakers in shared spaces to identify gaps in the literature. Specifically, we answer the following research questions: RQ1: What concerns do users have about multiple people using smart speaker technology? RQ2: Which scenarios/anecdotes have users experienced involving multiple people that made them feel worried or awkward, or where the desired level of security or privacy was not possible to achieve?
Our review shows a clear gap in work on concerns of cohabitants and visitors to smart speakers in shared spaces. Existing work either focused on concerns of account owners of smart speakers, or looked at the different user groups in smart homes. Most studies looking at cohabitants focus on young children, but not adults. We conclude that there is a clear need to better understand the concerns of adult cohabitants and bystanders.

BACKGROUND
Typically, interactions with smart speakers are entirely voice-based, aided by LED indicators for when the device is "listening" to the user' request [18] 1 . An interaction begins when a user says the wake word. The device begins to record the request, which is then sent to the manufacturer's cloud service for speech and request processing [3,5,10,18]. Abdi et al. define two types on interactions: built-in skills such as information retrieval or weather, where the request is handled by the manufacturer, and third-party skills such as Spotify music or smart home control, where the request is passed on to the third party [1]. Due to the nature of a smart speaker, anyone within audible range can interact with it. Manufacturers offer mechanisms such as voice recognition and authentication pins for additional security, however they are not often utilised due to a lack of awareness [1,20,27].

Not just any smart home device
While smart speakers are often part of a smart home, used as a hub to control smart home devices by voice [17,40], they stand out from other kind of smart home devices due to their potential for data collection and interaction type. People are usually comfortable with the collection of environmental data such as room temperature, however they worry about the collection of personally identifying data such as video and audio [11,12], especially data collection in a private space [11]. However, smart speakers need to record and store voice recordings to offer their hands-free service [1,5,18]. By installing the device, owners, who may also have concerns regarding data collection and potential data leakage [1,23,27], are making a privacy-convenience trade off and decide to accept the data collection. Other users often have to accept the situation [17,22,27]. Some bystanders directly interact with smart speakers when visiting, while others report being accidentally recorded when the device activates [23,27,38,39].
Smart speakers not only differ from other sensory smart home devices because of the data they are collecting, but also in how they are delivering their services. Since requests and responses are given in a voice-based manner, they can be overheard by anyone close enough. Therefore, interactions that used to be private, such as reading emails or checking the calendar, can be shared with other people in the room [8,19].

Multi-user smart homes
Smart speakers and smart homes are rarely used only by one person. In both cases, there is typically a tech-savvy user, who drives the installation [1,17,21,34]. This person has administrative power over the devices, thus more control than other users, but is also often responsible for dealing with faults and setting up security [17,35]. Cohabitants of the owner were often not consulted upon adoption and were found to be more passive and less motivated to use it [17,21,34]. Account owners were also often not aware of concerns other users may have [22], despite having privacy concerns themselves [40][41][42]. Although some installers are aware of risks, they have limited understanding of smart home systems and their concerns are shaped by their experience in other domains like internet browsing [34]. These factors lead to gaps in threat models [40].
Research has also mapped out bystander privacy concerns in smart homes [17,21,26,38,39], but it is not clear to what extent these findings translate to the specific setting of shared smart speakers as they differ considerably in their heterogeneous data collection and interaction type.

METHODOLOGY
We conducted a qualitative narrative review [13] of the literature in computer science and adjacent areas, using the databases Web of Science, ACM Digital Library, and IEEE Xplore. The query for the initial literature review was created using the SPIDER tool [28] and covered the following components: (1) synonyms for smart speakers or product names, (2) terms that cover worries and attitudes, (3) terms for the multi-user aspect (e.g. secondary users, shared devices), and (4) list of qualitative research methods. The search terms used for the review are provided in Table ?? in Appendix A.
Additionally, the lead researcher reviewed all Symposium on Usable Privacy and Security (SOUPS) publications between 2012 and 2022 based on title and abstract as not all papers are included in the selected three databases, and two papers were added for further review. Two reviewers independently screened all 110 titles and abstracts, then the full texts of 32 papers that passed abstract screening. Conflicts among the reviewers were resolved by discussion. An additional 15 papers were included based on manual search of references found in the initial systematic review process. A publication was considered if: (1) it focused on smart speakers or other interaction-based voice assistants, (2) the study had a qualitative aspect, and (3) it suggested findings on concerns or potential scenarios regarding shared smart speakers. We required a qualitative component as we extracted quotes and findings on a participant basis rather than large scale summaries. The final analysis was conducted on 20 papers. Figure 1 summarises the described methodology.
To answer RQ1, we extracted any findings which could be interpreted as a concern or worry or a lack thereof regarding sharing smart speakers. We then categorised these extracts to identify who the user is concerned about. For example, if the extract mentioned that owners are concerned about visitors overhearing their interaction, it was categorised as owners concerned about visitors. Since little differentiation was made between related and unrelated cohabitants, the two categories were combined. To answer RQ2, we extracted any situation that was quoted or described in the results and used inductive thematic analysis to discern common patterns. The analysis was conducted by the lead researcher and revised in discussion with the remaining authors of the paper.

RESULTS
As shown in Table 1, 12 (60%) of all papers cover smart speakers or other voice interfaces, while 8 (40%) papers contain valuable insights gained as part of a smart home study. Most studies (17 papers, 85%) focus on owners, while only 6 (30%) include information from the perspective of bystanders or non-users, and 7 (35%) cover cohabitants who are not family. Methodologically, interviews dominate (16 papers, 80%) , followed by diary studies (7 papers, 35%).

RQ1: Concerns about multi-user scenarios
Our first research question focuses on the concerns and worries that smart speaker users have in regards to sharing a device. Unsurprisingly, the majority of concerns extracted from the set of papers were mentioned by account owners. Account owners and cohabitants were mostly concerned about other potential users. In contrast, visitors seem to worry about the device and the manufacturer rather than other people. While most concerns were about their own privacy, safety, or security, we came across participants worrying about other people's comfort and privacy.

Concerns about 'other people'.
Most concerns were mentioned by owners and related to potential other users within the smart speaker's proximity. Owners were often afraid that 'other people' may overhear their interactions [15,20,23,29,33]. There were concerns about housemates overhearing phone conversations conducted over the speaker.[. . .] "I do not want other people in the household to hear me talking about work, and my wife does not want everyone else to hear her talking to her friends." [20]. One person was even worried about being judged for the kind of tasks they use their smart speaker for [29]. Owners also mentioned other possibilities for unexpected privacy invasion, for example, similar voices leading to mismatched voice authentication or calling the account owner's contacts [9,20]. Another major concern of owners as well as of some cohabitants was 'other people's' inappropriate behaviour [9,14,20,27,29]. They mention rudeness, accessing improper content, or causing annoyance through pranks. When we asked participants about the possible motivation of insider threat actors, they suspected that friends and children would prank them. [29].
A number of owners were afraid that strangers may use their smart speaker to gain access to their house or their data, or use it to get on their network for other malicious purposes [20,29]. They were also afraid of strangers overhearing the authentication pin, thus bypassing the security mechanisms for sensitive actions [29]. A few participants feared malevolence by close acquaintances such as an ex-partner misusing their access after the relationship ends [29], or a housemate going through contacts and call history on purpose [20]. We found that cohabitants and visitors mentioned fewer concerns regarding 'other people'. Cohabitants mostly worried about inappropriate behaviour. Surprisingly, we did not find bystanders concerned about account owners seeing or hearing their interaction. Their concerns mainly related to the device nature, which is discussed in subsection 4.1.3.

Concerns for 'other people'.
In a few cases, participants were concerned about 'other people's' privacy rather than their own. They worried about the amount of their children's data captured and stored [16] or about intruding upon privacy themselves; either by interacting with another person's smart speaker [27], or by overly monitoring their teenager's request history [25]. Some participants actively sought to protect 'other people's' privacy. They mentioned warning their visitors and offering to mute or unplug the device to avoid causing discomfort to their clients and coworkers [7]. "We never discussed the matter. But whenever clients [they] are in my home, I make sure to plug off all the smart assistants" [7]. Davitt and Brown found that nursing homes did not allow smart speakers in shared rooms to preserve the roommate's privacy [9].

Concerns about Device
Infrastructure / Nature. Account owners, cohabitants, and bystanders all reported worries regarding the nature or functionality of smart speakers. Surprisingly, only Meng et al. reported that visitors were concerned about intruding on the owners privacy or the owner seeing the interaction [27]. More commonly, visitors were concerned about not knowing what data might be collected or how it is being used [2,27,39]. They worried about 'being listened to' and not even knowing of the device in the room [2,27,39]. Ahmed et al. discovered the need for tangible privacy mechanisms as visitors worry about protecting themselves, but do not know what being 'protected' actually looks like.
[We] observed that our participants understood the 'on' state with a high level of certainty. However, they often were not clear about the characteristics of the 'off' state. [2]. They also worry about the lack of protection mechanism available and feel awkward asking for protection in a social situation [27].
Owners also had concerns about their device, but of a different nature. First, they mentioned unwanted sharing of their personal data with third parties [17,20,27]. Almost half of all participants mentioned a loss of privacy as a risk when data they expect to be private is disclosed to a third party either through a data leak, sharing between companies or users. [27]. The second concern is related to device faults. Findings show concerns regarding faulty authentication method, lack of granular permissions for different users, and wrong outcomes due to misinterpreting requests [15,20,32]

RQ2: Scenarios
With our second research question, we explored multi-user smart speaker situations which made users uncomfortable. From our set of papers, we were able to extract over 66 scenarios, which illustrate concerns or show the root of a concern. In our analysis, three themes emerged: intentional misuse, unexpected behaviour, and device sharing.

Intentional
Misuse. The clearest cluster of scenarios involved malicious or naïve misuse. Misuse included a variety of minor annoyances like pranks [6,17,30,40], children's obsessive repetition of a certain interaction [6,17], disrespect towards device [27], and cheating in homework [16,33]. In Geeng and Roesner's study on smart homes, a participant said: When P14 had guests over, they [playfully] tried to use Amazon Echo voice commands to place orders from Amazon. P14 was annoyed about that, but had the ordering functionality disabled. [17].

Unexpected
Behaviour. Most papers reported scenarios in which a smart speaker behaved differently than the user expected and caused discomfort, distress, or even unintended data leakage. Some users were frustrated by being unable to get the smart speaker to do what they wanted [17,30]. For example, P1 logged that his girlfriend was annoyed she could not use the voice command 'turn off TV' to turn off the TV, since P1 has Apple, Chromecast, and Fire TV, each requiring a specific command, e.g., 'turn off Fire TV'. [17].
In situations when a smart speaker mistakenly activates, participants reported feeling uncomfortable. [7,27,32,37,39]. "It is kind of annoying when he [my father] isn't there, I unplug it because it is kind of weird like if we are talking just amongst ourselves and he says something vaguely like 'okay Google' which is the activation thing, it will start listening and it is kind of weird. " [39].
Participants also reported unease and embarrassment when requests were misinterpreted [16,32,33]. A participant in Shank and Gott's survey on AI leaking private information describes "One of the children in my family was asking a device to play a certain song. Apparently, the device didn't 'hear' correctly. The information that came out was very disturbing and should have not been heard by anyone under the age of 18. [ . . .]" [32].
Unexpected smart speaker behaviour may also cause for private information to be leaked [9,14,27,32,37,40]. In some situations, the revelation was unprompted, such as the case where Amazon Echo repeated conversations verbatim [37]. In another case, a teenager in an Asian Indian family describes an awkward situation: "I once thought I was alone in the room, so I asked for some sensitive information from the speaker instead of searching the info on my phone. It blasted the response on full volume, which led my mom to come to the room" [14].

Device Sharing.
Sharing devices naturally lead to the need to coordinate with co-users. Situations ranged from tensions at adoption [27,33] and coordination of sharing [15,16,23,30] to controlling the speakers [6,17,30,40]. Two participants in Meng et al. interview study described being uncomfortable when not being consulted before adoption; one explaining: "There was nothing like 'Hey, there is going to be a potential spy in the house'. There was no foreknowledge on my part. I remember being remotely annoyed by that" [27].
From the scenarios collected from the selected set of papers, we found differences in how sharing smart speakers is coordinated in a home. For unrelated cohabitants, ownership is the clear and decisive factor in prioritisation [16,23]. Deciding on who gets to use the device is more complex in a family setting as smart speakers are considered 'family' devices [15]. Some described following a 'first come, first serve' approach [15], while others used task priority, social hierarchy or prior communication to decide who got to control the device [14,30]. Tensions arise when rules are broken or misinterpreted, or the speaker is used in a way that bothers other users [14,15].

DISCUSSION AND CONCLUSION
Based on our findings to answer RQ1, our work revealed that account owners of smart speakers worry about 'other people' with regards to their smart speakers. They are also concerned about device faults, which can cause unwanted privacy intrusion, and third party data sharing. This pattern agreed with Luria et al.'s findings on the hierarchical difference in social roles regarding smart speakers between 'insiders' and 'outsiders' [25]. In contrast bystanders were found to worry about being 'listened to', being unaware of the presence of a smart speakers, and the lack of protection mechanisms available to them. While there was little differentiation between related and unrelated cohabitants, children emerged as a separate user group. With over half of the reviewed papers including families, the amount of concerns related to children is unsurprising. Reporting entities were usually parents, not split into account owners and non-owner.
For RQ2, we identified three groups of scenarios which caused discomfort. The first cluster covers misuse, which aligns well with owners' concerns about other users. Participants also reported cases of unexpected smart speakers behaviour, such as unprompted activation or misheard request, which lead to revealing sensitive information to bystanders. The last set of scenarios evidenced tensions arising from sharing smart speakers. Ranging from adoption to coordination of usage, some cases showed cohabitants being annoyed when they lacked the knowledge of how to operate the device.
We found substantial gaps in understanding of privacy concerns of cohabitants, bystanders, and non-users. Although previous work on secondary users showed them to be less motivated and rather passive towards smart devices [17,27,34], there are still concerns and privacy threats that are specific to this group and need to be addressed in future smart speaker technology. Work on addressing those concerns is underway for smart homes [4,31,38,39], exploring design alternatives such as mobile apps [4] or tangible sensor controls [31], but there is little work for smart speakers.
When establishing models of perceived and actual threats, concerns and example scenarios need to be analysed together. Sometimes, there is overlap. For example, the cluster of misuse related situations mapped onto owner concerns of 'other people' misusing their device, and cases of a smart speaker activated without prompting matched what visitors are afraid of when they say they are being 'listened to'. However, most of the scenarios we analysed complement reported concerns. For example, while many situations reported children being exposed to inappropriate content or traumatised by the device behaviour, concerns mainly focusing on children behaving in a harmful way. Looking at scenarios where non-owners struggled to get the device to fulfil their request, we see a connection to owners' reports of their cohabitants' annoyance with interacting with a smart home device [17,21]. However, this annoyance is not reflected in the explicit concerns reported.
Limitations. In order to ensure a manageable size, we did not search the grey literature or articles written by journalists. While other databases with better coverage of the social sciences could have been included, the databases we chose covers most of the relevant literature in computing (ACM Digital Library), engineering (IEEE Xplore), and related fields from the humanities, sciences, and social sciences (Web of Science). We also included two papers published at SOUPS. Due to space constraints, we only presented high level findings from our in-depth thematic analysis.
Future Work. We found a clear need for a better understanding of the privacy needs of related and unrelated cohabitants, bystanders, and visitors. We believe that such an understanding is crucial for the development of smart speakers and indeed their survival as a product category. In future work, we plan to conduct a deeper analysis of involved users and interaction types, which will allow to identify threat vectors and tailor protection mechanisms.