Introduction
Chatbots are getting ever more pervasive to everyday life. Grand View
Research (2021) estimates the global chatbot market at USD 430.0 million
in 2020 and to have a growth rate of 24.9% annually till 2028. Using
machine learning algorithms to chat with humans and learn from these
interactions, chatbots can be found in a wide range of settings,
including healthcare, retail, travel and tourism. Given the increasing
prevalence of chatbots in commercial and public contexts, it is
important to ask to what extent users have the knowledge of the context
and the limitations of AI-supported conversational interaction and the
flexibility to adapt their own interaction patterns to the AI
environment.
Human-chatbot interactions represents a form of social interaction
carried out online, also described as computer-mediated discourse
(Herring, 2004). Like web chat, chatbots are a lean medium of
conversation in that these interactions are deprived of the visual and
auditory cues which are part face to face interaction (Daft & Lengel,
1986). However, the lean properties of chatbot mediated conversation go
beyond that of normal chat in that the contributions of one
conversational partner are processed, interpreted and responded by a
computer rather than another human, leading to a number of challenges.
Firstly, understanding user intent can be challenging for bots
because “people are inconsistent, and their lives are disorderly. Their
situations, circumstances and aspirations change. They get tired and
hungry, glad or sad […] And sometimes they have no clue what
they really want” (Hantula et al., 2021). A bot will have to respond to
these changing circumstances understand the user’s intent, despite the
potential variations in which this intent is expressed. Secondly, the
bot has to generate language that responds correctly to users’
intents, which in most dialogue systems is done through pre-compiled
sentences or templates (Di Lascio et al., 2020). And finally, users need
to support this process by providing repair when the bot does not
understand their intent. As Collins (2018) points out, ‘repair’ is a
fundamental feature by which humans deal with communication that is less
than perfect. As previous studies have shown and this study will
demonstrate, it is also fundamental to human interactions with
conversational AI.
However, when interacting with bots, humans cannot necessarily rely on
the same models of communication than in face-to-face social
interaction. Research by Luger & Sellen (2016) on users’ expectations
and experience of conversational agents found that participants with
better technical skills were better prepared to adopt new mental models
of interaction than lower skilled participants whose expectations did
not change and who were more likely to get frustrated by their
interactions.
With these insights in the background, this paper takes a user-centric
perspective to investigate repair, described here as users’ efforts for
addressing intent interpretation issues in task-oriented chatbot
interactions. These insights have important implications for how users
can be supported to develop the communication skills for conversational
AI and to understand the sociolinguistic environment of these
interactions. These issues will be discussed in the conclusion to this
paper.
Literature review
The term repair is originally derived from conversation analysis.
Repair was first described by Schegloff et al. (1977) as a
“self-righting mechanism for the organization of language use in social
interaction” (p. 381), whereas Seedhouse (2005) defines repair as “the
treatment of trouble occurring in interactive language use” (p. 168).
In ordinary conversation, repair can be used by speakers to address
problems in hearing, speaking and understanding. In conversation with a
text-based chatbot, users do repair to address issues with the bot
interpreting their intent.
In their typology of repair, Schegloff et al. (1977) distinguish between
self-initiation of repair (repair initiated by the speaker who is the
cause of the trouble source) and other-initiation of repair (repair
initiated by the another speaker). They also distinguish self-repair
(repair completed by the speaker who is the cause of the trouble source)
and other-repair (repair completed by the another speaker), emphasizing
also that some repairs do not have a successful outcome at all. Speakers
use a variety of means to other-initiate repair, such as signals of
misunderstanding (e.g. Hugh, What?), question words (who?, when?) which
may be combined with a partial repeat of the trouble source.
Albert & de Ruiter (2018) argue that the notion of repair as introduced
by conversation analysts such as Schegloff et al. (1977) constitutes a
“minimal notion of shared understanding as progressivity” (p. 281)
which consciously does not focus on context. However, they also argue
that observing repair provides rich insights into the sources of the
misunderstanding, which may include “contextual problems of propriety
or transgression” (p. 303). Repair has been subject to a wide range of
investigations in specific contexts of interpersonal communication, such
as the classroom (Dippold, 2014; Montiegel, 2021) and in workplace
interaction (Oloff, 2018; Tsuchiya & Handford, 2014). In
computer-mediated environments, repair has so far primarily been
investigated in the context of web-chat, focusing for example on gaming
chat (Collister, 2011), German web chat (Schönfeldt & Golato, 2003),
library web chat (Koshik & Okazawa, 2012) and facebook chat (Meredith
& Stokoe, 2014). These studies found that that repair in online chat
shows is organised differently to ordinary conversation due to
differences in the sequential flow of messages. Moreover, users do not
have access to the same set of resources to accomplish social
interactions as in spoken conversation (e.g., prosody). Users do however
compensate with other ways of creating meaning (such as * as a repair
phoneme), and general principles of repair from ordinary conversation
(e.g., the preference for self-repair) still apply.
Repair has also been the subject of research in interactions between
humans and embodied robots as well as chatbots. For example, Beneteau et
al. (2019) investigated communicative breakdowns between Alexa and
family users. They showed that the onus on providing ‘repair’ when
communication broke down lay with users. Users deployed a range of
strategies to perform repair, e.g., using prosodic changes, over
articulation, semantic adjustments / modifications, increased volume,
syntactical adjustments, repetition.
Research on human interaction with text-based chatbots confirms that the
burden of repair lies primarily with the user. Analysing transcripts of
interactions between users and a task-oriented chatbot, Li et al. (2020)
investigated the relationship between different types of non-progression
and user repair types. They found that bot users were most likely to
abandon the conversation after three instances of non-progress, which
were caused by the bot’s misrecognition of user intents on one hand, and
non-recognition on the other. Users drew on a range of strategies for
dealing with non-progress, including quitting, changing the subject
temporarily, abandoning the bot service, temporarily quitting the
conversation, switching the subject and various forms of reformulating
messages (self-repair), e.g., rephrasing, adding, repeating or removing
words, using the same words, new topics etc.
Ashktorab et al. (2019) investigated user preferences for the repair
strategies used by a banking chatbot in an experimental setting, finding
that users preferred the bot to initiate repair by providing options of
potential user intents. Users also favoured assisted self-repair (e.g.,
explaining which keywords contribute the bot’s lack of understanding)
over other strategies. However, users’ strategy preferences depended on
other factors such as their social orientation towards chatbots, their
utilitarian orientation, their experience with chatbots and technology
and the repair outcome.
Følstad
& Taylor’s study (2020) centred on the bot’s strategies for initiating
repair and asked whether a chatbot expressing uncertainty in
interpretation and suggesting likely alternatives would affect chatbot
dialogues at message, process and outcome level. They found that
initiating repair in this manner substantially reduced irrelevant
responses that were not relevant to a customer request as well as
fallback responses offering escalation or explicitly expressing
misunderstanding. The number of relevant responses however remained
stable across both conditions.
Whilst this literature review shows that there is already a small body
of studies on repair in computer-mediated communication generally and in
human-bot interaction more specifically, users’ strategies for dealing
with repair and working themselves out of bot misunderstanding have not
yet been sufficiently explored, in particular from a primarily
qualitative perspective. Besides’ Li et al’s (2019) study on repair
types and non-progress, the only other qualitative evaluation on user
strategies for overcoming problems in interaction with bots focuses on
voice bot interaction. Myers et al. (2018) identified ten different user
tactics, the most frequently ones used being hyperarticulation (speaking
louder, slower or more clearly), adding more information, using a new
utterance to express the same intent, and simplification.
This study compliments builds on these insights by investigating user
repair strategies in a text-based chatbot. In doing so, this study will
describe the ‘technicalities’ of user repair and use these insights to
draw conclusions about users’ understanding of the AI context and skills
development for AI more generally.
Objectives
The objective of this paper was to track how users of a task-oriented
chatbot navigate through episodes in which the bot lacks understanding
of or misunderstands their intents through conversational repair. As
this paper was exploratory, the research question was broad in focus,
with the analysis revealing further questions which could be explored
with a larger dataset gathered ‘in the wild’ rather than a simulated
setting.
Data
Asa, the bot
The data for this paper are drawn from a research project conducted
conjointly with start-up company spryt.com. SPRYT have developed an
intelligent patient scheduling system which allows patients to schedule
medical appointments through Whatsapp via a text-to-text interactions.
Patients interact with a digital receptionist – the chatbot called
‘Asa’ – to schedule, reschedule or cancel appointments, respond to a
medical screening questionnaire or ask questions. At the stage of the
data collection, Asa was developed to the stage of being a ‘minimum
viable product’ – it was functionable but had not yet been tested with
real patients and had not yet engaged in algorithmic learning from real
patients’ interactions.
Dataset
The analysis is based on 36 interactions between individual users and
the appointment scheduling bot. These interactions took place in a
simulated setting as part of user research of the system pre-deployment.
Ten of the interactions were created during the first phase of the
project. In this phase, user experience interviews were conducted during
which users interacted with the bot and were asked to talk in detail
about their perceptions of the bot’s speech turns and of the system as a
whole. 26 interactions were created in phase two of the project. In this
phase, users interacted with Asa to complete a booking at a minimum. In
addition, users were also instructed to complete other tasks, such
rescheduling, cancelling, or asking a question. Subsequent to their
interactions, users reported their opinions about Asa through a
questionnaire after their interactions. For the purpose of this
analysis, only the interactions in themselves will be considered.
Participant recruitment and demographics
Participants were recruited through the researchers’ social media
channels as well as the university’s experimental platform. As a result,
the, the majority of participants in the interview phase were UG and PG
university students, in addition to two professionals who took part in
the research due to professional interest in chatbot development. In the
questionnaires stage, the majority of participants (45%) were between
18 and 24 years old. Just over 70& of participants described themselves
as White and as native speakers of English.
Data analysis and results
Analytical approach
Data analysis was exploratory and only loosely theory-guided at the
start of the project. Whilst the researcher was aware of the possible
relevance of repair for chatbot interactions due to her own previous
work (Dippold et al., 2020) and her reading of the literature, the
analysis did not focus on repair on the outset. However, after an
initial reading of the conversational data and exploratory annotations
in a qualitative analysis software programme (Nvivo), repair emerged as
a possible focus in the analysis.
Stages of analysis
The analysis took place in four subsequent stages. These stages were not
pre-determined at the outset; rather, each step was guided on the
previous and added an additional layer of evidence. Each of these steps
will be discussed in detail below, with examples from the data then
allowing a more detailed exploration of the results.
Step 1: This step focused on the identification of all instances of user
self-repair. In the majority of cases (65), self-repair occurred either
after a turn in which the bot explicitly indicated that there was a
problem with the user’s turn, or after an irrelevant bot response to a
user turn, prompting self-repair. In much fewer cases, the bot would
provide no response at all, leading to self-initiated self-repair (7).
Step 2: The purpose of the second step was to identify the trouble
sources leading to other-initiated user self-repair. This resulted in
the identification of four different types of trouble sources (Table 1):