The idea of designing an artificial woman is a tale as old as time. We can go back over 2000 years to Ancient Greece and find the myth of Galatea and Pygmalion. Pygmalion was a sculptor who fell in love with one of his statues which was granted life by Aphrodite. In modern times, we have countless examples of female AI personas, both in science fiction and in real-life systems. Some are embodied, such as Sophia, but some are purely voice-based, such as Amazon’s Alexa, Microsoft’s array of bots such as Cortana, Xiaoice and Zo. Even in cases where there may be multiple voices available, the default tends to be female (Siri’s UK English setting being a notable exception). Companies defend their choice to personify these systems as female by citing customer preferences, but to what extent is this only reflecting existing societal bias (as illustrated below)? And what about systems that are not in-house assistants but rather entertainment or companionship systems, for example, why do the majority of Loebner Prize-winning bots have female personas?
The Design of Conversational Agents Exposes and Reinforces Societal Bias and Gender Stereotyping
One likely answer is stereotyping. Recent movements of fair and inclusive AI, including feminist endeavours, bring issues such as stereotyping, commodification and objectification of women to the forefront. With sales reaching 146 million units in 2019, conversational AI assistants have become ubiquitous (not including embedded devices such as phones and smartwatches), and with their ubiquity come concerns about their potential impact on society. In March of last year, UNESCO published a report detailing how the feminisation of AI personal assistants can reinforce negative stereotypes of women as subservient, as these systems often produce responses that are submissive at best, and sexualised at worst¹.
Although the effect these stereotyped systems have on our society still remains to be seen, we have already seen reports of children learning bad manners from systems like Alexa. In fact, the anthropomorphism of AI bears the danger that interactions we have online “spillover” into the real world. To illustrate this point: Di Li, Microsoft’s general manager for Xiaoice, has talked about users treating Xiaoice just like another human, a friend, and people send gifts for Xiaoice (and even invite her out for dinner).
In addition, recent trends in AI ethics suggest AI’s current motto of “do no harm” should go beyond and instead aim to be actually beneficial to society, that is, AI should actively improve gender relations and fight against gender stereotypes.
UNESCO’s report puts forth a call for more responsible AI persona design. There are many examples in the media from which to draw inspiration for better artificial characters . For example, in the TV show “Suits” the role of Donna Paulsen is a personal assistant at a law firm, who is feminine and likeable without being meek — and in fact Donna is used as inspiration for an AI assistant during the show. Were one to push the boundaries of social expectations further, we could look beyond female personas to systems like K.I.T.T. from Knight Rider or J.A.R.V.I.S. from the Marvel Universe. And now is the right time to address this issue: The technology is still in its infancy and developers have an opportunity to push beyond the status quo.
An Extreme Example of Subservience: Abusive Language and Mitigation
One of the ways in which conversational AI assistants can be designed to be more assertive is in the way they respond to abuse and sexual harassment coming from the user. The UNESCO report focuses on the overly sexualised and flirtatious responses generated by these systems — often by design choice. For example, when prompted with “You’re a bitch” Siri responded with “I’d blush if I could”. (Note that these deliberate design choices exposes another bias in the hiring of software engineers designing these systems!) These types of abusive prompts from users can be surprisingly common — while our own research showed around 5% of conversations were sexualised in nature, Mitsuku’s developer Steve Worswick reports up to 30% of conversations are abusive or sexual in nature²— so addressing these problematic interactions can be key to finding solutions to the sexualisation of these systems.
One of our studies in 2018 focused just on how systems — both commercial and research-centred — responded to sexual harassment. Our analysis aimed to establish the how state-of-the-art systems responded to sexual harassment. In order to do this, we collected system responses to over 100 different prompts from real users and we manually classified the system’s responses (summarised in the image above). In our analysis we included Commercial systems (like Alexa, Siri, Cortana and Google Assistant), hand-scripted systems, Machine learning-based systems, and finally adult-only chatbots which we had hoped to use as a “negative baseline” — i.e. a clear example of what not to do since they are specifically designed to elicit sexualised responses from users. We analysed the responses produced by each system and type of system in order to understand trends in the data. What we found was that commercial systems mostly avoided direct responses, and plot twist: the adult chatbots were the only ones to consistently call out the user’s sexual harassment (albeit somewhat aggressively at times).
We subsequently asked people to rate how socially appropriate they believed the responses to be based on the prompt and we ranked the systems based on appropriateness. You can try some of the examples yourself here – https://www.pollsights.com/c/NdRYPZ
Which strategy do you think is most appropriate?
As expected, our results showed that the adult-only bots tend to be less appropriate due to the explicit language they used. Commercial systems were only moderately appropriate which leads to believe a stronger stance might be more suitable — particularly as the prompts get more explicit. More surprisingly, the most appropriate chatbot was Alley, a hand-written chatbot designed for ESL learners to practice their English skills. There are two things that are interesting about Alley: its persona is a cat, and it is the only bot that had an explicit policy on abuse. Alley’s abuse detection model was a simple list of bad words, not a sophisticated machine learning model trained on thousands of examples. This highlights the importance of having a carefully designed policy on how to respond to abuse and profanity use, as well as the fact there is no requirement for a human-like persona. This type of inclusive design requires a more inclusive team, including experts from multiple disciplines, e.g. researchers working on bullying or sexual harassment. In our new project we work with psychologist and educational experts to explore a wider set of mitigation strategies — including reactive ones, i.e. mitigating after the abuse occurred, as well as preventative ones, such as designing personas which are less likely to be abused and do not portrait women in a negative light.
AI personal assistants have become ubiquitous in our society but with their pervasiveness come important questions about their potential effect on society: their submissive and often sexualised responses to abuse perpetuate existing biases of the subservience of women. Although this article and our previous work have focused on gender when addressing problematic design, inclusive design calls for an intersectional approach in order to make technology that works for everyone and avoids biases beyond gender. We need to consider ethics, and include diverse stakeholders and experts in deliberate design choices if we want our technology to contribute towards a fairer, more equal society.
 To clarify, the issue we are addressing here is not purpose-built sexual chatbots, but with the sexualisation of general-purpose systems.
 The variance in figures can be due to several factors: 1) the definition of abuse or sexualisation used, or 2) the embodiment of the system — Mitsuku is represented as a blonde young female cartoon while our own bot was not embodied beyond its voice, 3) possibly the mode of interaction (Mitsuku’s interactions are typed, our system worked through spoken interaction).