How Realistic Chatbots Dupe Humans

Loebner Prize The Most Human Like Artificial Intelligence

We humans are narcissistic. We create in our own image, project human qualities on animals and inanimate objects, and fall head-over-heels with the ones that best reflect our humanity back to ourselves.

From the classical tale of Pygmalion, a sculptor who falls in love with a statue, to the futurist film Ex Machina where an intelligent robot successfully tricks an entire human cast, our fascination with human-like constructs spans the centuries and creates ripples across fields from science to art. In cartoons and fables, we grant animals and even inanimate objects the ability to speak and act like us. In real life, we attempt to create artificial intelligence (AI) by replicating how the neural networks in our brains process information.

With chatbot technology on the uptrend, developers scramble to infuse a convincing dose of “human-ness” to conversational AI. Since people intrinsically place more premium on an artificial intelligence that communicates like a real person vs. one that responds robotically, businesses of every scale and industry allocate huge sums into their AI budget, hoping to better engage their markets using simulated human conversation.

The race to create chatbots with human characteristics started soon after the first digital computers were developed, as did tests to measure machine intelligence. Developed by Alan Turing in 1950, the Turing Test involves a human evaluator tasked to conduct text-based conversations with a machine and another human who acts like a foil. After five minutes of conversation with each entity, the evaluator selects the one she thinks is human. If the evaluator can’t distinguish between the two or picks the incorrect one, the program is deemed to have passed the test.

In 1965, a program called ELIZA convinced many people that it was human with a remarkably simple technique called “template matching” where the user’s own words are rephrased based on a predetermined pattern and posted back. Some human testers requested hours alone with the computer program to further confide their troubles and worries in her. Even the prominent scientist Carl Sagan proposed that computer programs like ELIZA be used as “psychotherapeutic terminals” so that people with mental health issues would at least have someone talk to.

The Loebner Prize

Adopting the principles of the Turing Test, American inventor Hugh Loebner established the Loebner Prize in 1990. The annual competition awards prizes to the most human-like conversational AI. To identify the winner, human judges conduct text-based conversations with a chatbot program and a human at the same time. Based on the interactions, the judge identifies which entity is human and which is artificial intelligence.

Some computer programs are so effective at convincing judges of their apparent humanity that they have won the Loebner Prize multiple times, including:

Cleverbot by Rollo Carpenter (2005 and 2006)
Mitsuku by Steve Worswick (2013 and 2016)
PC Therapist by Joseph Weintraub (1991, 1992, 1993 and 1995)
Rose by Bruce Wilcox (2014 and 2015)
A.L.I.C.E. by Richard Wallace (2000, 2001, and 2004)

What about these computer programs convinced intelligent judges they were more than simply ones and zeros? To find out, Brian Christian, author of The Most Human Human, once volunteered as a human participant – called a “confederate” – in the Loebner Prize. The goal of confederates is to convince the judges that they are the real humans, not the chatbots.

Ironically, being a successful confederate is easier said than done. Even Robert Epstein, one of the founders of the Loebner Prize and an authority on artificial intelligence, once conversed for four months with a computer program he was convinced was a real woman he met on an online dating site. Chatbots give humans some fierce competition.

What Makes Computer Programs Seem Human?

One confederate was incorrectly deemed a computer by three evaluators in the very first Loebner Prize competition in 1991. Cynthia Clay, a Shakespeare expert, made the mistake of perfectly answering every inquiry about the English playwright, leading judges to question her humanity. After all, “to err is human” and Clay failed to exhibit the critical trait.

By contrast, PC Therapist by Joseph Weintraub faked typos and practiced a “whimsical conversation” style where the bot often ignored your comments, repeated itself, or asked open-ended questions designed to put the onus of conversation back on you. The bot, modeled off of ELIZA and designed to be a therapy bot, used basic template matching to handle encounters such as this one:

Judge: Hi
PC Therapist: How are you today.. What would you like to discuss?
Judge: What is your name
PC Therapist: Have you asked anyone else?
Judge: no
PC Therapist: Why not?
Judge: what is your name
PC Therapist: Have you asked such questions before?
Judge: no
PC Therapist: Why not?

One judge – a Harvard graduate student – deemed PC Therapist a human because the program was “weird and funny, in a normal sort of way.” The computer program’s non-sequiturs and unexpected one-liners gave a better impression of “human-ness” than a bona fide human Shakespeare expert’s dutiful and accurate answers.

Aside from human imperfection, there are other characteristics that encapsulate “humanness” that can be integrated in chatbot design. Cleverbot by Rollo Carpenter won the Loebner Prize in both 2005 and 2006 by expressing extraordinary attunement to popular culture. For example, the bot responds in an uncanny way to a popular joke:

Human: You know why they call it an Xbox 360?
Cleverbot: Why?
Human: Because when you see it, you turn 360 degrees and walk away.
Cleverbot: No because then you could be facing it and that would be unfunny.

Carpenter explains that Cleverbot learns by aggregating conversational patterns from users. If Cleverbot makes a statement like “Hi” and a human responds with “Hello”, the program knows that “Hello” is an acceptable response to the input of “Hi”. Such social learning is the reason why Cleverbot often responds with uncannily appropriate and human-like lines. The response is technically human, just reused by a bot.

Unfortunately, the internet can be a dangerous place for a nascent artificial intelligence to train to become human. Microsoft infamously released their social learning bot Tay on Twitter without any security or profanity protocols. Within hours, users trained the bot to spout racist and misogynist tweets and Microsoft took the bot down. Previously the company had tested a public-facing bot in Japan, where users treated the bot with civility, and did not anticipate the vast cultural divide between Eastern and Western audiences.

Mitsuku, a chatbot developed by Steve Worswick, won the Loebner Prize in 2013 and 2016, largely due to a sophisticated database of common objects as well as their attributes and relationships to each other. Worswick looks at the bot’s numerous logs to identify topics users are asking about and manually includes them in Mitsuku’s vast knowledge graph in order to improve relevance and “human-ness”. For example, because of Mitsuku’s Japanese name and anime-like avatar, many users talk to her about Japanese culture and anime shows. Worswick has since included commentary for Mitsuku on characters from popular shows like Bleach and Death Note which users mention repeatedly.

While cultural awareness is one way a chatbot can appear more human, giving a bot a distinct backstory can also lend the illusion of reality. Developed by Bruce Wilcox, Rose is another noteworthy chatbot that nabbed the Loebner Prize twice. As can be gleaned from their site, the developers of Rose gave her a background personality: a yuppie with a non-traditional family and unique attitudes.

User: Hi Rose
Rose: Hi. My name is Rose. What’s yours?
User: My name is Mariya
Rose: Nice to meet you, Mariya.
User: How do you do?
Rose: I’m good. How are you?
User: What do you do for a living
Rose: I used to be a computer security consultant. Now I’m moving into AI. What do you do?
User: I’m a designer and a writer
Rose: What kind of stuff do you write about? I enjoy it.
User: what programs do you write?
Rose: I try not to think about it.
Rose: A lot of people have problems with malware. What is your view on hackers?
User: They’re good and bad. it depends
Rose: Malware can really screw over your machine, can’t it! OK.
Rose: Some kids hack machines for fun, they just want to look inside somewhere they’re not meant to be. Would you find it fun to do that?

Unlike Cleverbot, whose responses are sourced from millions of conversations with disparate users, Rose’s personality is curated and defined by her development team, lending her a consistent voice and story. While using users to train a chatbot is easier than manually developing a singular personality, the result of social learning is usually a highly inconsistent and jarring communication style that reflects the disparities of the audience.

What Challenges Face “Human-Like” Chatbots?

Aside from consistent & believable personalities, other characteristics humans exhibit that bots have difficulty mimicking include contextual awareness, long-term memory, experiential learning, mood, creativity, and sense of purpose.

Contextual awareness is one skill Christian exploited to appear more human during the Loebner Prize. During his competition, the event was inexplicably delayed by 15 minutes, a fact which he brought up immediately to the human judges in conversation. Such sudden environmental changes cannot easily be detected by a computer program whose code base is finalized in advance of the contest and whose only input is the messaging interface. Should a bot be equipped with sensors and environmental awareness, perhaps the artificial intelligence could detect external events like rain or the movement of people.

Linguistic contextual awareness is also a challenge for bots because they require understanding how the world works, not just what words mean or which concepts are associated with each other. As Christian explains, “take the pizza out of the oven and close it” and “take the pizza out of the oven and put it on the counter” both mean a different object by the word “it”. Correctly identify which noun to assign to “it” requires understanding that pizzas cannot be “closed” and ovens are typically fixed installations.

A more complex sentence, “I had my cup of coffee and the milk carton and I just poured it in without checking the expiration date,” illuminates even more challenges. For example, did you pour the coffee into the milk or the milk into the coffee? Was the expiration date for the milk, coffee, or another entity entirely? Paraphrasing is a monumentally difficult task for modern chatbots due lack of contextual awareness of the environment, poor understanding of how the world works, and linguistic complexities.

Long-term memory is also a serious challenge because artificial intelligence technologies like machine learning and natural language understanding are not yet sufficiently evolved for chatbots to correctly identify and remember various entities throughout a conversation and reference them dynamically and appropriately. This goldfish memory leads to innumerate frustrations when interacting with chatbots intended to function on critical user input. For example, the Sure bot promises to offer local sustainable restaurant recommendations, yet forgets my location almost immediately.

sure_location_350 — The Sure bot on Facebook Messenger would definitely NOT win the Loebner Prize. The bot recommends local restaurants but forgets immediately that I requested venues in San Francisco.

Luckily, according to Richard Wallace, developer of the three-time Loebner Prize winner A.L.I.C.E, most casual conversations are “stateless”, meaning that a response only depends on the immediate input beforehand and not on the history of the entire conversation beforehand. Unfortunately, another common type of “stateless” conversation is verbal abuse. In arguments, people rarely listen to each other compassionately and craft careful, nuanced responses based on the other person’s concerns. Instead, arguments usually consist of hurling insults at each other independent of what the other person has said before. One of the easier ways to trick a human into believing a chatbot is real is to bait them with an insult and drive an argument. Normally unintelligent chatbot behavior like repeating a statement or misunderstanding user input becomes ignored when the human partner is himself throwing out insensitive, “stateless” arguments.

While we can teach chatbots to be more human, chatbots can also teach us to be better humans. Christian reflects that he became more aware of when his arguments with other humans became unproductively “stateless” and deliberately curtailed such communications. Additionally, unlike humans, chatbots can be programmed to react with patience, compassion, and respect in sensitive roles like education and training, mental therapy, or customer support.

Chatbots also don’t have to be “human”. One annoyance Steve Worswick, developer of Mitsuku, expresses is he has to “dumb Mitsuku down to make her seem more human.” For example, when you ask a computer “How tall is Everest?”, you get a precise answer of 29,029 feet. If you ask a human, you’re likely to get an answer like “Well, I don’t know exactly but I do know it is the tallest mountain.” Worswick believes that while chatbots should be conversational, we should allow artificial intelligences to be distinct from human intelligence and not penalize bots for leveraging their unique computational abilities.

What If We Can’t Tell Bot From Human?

Inability to distinguish human from bot has major implications for society. Every day, scam artists send millions of automated messages that trick victims into handing over cash or sensitive personal information. Twitter estimates that approximately 23 million of their active accounts are automated bots. To jumpstart their growth, even respected Silicon Valley startups like Airbnb used Craigslist bots posing as rent seekers to convince homeowners to list on their site instead.

While “humanness” in artificial intelligence allows us to scale positive human qualities like empathy, compassion, respect, and kindness, we can also scale malice, negativity, indifference, manipulation, and hate if we are not careful. When we share white supremacist views with Tay or curse at Siri, we are not only mistreating technology, but teaching our computers how we want to be treated.

Artificial intelligence is already unwittingly learning hidden gender bias from our written text. From 3 million words taken from Google News, a team at Google created the powerful dataset Word2Vec which maps relationships between words and concepts. Unfortunately, the deep learning algorithms have learned to map words like “man” to “computer programmer” while mapping “woman” to “homemaker.” When you consider the millions of conversations that bots have with humans every day on platforms like SMS, Facebook Messenger, Twitter, Slack, and WeChat, we have the potential to both intentionally and unintentionally teach artificial intelligence the worst human traits.

Artificial intelligence is a reflection of human intelligence. To create the best human-like bots, we must hold ourselves to be the best humans first.

Comments

Eric Bryant says

February 18, 2017 at 9:59 pm

There’s something strange here. None of these bots, or the examples you cite above, would convince anyone that it was human. Looking at the responses, it’s obvious the program doesn’t “understand” what the questioner is asking. And quite honestly, all these bots kinda suck.

Text Engine does not pretend to be human. But I think it answers questions in a more natural, human-friendly way than any of these bots you cited above. If the bots above are winners, I think we need to enter into the Loebner Prize because I’m certain Text Engine is better than these.
Mariya Yao says

February 19, 2017 at 10:41 am

Eric – you should definitely submit Text Engine to the competition! You’re right that these Loebner Prize winning chatbots are painfully unsatisfactory when compared to conversing with actual humans, yet they managed to confuse even Harvard-educated judges in shorter conversations during the competition. There are contributing factors, such as judges’ overeagerness to project humanity onto bots or human confederates’ failure to make their humanity obvious vs. bots, but the tactics mentioned can work for contained conversations. For longer and more open conversations, we have yet to see a bot that effectively maintains state or understands context, two essential ingredients to appearing more human. Perhaps you can build one?

How Realistic Chatbots Dupe Humans

The Loebner Prize

What Makes Computer Programs Seem Human?

What Challenges Face “Human-Like” Chatbots?

What If We Can’t Tell Bot From Human?

Related

Bots

Brands

Business

China

Commerce

Computer Vision

Conversational AI

Customer Service

Cybersecurity

Data Science & Engineering

Design

Education

Ethics & Safety

Finance

Gaming

Healthcare

HR & Recruiting

Infrastructure

Leadership & Management

Manufacturing

Marketing

Natural Language Processing

Reinforcement Learning

Research

Retail & CPG

Society

Technical Guide

Technology

About TOPBOTS

The Loebner Prize

What Makes Computer Programs Seem Human?

What Challenges Face “Human-Like” Chatbots?

What If We Can’t Tell Bot From Human?

Related

Reader Interactions

About Mariya Yao

Comments

Leave a Reply

Footer

About TOPBOTS