Chatbots & How To Test Them

Chatbots & How To Test Them
Hristo Gergov Musala Soft 7 years of experience in IT, all in of them in QA Outsourcing company - Had the chance to work with companies which are among the market leaders in the domains: IBM, Deutsche Telekom, Vmware, Leanplum Engaged in consulting and pre-sales, that’s how my interest in Chatbots started

Agenda

It all started a few decades ago …

The Turing test The Turing test, developed by Alan Turing in 1950, is a test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. Turing proposed that a human evaluator would judge natural language conversations between a human and a machine designed to generate human-like responses. The evaluator would be aware that one of the two partners in conversation is a machine, and all participants would be separated from one another. The conversation would be limited to a text-only channel such as a computer keyboard and screen so the result would not depend on the machine's ability to render words as speech.[2] If the evaluator cannot reliably tell the machine from the human, the machine is said to have passed the test. The test results do not depend on the machine's ability to give correct answers to questions, only how closely its answers resemble those a human would give.

Eliza ELIZA was the very first chatbot, created by Joseph Weizenbaum in It used pattern matching and substitution methodology to simulate conversation, and was able to past the Turing test. In 1966, Joseph Weizenbaum created a program which appeared to pass the Turing test. The program, known as ELIZA, worked by examining a user's typed comments for keywords. If a keyword is found, a rule that transforms the user's comments is applied, and the resulting sentence is returned. If a keyword is not found, ELIZA responds either with a generic riposte or by repeating one of the earlier comments.[27] In addition, Weizenbaum developed ELIZA to replicate the behaviour of a Rogerian psychotherapist, allowing ELIZA to be "free to assume the pose of knowing almost nothing of the real world."[28] With these techniques, Weizenbaum's program was able to fool some people into believing that they were talking to a real person, with some subjects being "very hard to convince that ELIZA [...] is not human."[28] Thus, ELIZA is claimed by some to be one of the programs (perhaps the first) able to pass the Turing test,[28][29] even though this view is highly contentious (see below).

… until things rapidly escalated

Travel, BFSI, Government, E-Commerce
Benefits: Instant answers 24h service New communication channels Ease-of-use Those only make sense in case the Chatbot is implemented properly How many of you stumbled upon a chatbot in the last couple of months? - In 2018 300K chatbots on Facebook only 5.6X year-over-year growth By 2022 Chatbots will save businesses $8 billion per year We’ll be talking to Chatbots more often than we talk to our spouses Reasons for the optimism good customer engagement – more communication channels, but Reduce operating costs

What is a chatbot?

What is a ChatBot? The Business perspective Chatbots are software solutions that combine the advantages of Artificial Intelligence in order to simulate how a human would behave as a conversational partner.

The Technology perspective

Quality Aspects The first question we ask: How do you define quality?

User friendly Media Buttons Multi-language support
Video, Voice, Hyperlinks, Attachments, Emoji Buttons Multi-language support Multiple deployment channels

Often used instead of NLP Multi-language support Multiple deployment channels

Buttons are very important as they are used often for intent identification

Localization Mixed languages queries Multiple deployment channels

Multiple deployment channels Mobile, Web, Voice

WhatsApp – Business API , currently Beta available only for companies like Uber, Booking, KLM and etc Viber – Public Account / bot, webhooks Skype – Azure Bot service Telegram – BOT API,

User Friendly – Testing Perspective
Usability testing of the media types Verification of the button oriented flows Localization testing Interoperability testing for the deployment channels

Engaging User on-boarding Intent identification
Guide the customer through its purpose Intent identification Proactive vs Reactive approach Expected vs Unexpected inputs Responsiveness Explain to the customer what it can and cant do

NLP - Sentiment analysis, Tokenization, Named entity recognition, Dependency parsing Proactive vs Reactive approach Expected vs Unexpected inputs Responsiveness Natural language processing (NLP) is concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data. Sentiment analysis - A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in is positive, negative, or neutral Tokenization - But simply, tokenization is a method to simplify content prior to the next step of processing. You replace certain input that you know with "tokens" which represent the meaning of that input. Named entity recognition - subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Dependency parsing – identifies the relations between the words in the sentence

Engaging User on-onboarding Intent identification
Proactive vs Reactive approach Who started the conversation? Expected vs Unexpected inputs Responsiveness Technically it makes a huge difference

Proactive vs Reactive approach Expected vs Unexpected inputs Joker or a Confused user ? Responsiveness

Proactive vs Reactive approach Expected vs Unexpected inputs Responsiveness Immediate responses and reactions

Engaging – Testing Perspective
Functional tests for confirmation that the Chatbot presents himself properly Functional tests that cover all possible intents Test Sets for Reactive and Proactive communication Negative Test Cases for Unexpected inputs Performance testing for Responsiveness

Generic messaging API – input/output message handler
Database – bot entities, conversations

Mature Self-learning Human handover Analytics Machine Learning
Dialog tracking – interaction history Human handover Analytics Chatbots Should be Smart and Analytical Smart chatbots are able to drive the conversation forward. They're able to predict what a user might need next (based on the prior conversation) and give suggestions whenever possible. Train it initially Self-training after that To capitalize on the power of AI, you need quality data as the input for machine learning. You can either perform data mining on existing call center logs or scrape the social networks and forums for user questions, reviews, and answers. If you are under a strict timeline, just gather data from beta testers and use it as a baseline for future improvements. Given some AI problem that can be described in discrete terms (e.g. out of a particular set of actions, which one is the right one), and given a lot of information about the world, figure out what is the “correct” action, without having the programmer program it in. Typically some outside process is needed to judge whether the action was correct or not. In mathematical terms, it’s a function: you feed in some input, and you want it to to produce the right output, so the whole problem is simply to build a model of this mathematical function in some automatic way. To draw a distinction with AI, if I can write a very clever program that has human-like behavior, it can be AI, but unless its parameters are automatically learned from data, it’s not machine learning.

Mature Self-learning Human handover Analytics
Chatbots should know what they can’t do instead of a human a provide a graceful handover. Also provide the ability for the customer to manual do so.

Mature Self-learning Human handover Analytics Goal completion rate
Goal completion time Properly measuring and storing those

Mature – Testing Perspective
Accuracy testing Regression testing - feels like shooting at a constantly moving target Beta testing A/B testing Data quality Stored Dialogs Functional Test Cases for the Human Handover scenarios AI causes new challenges AI software differs from conventional software in two significant ways: it generally addresses different and more complex kinds of problems, and it typically works in a different way than conventional software. Conventional software uses rule-based decision-making, whereas AI uses evolutionary algorithms. On the other hand, AI software has much in common with conventional software: indeed, most of the software in the system will be of the conventional variety (for example, I/O almost always is the largest single component in any system).

Omni-Capable Multi data sources support
Identification of the relevant Data sources Authentication Data Isolation Especially important for banks

Omni-Capable – Testing Perspective
Security Testing Test Cases for a proper mapping between recognized entities and domain objects Integration testing for the interaction with each Data Provider Especially important for banks

Test Automation

Test Automation - Architecture

Data Model - Customer Field name Field Type Data Type Object Concept
Field Concept Customer ID Primary Key String sc:Customer sc:CustomerID Location ID Foreign Key sc:LocationID Customer Name Data Element :Name Customer Division sc:Division Source Link sc:SourceLink

Data Model - Location Field name Field Type Data Type Object Concept
Field Concept Location ID Primary Key String sc:Location sc:LocationID Name Data Element :Name Location Type sc:Type Street sc:Street City sc:City Postal Code sc:PostalCode State-Province sc:StateProvince Country sc:Country Coordinates sc:Coordinates Include in Correlation Number sc:IncludeCorrelation

Applicable Objects Concept
Synonyms Field Concept Applicable Objects Concept Part of speech Field Data Type Synonyms :Name sc:Customer NOUN String consumer name client customer sc:CustomerID customer id sc:Division division organization sc:Location sc:Customer sc:Location Location sc:LocationID location id sc:SourceLink source link

Question templates Question Template Field Type Part of speech
What are <field> for <object> "<id>" MANY NOUN All At what time <object> "<id>" is <field> ONE ADJ Date When is the <object> "<id>" <field> ? When will <object> "<id>" be <field> What is the <field> for <object> "<id>" What is the <field> of <object> "<id>"? What's the <field> for <object> "<id>" Are there any details about <field> of <object> "<id>"?

DEMO TIME!

Test Automation - alternatives
Zypnos - Botium - Chatbottest - Zypnos: Zypnos is a quality assurance platform. They provide a tool to automate regression testing called Record and Run, where you can record your test cases and run them to check if your chatbot is working correctly or not. Botium – Datasets with questions for different domains, API level test execution Chatbottest – evaluate your chatbot based on 7 categories: personality, onboarding, understanding, answering, navigation, error management, intelligence

Takeaways UI elements could turn over the boat
Multi-user support requires scalable message handling 100% accuracy is not achievable, still degrading is not acceptable Humans are unpredictable, expect the unexpected Human Handover is important

Thank You! Hristo Gergov

Hristo Gergov ANY STIONS?

thanks to our sponsors:
#qachallenge thanks to our sponsors: Organized by: Hosted by:

Chatbots & How To Test Them

Similar presentations

Presentation on theme: "Chatbots & How To Test Them"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chatbots & How To Test Them

Similar presentations

Presentation on theme: "Chatbots & How To Test Them"— Presentation transcript:

Similar presentations

About project

Feedback