Presentation on theme: "MRS Connections, Barbican London, 22-24 March 2006 Using artificial intelligence as a data collection aid Craig Kolb, Ask Afrika & Dr Peter Millican, Hertford."— Presentation transcript:
MRS Connections, Barbican London, March 2006 Using artificial intelligence as a data collection aid Craig Kolb, Ask Afrika & Dr Peter Millican, Hertford College, Oxford University Connecting with eLizabeth
ELIZA in action Weizenbaum (1966) gave a classic example of a rather comical dialogue between ELIZA and a young woman, from which this is an edited excerpt. (ELIZA’s responses are in capitals)
ELIZA in action –Men are all alike. –IN WHAT WAY? –They're always bugging us about something or other. –CAN YOU THINK OF A SPECIFIC EXAMPLE? –Well, my boyfriend made me come here. –YOUR BOYFRIEND MADE YOU COME HERE? –I need some help, that much seems certain. –WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP? –Perhaps I could learn to get along with my mother –TELL ME MORE ABOUT YOUR FAMILY.
ELIZA in action My mother takes care of me. WHO ELSE IN YOUR FAMILY TAKES CARE OF YOU? My father. … WHAT ELSE COMES TO YOUR MIND WHEN YOU THINK ABOUT YOUR FATHER? Bullies. DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE?
The process used by the program itself The assumptions made by the human Two key components underlie ELIZA’s workings: ELIZA
ELIZA: process Identify keywords (e.g. ‘alike’, ‘mother’) and use canned responses. Exploit context (e.g. remember ‘my’ phrases to use later). Apply transformation rules (e.g. ‘I’ to ‘you’). Where possible, echo back parts of the input. Default responses when no keyword identified.
ELIZA: human assumptions –Enigmatic –Never answers a question for you –Inspires you to answer your own problems What do we expect from a psychotherapist?
ELIZA: human assumptions ‘This mode of conversation was chosen because the psychiatric interview is one of the few examples of … communication in which one of the participating pair is free to assume the pose of knowing almost nothing of the real world. ’ Weizenbaum
Such assumptions are not confined to chatbot dialogues For example: –Deliberate obscurantism. –Lack of power parity. –Irrational or abstract statements to ‘save face’. –Marketing research interviews.
In general conversation, chatbots quickly betray themselves through lack of commonsense knowledge or ability to respond flexibly ‘off topic’. But in an interview setting this may not be a problem and can even be an advantage.
This suggests that a chatbot might be a useful tool in marketing research
User-friendly chatbot system developed initially to teach artificial intelligence (A.I.). Rigorously specified and comprehensively documented, with online help etc. Highly flexible and easily programmed. Can save/retrieve information in memory or on disk files, and respond accordingly. All chatbot states and operations can be viewed in real time (e.g. while testing). Introducing eLizabeth
Application: Open-ended questions Typically generate superficial responses unless probed. Respondents are more forthcoming when answers are suggested. But this gives risk of ‘leading’ or bias.
Real answers to an unprobed open-ended question Why did you choose mobile telephone network A instead of other providers? Because I like it. Because it was cheap. Because my husband had problems with C and B. Because of upgrading. Three mobile networks were involved in this study, referred to as A, B, and C
A solution We suggest that an adaptation of standard chatbot techniques provides an economical means of probing open- ends. By creating a sense of conversation (rather than a linear questionnaire), the quality of response is improved over self-completion. This is ideal in situations such as CAWI where an interviewer is not necessarily present.
Interview 1 WHY DID YOU CHOOSE YOUR CURRENT CELL PHONE NETWORK OPERATOR? To save money. LIST ALL OF THE THINGS THAT WORRY YOU ABOUT YOUR NETWORK PLEASE. Network coverage and 3g. WHAT ELSE CAN YOU SAY ABOUT CELL PHONE FEATURES? I like my new handset, which is very important to me. I also like the extra ‘Ahas’ – such as emergency services. WAS IT ONLY ABOUT THE CELL PHONE HANDSET? No, but they did deliver it to me for free. I didn't have to go anywhere. LIST ALL OF THE THINGS YOU SEE AS GOOD ABOUT YOUR NETWORK PLEASE. Cheap; additional services such as free delivery; nice shows and stuff. IS SAVING MONEY MORE IMPORTANT THAN OTHER THINGS? Value for money is important, I don't mind paying A BIT more, but then I want to get more.
Interview 1 WHAT DID FAMILY AND FRIENDS HAVE TO SAY ABOUT YOUR CHOICE OF NETWORK? They don't really care, but I also have my mom on network A now. HOW DO YOU FEEL ABOUT FAMILY INVOLVEMENT? I love my family, but we don't really care about each other’s cell phone habits. HOW MUCH OF THE FINAL DECISION ON A PARTICULAR NETWORK WAS REALLY YOURS? 90%, 10% of it was because of the fact that I work on network A projects. WOULD YOU USE THIS NETWORK AGAIN IN FUTURE, IF YOU HAD TO CHOOSE AGAIN? Yes. YES? PLEASE GO INTO A BIT MORE DETAIL. I like the additional services; and I like the network A image. WHY DO YOU LIKE ‘THE ADDITIONAL SERVICES’? They deliver straight to me!
Context: Possibilities: 1.Pretend it is human (high expectations, but temporarily shielded by the assumption it is something it is not)? 2.Say it is an ‘artificial intelligence’ (high expectations, with possible hostility)? 3.Describe it in a way that creates lower expectations? e.g. ‘Automatic interviewer’ how did we introduce the chatbot?
Programming eLizabeth Elizabeth was easy to program, using only simple additions to the default behaviour. We defined a ‘welcome’ message, responses for null inputs, keyword sets with associated ‘once-only’ probes, and a list of ‘set questions’ for when no keyword was found. The resulting operation of the main cycle is represented in the following slide …
Performance criteria Alan Turing (1950) proposed a simple test. If a computer in online conversation can fool a human into believing that it is human too, then we should deem it to be intelligent. –Hence the focus over the years on developing chatbots that can ‘appear’ as human as possible. –Also a drive in robotics toward making robots look human (e.g. Actroid in picture).
Performance criteria Turing’s Test isn’t relevant to interviewing, as a different set of societal norms apply: –Interviewer normally controls the dialogue to prevent deviations from the topic (i.e. ‘fixed initiative’). –Interviewer is allowed to assume the pose of ignorance.
So then how do we evaluate a chatbot or A.I. interviewer?
Interviewing-specific performance criteria 1) Relevance of interviewer questions (i.e. questions specified by the researcher that were not previously answered). 2) Avoidance of ‘suggestion’ (i.e. hints or ‘leading’). 3) Relevance of respondents’ answers (i.e. do they answer the question in part or in full?). 4) Maximisation of the volume of attributes elicited (i.e. mobile network operator attributes).
CriteriaPerformance Interviewer relevance.Good, but accidental (see hypotheses for suggestions to control chatbot). Interviewer suggestion.Good, giving no risk of suggestion. Respondent relevance.Good, due to co-operation and (relatively crude) control mechanisms. Maximisation of volume of attributes elicited. Good, relative to self- completion. eLizabeth’s Performance
What variables might determine chatbot interviewing success? Respondent characteristics (gender, age, education etc). Interview length. Context presented to the respondent, expectations created (e.g. conversation vs. interview, A.I. vs. ‘automatic interviewer’). Chatbot’s own behaviour.
Hypotheses Perceptual hypotheses H1: Subjects with strong vigilance and analytical traits are more likely to detect irrelevant questions. H2a: The longer the exchange on a topic, the more likely the chatbot is to be seen as asking irrelevant questions. H2b: The longer the dialogue overall, the more likely the chatbot is to be seen as asking irrelevant questions.
Hypotheses H3: Subject knowledge of whether they are communicating with an A.I. or chatbot (and how it is described) is likely to moderate the statistical relationships described in H1 and H2/H2b.
Hypotheses Response hypotheses H4a: Perceptions of irrelevance will increase the likelihood of hostility, dependent on how the context is defined. H4b: Increased hostility will decrease the relevance of the respondent’s answers. H5: Introducing a chatbot using terms that create lower expectations (such as ‘automatic interviewer’) will moderate the irrelevance/hostility relationship.
Hypotheses H6: Limited mimicry (i.e. reflecting back a subject’s statements, with appropriate grammatical alterations) will result in increased volume and improved answer relevance. This hypothesis is based on the chameleon effect, as reported by Poulsen (2005). H7: Whether the human party in a dialogue has prior knowledge that the other is an A.I. or chatbot will moderate the relationships in H4a and H6.
Chatbot strengths Compared with human interviewers, chatbots have some clear advantages: Low cost. Unaffected by pressure. Better memory. Give researcher greater control over interviews. Can ensure consistency and rapid deployment in a changing context.
The disadvantages of chatbots compared with humans stem from their lack of genuine understanding: –Minimal control over response quality (though better than self-completion). –No ability to assess whether unasked questions have already been implicitly answered earlier in the dialogue (relevance). Chatbot weaknesses
Future applications Assisting human interviewers Suggesting useful probes and follow-on questions, ensuring greater control. Reducing stress of memory, phrasing, context adjustment (e.g. from structured questioning to open-end probing). Recording and collating results. Semi-automated learning during survey. This suggests the following in terms of assisting or replacing human interviewers:
Future applications Replacing human interviewers Practical in situations where CAWI has replaced CATI but probing is still desired on open ends. Techniques may be applicable to sentence completion, personification, and thematic apperception tests. More sophisticated structured techniques, such as Kelly’s Repertory Grid.
Applying chatbot technology to your own research To explore chatbots, try: taking you to eLizabeth’s home page (which also links to other chatbot sites). The system is free to download for non-commercial use, and comes with teaching materials and documentation. For advice on any project,