Anthropomorphic vs Non-Anthropomorphic User Interface Feedback for Online Hotel Bookings

Please download to get full document.

View again

of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Anthropomorphic vs Non-Anthropomorphic User Interface Feedback for Online Hotel Bookings
  ANTHROPOMORPHIC VS NON-ANTHROPOMORPHIC USER INTERFACE FEEDBACK FOR ONLINE HOTEL BOOKINGS Pietro Murano, Anthony Gee Computing Science and Engineering, University of Salford, Newton Building, Gt. Manchester, M5 4WT, Patrik O’Brian Holt  Interactive Systems Research Group, School of Computing, The Robert Gordon University, St. Andrew Street, Aberdeen,  AB25 1HG, Scotland Keywords: Anthropomorphism, User interface feedback, Evaluation. Abstract: This paper describes an experiment and its results concerning research that has been going on for a number of years in the area of anthropomorphic user interface feedback. The main aims of the research have been to examine the effectiveness and user satisfaction of anthropomorphic feedback in various domains. The results are of use to all interactive systems designers, particularly when dealing with issues of user interface feedback design. There is currently some disagreement amongst computer scientists concerning the suitability of such types of feedback. This research is working to resolve this disagreement and in turn can help software houses to increase their profits by developing better user interfaces that will promote an increase in sales. The experiment detailed, concerns the specific software domain of Online Factual Delivery in the specific context of online hotel bookings. Anthropomorphic feedback was compared against an equivalent non-anthropomorphic feedback. Statistically significant results were obtained suggesting that the non-anthropomorphic feedback was more effective. The results for user satisfaction were however less clear. 1 INTRODUCTION User interfaces and the feedback given to users are one of the most important aspects of any software system. This is because if the user interface and the feedback given is not usable, the users will either give up using the system, will be less efficient in using the system or will simply not enjoy using the system. This in turn can seriously affect the success of a software house and its sales. Also the growth and complexity of modern day software systems, in  particular the tasks they are able to perform, results in the continual requirement for more usable interfaces to be developed. The aim of this research is to aid in the improvement of user interfaces for users which can  promote better sales for a software house. Specific concentration is placed on comparing anthropomorphic and non-anthropomorphic user interfaces to address the issues of effectiveness and user satisfaction. There are various opinions amongst the computer science community regarding the effectiveness and user approval of anthropomorphic feedback at the   user interface. Some researchers are in favour of anthropomorphism, e.g. Koda and Maes (1996), Maes (1994), Laurel (1997), Agarwal (1999), Zue (1999) and Takeuchi and Naito (1995) However, some researchers are not generally in favour of anthropomorphism in most circumstances e.g. Shneiderman and Plaisant (2005). Each of these researchers tends to base their opinions on various studies conducted in the area. Due to the inconclusive nature of the results of these studies, there is the need for more work in this area to gain a  better understanding. This research continues on from a number of research studies conducted by Murano (2005, 2003, 2002a, 2002b, 2001a, 2001b) aiming to eventually solve the issues of effectiveness and user satisfaction of anthropomorphic feedback at the user interface. In Murano (2002b) it was shown that in the domain of software for in-depth learning, anthropomorphic feedback was significantly more effective. The results for user satisfaction were not so clear, but  participant preferences tended towards the anthropomorphic feedback. This was specifically in the context of English as a Foreign Language  pronunciation. Also in Murano (2002a) it was shown that in the domain of software for online systems usage, anthropomorphic feedback was significantly more effective and preferred by users. This context specifically involved the area of using UNIX commands at the UNIX shell. Specifically related to this paper, are the results in Murano (2003). The paper investigated anthropomorphic feedback in the context of online factual delivery, using the area of direction finding as the specific context. This paper showed with statistically significant results, that non-anthropomorphic feedback was more effective. The results for user satisfaction were not so clear, but  participant preferences tended towards the non-anthropomorphic feedback. In Dehn and van Mulken (2000) it was suggested that the context or domain of concern could influence the effectiveness and user approval of anthropomorphic interfaces. This research is  beginning to suggest with empirical evidence that this could be the case. However, in order to make sure that this really is the case, the authors are investigating anthropomorphic feedback in different domains, with the possibility of eventually developing a taxonomy of feedbacks as suggested in (Murano, 2005), for helping user interface designers in their design decisions. This paper therefore investigates the domain of online factual delivery further, describing an experiment set in this domain, using the context of online hotel bookings to test the user interface feedback. This context was chosen because it is a fairly common activity for users of all kinds to carry out over the Internet and was therefore considered to  be useful and realistic, whilst maintaining the theme of the previous experiment conducted by Murano (2003). As with the previous experiments, effectiveness and user satisfaction were the aspects  being investigated. Effectiveness was defined by the success rate in completing the tasks, a low error rate whilst carrying out the tasks and a low rate of hesitations/frustrations expressed by the participants during the experiment. The user approval aspects concerned the participants’ subjective opinions regarding the user interface aspects. For this experiment, the anthropomorphic feedback consisted of an animated character supplied with MS Agent 2.0 (see Apparatus and Material section) called ‘Merlin’. The non-anthropomorphic feedback consisted of guiding text. This was text of the kind one would expect to see on a ‘real’ online hotel  booking site. 2 THE EXPERIMENT – HOTEL BOOKINGS 2.1 Hypotheses As stated in the previous section this research concerns determining the effectiveness and user satisfaction of anthropomorphic user interface feedback in various contexts. Hence the following hypotheses were derived: H0a - There will be no difference in terms of user satisfaction between the anthropomorphic feedback (Merlin) and non-anthropomorphic feedback (guiding text). H0b - There will be no difference in terms of effectiveness between the anthropomorphic feedback and non-anthropomorphic feedback. Positive Hypotheses: H1a - The non-anthropomorphic (guiding text) feedback will be more effective than the anthropomorphic (Merlin) feedback. H1b - Users will prefer the anthropomorphic (Merlin) feedback rather than the non-anthropomorphic (guiding text) feedback. 2.2 Pilot Testing    Before the main experiment was undertaken, a small  pilot test with 4 participants was conducted. The main issues being considered in the pilot test were the main workings of the prototype developed, the environment to be used in the experiment and exercising suitable control over the various variables  being tested (see Variables section). A further aspect aided by the pilot test, was to determine a suitable amount of time to be used for the experiment and to test out the actual designed tasks. 2.3 Users The initial recruitment of the participants took place  by means of a recruitment questionnaire. The  participants were carefully selected so as to have similar profiles, therefore reducing the possibility of collecting invalid data. Initially 40 individuals were selected, but only 20, with similar profiles, were actually used in the experiment. The main aspects of the profiles of the participants used were similar in the following ways: •   All participants had similar computing knowledge. They were not complete beginners or ‘power’ users. Complete novice users were not selected as they would have required basic training in the concepts of devices and Windows systems. Experienced participants were not used in the experiment as it was decided that such users would in reality not require feedback of the sort being tested in their every day usage patterns. •   All the participants were less than 36 years of age with English as their primary language. 2.4 Experimental and Task Design For the purpose of the given experiment a between users design method was deployed. 10 of the  participants were assigned to Group A, and the remaining 10 participants were assigned to Group B. Group A    participants tested the anthropomorphic feedback (MS Merlin) as part of their experiment session. Group B    participants tested the non-anthropomorphic feedback (guiding text) as part of their experiment session. The experiment involved each participant attempting the following tasks: •   Task 1 required participants to make a specific  booking for a hotel and theatre performance. Participants would use the prototype online hotel reservation user interface to make the  bookings according to specific details supplied. •   Task 2 required participants to cancel the  booking they had just made using the hotel reservation user interface. The tasks outlined are representative of realistic tasks commonly carried out by users booking a hotel or holiday, using the Internet. For tasks 1 and 2 all  participants were initially shown a brief tutorial explaining how to book and cancel a hotel using the interface. The content of the tutorials shown was identical regardless of the feedback being given to ensure there was no bias. 2.5   Variables For the purpose of the experiment the associated independent variables were determined as being the two different methods of feedback that were available: •   Animated Microsoft Merlin with speech and text (anthropomorphic). •   Standard guiding text (non-anthropomorphic). The dependent variables were the participants’  performance in dealing with the hotel bookings and their subjective opinions. The dependent measures were that the performance was measured by counting the number of errors incurred, observing whether participants completed the tasks and counting the number of times hesitation and frustration were manifested. These factors were then used in a scoring formula (see Scoring section below for a description of the formula). Specifically performance was measured in the following manner: •   Tasks carried out correctly with no errors. The  participants were given a task sheet with specific instructions regarding the booking they should make (e.g. given dates and number of guests etc.). Deviation from this was considered to be a complete task but with some incorrect details. •   Tasks completed. This refers to the overall successful completion of the two tasks in the experiment. •    Number of times participants showed signs of hesitation. These were only clearly observable  participant reactions, such as manifesting a  puzzled expression or asking for help. •    Number of times participants showed signs of frustration. These were only clearly observable user reactions, such as making some remark about the user interface which had clearly caused the user some ‘anger’. •   The number of times participants used the feedback help, but still made an error.   These factors were recorded by means of an observation protocol. The subjective opinions were measured by means of a post-experiment questionnaire. Participants were asked to rate various aspects of the user interface using a Likert scale, where 9 was the most  positive score regarding some opinion, and 1 was the most negative score available. 2.6   Apparatus and Materials The experiment involved the use of ‘standard’ equipment. These were a laptop with, 128MB RAM, 20Gb disk and Windows XP. Also Microsoft Agent 2.0, the “Merlin” character and Lernout & Hauspie TruVoice Text-to-Speech (TTS) engine (American English) were used. Supplementary hardware used consisted of an external mouse and external speakers. Further, a paper notepad was available for each participant, for use in the experiment (see Procedure section). Each prototype was developed using Visual Basic 6. The Anthropomorphic interface required the use of the Microsoft Agent 2.0 Active X component. 2.7   Procedure The first process was to recruit suitable experiment candidates. This involved utilising the participant recruitment questionnaire to ask specific questions regarding the participants’ background and experiences, to determine whether the participant met the selection criteria. Once all the suitable  participants were recruited, they were randomly assigned into either Group A or Group B (see Experimental and Task Design section). Participants were then contacted and asked to meet at a suitable time to take part in the experiment. The experiment itself took approximately 30 minutes to complete. The procedure involved ensuring that each participant was treated in the same way, with the following outlined procedure  being identical for each of the participants. Also all the questionnaires and observation techniques used were the same for each participant, with the aim of minimizing confounding variables. The experiment took place in a carefully controlled environment, ensuring that there were no distractions and that the participants felt at ease. Upon entering the room each participant was greeted by the experimenter and was made to feel comfortable and relaxed. To make them feel more at ease, light refreshments were also offered at this time. The participants received a short verbal introduction to the experiment, explaining the  purpose of the study, with reassurance that the software was the focus of the study and not themselves. At this time participants were informed that they would be observed by the experimenter who would be present in the room throughout the experiment. When the participant felt relaxed, a task sheet was given to them, which contained a brief introduction to the experiment along with Tasks 1 and 2 (see Experimental and Task Design section). Having read through the task sheet participants were again assured that they were not being examined and they were subsequently asked if they had any immediate concerns regarding the tasks. Participants were then instructed as to which method of feedback they would be testing. Once the participant was ready the program began with a brief tutorial using the relevant method of feedback (Group A anthropomorphic and Group B non-anthropomorphic in terms of feedback). Both tutorials, regardless of the feedback, were the same in content. The only differences involved the anthropomorphic character referring to itself as ‘I’, while the non-anthropomorphic feedback was neutral in nature. The tutorial informed the  participant how to book and cancel a hotel using the  prototype. When the tutorial was started, the relevant mode of feedback ‘explained’ how to use each screen and its features. All the screens involved in the tutorial dealt with bookings and the cancellation of bookings. For the anthropomorphic condition the character uttered the information and this was also concurrently viewable by means of corresponding speech bubbles. Further, the character moved on the screen and ‘pointed’ with a hand to the features of each screen as it was being ‘described’. For the non-anthropomorphic condition, the same information appeared in text boxes with arrows pointing to the various features of the screens. Upon completion of the tutorial participants were then asked whether the tasks were clear, and when the participants felt ready the first task began. Upon completion of task 1 participants were asked if they had any immediate comments as to the task they had completed, such comments being recorded in the observation notes. The participants were then asked whether they were ready to begin task 2, once comfortable, task 2 proceeded. It was determined that the task was complete when the  participants had successfully cancelled the booking they had made during task 1. Following the task, completion comments and opinions were sought    from the participants. Errors were categorised by recording whether a  participant completed the task according to the specifications given on the task sheet. If the  participants deviated from the instructions given, e.g. the hotel was booked for the party arriving on the wrong day, or not enough rooms booked etc, this was recorded as a participant completing a task but with some incorrect details (see Variables section above). At times when the participants hesitated as to what they were required to do at a particular point, these hesitations were recorded (see Variables section above). At any point during the experiment if a participant asked the experimenter present in the room for guidance, no additional help was given. Instead participants were instructed that they should consult the feedback integrated into the interface, which was of the same kind as found in the tutorial and had the same condition being tested. If at any time a participant did consult the feedback, and still subsequently made an error regarding the problem they were trying to overcome, this was recorded. However, if the participant did consult the feedback and this solved the problem, this was also recorded. The number of times participants expressed clear frustration was also recorded. Such frustration included occurrences where participants would make remarks regarding certain aspects of the interface or feedback that caused them anger. A particular aspect of the second task was to enter the booking reference supplied when participants made a booking during Task 1, so that the correct  booking information could be retrieved to enable the  booking to be cancelled. If a participant was unable to remember the booking reference (the software instructed the participant to note the reference during Task 1), having not written it down on the notepad  provided, this would be seen as an error and subsequently resulted in the participant not fully completing Task 2. Once all tasks had been completed the experimenter debriefed each participant. This included the completion of the post experiment questionnaire, elicitation of participants’ immediate comments as well as the experimenter informing the  participant how the results of the study will be made available if required. 2.8   Scoring The effectiveness variables described (see Variables section) were carefully recorded for each participant. For each task completed/not completed, a score was assigned for use in the statistical analyses. The score for each task was based on a similar points system as  published in Murano (2002a). For each task, each  participant (unknown to them) was started on 10  points. Events which caused the score to reduce were observations of the following types: Signs of frustration (negative physical attitude) or hesitation resulted in 0.5 points being deducted from the score. If the participant carried out an incorrect action, causing the system to display an error message, 0.5  points were deducted. If the participant consulted the feedback in a particular situation and despite the help, continued to make a mistake, 0.5 points were deducted from the running score. Occurrences when the participant had completed the task but made a mistake in the booking, resulted in 1.5 points being deducted from the score. If the  participant was unable to complete the task, 1.5  points were deducted. Finally if the participant completed the task with none of the noted penalties the score would remain at 10. Consequently, at the end of each task the  participant obtained a final score. The formula was devised because it was felt that all the factors being measured potentially had a direct effect on overall success. 2.9 Results The data obtained for this experiment concerned effectiveness and subjective user opinions issues. The effectiveness issues were statistically analysed using a t-test and the subjective opinions regarding the interface used, were analysed through their means and standard deviations. For the 20 participants, 10 using the anthropomorphic feedback (MS Merlin) and 10 using the non-anthropomorphic feedback (guiding text), data gathered from the first task showed a t-observed of 3.08 and the t critical (5%) was 2.10, Table 1 below illustrates these statistics: Table 1: T-test result of text Vs Merlin (task 1). t-Observed 3.09 t-Critical (5%) 2.10 For the second task with 20 participants, 10 using anthropomorphic feedback (MS Merlin) and 10 using the non-anthropomorphic feedback (guiding
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks