An Integrative Paper on User Interruptibility

Lavar Askew

 

 

Abstract

 

The goals of this paper are to describe what effects interrupts have on people in the workplace and to present the state-of-the-art in the research and development of interruption management systems. In the coming sections this paper will present research which quantifies the effect of an interrupt on primary task completion, the social and environmental cues humans use, and design considerations when building interruption management systems.

 

1.   Introduction

An interrupt is the temporary stopping of a primary task to give attention to another task.  It serves a mechanism through which information is delivered to an individual.  The degree of the importance of interrupts can range between those that deliver warning messages requiring an individualÕs immediate attention to those that simply provide state-based information which do not require the individualÕs attention at all.  However, in the workplace, these interrupts can have a negative effect on the productivity of an individual already engaged in a task.  This negative effect has to do with the timing in which the interrupt seeks the attention of the person.  In the workplace, where meeting deadlines is paramount, minimizing the effect of an interrupt on productivity can be useful.

 

While people use environmental and social cues to determine whether or not to interrupt another person computers and software are not designed to recognize such cues.  This can result in loss of focus on the primary task for the interrupted.  The goal then is to design systems which can recognize an opportune time to interrupt the user which minimizes the effect of the interrupt on the primary task of the user.

 

1.1     How Interrupts Effect Task Performance

Task performance is crucial to meeting deadlines in the workplace.  However, interruptions from both humans and computers can make completing a task in a timely manner difficult. Bailey et al. have shown that a person Òspent between 5% and 40% longer on an interrupted [task] than a non-interrupted task.Ó [2]  The difference in completion times for the primary task is directly related to the memory load required by the primary task. Tasks which have a high memory load, such as reading comprehension, take longer to complete than tasks which have a low memory load. 

 

Even routine primary tasks are subject to the negative effects interrupts. Hess and Detweiler showed that if a user is trained to perform a task for two sessions without interruptions and then asked to perform the same task with interruptions, the interruptions were Òhighly harmful to performance.Ó[3]

1.2     Social and Environmental Cues for Determining Interruptibility

While a computer may indiscriminately present notifications to its user, humans use social and environmental cues to determine an opportune time to interrupt a co-worker.  An example of an environmental cue may be a co-worker leaving her office door open possibly indicating that she is available for discussion while a closed door may indicate just the opposite.  A social cue example may be when the interruptee makes eye contact with the interrupter.  By making eye contact, the interruptee is acknowledging the presence of the interrupter and is willing to hear what the interrupter has to say while avoiding eye contact may indicate just the opposite.

 

      Succinctly, these cues reflect a negotiated policy towards expressing interruptibility, where ÒnegotiatedÓ refers to the interruptee deciding when to accept an interruption.  Edward Curtrell et al. mention in related work that there are four interruption policies in human-computer interaction.  They are [4]:

 

á      Immediate – requiring a userÕs immediate response.

á      Negotiated – user chooses when to attend.

á      Mediated – an intelligent agent might determine when best to interrupt.

á      Scheduled – interruptions come at prearranged time intervals.

 

However McFarlane has found that none of the above mentioned policies is the best method for deciding when to interrupt a user.  The immediate interruption policy is what users are faced with now and research has shown that this type of policy has a negative effect on the userÕs memory load.  The negotiated interruption policy may allow a user to decide when to be interrupted, but this also gives rise to an interrupt being ignored indefinitely.  The mediated interruption policy seems promising, but that depends on the criteria that the intelligent agent is using.  Finally, the scheduled interruption policy involves setting aside a prearranged time to present interruptions.  This pre-arranged time can be situational or at particular timestamp intervals.

 

2       Design Considerations When Building Interruption Management Systems

When designing interruption management systems, one must consider the importance of the message from the interrupter and the willingness of the interruptee to be interrupted. This section focuses on the state-of-the-art in the research of interruption management systems.  Research which focuses on the negotiated, mediated, and scheduled interruption policies will be featured.

2.1     Grapevine

Figure 1 Grapevine e-Business Card [7]

 

IBM has developed a system known as Grapevine which presents a userÕs aggregate contextual information to colleagues in the form of an e-business card.  The contextual information is provided by a userÕs computer, mobile device, telephone, and motion detectors.  The goal of Grapevine is to aid those wishing to communicate with the user in deciding when to initiate contact and via which channel [7]. 

 

2.1.1    ÒLocation Awareness is a Good ThingÓ [7]

Grapevine is an example of a system implementing the mediated interruption policy.  Potential communicators may view a userÕs last physical location and computer application activity before initiating contact with the user.  With this information taken together the interruptee chooses an opportune time to interrupt a user.

 

A userÕs last location proved to be the most useful feature of the system.  One reason given for this phenomenon is that if you want to communicate with a person and you know through Grapevine the person is out of town then you may choose another person to communicate with.  Another reason given was that if a user is online early in the morning then goes offline the person wishing to communicate with the user may infer that the user is enroute to the office and choose to meet with him in person rather than call his cell phone.

 

2.1.2    ÒComputer Application Activity is a Mixed BlessingÓ [7]

Grapevine is able to detect and collect user activity in applications such as instant messaging clients and productivity suites.  This data is then reported to the central aggregation service.  Potential communicators can then use Òthis information to make better informed communication decisions.Ó [7]

 

While some users expected their colleagues to check with Grapevine before contacting them, some were not so comfortable with others knowing what they were doing.  For these users, Grapevine provided functionality to block others from viewing this data. 

 

Other issues arose when applications were used for more than one purpose.  This can cause one to mis-infer the interruptibility of a colleague.  For example, Christensen et al. [7] note that Lotus Notes supports a variety of applications and a colleague could not reliably infer that the user was composing an e-mail.

2.1.3     ÒLessons LearnedÓ[7]

The Grapevine research project began at IBM in 2001 and concluded in 2005.  Enumerated below are the lessons which should be considered when designing an interruption management system:

 

 

While Grapevine is only being used by humans to determine the interruptibility of other humans, one can easily imagine software systems that use Grapevine to do the same.  It seems reasonable that an interruption management system could be built into the next generation of instant messaging clients. Nevertheless, Hudson et al. [8] point out that Òa substantial semantic gap exists between the information that low-level sensors and programs can detect the high-level ability and willingness of a person to communicate with someone elseÓ,  however, Fogarty et al. provide evidence that this semantic gap may be closing.  This evidence is summarized in the next section.

 

2.2     ÒPredicting Human Interruptibility with SensorsÓ[1]

Fogarty et al. work in using low-level, low-cost sensors to predict how likely a user is open to being interrupted.  This work shows that low-level sensors are on par with humans when predicting the interruptibility of another human.

 

2.2.1    Key Elements to Study

There are three key elements involved in this study:

 

1.     The human subjects whose actions were recorded in an office setting.

2.     The human estimators who studied the recordings of the human subjects to predict the human subjectÕs interruptibility.

3.     The human coders who were used to simulate sensors.

2.2.2    Human Subjects

Figure 2 Human Subject in Office Setting [1]

 

As the human subjects were being video recorded in an office setting they were prompted by their computers for interruptibility self reports.  These reports were queried for at Òrandom, but controlled, intervals averaging two prompts per hour. Subjects were asked to rate [their] interruptibility on a five-point scale, with 1 corresponding to ÔHighly InterruptibleÕ and 5 to ÔHighly Non-Interruptible.Õ  The human subjects were present for 627 of these prompts.Ó[1]

2.2.3    Human Estimators

Figure 3 Human Estimators UI for Determining Human Subject Interruptibility [1]

Ò[40 human estimators] were shown portions of the records collected from the [video] subjects.Ó[1]  Each estimator subject was asked to infer the interruptibility of the human subjects after watching video clips between 15 and 30 second intervals just before the human subjects were prompted for their own interruptibility estimate.

2.2.4    Results of Human Estimators

In deciding whether the human subject was interruptible on a scale from 1 to 5 the estimators had an overall accuracy of 30.7%.  Their accuracy improved when off by 1 to 65.8%.  When choosing between ÒHighly Non-InterruptibleÓ and all other choices their accuracy was 76.9%, which was slightly better than chance (always choosing Òhighly non-interruptibleÓ) at 70.6%.

2.2.5    Simulated Sensors

 

Figure 4 UI Used by Coders for Simulating Sensors that Infer the Human Subject's Interruptibility [1]

 

Humans were used to simulate sensor behavior.  This is known as the ÒWizard of OzÓ technique.  The reasoning behind this is to experiment with sensor behavior before actually implementing them.  Reflecting the use of this technique 24 events were identified by the human coders based on viewing the human subjects in and office setting.  The following events were chosen because they were believed to be highly related to predicting interruptibility and physical sensors could easily be built to capture these events:

 

Occupant Related

Occupant presense

 

Speaking, writing, sitting, standing, or on the phone

 

Touch of or interaction with: desk, table, file cabinet, food, drink, keyboard, mouse, monitor, and papers

Guest Related

Number of guest present.

 

For each guest: sitting, standing, talking or touching

Environment

Time of day

Aggregate

Anybody task (occupant and guest talking)

Table 1 Simulated Sensors

 

Also there were a set of six derivatives applied to each sensor.

 

Imm – whether the event occurred in the 15 second interval containing the self-report sample.

All-N – whether event occurred in every 15 second interval during N seconds prior to the sample.

Any-N – whether event occurred in any 15 second interval during N seconds prior to the sample.

Count-N - the number of times the event occurred during intervals in N seconds prior to the sample.

Change-N - the number of consecutive intervals for which the event occurred in one and did not occur in the other during N seconds prior to the sample.

Net-N - the difference in the sensor between the first interval in N seconds prior to the sample and the sensor in the interval containing the sample.

 

Table 2 Derivatives

 

The combination of sensor types and derivations define the feature set of sensors to be simulated.  The goal of Fogarty et al. is to construct an interruptibility model based on this feature set.  With this goal in mind the question then becomes which features are the most effective at predicting interruptibility?

 

2.2.6    Wrapper-Based Feature Selection Strategy

The model for determining interruptibility was developed using a data structure known as a decision tree in the discipline of Machine Learning.  The decision tree learning algorithm adds what it perceives to be the most significant features for determining interruptibility to the tree with the root of the tree being the most significant feature of them all.  The algorithm is as follows, starting with the empty set of features each feature was added to the set to determine which ones most improve the accuracy of the model.  Those features which did not improve the model were removed.  This cycle was repeated until there was no change that resulted in improving the accuracy of the model.

 

The benefit of developing the model in this manner is that it prevents the overfitting of data, but can be slow because it requires the Òrepeat application of a machine learning technique to learn which features are most important.Ó  Also decision trees tend to be shallow when compared to the amount of features considered.  In real world terms this translates into the simplest solution is usually the correct solution, hence only 10 features are considered as the most important features in figure 3 out of a possible 367.  The features are listed in their order of importance from the greatest to the least.

 

1

Any Talk (Imm)

2

Telephone (Any-30)

3

Time of Day (Hour Only)

4

Desk (Change-120)

5

Monitor (Any-300)

6

Occupant Talk (Net-120)

7

Writing (Count-30)

8

Writing (Count-60)

9

Papers (Count-300)

10

Mouse (All-120)

Table 3 Features chosen by Wrapper-based feature selection strategy

 

The above table reflects the results of the wrapper-based feature selection strategy.  90% of the data was used for training and 10% for testing.  The model was 82.4% accurate in distinguishing between the human subject being Òhighly non-interruptibleÓ and all others.  This is a promising result as the human estimatorsÕ accuracy was 76.9% and chance 70.6%.  Also take note that even though these low-level sensors are simulated they should relatively simple and cheap to build.

 

 

Accuracy Within 1

Overall Accuracy

Human Estimators

65.8%

30.7%

Simulated Sensors

75.1%

51.5%

Table 4 Comparison of Human Estimators versus Simulated Sensors

 

The simulated sensors also had an accuracy of 75.1% when within 1 and an overall accuracy of 51.5%.

 

2.2.7    What to Learn from Fogarty et al. Work

Fogarty et al. have shown that determining interruptibility based on low-level sensors can perform as well or better than humans without users explicitly indicating their interruptibility or interacting with calendars.  Based on the selected features for the model users are most likely Òhighly non-interruptibleÓ when engaged in a task or a social situation.  In deciding the degree of interruptibility Òestimates of 3 or 4 could be usedÉto initiate a negotiated interruption with an ambient displayÓ and Òestimates of 1 or 2 could be usedÉto decide to initiate [an interrupt] with a more direct method.Ó

 

2.3     ÒReducing the Cost of Interruption Using Gradual Awareness NotificationsÓ[6]

Figure 5 Gradual awareness notification of battery power[6]

 

The final study of interruption management systems can be loosely classified as scheduled.  Gradual awareness notifications gradually request the userÕs attention giving the user time to ÒscheduleÓ a breakpoint in his primary task to deal with the interrupt.  ÒGradual awareness techniques require no training, no sensors and no modeling of user behavior.Ó[6]

 

2.3.1    How Do Gradual Awareness Notifications Work?

Gradual awareness notifications work by using continuously growing display.  As in figure 5 the notification display starts out small and continuously grows over time.  The importance of the notification determines its rate of growth.

 

2.3.2    Experiment Set up

 

Wilson designed his experiments around three hypotheses:

 

1.      Slow-growth notifications are less disruptive to task execution than those that immediately pop-up on the desktop.

2.     Natural task breaks would become the preferred time in which users dealt with slow-growth notifications.

3.     Slow-growth notifications can improve overall task performance over pop-ups.

 

Figure 6 UI for gradual awareness notification experiment

 

For the experiment users were asked to type a document while being randomly interrupted by popup and slow growth interruptions.  Each of the seven users had to retype the document on the left to the right side of the user interface shown in figure 6.  The interruptions would appear randomly in the corners.

2.4     Results and Lessons Learned

            Four key metrics were used to measure the performance of each user in the study.  The first metric was response time. ÒResponse time was measured by the difference in milliseconds between when the notification first appeared and when the user pressed F2 to dismiss it.Ó [6] In this case users recognized the popup notifications quicker than the gradual notifications.  This is to be expected since the pop-up notifications are immediately displayed.

 

The next metric was resume time.  ÒThe resume time was measured as the delay between when the user pressed F2 to dismiss the notification and the next key press. This measurement provides a reasonable estimate of the amount of time it took for users to find their place and resume typing.Ó[6]  The results from this metric show that users resumed typing 39% faster after slow-growth notifications than pop-ups supporting the authorÕs first hypothesis that slow-growth notifications are less disruptive than pop-ups.

 

The third metric was the interruption point.  ÒThe interruption point was measured

by recording the text the user had successfully transcribed when they dismissed the notification, and comparing the userÕs text to the reference text.  For the task of transcribing text, four possible interruption points were identified.Ó

 

 There were four possible interruption points:

á      The middle of a word

á      The end of a word

á      The end of a sentence

á      The end of a paragraph

 

However, the end of a word, sentence or paragraph reflected natural task breaks.

 

Users tended to deal with slow-growth notifications at the end of a word, sentence or paragraph while the pop-ups were dealt while the user was in the middle of typing a word.  This result shows that users tended to deal with slow-growth notifications at natural task breaks, while pop-ups were more disruptive.  This evidence supports hypothesis #2.

 

            The final metric was page completion time.  ÒPage completion time was measured as the delay in seconds from when the user pressed the ÒStartÓ button and began transcribing the text to when they pressed the ÒNextÓ button, indicating page completion.Ó  Users on average completed their tasks 5% faster with slow-growth notifications when compared with pop-ups.  The author did not consider the enough of an improvement to support hypothesis #3.

3       Conclusion

 

While interruptions are a part of life we have illustrated that they can have a negative effect on the level of productivity in the workplace.  However, the goal is not to ignore them, but rather minimize their effect on productivity.  This can be done through a number of software and sensor-based solutions.

 

Although there maybe gap between what low-level sensors and programs can infer about a userÕs willingness to be interrupted, it has been shown that this gap is closing.  Fogarty has shown that by interpreting data received from low-level sensors with a machine learning technique computers can perform better than humans at determining the interruptibility of another human.  Also, Wilson has shown that just by gradually presenting notifications, as opposed to having them pop-up on a userÕs screen, can reduce the disruptive nature of a notification and allow the user to deal with it at a natural task break.

 

4       References

 

1.     James Fogarty, Scott E. Hudson, Christopher G. Atkeson, Daniel Avrahami, Jodi Forlizzi, Sara Kiesler, Johnny C. Lee and Jie Yang, 2005. ÒPredicting Human Interruptibility with Sensors,Ó March 2005

 

2.     B. P. Bailey, J. A. Konstan, and J. V. Carlis, 2000 ÒMeasuring the Effects of Interruptions on Task Performance in the User Interface,Ó In Proc. IEEE Conf. on Systems, Man, and Cybernetics 2000, 757–762.

 

3.     S. M. Hess and M. C. Detweiler, 1994 ÒTraining to Reduce the Disruptive Effects of Interruptions,Ó Proceedings of the Human Factors and Ergonomics Society 38th Annual Meeting.

 

4.     D. McFarland, 1999 ÒCoordinating the Interruption of People in Human-Computer Interaction,Ó Human-Computer Interaction – INTERACT Õ99, IOS Press, Inc., The Netherlands, pp. 295-303.

 

5.     E. Curtell, Mary Czerwinski, and Eric Horvitz, 2001 ÒNotification, Disruption, and Memory: Effects of Messaging Interruptions on Memory and PerformanceÓ

 

6.     T. Wilson, ÒReducing the Cost of Interruption Using Gradual Awareness NotificationsÓ

 

7.     J. Christensen, J. Sussman, S. Levy, W. E. Bennett, T. V. Wolf, and W. Kellogg, 2006, ÒToo Much InformationÓ, Human-Computer Interaction – Õ06 Vol. 4, No. 6 –July/August 2006

 

8.     J. Hudson, J. Christensen W. A. Kellogg, and T. Erickson, 2002 ÒIÕd be Overwhelmed, but ItÕs Just One More Thing to Do: Availability and Interruption in Research Management.Ó  In Human Factors in Computing Systems, CHI 2002 Proceedings.  New York: ACM Press.