An Integrative Paper on User Interruptibility

Lavar Askew

Abstract

The goals of this paper are to describe what effects interrupts have on people in the workplace and to present the state-of-the-art in the research and development of interruption management systems. In the coming sections this paper will present research which quantifies the effect of an interrupt on primary task completion, the social and environmental cues humans use, and design considerations when building interruption management systems.

1. Introduction

An interrupt is the temporary stopping of a primary task to give attention to another task. It serves a mechanism through which information is delivered to an individual. The degree of the importance of interrupts can range between those that deliver warning messages requiring an individual’s immediate attention to those that simply provide state-based information which do not require the individual’s attention at all. However, in the workplace, these interrupts can have a negative effect on the productivity of an individual already engaged in a task. This negative effect has to do with the timing in which the interrupt seeks the attention of the person. In the workplace, where meeting deadlines is paramount, minimizing the effect of an interrupt on productivity can be useful.

While people use environmental and social cues to determine whether or not to interrupt another person computers and software are not designed to recognize such cues. This can result in loss of focus on the primary task for the interrupted. The goal then is to design systems which can recognize an opportune time to interrupt the user which minimizes the effect of the interrupt on the primary task of the user.

1.1 How Interrupts Effect Task Performance

Task performance is crucial to meeting deadlines in the workplace. However, interruptions from both humans and computers can make completing a task in a timely manner difficult. Bailey et al. have shown that a person “spent between 5% and 40% longer on an interrupted [task] than a non-interrupted task.” [2] The difference in completion times for the primary task is directly related to the memory load required by the primary task. Tasks which have a high memory load, such as reading comprehension, take longer to complete than tasks which have a low memory load.

Even routine primary tasks are subject to the negative effects interrupts. Hess and Detweiler showed that if a user is trained to perform a task for two sessions without interruptions and then asked to perform the same task with interruptions, the interruptions were “highly harmful to performance.”[3]

1.2 Social and Environmental Cues for Determining Interruptibility

While a computer may indiscriminately present notifications to its user, humans use social and environmental cues to determine an opportune time to interrupt a co-worker. An example of an environmental cue may be a co-worker leaving her office door open possibly indicating that she is available for discussion while a closed door may indicate just the opposite. A social cue example may be when the interruptee makes eye contact with the interrupter. By making eye contact, the interruptee is acknowledging the presence of the interrupter and is willing to hear what the interrupter has to say while avoiding eye contact may indicate just the opposite.

Succinctly, these cues reflect a negotiated policy towards expressing interruptibility, where “negotiated” refers to the interruptee deciding when to accept an interruption. Edward Curtrell et al. mention in related work that there are four interruption policies in human-computer interaction. They are [4]:

· Immediate – requiring a user’s immediate response.

· Negotiated – user chooses when to attend.

· Mediated – an intelligent agent might determine when best to interrupt.

· Scheduled – interruptions come at prearranged time intervals.

However McFarlane has found that none of the above mentioned policies is the best method for deciding when to interrupt a user. The immediate interruption policy is what users are faced with now and research has shown that this type of policy has a negative effect on the user’s memory load. The negotiated interruption policy may allow a user to decide when to be interrupted, but this also gives rise to an interrupt being ignored indefinitely. The mediated interruption policy seems promising, but that depends on the criteria that the intelligent agent is using. Finally, the scheduled interruption policy involves setting aside a prearranged time to present interruptions. This pre-arranged time can be situational or at particular timestamp intervals.

2 Design Considerations When Building Interruption Management Systems

When designing interruption management systems, one must consider the importance of the message from the interrupter and the willingness of the interruptee to be interrupted. This section focuses on the state-of-the-art in the research of interruption management systems. Research which focuses on the negotiated, mediated, and scheduled interruption policies will be featured.

2.1 Grapevine

Figure 1 Grapevine e-Business Card [7]

IBM has developed a system known as Grapevine which presents a user’s aggregate contextual information to colleagues in the form of an e-business card. The contextual information is provided by a user’s computer, mobile device, telephone, and motion detectors. The goal of Grapevine is to aid those wishing to communicate with the user in deciding when to initiate contact and via which channel [7].

2.1.1 “Location Awareness is a Good Thing” [7]

Grapevine is an example of a system implementing the mediated interruption policy. Potential communicators may view a user’s last physical location and computer application activity before initiating contact with the user. With this information taken together the interruptee chooses an opportune time to interrupt a user.

A user’s last location proved to be the most useful feature of the system. One reason given for this phenomenon is that if you want to communicate with a person and you know through Grapevine the person is out of town then you may choose another person to communicate with. Another reason given was that if a user is online early in the morning then goes offline the person wishing to communicate with the user may infer that the user is enroute to the office and choose to meet with him in person rather than call his cell phone.

2.1.2 “Computer Application Activity is a Mixed Blessing” [7]

Grapevine is able to detect and collect user activity in applications such as instant messaging clients and productivity suites. This data is then reported to the central aggregation service. Potential communicators can then use “this information to make better informed communication decisions.” [7]

While some users expected their colleagues to check with Grapevine before contacting them, some were not so comfortable with others knowing what they were doing. For these users, Grapevine provided functionality to block others from viewing this data.

Other issues arose when applications were used for more than one purpose. This can cause one to mis-infer the interruptibility of a colleague. For example, Christensen et al. [7] note that Lotus Notes supports a variety of applications and a colleague could not reliably infer that the user was composing an e-mail.

2.1.3 “Lessons Learned”[7]

The Grapevine research project began at IBM in 2001 and concluded in 2005. Enumerated below are the lessons which should be considered when designing an interruption management system:

“Do not expect users to do anything extra to provide context. Context that depends on user actions will not be reliable.”[7]
“There must be a simple, powerful, and intuitive way of giving users peace of mind with respect to the visibility of their context information.”[7]
“People look to instant messaging for real-time context.”
“A substantial semantic gap exists between the information that low-level sensors and programs can detect the high-level ability and willingness of a person to communicate with someone else.”[7]

While Grapevine is only being used by humans to determine the interruptibility of other humans, one can easily imagine software systems that use Grapevine to do the same. It seems reasonable that an interruption management system could be built into the next generation of instant messaging clients. Nevertheless, Hudson et al. [8] point out that “a substantial semantic gap exists between the information that low-level sensors and programs can detect the high-level ability and willingness of a person to communicate with someone else”, however, Fogarty et al. provide evidence that this semantic gap may be closing. This evidence is summarized in the next section.

2.2 “Predicting Human Interruptibility with Sensors”[1]

Fogarty et al. work in using low-level, low-cost sensors to predict how likely a user is open to being interrupted. This work shows that low-level sensors are on par with humans when predicting the interruptibility of another human.

2.2.1 Key Elements to Study

There are three key elements involved in this study:

1. The human subjects whose actions were recorded in an office setting.

2. The human estimators who studied the recordings of the human subjects to predict the human subject’s interruptibility.

3. The human coders who were used to simulate sensors.

2.2.2 Human Subjects

Figure 2 Human Subject in Office Setting [1]

As the human subjects were being video recorded in an office setting they were prompted by their computers for interruptibility self reports. These reports were queried for at “random, but controlled, intervals averaging two prompts per hour. Subjects were asked to rate [their] interruptibility on a five-point scale, with 1 corresponding to ‘Highly Interruptible’ and 5 to ‘Highly Non-Interruptible.’ The human subjects were present for 627 of these prompts.”[1]

2.2.3 Human Estimators

Figure 3 Human Estimators UI for Determining Human Subject Interruptibility [1]

“[40 human estimators] were shown portions of the records collected from the [video] subjects.”[1] Each estimator subject was asked to infer the interruptibility of the human subjects after watching video clips between 15 and 30 second intervals just before the human subjects were prompted for their own interruptibility estimate.

2.2.4 Results of Human Estimators

In deciding whether the human subject was interruptible on a scale from 1 to 5 the estimators had an overall accuracy of 30.7%. Their accuracy improved when off by 1 to 65.8%. When choosing between “Highly Non-Interruptible” and all other choices their accuracy was 76.9%, which was slightly better than chance (always choosing “highly non-interruptible”) at 70.6%.

2.2.5 Simulated Sensors

Figure 4 UI Used by Coders for Simulating Sensors that Infer the Human Subject's Interruptibility [1]

Humans were used to simulate sensor behavior. This is known as the “Wizard of Oz” technique. The reasoning behind this is to experiment with sensor behavior before actually implementing them. Reflecting the use of this technique 24 events were identified by the human coders based on viewing the human subjects in and office setting. The following events were chosen because they were believed to be highly related to predicting interruptibility and physical sensors could easily be built to capture these events:

Occupant Related	Occupant presense
	Speaking, writing, sitting, standing, or on the phone
	Touch of or interaction with: desk, table, file cabinet, food, drink, keyboard, mouse, monitor, and papers
Guest Related	Number of guest present.
	For each guest: sitting, standing, talking or touching
Environment	Time of day
Aggregate	Anybody task (occupant and guest talking)

Table 1 Simulated Sensors

Also there were a set of six derivatives applied to each sensor.

Imm – whether the event occurred in the 15 second interval containing the self-report sample.

All-N – whether event occurred in every 15 second interval during N seconds prior to the sample.

Any-N – whether event occurred in any 15 second interval during N seconds prior to the sample.

Count-N - the number of times the event occurred during intervals in N seconds prior to the sample.

Change-N - the number of consecutive intervals for which the event occurred in one and did not occur in the other during N seconds prior to the sample.

Net-N - the difference in the sensor between the first interval in N seconds prior to the sample and the sensor in the interval containing the sample.

Table 2 Derivatives

The combination of sensor types and derivations define the feature set of sensors to be simulated. The goal of Fogarty et al. is to construct an interruptibility model based on this feature set. With this goal in mind the question then becomes which features are the most effective at predicting interruptibility?

2.2.6 Wrapper-Based Feature Selection Strategy

The model for determining interruptibility was developed using a data structure known as a decision tree in the discipline of Machine Learning. The decision tree learning algorithm adds what it perceives to be the most significant features for determining interruptibility to the tree with the root of the tree being the most significant feature of them all. The algorithm is as follows, starting with the empty set of features each feature was added to the set to determine which ones most improve the accuracy of the model. Those features which did not improve the model were removed. This cycle was repeated until there was no change that resulted in improving the accuracy of the model.

The benefit of developing the model in this manner is that it prevents the overfitting of data, but can be slow because it requires the “repeat application of a machine learning technique to learn which features are most important.” Also decision trees tend to be shallow when compared to the amount of features considered. In real world terms this translates into the simplest solution is usually the correct solution, hence only 10 features are considered as the most important features in figure 3 out of a possible 367. The features are listed in their order of importance from the greatest to the least.

1	Any Talk (Imm)
2	Telephone (Any-30)
3	Time of Day (Hour Only)
4	Desk (Change-120)
5	Monitor (Any-300)
6	Occupant Talk (Net-120)
7	Writing (Count-30)
8	Writing (Count-60)
9	Papers (Count-300)
10	Mouse (All-120)

Table 3 Features chosen by Wrapper-based feature selection strategy

The above table reflects the results of the wrapper-based feature selection strategy. 90% of the data was used for training and 10% for testing. The model was 82.4% accurate in distinguishing between the human subject being “highly non-interruptible” and all others. This is a promising result as the human estimators’ accuracy was 76.9% and chance 70.6%. Also take note that even though these low-level sensors are simulated they should relatively simple and cheap to build.

	Accuracy Within 1	Overall Accuracy
Human Estimators	65.8%	30.7%
Simulated Sensors	75.1%	51.5%

Table 4 Comparison of Human Estimators versus Simulated Sensors

The simulated sensors also had an accuracy of 75.1% when within 1 and an overall accuracy of 51.5%.

2.2.7 What to Learn from Fogarty et al. Work

Fogarty et al. have shown that determining interruptibility based on low-level sensors can perform as well or better than humans without users explicitly indicating their interruptibility or interacting with calendars. Based on the selected features for the model users are most likely “highly non-interruptible” when engaged in a task or a social situation. In deciding the degree of interruptibility “estimates of 3 or 4 could be used…to initiate a negotiated interruption with an ambient display” and “estimates of 1 or 2 could be used…to decide to initiate [an interrupt] with a more direct method.”

2.3 “Reducing the Cost of Interruption Using Gradual Awareness Notifications”[6]

Figure 5 Gradual awareness notification of battery power[6]

The final study of interruption management systems can be loosely classified as scheduled. Gradual awareness notifications gradually request the user’s attention giving the user time to “schedule” a breakpoint in his primary task to deal with the interrupt. “Gradual awareness techniques require no training, no sensors and no modeling of user behavior.”[6]

2.3.1 How Do Gradual Awareness Notifications Work?

Gradual awareness notifications work by using continuously growing display. As in figure 5 the notification display starts out small and continuously grows over time. The importance of the notification determines its rate of growth.

2.3.2 Experiment Set up

Wilson designed his experiments around three hypotheses:

1. Slow-growth notifications are less disruptive to task execution than those that immediately pop-up on the desktop.

2. Natural task breaks would become the preferred time in which users dealt with slow-growth notifications.

3. Slow-growth notifications can improve overall task performance over pop-ups.

Figure 6 UI for gradual awareness notification experiment

For the experiment users were asked to type a document while being randomly interrupted by popup and slow growth interruptions. Each of the seven users had to retype the document on the left to the right side of the user interface shown in figure 6. The interruptions would appear randomly in the corners.

2.4 Results and Lessons Learned

Four key metrics were used to measure the performance of each user in the study. The first metric was response time. “Response time was measured by the difference in milliseconds between when the notification first appeared and when the user pressed F2 to dismiss it.” [6] In this case users recognized the popup notifications quicker than the gradual notifications. This is to be expected since the pop-up notifications are immediately displayed.

The next metric was resume time. “The resume time was measured as the delay between when the user pressed F2 to dismiss the notification and the next key press. This measurement provides a reasonable estimate of the amount of time it took for users to find their place and resume typing.”[6] The results from this metric show that users resumed typing 39% faster after slow-growth notifications than pop-ups supporting the author’s first hypothesis that slow-growth notifications are less disruptive than pop-ups.

The third metric was the interruption point. “The interruption point was measured

by recording the text the user had successfully transcribed when they dismissed the notification, and comparing the user’s text to the reference text. For the task of transcribing text, four possible interruption points were identified.”

There were four possible interruption points:

· The middle of a word

· The end of a word

· The end of a sentence

· The end of a paragraph

However, the end of a word, sentence or paragraph reflected natural task breaks.

Users tended to deal with slow-growth notifications at the end of a word, sentence or paragraph while the pop-ups were dealt while the user was in the middle of typing a word. This result shows that users tended to deal with slow-growth notifications at natural task breaks, while pop-ups were more disruptive. This evidence supports hypothesis #2.

The final metric was page completion time. “Page completion time was measured as the delay in seconds from when the user pressed the “Start” button and began transcribing the text to when they pressed the “Next” button, indicating page completion.” Users on average completed their tasks 5% faster with slow-growth notifications when compared with pop-ups. The author did not consider the enough of an improvement to support hypothesis #3.

3 Conclusion

While interruptions are a part of life we have illustrated that they can have a negative effect on the level of productivity in the workplace. However, the goal is not to ignore them, but rather minimize their effect on productivity. This can be done through a number of software and sensor-based solutions.

Although there maybe gap between what low-level sensors and programs can infer about a user’s willingness to be interrupted, it has been shown that this gap is closing. Fogarty has shown that by interpreting data received from low-level sensors with a machine learning technique computers can perform better than humans at determining the interruptibility of another human. Also, Wilson has shown that just by gradually presenting notifications, as opposed to having them pop-up on a user’s screen, can reduce the disruptive nature of a notification and allow the user to deal with it at a natural task break.

4 References

1. James Fogarty, Scott E. Hudson, Christopher G. Atkeson, Daniel Avrahami, Jodi Forlizzi, Sara Kiesler, Johnny C. Lee and Jie Yang, 2005. “Predicting Human Interruptibility with Sensors,” March 2005

2. B. P. Bailey, J. A. Konstan, and J. V. Carlis, 2000 “Measuring the Effects of Interruptions on Task Performance in the User Interface,” In Proc. IEEE Conf. on Systems, Man, and Cybernetics 2000, 757–762.

3. S. M. Hess and M. C. Detweiler, 1994 “Training to Reduce the Disruptive Effects of Interruptions,” Proceedings of the Human Factors and Ergonomics Society 38^th Annual Meeting.

4. D. McFarland, 1999 “Coordinating the Interruption of People in Human-Computer Interaction,” Human-Computer Interaction – INTERACT ’99, IOS Press, Inc., The Netherlands, pp. 295-303.

5. E. Curtell, Mary Czerwinski, and Eric Horvitz, 2001 “Notification, Disruption, and Memory: Effects of Messaging Interruptions on Memory and Performance”

6. T. Wilson, “Reducing the Cost of Interruption Using Gradual Awareness Notifications”

7. J. Christensen, J. Sussman, S. Levy, W. E. Bennett, T. V. Wolf, and W. Kellogg, 2006, “Too Much Information”, Human-Computer Interaction – ’06 Vol. 4, No. 6 –July/August 2006

8. J. Hudson, J. Christensen W. A. Kellogg, and T. Erickson, 2002 “I’d be Overwhelmed, but It’s Just One More Thing to Do: Availability and Interruption in Research Management.” In Human Factors in Computing Systems, CHI 2002 Proceedings. New York: ACM Press.