HomePage | SiteMap | Refresh| EditPage| RenamePage| WikiHelp| LogIn


Sasa Junuzovic

1. Introduction

With the release of every new or improved computer product (both software and hardware) comes an eighteen-wheeler trailer loaded with claims such as “upgraded performance,” “new features,” and “improved user-friendliness.” These claims, whether they are true or not, can be substantiated only once the consumers start using the product. To people’s surprise and disappointment, the claims that ship with a product are often a distant cousin of the actual change. For example, a new version of a software product may be advertised as having improved performance; unfortunately, to the dismay of the users who buy the software, the improvement is often in large part the result of increased minimum hardware requirements for the new version. Even in the cases where the claims are correct, the features they tout usually help only select user groups. This is especially true of the “user-friendly” buzz word when the most diverse and inclusive segment of the population is considered – people with disabilities1. What good is the Internet to a person who has no other motor capabilities except controlling a single switch? What good is a cool and sleek computer mouse to people who do not have the fine motor capabilities required to navigate the mouse pointer? And what good is a graphical user interface to a blind computer user?

Thankfully, the need for software and hardware to support these users is becoming recognized as evident through the emergence of screen-readers and improved accessibility options in the mainstream operating systems. In addition, course offerings on enabling and assistive technologies are making their way into both graduate and undergraduate computer science programs. In one such class, students created projects that take giant leaps forward toward solving some of the problems posed above. The Hawking Toolbar, for example, inspired by Steven Hawking, offers switch-activated Internet browsing. Another project, the Head Typer, enables people with extremely poor motor capabilities to control the mouse pointer through head gestures. And yet another product, the JerryFreeMouse, is a chat client that relies on no visual interfaces so that it can be used by visually impaired or blind people. All of these projects (and many more not mentioned) are taking steps toward improving enabling technologies. In many ways, these projects are opening new doors to people with disabilities. Here, I focus on the JerryFreeMouse, the chat client for blind people.

2. Motivation

There are many applications that need to be made accessible to blind people: email clients, word processors, and internet browsers are just some of them. With such a wide range of useful applications, some may question making a chat client for blind people. However, my selection of a chat client as the application of choice is justified for the following reasons:

  1. Screen-readers do not work well with chat clients or chat rooms: The main reason screen readers don’t work well in chat rooms is because the volume of new text is very large and the screen readers cannot keep up. As for chat clients, the screen readers offer no support for multiple conversations because screen-readers today do not understand the underlying semantics of the applications they support. Thus, they cannot distinguish between different conversations or understand the various events such as a user coming online or a new conversation being started in a chat application.
  2. Current work on non-graphical user interfaces is focusing on non-interactive software such as Outlook and Internet Explorer [2]: While this work is of tremendous importance to improving the user interfaces of all applications, I believe that interactive applications, such as a chat client, pose a unique set of problems that need to be solved. An example of a problem unique to a chat client is handling multiple conversations: How are multiple conversations presented in an understandable way without having a graphical interface?
  3. Blind children often have little to no interaction with their friends once they go home: Children today often interact with their friends through various chat clients. Unfortunately, this effectively cuts off the blind children from interacting with their friends, both sighted and blind since the screen-readers do not work well with chat applications. Hence, providing blind children with a chat client they can use will hopefully offer them a new way of interacting with their friends.
  4. My research is on Distributed Collaboration and a large part of this field is interactive applications: This summer, my internship will involve adding various collaboration awareness features to a co-editor. JerryFreeChat is an application that focuses on providing sound-based collaboration awareness. I want to use the sound-based awareness experience from JerryFreeChat to add to the repertoire of awareness tools I have to use for my summer project.

3. Chat Issues

Current enabling technologies do not work very well with chat applications. While in more traditional applications, such as a word processor, the enabling technology needs to react only to a single user making inputs, this is not the case in chat applications. In a chat application, the local user is interacting with one or more remote users and as a result, the enabling technologies have to solve the following new problems:

  1. Since multiple users are interacting with the local application, the external events that occur asynchronously to the events caused by the actions of the local user must be handled.
  2. Interacting applications necessarily have multiple threads of execution (i.e. multiple conversations in a chat client). Presenting these threads of execution in a manner that allows the local user to distinguish between and interact with them must be done.

In a way, the first problem is a more specific instance of the second problem, because multiple threads of execution imply some sort of asynchrony. As a result, a solution to the second problem should solve the first problem, as well.

Since JerryFreeChat is targeting blind users, non-visual channels of human computer interaction need to be explored. One such channel is audio, and it was used to make JerryFreeChat accessible to blind users. The solution adopted in JerryFreeChat is that of spatial audio. Before I describe the use of spatial audio, I briefly describe the system architecture, and in particular, its portable design. I then show how audio, and in particular spatial audio, is used to make JerryFreeChat accessible to blind people. After this, I explain browsing the conversation history and the user controls. The document finishes with directions for future work and some of the problems I encountered that need to be considered by anyone extending this work.

4. Architecture

In order to demonstrate the usefulness of the JerryFreeChat audio interface, I chose to write a simple chat client and server program instead of adapting an existing application. I believe that extending existing applications would have distracted my efforts from creating a novel audio interface – understanding somebody else’s code and then modifying is often time consuming. Since the focus of my project was the user interface (and not the underlying communication), I simply wrote my own chat application.

4.1. The Chat Server

The server I implemented supports only the following operations:

  1. Login and logout
  2. Query remote user information
  3. Query if a remote user is online
  4. Query all remote users that are online

The chat server is centralized and no caching of remote user information is done on the clients. For example, if a chat client sends two consecutive text messages to the same remote user, both messages first query the remote user’s information (such as host and port) on the server. This does not scale well, but this simple design reduces the amount of overhead I would have had to spend dealing with replication protocols.

4.2. The Chat Client

The client I implemented allows the following calls to be made by the UI:

  1. Send text message to a remote user
  2. Logout
  3. Get the list of all online users

The client performs a login upon instantiation, so that is why it does not expose login functionality. Also, even though the server can reply to single user status queries, the client (at the moment) does not allow the UI to perform such a call; instead, the UI can ask for a list of all online users.

The client also fires notification events when one of the following occurs:

  1. A text message is received
  2. The status of a conversation changes
  3. The status of a user changes
  4. A reply to the user status query is received

Hence, the UI that interacts with the chat client can register for and catch these events in order to convey them to the user.

4.3. The Basic UI

The Basic UI is effectively a text UI for the underlying chat client. This Basic UI can make calls to the chat client and handle notification events fired by the chat client. The key aspect of the Basic UI is that the UI is designed in a manner that allows non-text based UIs to extend and override any of these capabilities. For example, the Audio UI (described below) overrides the chat client notification event handlers in the Basic UI. A diagram representing the three project parts described so far is shown below in Figure 1.

Figure 1. Chat Server, Chat Client, and Basic UI Architecture

The Basic UI is also responsible for maintaining conversation data and status information, the state of the system (mode, current conversation, etc.). Any UI that extends the basic UI has access to this information.

The Basic UI has two modes: a main menu mode and a chat mode. While in the main menu mode, the user can perform operations such as query the users currently online, start a conversation with a remote user, and sign out of the chat server. The user can also switch from the main menu mode into conversation mode. When the user is in chat mode, the user can navigate through the ongoing conversations, send text messages, and read the history of the current conversation he or she is in. Of course, the user can switch back into main menu mode, as well.

Implementing my own chat client has a further advantage: the Basic UI (and hence the Audio UI) is portable to any chat client in use today. I designed the Basic UI in a manner that assumes only basic chat functionality in a chat client: login/logout, start/end conversation, send message, and query remote user status. Furthermore, I assumed the chat client could only notify the UI of the following events: login/logout status, incoming text message, conversation status change, remote user status change. By writing my own chat application, it was easier to stay focused on the UI features that support only on this subset of chat application capabilities instead of worrying about things such as file transfer notifications and starting audio/video conversations.

Thus, with access to the source code of any chat client, only a wrapper needs to be made that allows my audio UI to interact with the underlying chat application. Although I have not written such a wrapper, creating one should not be too difficult. In fact, it could be an interesting extension of this project. A diagram of the MSN wrapper role is shown in Figure 2.

Figure 2. The MSN Wrapper Functionality

4.4. The Audio UI

The Audio UI extends the Basic UI by converting into speech the text messages the Basic UI outputs on events, errors, and operations. The Audio UI also provides a way for the user to know which mode the chat client is currently in, as well as distinguish between the messages and events belonging to different concurrent conversations. I introduce the Audio UI here simply to show where its place is in the system architecture. In the next section, I delve into the details of the Audio UI. Figure 3 displays the relationship between the Basic UI and the Audio UI. Potentially, a graphical UI could be developed as an extension to the project. This would allow both blind and sighted users to use the applications in a way easiest for them.

Figure 3. Audio UI Extension of the Basic UI

5. Audio UI Functionality

There are two key properties of the Audio UI in JerryFreeChat: locality and spatial audio. These two properties allow a user to always know what part of the application is currently active and to distinguish between text messages and notifications belonging to distinct concurrent conversations.

5.1. Locality Audio

One problem a blind user has when using any applications is keeping track which part of the application is currently the active one. For example, suppose a user decides to listen to a menu of possible options, but while the menu options are being translated into speech, the user is interrupted by a phone call. When the user returns, he or she may have forgotten that a menu option must be selected. Thus, the user may proceed to attempt to use the application as if no menu selection needs to be made. This is not inconceivable as the user may have been interrupted for a period of time much longer than it takes to translate the menu options into speech. However, if some sort of menu specific noise is played in the background whenever a menu choice needs to be made, the user will always know that he or she must select a menu option regardless of how long the interruption actually is.

This is the approach used in JerryFreeChat to help orientate the user as to which mode the application is currently in. For example, when the user is in the main menu mode, a music beat plays quietly in the background so that the user can immediately tell that the main menu mode is currently active. When the user switches into chat mode, the music beat of the main menu mode stops, and the user can hear chatter in the background. Hence, with this simple use of background audio, the user can always tell which mode the chat application is currently in.

NOTE: At the moment, there is no specific background audio played when the user is in the middle of entering a command. For example, when a user decides to start a conversation with a remote user, the remote user’s name needs to be entered next. A specific sound for “currently starting a conversation” should be played in the background to remind the user that he or she is in the middle of starting a conversation if for some reason they are interrupted and forget. Similarly, other commands, such as query the online users also needs to have specific audio to represent them. But even though this functionality does not exist, the client is designed for errors so even the worst thing that can happen is that the user will hear an error message, something from which the user can recover easily.

5.2. Spatial Audio

5.2.1. Speech

The use of spatial audio to help distinguish between events and messages of various concurrent conversations is the novel contribution of this project. The spatial audio is used in the following manner:

  1. The current ongoing conversations are arranged in a logical semi circle in front of the local user.
  2. The conversation the local user is currently in is has the center audio channel dedicated to it, and all the conversation messages, events, and history readings appear to come from in front of the user. This conversation is called the center conversation.
  3. The audio from conversations to the left of the center conversation is played so that to the local user, the sound appears to come from the left. The further to the left a conversation is of the center conversation, the more to the left its audio appears to come from. The conversation furthest to the left, for example, has its audio played only in the left speaker.
  4. Conversations to the right of the center conversations are handled just as the conversations to the left, except their audio appears to be coming from the right speaker.

Figures 4 and 5 show a graphical representation of the conversation arrangements and how as the user changes the center conversation, the audio corresponding the all the conversations changes the direction it is coming from.

Apart from the audio belonging to the various conversations, the user may also be given feedback from the chat client itself when the user either performs an operation or a conversation independent event (such as a user logging on) occurs. We call this feedback control feedback as it pertains more to the chat client’s functionality than to separating conversation events with spatial sound. The control feedback audio is always provided through the center channel. Hence, while the user is in a conversation, the audio resulting from translating the conversation text messages into sound may conflict with the control channel audio. We describe how we solve this problem below.

Figure 4. Spatial Sound Illustration with the Middle Conversation as the Center Conversation

Figure 5. Spatial Sound Illustration with the Left Conversation as the Center Conversation

5.2.2. Notifications

The audio notifications (of unanswered messages or new conversations) were played from the same audio channel as their corresponding conversation. Unlike speech audio however, which is usually played only once, a notification sound is repeated until the user switches to the conversation for which the notification is being played. For example, when the user to the right sends a message, the user will hear “User X says: test.” This message is not repeated again. However, the notification for the unanswered message will be replayed again and again until the user actual resolves the notification by switching the center conversation to be the one for the rightmost user.

NOTE: During the Maze Day demos, I asked the people (both children and grown-ups) how they feel about repetitive notifications. Some of them liked them, and some of them hated them. As a result, an option needs to be added that allows toggling on and off the repetitive notifications.

Please note that a conversation can have at most one notification. Thus, if a new conversation is started, regardless of how many messages the remote user sends, the local user will hear only the single “new conversation” started notification for that conversation. For the answered conversations (ones that the local user selected as the center conversation at sometime in the past), the interesting notifications are unanswered messages. Again, regardless of how many unanswered messages there are for any given conversation, there will be only one notification played for that message. By limiting the number of outstanding notifications per message to one, the unnecessary clutter in the “audio space” is reduced. After all, one notification is enough to inform the local user that his or her attention is needed in a conversation.

Before I proceed into further enhancements to representing conversation events with spatial audio, I offer an example scenario. Since I have no way of providing an actual sample of the audio the users hear in this document, I instead offer a common scenario in order to further explain what the user hears during the chat session.

Example: Suppose the local user, D, is chatting with A, B, and C. Suppose also that the conversations were started in this same order $(H f(Birst with A, then with B, and then with C. Hence, to user D, it appears that users A, B, and C are standing in a semi-circle with user A on the left, user B in the middle, and user C on the right. Suppose user D is currently talking to user B. If user A now messages user D, user D will hear “A says: text message” coming from the left speaker. Of course, if user C sends a text message to D, D will hear the readout of C’s message from the right speaker. However, now the user D is chatting with user B but there are outstanding messages (messages user D has not responded to) from both users A and C. As a result, the audio UI plays notification sounds to user D informing the user that there are unanswered messages: there is a notification sound playing from the left speaker (A’s message) and the right speaker (C’s message). Suppose user E now starts a conversation with the local user D. To user D, E now appears to be at the right-most point of the conversation semi-circle, and D will hear a new conversation notification sound coming from the right speaker (when the conversation started, D also heard “E wants to chat” from the control channel). Since E is now the rightmost user, the notification for user C’s unanswered message will now sound as if it is coming between the audio for the current conversation with B and the notification sound for a new conversation with E. Hence, the spatial audio is adjusted dynamically as conversations are created and destroyed.

5.2.3 Further Enhancements to Spatial Audio

Using a Variety of Voices

Above, I mentioned that the control channel audio and the current conversation audio can interfere with each other since they are both coming from the center channel. This problem also occurs with audio from different conversations, especially if more than two events for more than two conversations occur at the same time.

In order to deal with the problem, two additional techniques for differentiating events for multiple conversations, apart from pure spatial sound are used:

  1. Each conversation is assigned a unique voice. The voice is randomly generated when the conversation starts: a random speaker is picked and a random rate of speech for the speaker is picked.
  2. Volume is used to further differentiate the speakers. The center conversation audio volume is set to normal while the further away a conversation is from the center conversation, the lower the volume of its audio. This is done not only for the speech, but also for notifications.

These two further enhancements appear to make differentiating audio belonging to different conversation easier. However, no user studies have been performed so there has been very little user feedback provided (I’ve briefly talked with Jason during and after my in-class demo).

NOTE: During the Maze Day demo, I setup some of the conversations so that the users would hear events for multiple conversations at once. At first, they had difficulty understanding multiple text messages. However, the users that chatted for a while quickly learned how to distinguish between the multiple message. This offers some evidence that the spatial sound with the enhancements described here is an effective way of distinguishing events of different concurrent conversations.

Random Backoff for Notifications

During the initial tests of the Audio UI, I discovered that at times, multiple notifications were difficult to distinguish from each other, especially if they occurred at the same time. For example, if there were outstanding messages for multiple conversations, and the corresponding notifications were all played at the same time, it seemed as if only one conversation had an unanswered message. Moreover, it seemed like the center conversation was this conversation as the sound appeared to come from the center channel.

To solve this problem, notifications are not repeated at regular intervals. Instead, each notification randomly waits between 1 and 5 seconds before repeating itself. While this may still result in overlapping notification sounds, most of the time, notifications from different conversations will start at different times, thus making it easier to distinguish between them.

5.2.4. Other Uses of Audio

Default Audio Actions

In order to help the user navigate conversations, every time the user switches the center conversation, the user will hear:

  1. Who the new center conversation is with.
  2. The last message in the conversation and the user who entered that message.

This default action of playing the last message in the conversation when a conversation become the center one was done based on my own experience of using chat clients: whenever I switch conversations, I read the last couple of messages to remind myself what it was I said or what was the question the remote user was asking of me.

NOTE: During the Maze Day, the users really liked having the last message of a conversation read back to them. It helped inform them of who they are currently talking to in case the computer said the remote user$(Bs (Bname incorrectly and in a difficult to understand manner.

Main Menu Audio

Apart from hearing the background beat that characterizes the main menu mode, the feedback to any user input is also given.

For example, consider the scenario in which the local user would like to start a conversation with some friend who is online. The user first request the status of online users to which the following feedback is given: “Do you want query the online users?” is read by the control channel voice. If the local user enters ‘n’, the command is cancelled; if the user enters ‘y’, the query is performed and the user shortly hears “The current users who are online are ,T”

(Bp. Now suppose that Johnny is online and that the local user would like to start a conversation with Johnny. When the local user enters the command to start a conversation, the control channel voice will say “Who do you want to chat with?” When the user enters the name of the user, the control channel voice will say “What greeting do you want to send?” Finally, when the user enters the message to send, the control channel will say “Do you want to send to Johnny?” If the user enters ‘y’, the message will be sent to Johnny. Similarly, when the user is in chat mode and wishes to send a message to the remote user, the local user will hear the same message “Do you want to send to ?”

NOTE: While the commands seem counter-intuitive, the Maze Day children who have chat experience had no problems picking them up. The only users who had difficultly with the commands (such as starting a conversation) were the children who never used a chat client before. This offers some evidence that the controls are not too difficult and can probably be learned quickly. The coolest part was that once I showed a child who never chatted before how to send a message once in a conversation and how to switch between multiple conversations, I could simply stand back and watch them chat $(H t(Bhey needed no further help.

Stopping the Speech

One of the most important functions any screen-reader provides is the ability to stop all current speech. The Audio UI also allows the user to stop all speech. However, a new feature, possibly unique to chat, is the ability to be able to stop all speech that is not related to the center conversation. Hence, if the user is chatting with someone and other messages start to come in, the user has to ability to stop all of them in order to focus on his or her current center conversation.

NOTE: The Maze Day users really liked the option of stopping all speech from the non-current conversation.

Error Feedback

Since there are actions the local user can enter which cannot be completed, the local user is given audio feedback of errors as well. For example, if the user attempts to send a message to Sue, and Sue is currently offline, the local user will hear the control voice give the error. Similarly, other commands that can result in errors, such as trying to move past the starting point of the conversation history (see Conversation History) will also result in the control voice explaining to the user what went wrong.

6. Conversation History

Another aspect of a chat application that is very interesting for three distinct reasons is browsing the conversation history. 1) First and foremost, how does a blind user read the conversation history? Furthermore, a sighted user can use the mouse to quickly jump through the conversation, skipping easily over the conversation blocks of no interest to them. 2) What can be done to make the history browsing more efficient for blind users? Finally, browsing history will invariable require an audio readout of the messages in the history. 3) What is the best approach to minimizing the disruptions of incoming messages while browsing history, yet still allowing the user to know there are outstanding messages?

The answer to the first question requires possibly the least imagination. The obvious solution I adopted is to give user the ability to go backward and forward in history. Each time to user moves through the history, the current message “block” is read to the user. This idea of a “message block” was used to answer the second question. Furthermore, when the user reaches the first (last) message in the history and attempts to move further back (forward), the Audio UI notifies the user of the erroneous operation.

As we hinted earlier, the “message block” idea is used to give the user a more efficient way of browsing the history. We explain what a message block via an example.

Example: Consider a hypothetical chat conversation between Johnny and Alexa.

Johnny: Hi Alexa
Alexa: Hi Johnny
Johnny: I am starving.
Johnny: Would you like to go for dinner?
Alexa: Sure, where would you like to go?
Johnny: I'm not sure, do you have a preference.
Alexa: How about Chinese?
Alexa: or Indian?
Alexa: or McDonald's?
Johnny: Indian sounds good to me.

Suppose Johnny is the local user and would now like to browse the conversation history. If Johnny goes back in history, the first message he hears could be “Alexa says: or McDonald’s?” If he goes back further, the next message he hears could be “Alexa says: or Indian?” Clearly there is something wrong. When Johnny goes back, he should really hear something like this: “How about Chinese? Or Indian? Or McDonald’s?”

A “message block” is defined as a set of messages by a user uninterrupted by any message from any other users. Hence, there are seven message blocks in the above example, the longest one which is Alexa listing different kind of restaurants to go for dinner.

We define a move through a back (forward) move in history as having the effect of the previous (next) message block in the history being read out to the user. This approach attempts to leverage the fact that consecutive messages by a user are often a part of a single larger message. As a result, it offers a more efficient way to go through history for the user of the chat application.

We finally get to the most interesting question regarding conversation history: What happens if the user is browsing the history of a conversation and the other user in this conversation sends a message or messages? The best way to answer this question would be to offer several solutions and then do a user study to see which one the users feel works the best. However, in the relatively short period of time the project was conceived, designed, and developed with, a single author does not have time to perform user studies (possibly at nearby schools). As a result, an approach was picked and implemented by the author. We feel that it is a relatively good solution to the interruptions while browsing conversation histories. The key idea behind our solution is the idea of “snapback scrolling.”

While a user is browsing a conversation history, we accept incoming messages for that conversation, but instead of translating the messages into speech, an unanswered message notification sound begins to play for the center conversation. Note that this is the only time during which the user can hear the unanswered notification sound for the center conversation. As before, there is at most one notification playing even though there may be multiple outstanding unanswered messages. Suppose now that the user, even though he or she is browsing the history, wants to immediately jump to the unanswered messages. We offer a way for the user the perform a “snapback” to the unanswered message regardless of where in the history he or she currently is. Hence, in our example, if while Johnny is hearing the readout of “Johnny says: I’m starving. Do you want to go for dinner?” and Alexa sends a new message, then Johnny can immediately jump to the newest message as soon as he hears the notification sound that a new message has come in. I believe this solution is a further enhancement to efficient history scrolling.

7. Controls

Another difficult aspect of a chat application with respect to blind users is the controls. In fact, the solution to any sort of user input must be carefully analyzed and implemented. This is especially the case of exposing customizable options to the user. One approach to this is to use a Jaws like approach to menus. However, the focus of my project was more the output than the input. The input solution can be considered as a further extension to this project. The user input controls here are simplified as much as possible while still allowing the user to perform all the supported chat functionality.

The controls are shown in Figure 6 and 7 for the main menu and chat modes, respectively. Note that I attempt to reuse as few keys as possible. It is a well-known phenomenon that multiple modes to the same button often confuse the users, which is what I are trying to avoid. Furthermore, the keys that are used in both modes either perform the same functionality (0 cycles through main menu and chat modes) or similar type of functionality (5 is always a reset type of action: in chat mode, it serves to snapback to the current message in the conversation if the user is browsing the history, and in main menu mode, it is used to cancel an ongoing command and reset the chat client into a stable state from which the user can begin any command once again).

Figure 6. Main Menu Mode Controls

Figure 7. Chat Mode Controls

8. Problems Encountered:

This section is meant for anyone who would like to continue working on this project. It outlines some of the difficulties with C# and C# libraries that I encountered and had to create workarounds for.

ArrayList Object:

Suppose an ArrayList is created to store objects of type MyType. MyType has two fields: an integer count, and a string name. If an instance of MyType is in ArrayList and the user attempts to modify its name attribute, at first it appears as if the assignment works. However, as soon as the scope of the current function changes, the new value of name is lost and the old one is re-established. To get around this issue, I had to change the instance of MyType, remove it from the ArrayList, and then reinsert it to the same position it was originally in.

DirectX Audio Library:

The managed DirectX libraries are still in preliminary release stages. As such, it is no surprise that they still have some unresolved problems. One such problem is with the Audio object when it finishes playing a sound file (it is given the sound file on creation). The state of the Audio object should be set to Stopped and it should fire a Stopping event. Unfortunately, neither of these two happen in the release that I was using (April 2005). As a result, I had to do the following: every time I created an audio object, I started a system timer to expire exactly when the audio object stops playing. When this timer fired, I would manually change the state of the audio object that it expired for, and release the file handle the audio object used to open and read the sound file it was given on creation.

Speech to DirectX Data Flow:

The optimum way to use the output of the text-to-speech engine as input to the DirectX Audio object is to buffer the text-to-speech output into a memory buffer which the Audio object reads from. Unfortunately, this did not seem to work well as each time the Audio object would throw an exception about an invalid format whenever it tried to read a memory buffer created by the text-to-speech libraries. As a result, I output the text-to-speech output into an audio file, and I the use the audio file as input to the DirectX Audio object. While in this particular instance, I am not convinced the error was simply in the libraries (it could have been my fault as well), debugging was next to impossible so I chose the simpler file-based solution.

9. Future Work:

Throughout the discussion above, I suggested various directions for future work.

One interesting idea would be to create wrapper classes for existing chat applications like AIM, MSN, and Gaim, so that the Basic and Audio UIs I developed can be used with them. This is an important step as these existing chat applications have already solved all the communication problems, many of which I ignored in my simple chat client – for example, I have no authentication or any ability to traverse firewalls or gateways.

Another extension would be to investigate ways to provide customizable settings to the users of JerryFreeChat. Is a Jaws-like approach the best? Are there some more interesting solutions? One possibility is to try and use voice recognition. The commands would have to be kept pretty simple so that the voice recognition engines available today would hopefully work with any user immediately.

The most important future work that needs to be done is user studies. This application has attempted to solve many issues of a chat application with respect to blind users in novel ways. In order to judge the effectiveness of the solutions used, user studies must be performed.

NOTE: The Maze Day children all really liked the chat client. In fact, after lunch, I came back upstairs and there was a lineup for new and repeat users. All the children and adults (Lee and Donnie) asked when they could use it at home. All of them also want to be able to use the Audio UI with their favorite chat clients like MSN and AIM. As a result, I will continue to work on this project on and off during the summer. When I return to Chapel Hill for the Fall semester, I will put more time into making wrappers and improving the robustness of my chat client so that hopefully, by Christmas, I can ship JerryFreeChat out to the kids who wanted to use it at home.

10. Using the Application

In this section, I outline the requirements for JerryFreeChat and how to start the chat server and chat clients. The chat clients automatically start together with the Audio UI.

Before being able to run JerryFreeChat, the following need to be installed:

  1. .NET Framework 1.1
  2. DirectX 9.0
  3. DirectX SDK (April 2005 or newer)
  4. SAPI 5.1

When running JerryFreeChat, you first need to start the chat server.

To run the JerryFreeChat chat server, enter the following command (using a command line tool) in the directory that contains the JerryFreeChat executable:

JerryFreeChat server

The host IP parameter is the IP of the machine on which the server is running on. Hence, it should be the IP of the local machine. The port parameter is the port on which you would like the server to accept incoming connections.

To run the JerryFreeChat chat client, enter the following command (using a command line tool) in the directory that contains the JerryFreeChat executable:

JerryFreeChat client

The server IP and server port values are those passed as parameters when starting the chat server. The user name parameter is the login name of the user. Please make sure that each chat client is given a unique login name; otherwise, if a chat client attempts to login as a user who is already logged in, the login fails. The client port parameter is the port on which you would like the chat client to listen on for incoming messages.

At the moment, the demo runs fairly well. There are some issues. For some reason, when I used the slow IBM laptop as one of the chat client stations, the application crashes after a while. It seemed to be a problem in the afternoon part of the demo day. There were no problems in the morning. This could have something to do with the fact that the old laptop was running for a long time and started to behave somewhat flaky. However, overall, the chat application performed very well based on the reactions of the users who I demoed it to.

11. References

1 http://www.kidstogether.org/pep-1st.htm

2 Peter Parente’s work.


The documentation for my project is available at: http://www.cs.unc.edu/~sasa/JerryFreeChat/JerryFreeChat-ProjectReport.doc

The in-class demo presentation for my project is available here: http://www.cs.unc.edu/~sasa/JerryFreeChat/JerryFreeChat-Presentation.ppt

The application code is available at: http://www.cs.unc.edu/~sasa/JerryFreeChat/JerryFreeChatCode.zip

The application is available at: http://www.cs.unc.edu/~sasa/JerryFreeChat/JerryFreeChatApplication.zip