Computer Science Students Association
unc chapel hill

Introduction to doing research
home  about us  students  guides 
Introduction to doing research
This is an introduction to research for new graduate students.

Many thanks to the authors Michele Weigle and David Ott. Thank you also to Prof. Ketan Mayer-Patel and Mark Lindsey for their suggestions. The original texts are at: reading, experiments and CVS.

Table of contents:

Hints on Finding and Reading Academic Papers [top]
Not long after you start your graduate studies at UNC, you'll be asked to read your first academic paper. It might be a course professor or your research advisor who asks you to do it, but it's guaranteed to happen.

While it might seem tough going at first, reading papers is one of the fun things about grad school. It's a chance to learn what's going on with research groups elsewhere, and to keep up with new ideas and approaches. As time goes on, you'll find that there's a core body of important papers in your subfield that provide a basis for newer work going on in the present. Knowing this body of work is key to having a sound understanding of your subfield.

Here's a few hints for the uninitiated on academic papers.

Types of Academic Papers

Papers typically come in one of three flavors: a workshop paper, a conference paper, or a journal paper.

Workshop papers consist often of incomplete ideas or work-in-progress reports. Workshops tend to be small, and sometimes just have authors and a select, invited crowd who can give helpful advice. Some workshops are nearly as strong as conferences, though. For example, the International Workshop on Quality of Service (IWQoS) is a strong workshop in networking.

Conference papers are where it's at. Nearly everything good has been published at a conference at some point. IEEE and ACM conferences tend to be fairly good. In contrast, conferences that are regional are often not as good. Such conferences usually have names that include "directional" (e.g., The Southwest Conference on Multimedia) or "geographical" (e.g., Pan-Asian Conference on Multimedia) words in their title. This is not hard-and-fast rule, of course, since some regional conferences are top notch.

Journal articles have the highest quality and are usually very thorough. But it takes a long time for a paper to be journalized. So while the information is thoroughly discussed, the ideas are often no longer new any more. Journal papers also tend to be fairly long. Nearly every journal article is based on some conference papers, so it's often better to find the original conference papers to quickly digest the main ideas.

How do I get a copy of a particular paper?

Life is good. Getting hold of a particular paper in Computer Science nowadays is usually very easy because of the Web.

I can nearly always find a paper by going to http://www.google.com and searching on the title in quotes. For example, if I'm looking for a paper entitled "Architectural Considerations for a New Generation of Protocols" by D. Clark and D. Tennenhouse, I'll search on "architectural considerations for a new generation".

The resulting hits are instructive. A very large number of papers in Computer Science are found in the database at http://citeseer.org, run by the NEC Research Institute in Princeton, New Jersey. This database is great because it not only gives bibliographic information and lets you download the paper in various formats, it also contains information on which papers cite this paper as a reference. This lets you sleuth for related papers and get a sense of the importance of this paper to a larger body of work in the area.

Another common type of hit is the homepage of one of the paper's authors. Most researchers in Computer Science have some form of home page which lists their publications allows you to download them in PostScript or PDF format. This is great for getting to know the work associated with a particular individual.

Similarly, research groups often have a publications page that makes available various papers published by its members. For the above example, the Advanced Networking Architecture Group at MIT lists this paper on their publications page.

If you're lucky, the journal, conference, or workshop that published the paper put it online for you to download. That's rather lucky, however, as many conferences and journals charge a fee before you can access their papers online.

UNC libraries have a number of journal publications online at their e-journal site. This is a great way to browse papers when you're not sure what you're looking for or you're just trying to stay up-to-date with recent conference proceedings.

Finally, it happens every once in a while that the paper you need to read is so old, it isn't available in electronic format on the Web. In that case, you'll need to walk over to the Brauer Math-Physics Library in Phillips Hall (next door to Sitterson Hall), find the publication and photocopy it. Note that conference and workshop proceedings are usually kept in the stacks, while journals are kept in the reference section.

Which Papers to Read, and How to Read Them

Reading a paper in its entirety is fairly serious time commitment. The situation seems particularly daunting for new students who feel overwhelmed by the number of papers in their subfield. How does one come up to speed?

Take heart. Reading every paper in its entirety within in your subfield isn't necessary.

Professor Ketan Mayer-Patel suggests a series of steps for deciding whether to read a paper, and how much time to commit to it. The idea is to start at the top of the list and proceed to the next step only if the paper warrants it.

  1. Guess about the paper's relevance by the frequency of its citation. This can be done by looking at http://citeseer.org, or observing the bibliographies of other important papers informally. If the paper isn't very relevant, you'd be better off spending your time on one that is.
  2. Check the title, where and when it was published, and who its authors are. Papers by key people and in key conferences and journals give the paper a higher priority. Recent work should take precedence over older, outdated work. (Although sometimes it's important to know seminal papers from the past in your subfield.)
  3. Read the abstract and the first page. Does their problem approach make sense and address the issue you're interested in? Don't waste your time reading a paper in detail if it lacks applicability to your problem and approach.
  4. Read the section headings. What direction do they go with their approach, and what are their main contributions?
  5. Look at the pictures. What do various diagrams and plots actually show?
  6. Skim the paper. Look for the main conclusions and contributions. Avoid spending time on proofs, detailed derivations, etc.

  7.  

     

    NOTE: Everything up to this point can be done in less than 10 minutes.
     

  8. If the paper still seems valuable after the above steps, read it carefully.

  9.  
Hints for maintaining references to papers [top]
During the course of grad school, you'll read lots of papers. You'll even want to remember something about many of these papers. Starting an annotated bibliography early will help you when you're ready to write a paper, your proposal, the related work section of your dissertation, anything.

You can either start a Word document with references and comments, or use BibTeX. If you have any inkling that you want to write papers and/or your dissertation in LaTeX, use BibTeX for your references. (Here's my BibTeX setup.) If you want to use Word, several folks at UNC use EndNote for organizing bibliographies (it's not free, but is sold at Student Stores).

Experimental Research Hints [top]

Write It Down!

At some point, you will run experiments and later need to reference it in a paper or progress report (or someone else in your research group needs to understand the experiment). You will be very happy if you have logged some of the following: Lab notebooks are valuable, but web pages are great places to record this type of information. Not only can you search for keywords, but you can refer others to your log and link in other documents (plus, if it's in AFS-space, it's backed-up). Often people will put pictures of graphs and other results on their experiment web page. Here's my online experiment log.

Essential Tools

Archival Data Storage is Plentiful

After you've run lots of experiments, you may find you need additional disk space. ATN offers a mass storage system that uses SAM-FS. After your data has been in the mass storage system for 14 hours, it is backed up to tape. Note that this system is only for long-term storage of seldom-used files.

Recommending Reading

Hints for using the CVS software version control [top]
CVS ("Concurrent Versions System") is a version control system for software development projects. It allows you to keep change histories on individual source files, and to tag a particular snapshot of all files as belonging to the same release. It also supports other operations useful in the context of team programming.

CVS is part of the GNU Project and is freely distributed in the open source community. (GNU does many commonly used UNIX utilities like emacs, make, gcc, and gdb. See their "Free Software Directory" for a complete list.)

Why use CVS?

CVS solves a number of problems in the context of software development.

Some basic commands

Here are a few commands to give you an idea of how to use CVS. This list is hardly complete, but is meant simply to illustrate several typical operations. (See the links in the next section for a more detailed treatement of CVS commands.)

First, set your environment variable to your CVS repository directory. This is where CVS will store source files, differences, version information, etc.

setenv CVSROOT=mycvsrootdir

To create a source repository, cd to your source directory and use:

cvs import -m MESSAGE MODULE VENDER_TAG RELEASE_TAG

Once a repository is created, you can cd to any directory and "checkout" the source tree. This will copy it into your directory.

cvs checkout [-r REV] [-d DIR] MODULE

To add source files to a project:

cvs add -m MESSAGE file1 file2 file3

Remember that "checkout" in CVS means copying a source tree into your current working directory. Unlike some other source control systems, there is no concept of checking a file out before working on modifications in CVS. Instead, you simply make the modifications, and then "commit" the changes using:

cvs commit

To create a snapshot, you tag all files in the module:

cvs tag TAG_NAME

To look at a file's history:

cvs history file1

Useful CVS Links

GNU CVS manual
One of many online tutorials
CVS FAQ
CVS FAQ page written by UNC's DiRT research group

University of North Carolina at Chapel Hill
Computer Science, Sitterson Hall
Chapel Hill, NC 27599-3175 USA
Server Manager: webmaster@cs.unc.edu
Content Manager: cssa@cs.unc.edu
Last modified:
.