COMP 790-090 Research Seminar (GNET 713 BCB Module) Spring 2008
Data Mining: Concepts, Algorithms, and Applications

ANNOUNCEMENTS
  • April 16th: The project report of COMP790-90 is extended to 11:59PM April 29 Tuesday. The project report deadline for GNET 713 remains unchanged.
  • April 16th: The project presentation of COMP 790-90 will take place Monday April 28th 4-8PM in SN 115.
  • March 4th: The paper presentation should contain the following components:
    • Background information
    • Problem studied in the paper
    • Previous approaches to this problem and their pros and cons
    • Proposed method
    • Experiments and applications
    • Discussions: your thought, open questions, limitations, novel applications
    • References
  • February 10th: The project proposal should contain the following items:
    • a description of the problem you propose to study
    • existing approaches to this problem, and their pros and cons
    • your approach and its potential advantage
    • expected outcome
    • evaluation plan
  • February 3rd: The deadline for project proposal is extended to Feb. 12th (Tuesday).
  • February 3rd: Some example projects are available.
  • January 22th: Recommendations for paper presentation are available.
  • January 10th: The class will be held in SN 011.
  • With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations and anomalies in massive databases, and is a highly inter-disciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing. This seminar will provide an introductory survey of the main topics (including and not limited to classification, regression, clustering, association rules, trend detection, feature selection, similarity search, data cleaning, privacy and security issues, and etc.) in data mining and knowledge discovery as well as a wide spectrum of data mining applications such as bioinformatics, E-commerce, environmental study, financial market study, multimedia data processing, network monitoring, social service analysis.

    The first half of the semester will cover the principles and algorithms of data mining, while emphasis will be on performance issues and applications of data mining during the second half of the semester. The lectures are based on a collection of journal and conference papers and book chapters. A number of guest lectures by faculty members in other fields or other departments will be also scheduled.

    Each student in COMP 790-90 will be expected to present a paper and lead the discussion following his/her presentation and do a project on selected topics. Students in GNET 713 need to attend the first nine lectures and do a project on selected topics. There will be neither homework nor exam.


    Credit Hours: 3 for COMP 790-90, 1 for GNET 713
    Location: SN 011
    Time: TR 11AM-12:15PM
    URL: http://www.cs.unc.edu/Courses/comp790-090-s08
    Instructor: Wei Wang
    Office: SN 329
    Email: weiwang@cs.unc.edu
    Voice: 1 (919) 962-1744
    Office Hour: By Appointment

    Prerequisite: None

    References: No required textbook

    Useful Links:

    Schedule

    DATE LECTURE NOTES READING PAPER PRESENTATION PROJECT
    Jan. 10 Introduction [PDF][PPT]
    Association Rule (Part I) [PDF][PPT]
    Association Rules I
    Jan. 15 Association Rule (Part II) [PDF][PPT] Association Rules II
    Jan. 17 Association Rule (Part III) [PDF][PPT] Association Rules III
    Jan. 22 Association Rule (Part IV) [PDF][PPT] Association Rules IV Recommendations
    Jan. 24 Clustering (Part I) [PDF][PPT] Clustering I
    Jan. 29 Clustering (Part II) [PDF][PPT] Clustering II
    Jan. 31 Clustering (Part III) [PDF][PPT] Clustering III Last Day to Select Presentation Paper
    Feb. 5 Classification (Part I) [PDF][PPT] Classification I
    Feb. 7 Classification (Part II) [PDF][PPT] Classification II
    Feb. 12 Sequence Clustering [PDF][PPT] Sequence Clustering Project Proposal Due
    Feb. 14 BiClustering (Part I) [PDF][PPT] Bi-Clustering I
    Feb. 19 BiClustering (Part II) [PDF][PPT] Bi-Clustering II
    Feb. 21 Mining Comple Data (Part I) [PDF][PPT] Mining Complex Data I
    Feb. 26 Mining Comple Data (Part II) [PDF][PPT] Mining Complex Data II
    Feb. 28 Semi-supervised Learning [PDF][PPT]
    Mar. 4 Probabilistic Graphical Models [PDF][PPT]
    Mar. 6 Jens Rantil [PDF] [ODP] [PPT] Tracking multiple topics for finding interesting articles
    Mar. 11 Spring Break!
    Mar. 13 Spring Break!
    Mar. 18 Xin Huang [PDF] [PPT] Detecting anomalous records in categorical datasets
    Mar. 20 Eric La Force [PDF] [PPT] Show me the money!: deriving the pricing power of product features by mining consumer reviews
    Mar. 25 Tao Yu [PDF] [PPT] Detecting time series motifs under uniform scaling
    Mar. 27 Stephan Altmueller [PDF] Webpage understanding: an integrated approach
    Apr. 1 no class
    Apr. 3 Man Lou [PDF] [PPT] Efficient incremental constrained clustering
    Apr. 8 no class
    Apr. 10 no class
    Apr. 15 Ning Jin [PDF] [PPT] Association analysis-based transformations for protein interaction networks: a function prediction case study
    Apr. 17 Vishnu Konda [PDF] [PPT] SCAN: a structural clustering algorithm for networks
    Apr. 22 Ram Kumar [PDF] [PPT] From frequent itemsets to semantically meaningful visual patterns
    Apr. 24 Stephen Olivier [PDF] Parallel Mining of Frequent Closed Patterns:Harnessing Modern Computer Architectures
    Apr. 25 GNET 713 Final Project Due
    Apr. 28 Eric La Force
    Jens Rantil
    Man Lou
    Ning Jin
    Ram Kumar
    Stephan Altmueller
    Stephen Olivier
    Tao Yu
    Vishnu Konda
    Xin Huang
    4PM in SN 115 COMP 790-90 Final Project Presentation
    Apr. 29 COMP 790-90 Final Project Due

    Wei Wang