COMP 790-090 Research Seminar (BCB 713 Module) Spring 2009
Data Mining: Concepts, Algorithms, and Applications

ANNOUNCEMENT
  • April 29th: We will start the 5 minute oral presentation at noon. However, you should set up your poster between 11AM and 12noon. Easels will be available at 11AM. Food and drinks are provided.
  • April 29th: Final projects are due.
  • February 9th: Some project examples for COMP 790-90 are available. You are also welcome to suggest your own.
  • February 3rd: The deadline for project proposal is extended to February 17th. The submission is by email.
  • February 3rd: Last day to select paper presentation.
  • January 27th: Recommended papers for presentation are available.
  • January 20th: Today's class is canceled.
  • January 13th: The class will be held in SN 011.
  • With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations and anomalies in massive databases, and is a highly inter-disciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing. This seminar will provide an introductory survey of the main topics (including and not limited to classification, regression, clustering, association rules, trend detection, feature selection, similarity search, data cleaning, privacy and security issues, and etc.) in data mining and knowledge discovery as well as a wide spectrum of data mining applications such as bioinformatics, E-commerce, environmental study, financial market study, multimedia data processing, network monitoring, social service analysis.

    The first half of the semester will cover the principles and algorithms of data mining, while emphasis will be on performance issues and applications of data mining during the second half of the semester. The lectures are based on a collection of journal and conference papers and book chapters. A number of guest lectures by faculty members in other fields or other departments will be also scheduled.

    Each student in COMP 790-90 will be expected to present a paper and lead the discussion following his/her presentation and do a project on selected topics. Students in GNET 713 need to attend the first nine lectures and do a project on selected topics. There will be neither homework nor exam.


    Credit Hours: 3 for COMP 790-90, 1 for BCB 713
    Location: SN 011
    Time: TR 11AM-12:15PM
    URL: http://www.cs.unc.edu/Courses/comp790-090-s09
    Instructor: Wei Wang
    Office: SN 329
    Email: weiwang@cs.unc.edu
    Voice: 1 (919) 962-1744
    Office Hour: By Appointment
    TA: Catherine Welsh
    Office: SN 323
    Email: cwelsh@cs.unc.edu
    Voice: 1 (919) 962-1983
    Office Hour: By Appointment

    Prerequisite: None

    References: No required textbook

    Useful Links:

    Schedule

    DATE LECTURE NOTES READING PAPER PRESENTATION PROJECT
    Jan. 13 Introduction [PDF][PPT]
    Association Rule (Part I) [PDF][PPT]
    Association Rules I
    Jan. 15 Association Rule (Part II) [PDF][PPT] Association Rules II
    Jan. 20 canceled
    Jan. 22 Association Rule (Part III) [PDF][PPT] Association Rules III
    Jan. 27 Association Rule (Part IV) [PDF][PPT]
    Clustering (Part I) [PDF][PPT]
    Association Rules IV
    Clustering I
    Paper Recommendation
    Jan. 29 Clustering (Part II) [PDF][PPT] Clustering II
    Feb. 3 Clustering (Part III) [PDF][PPT] Clustering III Last Day to Select Presentation Paper
    Feb. 5 Clustering (Part IV) [PDF][PPT] Clustering IV
    Feb. 10 Classification (Part I) [PDF][PPT] Classification I
    Feb. 12 Classification (Part II) [PDF][PPT] Classification II
    Feb. 17 Sequence Clustering [PDF][PPT] Sequence Clustering Project Proposal Due
    Feb. 19 Bi-Clustering I [PDF][PPT] Bi-Clustering I
    Feb. 24 Bi-Clustering II [PDF][PPT] Bi-Clustering II
    Feb. 26 Mining Complex Data I [PDF][PPT] Mining Complex Data I
    Mar. 3 Mining Complex Data II [PDF][PPT] Mining Complex Data II
    Mar. 5 Bacon, Kelli [PPT]
    Neyer, Mark Patrick [PDF]
    Microscopic evolution of social networks
    Influence and correlation in social networks
    Mar. 10 Spring Break!
    Mar. 12 Spring Break!
    Mar. 17 Liu, Yi [PPT]
    Zhang, Zhaojun [PDF]
    Learning classifiers from only positive and unlabeled data
    Structured metric learning for high dimensional problems
    Mar. 19 Desmarais, Bruce Albert [PDF]
    Wen, Xiaoyang [PDF]
    Dynamic Social Network Analysis using Latent Space Models
    The structure of information pathways in a social communication network
    Mar. 24 Hasan, Shaddi Husein [PDF]
    Libonati, Alana Marie [PDF]
    Combinational collaborative filtering for personalized community recommendation
    Can complex network metrics predict the behavior of NBA teams?
    Mar. 26 no class
    Mar. 31 Carter, Jason Lamont[PPT]
    Cochran, Robert Anderson [PDF]
    Examining Task Engagement in Sensor-Based Statistical Models of Human Interruptibility
    Anomaly pattern detection in categorical datasets
    Apr. 2 Semi-Supervised Learning [PDF][PPT]
    Apr. 7 Huang, Shunping [PPT] Simultaneous tensor subspace selection and clustering: the equivalence of high order svd and k-means clustering
    Apr. 9 Selzo, Chris
    White, Andrew Maxwell
    The cost of privacy: destruction of data-mining utility in anonymized data publishing
    Composition attacks and auxiliary information in data privacy
    Apr. 14 Hopper, Steven Daniel [PPT]
    Mckenzie, Ryan Nicholas [PPT]
    Mining search engine query logs via suggestion sampling
    Finding relevant patterns in bursty sequences
    Apr. 16 Li, Peng [PPT]
    Spensky, Chad Samuel [PPT]
    Anonymizing transaction databases for publication
    Entity categorization over large document collections
    Apr. 21 Bethea, Darrell Joseph [PDF]
    Brian Marco
    Outlier-robust clustering using independent components
    Building semantic kernels for text classification using wikipedia
    Apr. 23 O'meara, Matthew James
    Singh, Darshan
    Mining significant graph patterns by leap search
    Fastanova: an efficient algorithm for genome-wide association study
    Apr. 29 Project Poster Presnetation
    12noon - 3:30PM
    Final Project Due

    Wei Wang