COMP 790-090 Research Seminar (BCB 713 Module) Spring 2009
Data Mining: Concepts, Algorithms, and Applications
| ANNOUNCEMENT |
|
April 29th: We will start the 5 minute oral presentation at noon. However, you should set up your poster between 11AM and 12noon. Easels will be available at 11AM. Food and drinks are provided.
April 29th: Final projects are due.
February 9th: Some project examples for COMP 790-90 are available. You are also welcome to suggest your own.
February 3rd: The deadline for project proposal is extended to February 17th. The submission is by email.
February 3rd: Last day to select paper presentation.
January 27th: Recommended papers for presentation are available.
January 20th: Today's class is canceled.
January 13th: The class will be held in SN 011. |
With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations and anomalies in massive databases, and is a highly inter-disciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing. This seminar will provide an introductory survey of the main topics (including and not limited to classification, regression, clustering, association rules, trend detection, feature selection, similarity search, data cleaning, privacy and security issues, and etc.) in data mining and knowledge discovery as well as a wide spectrum of data mining applications such as bioinformatics, E-commerce, environmental study, financial market study, multimedia data processing, network monitoring, social service analysis.
The first half of the semester will cover the principles and algorithms of data mining, while emphasis will be on performance issues and applications of data mining during the second half of the semester. The lectures are based on a collection of journal and conference papers and book chapters. A number of guest lectures by faculty members in other fields or other departments will be also scheduled.
Each student in COMP 790-90 will be expected to present a paper and lead the discussion following his/her presentation and do a project on selected topics. Students in GNET 713 need to attend the first nine lectures and do a project on selected topics. There will be neither homework nor exam.
Credit Hours: 3 for COMP 790-90, 1 for BCB 713
Location: SN 011
Time: TR 11AM-12:15PM
URL: http://www.cs.unc.edu/Courses/comp790-090-s09
Prerequisite: None
Some background in algorithms, data structures, statistics, machine learning, artificial intelligence, and databases is helpful.
References: No required textbook
[1] Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann, 2006. (ISBN:1-55860-901-6)
[2] Introduction to Data Mining, by Tan, Steinbach, and Kumar, Addison Wesley, 2006. (ISBN:0-321-32136-7)
[3] Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press, 2001. (ISBN:0-262-08290-X)
[4] The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer, 2001. (ISBN:0-387-95284-5)
[5] Mining the Web --- Discovering Knowledge from Hypertext Data, by Chakrabarti, Morgan Kaufmann, 2003. (ISBN:1-55860-754-4)
Useful Links:
Schedule
| DATE |
LECTURE NOTES |
READING |
PAPER PRESENTATION |
PROJECT |
| Jan. 13 |
Introduction [PDF][PPT]
Association Rule (Part I) [PDF][PPT] |
Association Rules I |
|
|
| Jan. 15 |
Association Rule (Part II) [PDF][PPT] |
Association Rules II |
|
|
| Jan. 20 |
canceled |
|
|
|
| Jan. 22 |
Association Rule (Part III) [PDF][PPT] |
Association Rules III |
|
|
| Jan. 27 |
Association Rule (Part IV) [PDF][PPT]
Clustering (Part I) [PDF][PPT] |
Association Rules IV
Clustering I |
Paper Recommendation |
|
| Jan. 29 |
Clustering (Part II) [PDF][PPT] |
Clustering II |
|
|
| Feb. 3 |
Clustering (Part III) [PDF][PPT] |
Clustering III |
Last Day to Select Presentation Paper |
|
| Feb. 5 |
Clustering (Part IV) [PDF][PPT] |
Clustering IV |
|
|
| Feb. 10 |
Classification (Part I) [PDF][PPT] |
Classification I |
|
|
| Feb. 12 |
Classification (Part II) [PDF][PPT] |
Classification II |
|
|
| Feb. 17 |
Sequence Clustering [PDF][PPT] |
Sequence Clustering |
|
Project Proposal Due |
| Feb. 19 |
Bi-Clustering I [PDF][PPT] |
Bi-Clustering I |
|
|
| Feb. 24 |
Bi-Clustering II [PDF][PPT] |
Bi-Clustering II |
|
|
| Feb. 26 |
Mining Complex Data I [PDF][PPT] |
Mining Complex Data I |
|
|
| Mar. 3 |
Mining Complex Data II [PDF][PPT] |
Mining Complex Data II |
|
|
| Mar. 5 |
Bacon, Kelli [PPT]
Neyer, Mark Patrick [PDF] |
Microscopic evolution of social networks
Influence and correlation in social networks |
|
|
| Mar. 10 |
Spring Break! |
|
|
|
| Mar. 12 |
Spring Break! |
|
|
|
| Mar. 17 |
Liu, Yi [PPT]
Zhang, Zhaojun [PDF]
|
Learning classifiers from only positive and unlabeled data
Structured metric learning for high dimensional problems
|
|
|
| Mar. 19 |
Desmarais, Bruce Albert [PDF]
Wen, Xiaoyang [PDF]
|
Dynamic Social Network Analysis using Latent Space Models
The structure of information pathways in a social communication network |
|
|
| Mar. 24 |
Hasan, Shaddi Husein [PDF]
Libonati, Alana Marie [PDF]
|
Combinational collaborative filtering for personalized community recommendation
Can complex network metrics predict the behavior of NBA teams? |
|
|
| Mar. 26 |
no class |
|
|
|
| Mar. 31 |
Carter, Jason Lamont[PPT]
Cochran, Robert Anderson [PDF]
|
Examining Task Engagement in Sensor-Based Statistical Models of Human Interruptibility
Anomaly pattern detection in categorical datasets |
|
|
| Apr. 2 |
Semi-Supervised Learning [PDF][PPT] |
|
|
|
| Apr. 7 |
Huang, Shunping [PPT]
|
Simultaneous tensor subspace selection and clustering: the equivalence of high order svd and k-means clustering |
|
|
| Apr. 9 |
Selzo, Chris
White, Andrew Maxwell
|
The cost of privacy: destruction of data-mining utility in anonymized data publishing
Composition attacks and auxiliary information in data privacy |
|
|
| Apr. 14 |
Hopper, Steven Daniel [PPT]
Mckenzie, Ryan Nicholas [PPT]
|
Mining search engine query logs via suggestion sampling
Finding relevant patterns in bursty sequences |
|
|
| Apr. 16 |
Li, Peng [PPT]
Spensky, Chad Samuel [PPT]
|
Anonymizing transaction databases for publication
Entity categorization over large document collections |
|
|
| Apr. 21 |
Bethea, Darrell Joseph [PDF]
Brian Marco
|
Outlier-robust clustering using independent components
Building semantic kernels for text classification using wikipedia
|
|
|
| Apr. 23 |
O'meara, Matthew James
Singh, Darshan
|
Mining significant graph patterns by leap search
Fastanova: an efficient algorithm for genome-wide association study |
|
|
| Apr. 29 |
Project Poster Presnetation
12noon - 3:30PM |
|
|
Final Project Due |
Wei Wang