CS635 Web Search and Mining
Spring 2009 Calendar

Week	Date, slides	Summary, papers
1	2009/01/02 Introduction	Course introduction (guest lecture by Ganesh Ramakrishnan)
2	2009/01/06	Tokenization, compound words, stemming Building a compound word dictionary The term-document matrix Generative models for the term-document matrix Multivariate binary, multinomial, Poisson models
2	2009/01/09	Word burstiness, non-parametric models Word burstiness, Dirichlet word generators Modeling multiple topic clusters in the term document matrix (Single cause) mixture model and expectation maximization Limitations of single-cause mixture models
3	2009/01/13	Multiple cause mixture models, aspect model Latent Dirichlet topic model Basic and positional inverted index construction Space taken by dictionary and posting lists Compressing posting lists -- gamma code, skip pointers Compressing the dictionary -- front coding
3	2009/01/16	Recall, (interpolated) precision, F1, break-even Area under ROC curve, pair preference violation MRR, MAP, NDCG, BPREF Vector space model: term frequency, inverse document frequency BM25
4	2009/01/20	Query processing in IR: How to order and scan postings Fagin's threshold algorithm and probabilistic top-k Index caching Text search as translation Probabilistic models for information retrieval Translation models via term-document random walk
4	2009/01/23 Text clustering	Latent semantic indexing (LSI/SVD) Applying PLSI/aspect model to search Nonnegative matrix factorization , Coclustering and cross-associations Cluster hypothesis in IR Clustering and corpus visualization techniques k-Means and self-organizing maps Hierarchical agglomerative clustering, dendrograms, scatter-gather Multidimensional scaling, FastMap About dynamic clustering, Vivisimo, Clusty, Kosmix
5	2009/01/27 Similarity-based query processing	Find-similar search, avoiding the quadratic bottleneck Jaccard similarity over shingles Pairwise and min-wise hash functions and families
5	2009/01/30 Similarity search, continued	Reducing random bits needed to sample permutations Cosine similarity Fast mirror site detection Random projections and find-similar with cosine similarity Locality sensitive hashing
6	2009/02/03 Labeling feature vectors	Labeling tasks: examples and motivation Document representation for labeling --- are corpus models adequate? Art of feature vector design Probabilistic labeling Joint models, naive Bayes Conditional model, logistic regression Entropy model
6	~~2009/02/06~~	Lecture postponed
7	2009/02/10 and ranking	Discriminative labeling Support vector machines Dealing with topic hierarchies Learning to rank -- representation, training, testing Itemwise, pairwise, listwise training paradigms Itemwise: Ordinal regression
7	2009/02/12 Learning to rank	Itermwise: McRank (Algorithm 6) Pairwise: RankSVM and RankNet
7	2009/02/13 Learning to rank	Pairwise: RankBoost Listwise: SoftRank , RankAUC , SVMmap , DORM , SVMndcg, SVMcombo
8	2009/02/15 Midterm	Midterm exam, SIC301, 9:30--12:30. Exam with some solutions.
9	2009/02/24 Social networks	Midterm review Intro to social networks Hyperlinks: the first online social network? And now, Wikipedia, Orkut, Flickr, LinkedIn, Facebook, Twitter, Blogspot, Wordpress, ... Social networks as entity-relationship graph data models
9	2009/02/27 Social networks	Power laws everywhere Centrality and prestige Diameter, hop-plots, eigenvalues Bow-tie
10	2009/03/03 Social networks	Preferential attachment Winner does not take all Copying model
10	2009/03/06 Social networks	Compression and storage of graphs , Connectivity Server Generating realistic social networks PageRank
11	2009/03/13 Random walks	Random walks as a way to sample the Web HenzingerHMN2000sample.ps Accelerating PageRank C-H Theorem Large-scale computation
12	2009/03/17 PageRank	PageRank linearity and decomposition Topic-sensitive and personalized PageRank Page staleness, link spam, trust/mistrust, viral marketing, clickthrough mining
12	2009/03/20 PageRank	Asynchronous push algorithms Alternative decay profiles Score and rank stability of PageRank Maxent view of PageRank
13	2009/03/24 HITS	Bipartite reinforcement: HITS Learning to rank in graph models NetRank
13	2009/03/27	Holiday
14	2009/03/31 HITS, Crawling	Score and rank stability of HITS and related algorithms PHITS and SALSA Using host graph, page content, HTML layout, anchor text Basic crawler plumbing: DNS, frontier, work pool, concurrency, page storage Frontier prioritization issues and considerations
14	2009/04/03 Information extraction	Named entity recognition: HMM, CRF NE disambiguation Open-domain is-a relation harvesting: Hearst patterns , KnowItAll Binary relationship recognition: DIPRE , Snowball , dependency parsing Open-domain binary relation harvesting: TextRunner Extracting records from marked-up sources
15	2009/04/08 Semistructured search	Convergence of DB+XML and IR+Web search WHIRL XIRQL and ELIXIR XQuery, TeXQuery DBXplorer , BANKS , DISCOVER DBPedia, HBase, FreeBase, YAGO and NAGA
WW	2009/04/26 Final	Final exam, SIC301, 9:30--13:00. Exam with some solutions.

CS635 Web Search and Mining Spring 2009 Calendar

CS635 Web Search and Mining
Spring 2009 Calendar