CS610 Spring 2007 Calendar
(Some links will work only if
Javascript is enabled.)
- 2007-01-05
- Administrative details
- Course overview
- Tokenization, compound words, stemming
- Building a compound word dictionary
- The term-document matrix
- Generative models, perplexity, curse of dimensionality
- Multivariate binary, multinomial, Poisson models
- Word burstiness, non-parametric models
- Word burstiness, Dirichlet hypergenerators
- Text search as translation
- Translation models via term-document random walk
- Modeling inter-term dependencies using subspaces
- Latent semantic indexing (LSI)
and principal component analysis (PCA)
- Modeling multiple topic clusters
- (Single cause) mixture model and
Expectation maximization
- SVD on almost low-rank matrices
- Limitations of single-cause mixture models
- Multiple cause mixture models, aspect model, latent
Dirichlet topic model
- Complete LDA model estimation
- Boolean queries and the inverted index
- Inverted index construction, compression, updates
2007-01-26
- Republic Day
- 2007-01-30
- Holiday
- Relevance ranking
- Recall, precision, F1, breakeven, NDCG
- Vector space model and TFIDF
- Fast top-k search in the vector space model
- Probabilistic retrieval
- Pairwise similarity search avoiding the quadratic barrier
- Minhash and Jaccard
- Shingling for approximate duplicate detection
TechReport
- Cosine similarity and random hyperplanes
- Applications: find-similar, mirror site detection
- Embedding, visualizing, clustering a document collection
- Multidimensional scaling
- k-means, self-organizing maps
- Bottom-up agglomerative clustering and dendrograms
- Document classification
- Training, testing, cross-validation, feature selection
- Generative probabilistic classifiers via Bayes rule
- System entropy minimization
- Conditional probabilistic classifiers, logistic regression,
maximum entropy
- Discriminative classifier: support vector machine
- 2007-02-16
- Holiday
- Detour: Feature selection
KohaviJ1997wrapper
- Discriminative and max-margin classification
Joachims1998textSVM1.ps.gz
- Parameter shrinkage in a topic hierarchy
- SVMs for topic hierarchies
- Transductive SVMs using graph Laplacian
- 2007-02-20
- Midterm week
- 2007-02-23
- Midterm week
- 2007-02-25
- Midterm exam
- 2007-03-02
- Adding some random links
- Existence of bipartite cores and motivation for other models
- The copying model and some basic analysis
- Searching for bipartite communities
- Searching for communities via maxflow/mincut
- Sampling the Web graph using random walks
HenzingerHMN2000
- Relation between link proximity
and content proximity
ChakrabartiJPP2002topics
-
- Pagerank
and HITS
- Accelerating Pagerank computation
- Stable HITS variants: PHITS and SALSA
LempelM2000salsa
- Topic-sensitive Pagerank
- Personalized Pagerank
- Linearity and decomposition
- Personalized Pagerank applications: Page staleness
,
Link spam and TrustRank
Topical TrustRank,
ObjectRank
- Sparse teleports and the push algorithm
- 2007-03-23
- 2007-03-30
- 2007-04-03
- 2007-04-06
- Holiday
- 2007-04-10
-
Guest lecture by Manish Gupta and Alekh Agarwal.
- 2007-04-13
-
Lecture canceled. Here is a grab-bag of papers to read.
- 2007-04-21
- Final exam
A1 14:30--17:30