- 2006-01-05
- Course overview
- Engineering a large-scale crawler

- 2006-01-09
- Crawl frontier management
- Post-crawl near-duplicate detection
- In-flight mirror site/host detection

- 2006-01-12
- Lecture outline
- Early front-end processing: tokenizing, stemming
- Inverted index contruction, compression
- Boolean queries

- 2006-01-16
- Lecture outline
- Motivation for ranking responses
- Vector space model, TF and IDF
- Probabilistic retrieval:

- 2006-01-19
- Using language modeling in search:

- 2006-01-23
- Overview of Expectation maximization
- Language models and random walks,

- 2006-01-26
- Guest lecture by Prabhakar Raghavan (Query incentive networks) Kohli-KReSIT 15:30
- 2006-01-30
- Term-document random walks
- Random walks, eigenvectors and singular value decomposition

- 2006-02-02
- SVD demo
- SVD and tests of word relatedness
- Using SVD for document expansion

- 2006-02-06
- SVD as an approximation to selection

~~2006-02-09~~- Holiday, guest lecture cancelled
- 2006-02-13
- Complete modeling using Dirichlet distribution
- Application of Dirichlet distributions to corpus modeling:

- 2006-02-16
- Latent Dirichlet, continued
- Simpler variants like Canny's

- 2006-02-18
- Midterm exam 14:30--
~~16:30~~17:00 A1/A2 ~~2006-02-20~~- Midterm week
~~2006-02-23~~- Midterm week
- 2006-02-27
- Midterm review and lecture outline
- Aggregate statistics of the Web graph: power laws galore
- Reachability, connectivity, bow-tie
- The Connectivity Server and link graph compression

- 2006-03-02
- Explaining power-law degree by generative models
- Barabasi-Albert preferential attachment
- Adding

- 2006-03-06
- Existence of bipartite cores and motivation for other models
- The and some basic analysis.

- 2006-03-09
- Sampling the Web using random walks: directed walk,
- Collecting
- Crawling communities using ideas

- 2006-03-13
- HITS and Pagerank
- and personalized Pagerank
- How to spam HITS and Pagerank
- Sensitivity to topology and in the graph

- 2006-03-16
- More about HITS and Pagerank stability
- Coupled random walks and their basic analysis

- 2006-03-20
- 2006-03-23
- Analysis of
- computation

- 2006-03-27
- Pagerank as
- Teleport dependent on query keywords---ObjectRank

~~2006-03-30~~- Holiday
- 2006-04-03
- system overview
- Pagerank personalization via teleport

- 2006-04-04
- SimRank and graph fingerprints (Srivatsa Iyengar)

- 2006-04-06
- Survey of link analysis and concluding remarks
- TFIDF scoring, quit and continue
- Fagin's TA algorithm and

- 2006-04-10
- XML and text search
- Course feedback
- Similarity joins and
- XML basics, querying XML+text:

- 2006-04-12
- Bootstrapping relations from raw text: , , , ,
- Lecture outline and course summary

- 2006-04-13
- Homework review by TAs (if there is enough interest)
- 2006-04-18
- Final exam
14:30--
~~17:30~~18:00 A1/A2