CS728
Organization of Web Information
Spring 2010 Calendar

Week Date, slides Summary, papers
1 2010/01/05
  • Guest lecture on using clicks for community-driven search, by Vijay Krishnan.
  • Course summary and organization
  • Finer-grain artifacts the Web: markups, token sequences, user activity
  • What forms of structure we want to identify: records, entities, attributes, relations.
  • What techniques to use: rule-based, statistical learning, closed-domain, open-domain, etc.
  • Why do this---use in search.
1 2010/01/08
  • Regular patterns in (1d) tag sequences
  • String edit distance
  • Multiple string alignment
  • Tree edit distance and alignment
2 2010/01/12
2 2010/01/14 (Guest lecture by Prof. Ganesh)
  • Rule-based sequence labeling
3 2010/01/19
  • Feature representation for tokens
  • The B-I-O state trick and features
  • MEMM
  • CRF and training
  • Quick intro to max-margin structured learning
3 2010/01/29 (Guest lecture by Prof. Sunita)
4 2010/02/02 (Guest lecture by Prof. Sunita)
4 2010/02/05 (Guest lecture by Prof. Sunita)
  • Inference and Training with sequential CRFs
  • Forward-backward recurrences
5 2010/02/09 (Guest lecture by Prof. Sunita)
5 2010/02/12 Shiv ratri holiday
6 2010/02/20 Midterm exam, SIC301, 14:00--17:00
7 2010/02/23
7 2010/02/26
  • SemTag and Seeker
  • Wikify!
  • Tree kernels
8 2010/03/02
  • Cucerzan's leave-one-out
  • Milne and Witten's anchor method
8 2010/03/05
9 2010/03/09 Postponed
9 2010/03/12 (Guest lecture by Prof. Sunita -- slides)
  • Relationship extraction
  • Clues: surface tokens, POS tags, parse tree, dependency graph
  • Feature, rule, kernel -based methods
10 2010/03/16 Holiday
10 2010/03/19
  • More examples of path kernels
  • Closed domain binary relationship bootstrapping: DIPRE , Snowball
  • Open-domain binary relation harvesting: TextRunner
11 2010/03/23
  • Coreference resolution: in unstructured text, of rows in table
  • Discriminative (active learning) approach
  • Hierarchical hybrid network model
  • Unified graphical model
11 2010/03/26
  • Using entities and types in search
  • Descendant of answer type in proximity to keyword matches: IR4QA, SSQ
  • Extending to multiple answer types:
12 2010/03/30
  • Collective Laplacian scoring/ranking
  • Collective snippet labeling using associative graphical model:
12 2010/04/02 Holiday -- Good Friday
13 2010/04/06
  • Proximity-based (positional) language models in IR
  • Probabilistic approaches to expert search
  • Discriminative entity ranking
13 2010/04/09 (Guest lecture by Prof. Sudarshan)
  • Proximity search in graphs
  • DBXplorer , BANKS (demo)
14 2010/04/13
  • Summary of entity/expert search
  • Convergence of DB+XML and IR+Web search
  • WHIRL
  • XIRQL and ELIXIR
  • XQuery, TeXQuery
14 2010/04/16
  • DISCOVER, ObjectRank
  • YAGO and NAGA
15 2010/04/24 Final exam, SIC301, 14:30–17:30