Datasets for Noun Compound Interpretation

 

A noun compound is a sequence of two or more nouns which have a well-defined meaning when written together. For example, orange juice, colon cancer, research paper submission, paper submission deadline, etc. The fact that “juice is made from orange” is hidden in orange juice. Noun compound interpretation deals with uncovering such hidden relations.

Noun compounds are usually interpreted in two ways: labelling and paraphrasing.

  • Labelling involves assigning a semantic relation to a noun compound e.g., student protest: AGENT, orange juice: MADEOF, etc. These relations come from a set of a predefined taxonomy of semantic relations.
  • Paraphrasing involves rewriting the noun compound as a paraphrase which conveys its meaning explicitly, e.g., orange juice: “juice made from orange” or “juice with orange flavour”.
 

FrameNet Based Labelling

We have prepared a dataset dataset with FrameNet based labeling. The head-word (mostly the second word) of a noun compound invokes a frame, and the modifier assumes a frame-element (FE) (or, semantic role) in the frame.

You can download the dataset from here.

Citation:
Girishkumar Ponkiya, Kevin Patel, Pushpak Bhattacharyya and Girish K. Palshikar, Towards a Standardized Dataset for Noun Compound Interpretation, LREC 2018, Miyazaki, Japan, May 7-12, 2018.
BibTeX:
@inproceedings{ponkiya2018towards,
  title = {Towards a Standardized Dataset for Noun Compound Interpretation},
  author = {Ponkiya, Girishkumar and Patel, Kevin and Bhattacharyya, Pushpak and Palshikar, Girish K},
  address = {Miyazaki, Japan},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  isbn = {979-10-95546-00-9},
  pages= {3092--3097}
}
 

Prepositional Paraprasing

Prepositional paraphrasing – paraphrasing using only prepositions (e.g., orange juice: “juice of orange”) – is an important task. We have annotated Kim and Baldwin (2005)'s dataset (noun compounds annotated with 20 relations) with 8-preposition (proposed by Lauer (1995)).

You can download the dataset from here.

Citation:
Girishkumar Ponkiya, Kevin Patel, Pushpak Bhattacharyya and Girish Palshikar, Treat us like the sequences we are: Prepositional Paraphrasing of Noun Compounds using LSTM, COLING 2018, Santa Fe, New-Mexico, USA, August 20-26, 2018.
BibTeX:
@inproceedings{ponkiya2018prepositional,
  title = {Treat us like the sequences we are: Prepositional Paraphrasing of Noun Compounds using {LSTM}},
  author = {Ponkiya, Girishkumar and Patel, Kevin and Bhattacharyya, Pushpak and Palshikar, Girish K},
  booktitle = {The 27th International Conference on Computational Linguistics (COLING 2018)},
  address = {Santa Fe, New-Mexico, USA},
  year = {2018},
  pages= {1827--1836}
}
 

References

To be updated..