Our data contains 9,000 high-quality literal, figurative sentence pairs with entail/contradict labels and the associated explanations. The benchmark spans four types of figurative language: Sarcasm, Simile, Metaphor, and Idiom.
A noteworthy property of our data is that both the entailment/contradiction labels and the explanations are w.r.t the figurative language expression (i.e., metaphor, simile, idiom) rather than other parts of the sentence.

Our data is challenging because it inherently requires 1) relational reasoning using background commonsense knowledge, and 2) finegrained understanding of figurative language. Our dataset is constructed through a combination of few-shot prompting with GPT3 and crowd-sourcing from AMT followed by experts judging and minimally editing GPT-3 output to ensure quality control.

Links:   [Leaderboard]   [Data Explorer]   [Primary Contact]  


Key Challenges

Why is this problem hard?

Figurative expressions often compose in a non-trivial way, and introduce implicit meaning that requires multiple reasoning steps to interpret. In the example above the hypothesis represents a Sarcasm and in order to understand the contradiction between the literal premise and the sarcastic hypothesis, models need to reason about everyday concepts using commonsense knowledge

Second, the models needs to have the ability to comprehend unseen figurative expressions which can also be non-compositional as in the case of the Idiom example above.



Template Courtesy: Bill Yuchen Lin