A Shared Task on Understanding Figurative Language

We are happy to announce a new shared task on Understanding Figurative Language as part of the Figurative Language Workshop (FigLang 2022) at EMNLP 2022. In recent years, there have been several benchmarks dedicated to figurative language understanding, which generally frame "understanding" as a recognizing textual entailment task -- deciding whether one sentence (premise) entails/contradicts another (hypothesis)(Chakrabarty et al 2021, Stowe et al 2022). We introduce a new shared task for figurative language understanding around this textual entailment paradigm, where the hypothesis is a sentence containing the figurative language expression (e.g., metaphor, sarcasm, idiom, simile) and the premise is a literal sentence containing the literal meaning. There are two important aspects of this task and the associated dataset: 1) the task requires not only to generate the label (entail/contradict) but also to generate a plausible explanation for the prediction; 2) the entail/contradict label and the exploration are related to the meaning of the figurative language expression.

For instance given a

Premise: He utterly decimated his tribe's most deeply held beliefs.

Hypothesis: He absorbed the knowledge or beliefs of his tribe.

we need an output that consists of a

1) Label = Contradiction

2) Explanation = Absorbed typically means to take in or take up something, while "utterly decimated" means to destroy completely.

To judge the quality of the explanations we compute the average between BertScore and BLEURT, which we refer to as explanation score (between 0 and 100). Instead of reporting only label accuracy for NLI, we report label accuracy at three thresholds of explanation score (0, 50, and 60). Accuracy@0 is equivalent to simply computing label accuracy, while Accuracy@50 counts as correct only the correctly predicted labels that achieve an explanation score greater than 50.

For instance given a

Premise: The place looked impenetrable and inescapable.

Hypothesis: The place looked like a fortress.

1) Gold Label = Entailment and Predicted Label = Entailment

2) Gold Explanation = A fortress is a military stronghold, hence it would be very hard to walk into, or in other words impenetrable and inescapable.

3) Model Explanation = A fortress is a structure that is built to withstand attacks and is impenetrable, making it an impregnable and inescapable place.

explanation score = 61

We will also conduct human evaluation of top 5 performing systems. When accounting for top 5 sytems we will take into account both systems with good Accuracy measures but also systems that are competitive but typically use smaller models (in terms of number of parameters)

Task Definition

Evaluation Metrics

Misc.

Citation