Authors: Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush \ Paper reference: https://aclanthology.org/D18-1443.pdf

Contribution

This work proposes a content selection system to extract words that should be part of the summary and demonstrated that the content selector (bottom-up attention) itself is effective at identifying important words.

While Pointer-Generator models have the ability to abstract in summary, the use of a copy mechanism causes the summaries to be mostly extractive. Summarizers can be further improved by using content selector to modify the copy attention distribution of abstractive summarizers and to restrict their ability to copy words from the source. The content selection system is data-efficient and can be applied in low-resource summarization settings.

Details

Content Selection

The paper defines the content selection task as a sequence-tagging problem, with the objective of identifying tokens from a document that are part of its summary. A word $x_i$ in the source document is labeled as 1 if it satisfies certain criteria, 0 otherwise.

Bottom-Up Copy Attention

The paper trains, separately, both a pointer-generator model and a content selector on the full dataset. At inference time, the content selector computes selection probabilities $q_{1:n}$ for each token in a source document. A threshold $\epsilon$ is chosen so that if the selection probability $q_i$ is greater than the threshold, then the original attention score from the pointer-generator model is kept; otherwise the attention score is set to $0$.

The selection probabilities from the content selector are used to modify the copy attention distribution to only include tokens identified by the selector. Experiments find that the performance of the abstractive system drops if does not have access to the full source document.

Discussion

Content selector is quite effective at finding important words (ROUGE-1) but less effective at chaining them together (ROUGE2). The decrease in ROUGE-2 indicates a lack of fluency and grammaticality of the generated summaries.

The abstractor has low unigram novelty, which limits the extent of abstraction. The benefit of abstractive models has been less in their ability to produce better paraphrasing but more in the ability to create fluent summaries from a mostly extractive process.