Authors: Tiezheng Yu, Zihan Liu, Pascale Fung \ Paper reference: https://aclanthology.org/2021.naacl-main.471.pdf

Contribution

This paper studies domain adaptation for the abstractive summarization task. Specifically, it investigates adding a second phase pre-training under low-resource setting across diverse domains, and shows that applying RecAdam can effectively maintain the pre-trained models' knowledge in the first stage of pre-training and alleviate catastrophic forgetting. At the end, it leaves future works some challenges in low-resource domain adaptation for abstractive summarization.

Pre-training settings

This paper uses BART as the base model.

Source Domain Pre-Training (SDPT): continue pre-training BART using the summarization task with the source (news) domain summarization data. The purpose of this pre-training is to inject the task knowledge into the pre-trained language model so that the model can quickly adapt to the same task in target domains.
Domain-Adaptive Pre-Training (DAPT): continue pre-training BART using its original pre-training objective function on unlabeled domain-related data.
Task-Adaptive Pre-Training (TAPT): continue pre-training BART on a set of the unlabeled documents in the target domain’s summarization task uses a much smaller but far more task-relevant pre-training corpus compared to TAPT

RecAdam penalize the loss function if the learned parameters (after the second-phrase pre-training) is far way from the original parameters. This is why it can keep models' knowledge learnt from the first stage pre-training.

Takeaways

SDPT and TAPT are able to generally improve the summarization performance for all domains. The effectiveness of DAPT depends on the relatedness (measured by vocal overlaps) between the pre-training data and the target domain task data.
Extensive training data could result in a comparatively large loss from RecAdam since the model’s parameters tend to be greatly modified.
Pre-training with relatively short document and summary is more effective for SDPT.
The paper leaves how to effectively integrate the task and domain knowledge (SDPT and DAPT) to future works.

Contribution#

Pre-training settings#

Takeaways#

Contribution

Pre-training settings

Takeaways