Authors: Tiezheng Yu, Zihan Liu, Pascale Fung \ Paper reference: https://aclanthology.org/2021.naacl-main.471.pdf

Contribution

This paper studies domain adaptation for the abstractive summarization task. Specifically, it investigates adding a second phase pre-training under low-resource setting across diverse domains, and shows that applying RecAdam can effectively maintain the pre-trained models' knowledge in the first stage of pre-training and alleviate catastrophic forgetting. At the end, it leaves future works some challenges in low-resource domain adaptation for abstractive summarization.

Pre-training settings

This paper uses BART as the base model.

Source Domain Pre-Training (SDPT): continue pre-training BART using the summarization task with the source (news) domain summarization data. The purpose of this pre-training is to inject the task knowledge into the pre-trained language model so that the model can quickly adapt to the same task in target domains.
Domain-Adaptive Pre-Training (DAPT): continue pre-training BART using its original pre-training objective function on unlabeled domain-related data.
Task-Adaptive Pre-Training (TAPT): continue pre-training BART on a set of the unlabeled documents in the target domain’s summarization task uses a much smaller but far more task-relevant pre-training corpus compared to TAPT

RecAdam penalize the loss function if the learned parameters (after the second-phrase pre-training) is far way from the original parameters. This is why it can keep models' knowledge learnt from the first stage pre-training.

Takeaways

  • SDPT and TAPT are able to generally improve the summarization performance for all domains. The effectiveness of DAPT depends on the relatedness (measured by vocal overlaps) between the pre-training data and the target domain task data.
  • Extensive training data could result in a comparatively large loss from RecAdam since the model’s parameters tend to be greatly modified.
  • Pre-training with relatively short document and summary is more effective for SDPT.
  • The paper leaves how to effectively integrate the task and domain knowledge (SDPT and DAPT) to future works.