Authors: Griffin Adams, Emily Alsentzer, Mert Ketenci, Jason Zucker, Noémie Elhadad
Paper reference: https://aclanthology.org/2021.naacl-main.382.pdf

Contribution

This paper introduces a hospital-course summarization task in which the goal is to faithfully and concisely summarizing the EHR documentation for a patient’s specific inpatient visit, from admission to discharge (take the Brief Hospital Course, BHC, as a proxy reference).

Then the paper provides a comprehensive analysis of the dataset and identify several implications for future hospital course summarization research.

Details

Dataset

The paper constructs a large-scale, multi-document summarization dataset CLINSUM (not public) covering a wide range of reasons for hospitalizations. It mainly relies on “Admission”, “Progress”, and “Consult” notes as source documents. It represents an incredibly challenging multi-document summarization task with diverse knowledge requirements.

Dataset Analysis and Implications

Extractiveness v.s. abstractiveness

CLINSUM appears very extractive according to widely used metrics. However, 64% of the extractive fragments are unigrams, and 25% are bigrams, which indicate a high level of re-writing.

Extractive strategies find that, on average, one sentence accounts for roughly 50% of the overall ROUGE score. Afterwards, the marginal contribution of the next shrinks. In other words, the summary transitions from extractive to abstractive.

There is a great deal of redundancy in the source notes, but repetition is not indicative of salience in CLINSUM.

Implications:
(1) Require better understanding of the signal between lexical centrality and salience.
(2) Require for dynamic hubris extraction-abstraction strategies.

Comprehensiveness and conciseness

BHC summaries are packed with medical entities, which are well-distributed across the source notes. This difficult task calls for a domain-specific approach to assessing faithfulness.

Summaries are extremely dense with medical entities and it is necessary to read the entire set of notes to generate the summary despite diminishing marginal returns. Summaries also exhibits frequent, abrupt topic shifts, with few repeated entities.

Implications:
Entities are so densely packed in summaries makes models more susceptible to factual errors. Fact-based evaluation metrics which encode a deeper knowledge of clinical concepts and their complex semantic and temporal relation should be developed.

Styles and content organization

Clinical texts contain many obscure, abbreviations, misspellings, and sentence fragments. Summary sentences are actually longer on average than source sentences. Qualitative analysis confirms that most BHCs are organized around a patient’s disorders.

Retrieval frameworks find that summaries adapt the style and problem-oriented structure of other summaries, but contain patient-specific information from the source notes.

BHC summaries are silver standard. Discharge summaries and their associated BHC sections are frequently missing critical information or contain excessive or erroneous content. These quality issues occur for a number of reasons.

Implications:
(1) One approach can be using use the retrieve-rerank-rewrite framework to generate problem-oriented BHC summarization.
(2) Develop heuristics to assess reference quality or scalable reference-free evaluations.