Authors: Amir Feder, Katherine A. Keith, Emaad Manzoor, Reid Pryzant, Dhanya Sridhar, Zach Wood-Doughty, Jacob Eisenstein, Justin Grimmer, Roi Reichart, Margaret E. Roberts, Brandon M. Stewart, Victor Veitch, Diyi Yang
Paper reference: https://arxiv.org/pdf/2109.00725v1.pdf

Context

As NLP systems are increasingly deployed in challenging and high-stakes scenarios, we cannot assume that training and test data are identically distributed, and we may not be satisfied with uninterpretable black-box predictors. Moreover, the increasingly high-capacity neural architectures make no distinction between causes, effects, and confounders, and they make no attempt to identify causal relationships. Causality offers a promising path forward for these problems. There are potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models.

Learning Robust Predictors

The NLP field has grown increasingly concerned with spurious correlations. Such observations have led to several proposals for novel evaluation methodologies to ensure that predictors are not “right for the wrong reasons”.

These evaluations generally take two forms:
(1) invariance tests, which assess whether predictions are affected by perturbations that are causally unrelated to the label;
(2) sensitivity tests, which apply perturbations that should in some sense be the minimal change necessary to flip the true label.

A number of approaches have been proposed for learning predictors that pass tests of sensitivity and invariance. These approache fall into two main groups: counterfactual data augmentation and causally-motivated distributional criteria.

Data augmentation

Idea: Elicit or construct counterfactual instances, and incorporate them into the training data.

In the case of invariance tests, additional focus can be provided by adding a term to the learning objective to explicitly penalize disagreements in the predictions for counterfactual pairs. In the case of interventions on the label Y, training on label counterfactuals can improve out-of-domain generalization and reduce sensitivity to noise.

Counterfactual examples

Counterfactual examples can be generated in several ways:
(1) Manual post-editing. Manual editing is typically fluent and accurate but relatively expensive.
(2) Heuristic replacement of keywords. It cannot guarantee fluency or coverage of all labels and covariates of interest.
(3) Automated text rewriting. Fully generative approaches could potentially combine the fluency and coverage of manual editing, but these methods are still relatively immature.

Counterfactual examples are a powerful resource because they directly address the missing data issues that are inherent to causal inference. However, in many cases it is difficult for even a fluent human to produce meaningful counterfactuals.

Distributional Criteria

Way1

Derive distributional properties of invariant predictors, and then ensure that these properties are satisfied by the trained model.

It can be shown that any counterfactually invariant predictor will satisfy $f(X) \perp Z \mid Y$ (the prediction f(X) is independent of the covariate Z conditioned on the true label Y). In this fashion, knowledge of the true causal structure of the problem can be used to derive observed-data signatures of the counterfactual invariance. Such signatures can be incorporated as regularization terms in the training objective.

This does not guarantee counterfactual invariance, but in practice it increases counterfactual invariance and improve performance in out-of-distribution settings without requiring counterfactual examples.

Way2

Viewing the training data as arising from a finite set of environments, in which each environment is endowed a unique distribution over causes, but the causal relationship between X and Y is invariant across environments (environmental invariance). The goal is to learn a predictor that works well across a set of causally-compatible domain (domain generalization).

Way3

Control for confounding such that $$ \tilde{P}(Y \mid X)=\sum_{z} P(Y \mid X, Z=z) \operatorname{Pr}(Z=z). $$

These distributional approaches require richer training data than in the typical supervised learning setup: either explicit labels Z for the causes of the text that should not influence the prediction, or access to data gathered from multiple labeled environments. Whether obtaining such data is easier than creating counterfactual instances depends on the situation.

Fairness and bias

NLP systems inherit and sometimes amplify undesirable biases that are encoded in text training data. A causal analysis is required to determine whether an observed distribution of data and predictions raises fairness concerns. Counterfactual data augmentation has been applied to reduce bias in text classification and in pre-trained contextualized word embedding models.

Causal Model Interpretations

Both attention and perturbation-based methods have important limitations. Attention-based explanations can be misleading and are generally possible only for individual tokens. Existing perturbation based methods often generate implausible counterfactuals. These methods do not allow for explaining predictions in terms of more abstract linguistic concepts or sentence-level concepts.

From causal inference persepective, a natural approach to explanation is to generate counterfactual examples and then compare the prediction for each example and its counterfactual. A complementary approach is to generate counterfactuals with minimal changes that obtain a different model prediction. Such examples serve as explanations as they allow us to observe the changes required to change a model’s prediction.

Another solution is to identify invariances in a given trained model, and not with enforcing them during training.