Causally informed regression modeling: The dative alternation

corpus linguistics

regression

bias

causal inference

binary data

This blog post describes how assumed causal relations among variables may inform the analysis and interpretation of corpus-linguistic datasets. For illustration, I will rely on Bresnan et al.’s (2007) data on the English dative alternation.

Author

Affiliation

Lukas Sönning

University of Bamberg

Published

January 16, 2026

R setup

library(rethinking)
library(dagitty)
library(languageR)
library(marginaleffects)
library(lattice)
library(gtsummary)
library(uls)

A complication that arises in the analysis of corpus data is the fact that predictor variables often show varying levels of association. These associations, which are usually discussed under the label “(multi-)collinearity”, may reflect causal relations between predictor variables. This blog post is a companion to Sönning (2025), and it describes how to incorporate information about such causal relations into regression modeling.

Data

For illustration, I will use Bresnan et al. (2007)’s data on the English dative alternation, which are available in the R package {languageR} (Baayen and Shafaei-Bajestan 2019). The dative alternation refers to the choice between two realizations of the recipient in a ditransitive construction:

As a noun phrase (NP): She gave her parents fresh flowers.
As a prepositional phrase (PP): She gave fresh flowers to her parents.

Viewed from a different perspective, this alternation is about the order of the theme and the recipient.

Recipient first: She gave her parents fresh flowers.
Theme first: She gave fresh flowers to her parents.

The realization of the recipient is the response variable, which we will code as follows:

NP: Recipient is realized as a noun phrase (i.e. it almost certainly appears first)

In line with earlier research on this alternation, the focus will be on four characteristics of nominal clause constituents, i.e. characteristics of the theme and of the recipient:

discourse status: whether the constituent is given or not-given (new or accessible) (binary)
pronominality: whether the constituent is realized as pronoun or not (binary)
length: in number of words, log-transformed (continuous)
definiteness: whether the constituent is definite or not (binary)

To simplify the analysis task, I omit a number of predictors (e.g. animacy). I also reduce the data to the verb give, the most frequent type in the dataset. This yields a subsample of 1,666 tokens. Throughout this blog post, recipients and themes will be treated separately.

prepare data

dative_give <- subset(dative, Verb == "give")

dative_give$NP <- ifelse(dative_give$RealizationOfRecipient == "NP", 1, 0)

dative_give$log_length_thm <- log(dative_give$LengthOfTheme)
dative_give$log_length_rec <- log(dative_give$LengthOfRec)

dative_give$given_rec <- ifelse(dative_give$AccessOfRec == "given", 1, 0)
dative_give$given_thm <- ifelse(dative_give$AccessOfTheme == "given", 1, 0)

dative_give$pronoun_thm <- ifelse(dative_give$PronomOfTheme == "pronominal", 1, 0)
dative_give$pronoun_rec <- ifelse(dative_give$PronomOfRec == "pronominal", 1, 0)

dative_give$definite_thm <- ifelse(dative_give$DefinOfTheme == "definite", 1, 0)
dative_give$definite_rec <- ifelse(dative_give$DefinOfRec == "definite", 1, 0)

DAG

As discussed in more detail in Sönning (2025), the directed acyclic graph (DAG) in Figure 1 represents the causal relations that are assumed to hold between the variables. Note that this causal constellation applies to both the theme and the recipient. A full DAG would therefore include a horizontally-flipped mirror image of the constellation shown in Figure 1, with the left half representing characteristics of the theme, and the right half (not shown in Figure 1) representing characteristics of the recipient. To keep things simple, I will avoid this redundancy. Note, however, that this means we assume characteristics of the theme to be independent of characteristics of the recipient; this seems reasonable.

draw DAG

dag_dative <- dagitty( "dag{DiscourseStatus -> Pronominality; 
                       Pronominality -> Length; 
                       Length -> Realization; 
                       DiscourseStatus -> Definiteness;
                       DiscourseStatus -> Length;
                       Pronominality -> Realization;
                       Definiteness -> Realization;
                       DiscourseStatus -> Realization}")

coordinates(dag_dative) <- list(
  x=c(DiscourseStatus = 0, Pronominality = 1, Length = 1, Definiteness = 1, Realization = 2),
  y=c(DiscourseStatus = 1, Pronominality = 3, Length = 1.8, Definiteness = 0, Realization = 1)
)

drawdag(dag_dative)

Figure 1: Directed acyclic graph for internal variables characterizing the noun phrases realizing the recipient or the theme in the ditransitive construction.

Naive modeling approach

The typical approach to data of this kind is to fit a regression model and look at the coefficients and standard errors for the individual predictors. Let’s go ahead and fit such a model:

m_full <- glm(
  NP ~ 
    given_thm + log_length_thm + pronoun_thm + definite_thm + 
    given_rec + log_length_rec + pronoun_rec + definite_rec,
  data = dative_give,
  family = binomial)

Information on the regression coefficients is provided in the table below, where the predictors are grouped by semantic role. Reassuringly, the estimates for characteristics of the recipient and the theme are, to a first approximation, mirror images of one another, meaning that they are similar in magnitude but differ in sign. This makes sense, since this alternation is essentially a question about the order of theme and recipient. Since the outcome variable is coded to indicate a noun phrase realization (i.e. recipient first), the following patterns emerge:

Given, pronominal, definite, and short recipients forge ahead, favoring the recipient-first order.
Given, pronominal, definite, and short themes likewise forge ahead, disfavoring the recipient-first order.

Characteristic	log(OR)	95% CI	p-value
theme: Given	-1.8	-2.7, -0.99	0.000
theme: Length	1.5	1.1, 1.8	0.000
theme: Pronoun	-3.2	-4.0, -2.3	0.000
theme: Definite	-0.85	-1.4, -0.25	0.005
recipient: Given	1.2	0.58, 1.9	0.000
recipient: Length	-1.5	-1.9, -1.0	0.000
recipient: Pronoun	2.1	1.3, 2.9	0.000
recipient: Definite	0.78	0.21, 1.4	0.008
Abbreviation: OR = Odds Ratio

Before studying this regression table in any detail, it is important to call into mind what the coefficients mean. They reflect the association between predictor and outcome that remains once we adjust for all the other variables in the model. Since these coefficients are blind to the underlying causal structure, however, the DAG in Figure 1 is needed to put them into perspective.

Direct vs. indirect va. total causal effect

In the model we have fit, the coefficient for discourse status, for instance, along with its standard error, reflects the direct causal effect of this variable. This is to say that the model compares a given and a new recipient (or theme, as the case may be) that are identical on all other variables: They have the same pronominality, definiteness, and length.

Let’s use the {marginaleffects} package (Arel-Bundock, Greifer, and Heiss 2024) to get an estimate of the direct causal effect of recipient discourse status (on the proportion scale). Apparently, it is quite small (88% NPs for a given recipient vs. 82% NPs for recipient that is not given).

marginaleffects::avg_predictions(
  m_full,
  variables = "given_rec"
)


 given_rec Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
         0    0.815    0.01218 66.9   <0.001 Inf 0.791  0.839
         1    0.876    0.00899 97.5   <0.001 Inf 0.859  0.894

Type:  response

According to Figure 1, however, the predictor discourse status also affects the realization of the recipient indirectly, through other causal channels. Specifically, there are four other pathways:

discourse status -> definiteness -> NP
discourse status -> length -> NP
discourse status -> pronominality -> NP
discourse status -> pronominality -> length -> NP

Aggregating over these four indirect pathways, we obtain the indirect effect of discourse status on the realization of the recipient. Once a DAG is drawn, then, we can distinguish between direct effects and indirect effects. The direct effect is represented by the arrow going from discourse status into NP:

discourse status -> NP

The indirect effect, on the other hand, is the sum of the indirect causal pathways. The total effect of discourse status is the sum of the direct and all indirect effects.

Importantly, the coefficients returned by regression models always reflect direct effects. This means that if the model includes predictors that transmit indirect effects, these indirect effects will not be reflected in the coefficients. If we would like a regression coefficient to return an estimate of the total causal effect of a predictor, we must leave all indirect pathways open. This means that we must not adjust for those variables that transmit an indirect effect. According to Figure 1, a regression model that addresses the total causal effect of discourse status must not include any of the other predictors.

Let us fit such a model:

m_marginal_given <- glm(
  NP ~ given_rec, 
  family = binomial,
  data = dative_give)

And then we use {marginaleffects} to get an estimate of the total causal effect of the discourse status of the recipient (again on the proportion scale). This association is much larger: 93% NPs for a given recipient vs. 57% NPs for a recipient that is not given.

marginaleffects::avg_predictions(
  m_marginal_given,
  variables = "given_rec"
)


 given_rec Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
         0    0.567    0.02587  21.9   <0.001 351.1 0.516  0.617
         1    0.925    0.00729 126.9   <0.001   Inf 0.911  0.940

Type:  response

The first key question, then, is whether you are interested in the direct or the total effect of a predictor. Both are usually meaningful (and relevant), and it therefore seems reasonable to consider both. It is arguably only in combination, and alongside a DAG, that these coefficients tell the linguistic story behind the data.

The distinction between direct, indirect, and total effects is the first key insight a DAG can contribute to our interpretation of the data.

Confounding

Another important theme in multifactorial modeling is confounding. Thus, in the data at hand, we observe an overall association between definiteness and NP. This is reflected in a regression model including definiteness as the only predictor:

m_marginal_definite <- glm(
  NP ~ definite_rec, 
  family = binomial,
  data = dative_give)

Using {marginaleffects}, we can get an estimate of the overall association between recipient definiteness and the response variable on the proportion scale: 88% of the definite recipients are realized as a NP, vs. 62% of the indefinite recipients.

marginaleffects::avg_predictions(
  m_marginal_definite,
  variables = "definite_rec"
)


 definite_rec Estimate Std. Error     z Pr(>|z|)     S 2.5 % 97.5 %
            0    0.616    0.03529  17.5   <0.001 224.1 0.547  0.685
            1    0.876    0.00858 102.1   <0.001   Inf 0.859  0.893

Type:  response

According to Figure 1, this association is at least partly spurious. This is due to the confounder discourse status, i.e. a third variable which has a causal effect both on definiteness and on NP. In such a situation, we must adjust for the confounding variable discourse status to get a more accurate assessment of the effect of definiteness on NP. There are various strategies for “getting rid of” the spurious part of the association between definiteness and NP:

Data analysis: We could include the confounder as an adjustment variable into the analysis
Study design: We could control the distribution of the confounding variable in the data, so that definite and indefinite constituents do not differ systematically in terms of discourse status; this could be done via matching or blocking/stratification

Let’s pursue the first strategy and include the confounder discourse status as an adjustment variable into the model:

m_total_definite <- glm(
  NP ~ definite_rec + given_rec, 
  family = binomial,
  data = dative_give)

Having adjusted for discourse status, there is no longer an association between definiteness and the outcome variable:

marginaleffects::avg_predictions(
  m_total_definite,
  variables = "definite_rec"
)


 definite_rec Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
            0    0.855    0.01770 48.3   <0.001 Inf 0.820  0.889
            1    0.844    0.00968 87.1   <0.001 Inf 0.825  0.863

Type:  response

Another important function of DAGs, then, is that they inform us about spurious patterns of associations in the data. This means that they allow us to recognize whether pairwise, overall associations between variables are due to confounding or whether they reflect a genuine causal relationship.

Difference between overall and adjusted association

To recapitulate, for both discourse status and definiteness, the marginal (overall) association with the response variable differs from the adjusted association:

For discourse status, the marginal association is larger. According Figure 1, this is because it reflects the total causal effect of this predictor.
For definiteness, the marginal association is smaller. According Figure 1, this is because it is confounded by the variable discourse status.

To do causally informed regression modeling, we must learn how to read relevant information from a DAG:

Which variable(s) do I need to adjust for to obtain information on a direct causal effect?
Which variable(s) do I need to adjust for to obtain information on a total causal effect?
Which variable(s) do I need to adjust for to block spurious paths?

However, this only scratches the surface of causal inference. For more background, I can recommend Pearl, Glymour, and Jewell (2016), McElreath (2020), and Cunningham (2021).

Application to the dative alternation

Let us now apply these insights to the dative alternation. For each predictor variable, we will obtain three quantities. All of these are differences between proportions, i.e. measures of association:

The first is an estimate of the overall association with the response variable, i.e. not adjusting for any other variables. I will refer to this as the marginal association
The second is an estimate of the total causal effect, which requires us to use a model that (i) adjusts for confounders and (ii) does not adjust for variables transmitting relevant indirect effects
The third one is an estimate of the direct causal effect, which requires us to use a model that adjusts for confounders

More generally, the structure of a DAG tells us what kind of model specification will produce a coefficient with the desired interpretation, i.e. a direct or a total effect. In order to arrive at unbiased estimates of both types of effects, so-called backdoor paths, which often reflect confounding, need to be closed. This is usually accomplished by including certain variables into the model. For definiteness, for instance, we need to adjust for discourse status to block confounding paths.

If we want to estimate the total effect of a variable in a situation where there are mediators through which it transmits indirect effects, these causal pathways must remain open, which is to say that no adjustment should take place. For discourse status, for instance, we must not adjust for any variables, since all of them (may) operate as mediators and therefore contribute to its overall causal association with the outcome.

These quantities are addressed by different model specifications. The marginal associations are straightforward to obtain:

Marginal association
- NP ~ given (Discourse status)
- NP ~ pronoun (Pronominality)
- NP ~ definite (Definiteness)
- NP ~ length (Length)

According to Figure 1, the following model specifications estimate the total causal association of each predictor with the outcome.

Total causal effect
- NP ~ given (Discourse status)
- NP ~ given + pronoun (Pronominality)
- NP ~ given + definite (Definiteness)
- NP ~ given + pronoun + length (Length)

And a regression model including all predictors may serve to address the direct causal effect for each variable:

Direct causal effect
- NP ~ given + pronoun + definite + length (all predictors)

We start by fitting all of these models, separately for predictors tied to the recipient and the theme.

fit models

# Recipient
#-------------------------------------------------------------------------------

# marginal associations
#----------------------

m_marginal_given <- glm(
  NP ~ given_rec, 
  family = binomial,
  data = dative_give)

m_marginal_pronoun <- glm(
  NP ~ pronoun_rec, 
  family = binomial,
  data = dative_give)

m_marginal_length <- glm(
  NP ~ log_length_rec, 
  family = binomial,
  data = dative_give)

m_marginal_definite <- glm(
  NP ~ definite_rec, 
  family = binomial,
  data = dative_give)


# total causal effects
#---------------------

m_total_pronoun <- glm(
  NP ~ pronoun_rec + given_rec, 
  family = binomial,
  data = dative_give)

m_total_length <- glm(
  NP ~ log_length_rec + given_rec + pronoun_rec, 
  family = binomial,
  data = dative_give)

m_total_definite <- glm(
  NP ~ definite_rec + given_rec, 
  family = binomial,
  data = dative_give)



# Theme
#-------------------------------------------------------------------------------

# marginal associations
#----------------------

m_marginal_given_thm <- glm(
  NP ~ given_thm, 
  family = binomial,
  data = dative_give)

m_marginal_pronoun_thm <- glm(
  NP ~ pronoun_thm, 
  family = binomial,
  data = dative_give)

m_marginal_length_thm <- glm(
  NP ~ log_length_thm, 
  family = binomial,
  data = dative_give)

m_marginal_definite_thm <- glm(
  NP ~ definite_thm, 
  family = binomial,
  data = dative_give)


# total causal effects
#---------------------

m_total_pronoun_thm <- glm(
  NP ~ pronoun_thm + given_thm, 
  family = binomial,
  data = dative_give)

m_total_length_thm <- glm(
  NP ~ log_length_thm + given_thm + pronoun_thm, 
  family = binomial,
  data = dative_give)

m_total_definite_thm <- glm(
  NP ~ definite_thm + given_thm, 
  family = binomial,
  data = dative_give)

The we use {marginaleffects} to express relevant estimates on the proportion scale.

form predictions

# Recipient
#-------------------------------------------------------------------------------

# given
#------

# direct causal effect
given_direct_preds_rec <- marginaleffects::avg_predictions(
  m_full,
  variables = "given_rec"
)
given_direct_comp_rec <- marginaleffects::avg_comparisons(
  m_full,
  variables = "given_rec"
)

# marginal association
given_marginal_preds_rec <- marginaleffects::avg_predictions(
  m_marginal_given,
  variables = "given_rec"
)
given_marginal_comp_rec <- marginaleffects::avg_comparisons(
  m_marginal_given,
  variables = "given_rec"
)

# total association
given_total_preds_rec <- marginaleffects::avg_predictions(
  m_marginal_given,
  variables = "given_rec"
)
given_total_comp_rec <- marginaleffects::avg_comparisons(
  m_marginal_given,
  variables = "given_rec"
)



# pronoun
#--------

# direct causal effect
pronoun_direct_preds_rec <- marginaleffects::avg_predictions(
  m_full,
  variables = "pronoun_rec"
)
pronoun_direct_comp_rec <- marginaleffects::avg_comparisons(
  m_full,
  variables = "pronoun_rec"
)

# marginal association
pronoun_marginal_preds_rec <- marginaleffects::avg_predictions(
  m_marginal_pronoun,
  variables = "pronoun_rec"
)
pronoun_marginal_comp_rec <- marginaleffects::avg_comparisons(
  m_marginal_pronoun,
  variables = "pronoun_rec"
)

# total association
pronoun_total_preds_rec <- marginaleffects::avg_predictions(
  m_total_pronoun,
  variables = "pronoun_rec"
)
pronoun_total_comp_rec <- marginaleffects::avg_comparisons(
  m_total_pronoun,
  variables = "pronoun_rec"
)


# definite
#---------

# direct causal effect
definite_direct_preds_rec <- marginaleffects::avg_predictions(
  m_full,
  variables = "definite_rec"
)
definite_direct_comp_rec <- marginaleffects::avg_comparisons(
  m_full,
  variables = "definite_rec"
)

# marginal association
definite_marginal_preds_rec <- marginaleffects::avg_predictions(
  m_marginal_definite,
  variables = "definite_rec"
)
definite_marginal_comp_rec <- marginaleffects::avg_comparisons(
  m_marginal_definite,
  variables = "definite_rec"
)

# total association
definite_total_preds_rec <- marginaleffects::avg_predictions(
  m_total_definite,
  variables = "definite_rec"
)
definite_total_comp_rec <- marginaleffects::avg_comparisons(
  m_total_definite,
  variables = "definite_rec"
)


# length
#-------

# direct causal effect
length_direct_preds_rec <- marginaleffects::avg_predictions(
  m_full,
  variables = list(log_length_rec = log(c(1, 4))))

length_direct_comp_rec <- marginaleffects::avg_comparisons(
  m_full,
  variables = list(log_length_rec = log(c(1, 4))))

# marginal association
length_marginal_preds_rec <- marginaleffects::avg_predictions(
  m_marginal_length,
  variables = list(log_length_rec = log(c(1, 4))))

length_marginal_comp_rec <- marginaleffects::avg_comparisons(
  m_marginal_length,
  variables = list(log_length_rec = log(c(1, 4))))

# total association
length_total_preds_rec <- marginaleffects::avg_predictions(
  m_total_length,
  variables = list(log_length_rec = log(c(1, 4))))

length_total_comp_rec <- marginaleffects::avg_comparisons(
  m_total_length,
  variables = list(log_length_rec = log(c(1, 4))))



# Theme
#-------------------------------------------------------------------------------

# given
#------

# direct causal effect
given_direct_preds_thm <- marginaleffects::avg_predictions(
  m_full,
  variables = "given_thm"
)
given_direct_comp_thm <- marginaleffects::avg_comparisons(
  m_full,
  variables = "given_thm"
)

# marginal association
given_marginal_preds_thm <- marginaleffects::avg_predictions(
  m_marginal_given_thm,
  variables = "given_thm"
)
given_marginal_comp_thm <- marginaleffects::avg_comparisons(
  m_marginal_given_thm,
  variables = "given_thm"
)

# total association
given_total_preds_thm <- marginaleffects::avg_predictions(
  m_marginal_given_thm,
  variables = "given_thm"
)
given_total_comp_thm <- marginaleffects::avg_comparisons(
  m_marginal_given_thm,
  variables = "given_thm"
)


# pronoun
#--------

# direct causal effect

pronoun_direct_preds_thm <- marginaleffects::avg_predictions(
  m_full,
  variables = "pronoun_thm"
)
pronoun_direct_comp_thm <- marginaleffects::avg_comparisons(
  m_full,
  variables = "pronoun_thm"
)

# marginal association

pronoun_marginal_preds_thm <- marginaleffects::avg_predictions(
  m_marginal_pronoun_thm,
  variables = "pronoun_thm"
)
pronoun_marginal_comp_thm <- marginaleffects::avg_comparisons(
  m_marginal_pronoun_thm,
  variables = "pronoun_thm"
)

# total association

pronoun_total_preds_thm <- marginaleffects::avg_predictions(
  m_total_pronoun_thm,
  variables = "pronoun_thm"
)
pronoun_total_comp_thm <- marginaleffects::avg_comparisons(
  m_total_pronoun_thm,
  variables = "pronoun_thm"
)


# definite
#---------

# direct causal effect

definite_direct_preds_thm <- marginaleffects::avg_predictions(
  m_full,
  variables = "definite_thm"
)
definite_direct_comp_thm <- marginaleffects::avg_comparisons(
  m_full,
  variables = "definite_thm"
)

# marginal association

definite_marginal_preds_thm <- marginaleffects::avg_predictions(
  m_marginal_definite_thm,
  variables = "definite_thm"
)
definite_marginal_comp_thm <- marginaleffects::avg_comparisons(
  m_marginal_definite_thm,
  variables = "definite_thm"
)

# total association

definite_total_preds_thm <- marginaleffects::avg_predictions(
  m_total_definite_thm,
  variables = "definite_thm"
)
definite_total_comp_thm <- marginaleffects::avg_comparisons(
  m_total_definite_thm,
  variables = "definite_thm"
)


# length
#-------

# direct causal effect

length_direct_preds_thm <- marginaleffects::avg_predictions(
  m_full,
  variables = list(log_length_thm = log(c(1, 4))))

length_direct_comp_thm <- marginaleffects::avg_comparisons(
  m_full,
  variables = list(log_length_thm = log(c(1, 4))))

# marginal association

length_marginal_preds_thm <- marginaleffects::avg_predictions(
  m_marginal_length_thm,
  variables = list(log_length_thm = log(c(1, 4))))

length_marginal_comp_thm <- marginaleffects::avg_comparisons(
  m_marginal_length_thm,
  variables = list(log_length_thm = log(c(1, 4))))


# total association

length_total_preds_thm <- marginaleffects::avg_predictions(
  m_total_length_thm,
  variables = list(log_length_thm = log(c(1, 4))))

length_total_comp_thm <- marginaleffects::avg_comparisons(
  m_total_length_thm,
  variables = list(log_length_thm = log(c(1, 4))))

Figure 2, which graphs these estimates, is quite rich in information. The graph is structured as follows:

There are four rows, one per predictor variable
The right panel shows estimates for theme-related predictor (left: characteristics of the recipient)
The plotting symbols show the predicted difference in the share of NP realization as a measure of association (or effect size)
- M: marginal association
- T: total causal effect
- D: direct causal effect

draw plots

p1 <- xyplot(1~1, type = "n", xlim = c(-.05, .55), ylim = c(.5, 4.5),
       par.settings = lattice_ls, #axis = axis_bottom,
       scales = list(y = list(draw = FALSE),
       x = list(at = c(0, .2, .4)),
       labels = c("0", "+20", "+40")),
       xlab = list(label = "Predicted difference in the share of NP realizations (percentage points)                                                            "),
       ylab = NULL,
       panel = function(x,y){
         panel.abline(h = c(1.5, 2.5, 3.5), col = "lightgrey")
         panel.abline(v = 0)
         panel.segments(
           x0 = c(given_total_comp_rec$conf.low,
                  pronoun_total_comp_rec$conf.low,
                  definite_total_comp_rec$conf.low,
                  -length_total_comp_rec$conf.low),
           x1 = c(given_total_comp_rec$conf.high,
                  pronoun_total_comp_rec$conf.high,
                  definite_total_comp_rec$conf.high,
                  -length_total_comp_rec$conf.high),     
           y0 = (4:1), y1 = (4:1), col = "grey"
         )
         panel.segments(
           x0 = c(given_marginal_comp_rec$conf.low,
                  pronoun_marginal_comp_rec$conf.low,
                  definite_marginal_comp_rec$conf.low,
                  -length_marginal_comp_rec$conf.low),
           x1 = c(given_marginal_comp_rec$conf.high,
                  pronoun_marginal_comp_rec$conf.high,
                  definite_marginal_comp_rec$conf.high,
                  -length_marginal_comp_rec$conf.high),     
           y0 = (4:1)+.2, y1 = (4:1)+.2, col = "grey"
         )
         panel.segments(
           x0 = c(given_direct_comp_rec$conf.low,
                  pronoun_direct_comp_rec$conf.low,
                  definite_direct_comp_rec$conf.low,
                  -length_direct_comp_rec$conf.low),
           x1 = c(given_direct_comp_rec$conf.high,
                  pronoun_direct_comp_rec$conf.high,
                  definite_direct_comp_rec$conf.high,
                  -length_direct_comp_rec$conf.high),     
           y0 = (4:1)-.2, y1 = (4:1)-.2, col = "grey"
         )
         
         
         panel.points(x = c(
           given_total_comp_rec$estimate,
           pronoun_total_comp_rec$estimate,
           definite_total_comp_rec$estimate,
           -length_total_comp_rec$estimate
         ), y = (4:1), pch = "T", cex = 2.2, col = "white", fontface = "bold")
         
         panel.points(x = c(
           given_marginal_comp_rec$estimate,
           pronoun_marginal_comp_rec$estimate,
           definite_marginal_comp_rec$estimate,
           -length_marginal_comp_rec$estimate
         ), y = (4:1) + .2, pch = "M", cex = 2.2, col = "white", fontface = "bold")
         
         panel.points(x = c(
           given_direct_comp_rec$estimate,
           pronoun_direct_comp_rec$estimate,
           definite_direct_comp_rec$estimate,
           -length_direct_comp_rec$estimate
         ), y = (4:1) - .2, pch = "D", cex = 2.2, col = "white", fontface = "bold")
         
         
         panel.points(x = c(
           given_total_comp_rec$estimate,
           pronoun_total_comp_rec$estimate,
           definite_total_comp_rec$estimate,
           -length_total_comp_rec$estimate
         ), y = (4:1), pch = "T", cex = 2)
         
         panel.points(x = c(
           given_marginal_comp_rec$estimate,
           pronoun_marginal_comp_rec$estimate,
           definite_marginal_comp_rec$estimate,
           -length_marginal_comp_rec$estimate
         ), y = (4:1) + .2, pch = "M", cex = 2)
         
         panel.points(x = c(
           given_direct_comp_rec$estimate,
           pronoun_direct_comp_rec$estimate,
           definite_direct_comp_rec$estimate,
           -length_direct_comp_rec$estimate
         ), y = (4:1) - .2, pch = "D", cex = 2)
         
         panel.text(x = 0, y = 5, label = "Higher if the RECIPIENT is:", cex = .9, adj = 0)

         
         panel.segments(x0=-.05, x1 = .55, y0 = .5, y1 = .5)
         panel.segments(x0=seq(0, .4, .2), x1=seq(0, .4, .2),
                        y0 = .5, y1 = .4)
       })



p2 <- xyplot(1~1, type = "n", xlim = c(-.85, .05), ylim = c(.5, 4.5),
       par.settings = lattice_ls, #axis = axis_bottom,
       scales = list(y = list(at = 4:1, labels = NULL),
                     x = list(at = c(-.8, -.6, -.4, -.2, 0)),
       labels = c("\u221280", "\u221260", "\u221240", "\u221220", "0")),
       xlab = list(label = " ", lineheight = .8),
       ylab = NULL,
       panel = function(x,y){
         panel.abline(h = c(1.5, 2.5, 3.5), col = "lightgrey")
         panel.abline(v = 0)
         panel.segments(
           x0 = c(given_total_comp_thm$conf.low,
                  pronoun_total_comp_thm$conf.low,
                  definite_total_comp_thm$conf.low,
                  -length_total_comp_thm$conf.low),
           x1 = c(given_total_comp_thm$conf.high,
                  pronoun_total_comp_thm$conf.high,
                  definite_total_comp_thm$conf.high,
                  -length_total_comp_thm$conf.high),     
           y0 = (4:1), y1 = (4:1), col = "grey"
         )
         panel.segments(
           x0 = c(given_marginal_comp_thm$conf.low,
                  pronoun_marginal_comp_thm$conf.low,
                  definite_marginal_comp_thm$conf.low,
                  -length_marginal_comp_thm$conf.low),
           x1 = c(given_marginal_comp_thm$conf.high,
                  pronoun_marginal_comp_thm$conf.high,
                  definite_marginal_comp_thm$conf.high,
                  -length_marginal_comp_thm$conf.high),     
           y0 = (4:1)+.2, y1 = (4:1)+.2, col = "grey"
         )
         panel.segments(
           x0 = c(given_direct_comp_thm$conf.low,
                  pronoun_direct_comp_thm$conf.low,
                  definite_direct_comp_thm$conf.low,
                  -length_direct_comp_thm$conf.low),
           x1 = c(given_direct_comp_thm$conf.high,
                  pronoun_direct_comp_thm$conf.high,
                  definite_direct_comp_thm$conf.high,
                  -length_direct_comp_thm$conf.high),     
           y0 = (4:1)-.2, y1 = (4:1)-.2, col = "grey"
         )
         
         panel.points(x = c(
           given_total_comp_thm$estimate,
           pronoun_total_comp_thm$estimate,
           definite_total_comp_thm$estimate,
           -length_total_comp_thm$estimate
         ), y = (4:1), pch = "T", cex = 2.2, col = "white", fontface = "bold")
         
         panel.points(x = c(
           given_marginal_comp_thm$estimate,
           pronoun_marginal_comp_thm$estimate,
           definite_marginal_comp_thm$estimate,
           -length_marginal_comp_thm$estimate
         ), y = (4:1) + .2, pch = "M", cex = 2.2, col = "white", fontface = "bold")
         
         panel.points(x = c(
           given_direct_comp_thm$estimate,
           pronoun_direct_comp_thm$estimate,
           definite_direct_comp_thm$estimate,
           -length_direct_comp_thm$estimate
         ), y = (4:1) - .2, pch = "D", cex = 2.2, col = "white", fontface = "bold")
         
         panel.points(x = c(
           given_total_comp_thm$estimate,
           pronoun_total_comp_thm$estimate,
           definite_total_comp_thm$estimate,
           -length_total_comp_thm$estimate
         ), y = (4:1), pch = "T", cex = 2)
         
         panel.points(x = c(
           given_marginal_comp_thm$estimate,
           pronoun_marginal_comp_thm$estimate,
           definite_marginal_comp_thm$estimate,
           -length_marginal_comp_thm$estimate
         ), y = (4:1) + .2, pch = "M", cex = 2)
         
         panel.points(x = c(
           given_direct_comp_thm$estimate,
           pronoun_direct_comp_thm$estimate,
           definite_direct_comp_thm$estimate,
           -length_direct_comp_thm$estimate
         ), y = (4:1) - .2, pch = "D", cex = 2)
         
         panel.text(x = 0, y = 5, label = "Lower if the THEME is:", cex = .9, adj=1)
         panel.text(x = .3, y = 5.5, label = "The percentage of NP realizations is:", cex = .9)
        panel.text(x = .28, y = 4:1, label = c(
           "given\n(vs. other)",
           "pronominal\n(vs. other)",
           "definite\n(vs. indefinite)",
           "shorter\n(1 vs. 4 words)"
         ), cex = .9, lineheight = .8)
         
         panel.segments(x0=-.85, x1 = .05, y0 = .5, y1 = .5)
         panel.segments(x0=seq(-.8, 0, .2), x1=seq(-.8, 0, .2),
                        y0 = .5, y1 = .4)
       })


print(p2, position=c(-.04,0,.46,.84), more = TRUE)
print(p1, position=c(.63,0,.96,.84))

Figure 2 allows us to see why there is a difference between the adjusted and the overall (marginal) association of a predictor with the outcome. The variable definiteness, for instance, shows a relatively strong marginal association with the outcome. Overall, there is a 20-point difference in the percentage of NP realizations between definite and indefinite recipients. Almost all of this association, however, is due to the confounder discourse status. Once we adjust for this variable, the association drops to(ward) zero.

Figure 2 compares each predictor’s marginal association (M) with the outcome, its total effect (T), and its direct effect (D). The fact that the constellations resemble mirror images of one another provides reassurance that the causal diagram in Figure 1 is not entirely unreasonable:

Discourse status: There is more to discourse status than its direct effect suggests; much of its causal effect (or, in the case of the theme, most of it) flows indirectly.
Pronominality: The marginal association of this variable with the outcome conflates a direct effect, an indirect effect (via length), and a confounding relationship with discourse status. Removing the non-causal part that is due to confounding leaves us with a smaller (in the case of the recipient, much smaller) total effect. For recipients, there appears to be little evidence for a direct effect of pronominality, which means that it operates exclusively via the mediator length. For the theme, however, a relatively strong direct effect remains.
Definiteness: The marginal association between definiteness and the outcome appears to be largely attributable to confounding; after adjusting for discourse status, there is little evidence for a noticeable total or direct effect.
Length: About half of the marginal association seems to be due to the confounders discourse status and pronominality; the total and direct effect are virtually identical since there are no indirect effects flowing from length.

Conclusion

This blog post serves as an online supplement to Sönning (2025), where the topic of causally informed modeling is addressed in some more detail, and from a linguistic perspective. The above discussion of the dative alternation was rather selective and failed to address a number of additional questions that will arise. However, it has managed to show that we must always be careful not to draw incomplete conclusions from regression tables or the overall (marginal) association between a predictor and the outcome.

References

Arel-Bundock, Vincent, Noah Greifer, and Andrew Heiss. 2024. “How to Interpret Statistical Models Using marginaleffects for R and Python.” Journal of Statistical Software 111 (9): 1–32. https://doi.org/10.18637/jss.v111.i09.

Baayen, R. H., and Elnaz Shafaei-Bajestan. 2019. languageR: Analyzing Linguistic Data: A Practical Introduction to Statistics. https://doi.org/10.32614/CRAN.package.languageR.

Bresnan, Joan, Anna Cueni, Tatiana Nikitina, and R. Harald Baayen. 2007. “Predicting the Dative Alternation.” In Cognitive Foundations of Interpretation, edited by Gerlof Bouma, Irene Krämer, and Joost Zwarts, 69–94. Amsterdam: Royal Netherlands Academy of Science.

Cunningham, Scott. 2021. Causal Inference: The Mixtape. London: Yale University Press.

McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in r and Stan. Chapman; Hall/CRC. https://doi.org/10.1201/9780429029608.

Pearl, Judea, Madelyn Glymour, and Nicholas P. Jewell. 2016. Causal Inference in Statistics: A Primer. Chichester: Wiley. http://www.vlebooks.com/vleweb/product/openreader?id=LeedsUni&isbn=9781119186854.

Sönning, Lukas. 2025. “Juggling Internal and External Factors When Modeling Natural Language Data.” November 26, 2025. https://osf.io/preprints/psyarxiv/r2p43_v1.

Citation

BibTeX citation:

@online{sönning2026,
  author = {Sönning, Lukas},
  title = {Causally Informed Regression Modeling: {The} Dative
    Alternation},
  date = {2026-01-16},
  url = {https://lsoenning.github.io/posts/2025-10-31_causally_informed_modeling/},
  langid = {en}
}

For attribution, please cite this work as:

Sönning, Lukas. 2026. “Causally Informed Regression Modeling: The Dative Alternation.” January 16, 2026. https://lsoenning.github.io/posts/2025-10-31_causally_informed_modeling/.