SCC Models

library(epicmodel)
library(magrittr)

Sufficient-component cause (SCC) models are the core of the epicmodel package. Creating this package came with interesting discoveries and (re-)interpretations of some underlying concepts. They arise, first and foremost, from the definition of the steplist as well as the model creation process from steplist to SCC model. This vignette tries to summarize what “this thing called SCC model” is to epicmodel.

Why SCC?

Before going into more specific topics, it is worth clarifying what the purpose of a SCC model actually is. As previously described, they are a causal modeling framework, i.e., a specific structure or an approach to causal modeling, i.e., to specifying and structuring causal assumptions. As we learned from Pearl (2009; pp. 38-40), causal inference is only possible based on causal assumptions and, therefore, causal inference methods need to include causal modeling. There are different causal modeling frameworks available, e.g., causal graphs in the form of directed acyclic graphs (DAG). Different frameworks usually have different perspectives and can complement each other.

SCC models describe a single outcome of interest, but with all of its (known or suspected) causes, which makes it an outcome-focused approach. The SCC framework models the idea that an effect has multiple causes and that only certain combinations of these causes lead to the outcome. The individual causes are called “component causes”, further emphasizing that multiple causes are necessary for the outcome to occur. The combinations of component causes that can lead to the outcome of interest are called “sufficient causes”, emphasizing that each of these sets is enough to cause the outcome. The main modeling task is grouping component causes together to form sufficient causes.

SCC models add an important perspective to causal modeling and therefore to causal inference. Here is, as an example, a quote from an article by Rerknimitr et al. (2017) talking about atopic dermatitis (AD) and filaggrin (FLG):

Although null mutation of the FLG gene poses the strongest risk for AD, 60% of individuals who carry the gene do not have AD symptoms (Irvine et al. 2011). On the contrary, a significant portion of AD patients do not have FLG mutation (Irvine et al. 2011). It is thus evident that additional factors are needed to develop the disease.

Through a SCC lens, the case seems obvious: Null mutations of the FLG gene seem to be a component cause for atopic dermatitis, but they are not part of every sufficient cause. An intuitive understanding of the SCC structure seems especially useful when investigating the effect of interventions. A certain intervention might be very beneficial for certain sufficient causes, i.e., a certain group of affected individuals, but useless for others. Without considering this possibility, the effect of beneficial interventions might be easily missed.

The purpose of SCC models is therefore to extend causal modeling practices by the SCC perspective, which emphasizes two main points:

The task of SCC model creation can therefore be rephrased to the task of finding the smallest sets of component causes that are sufficient for outcome occurrence, with “smallest” meaning that, within the sufficient set, every single component cause is necessary. In epicmodel, these smallest sets are sometimes explicitly called minimally sufficient, even though sufficiency implies “minimality” by definition.

How can SCC models be used?

The functionalities available in epicmodel show how SCC models might be used in practice. As mentioned above, SCC models are outcome-focused and in theory need to include all causes of an outcome. Their creation is therefore time-consuming, but once created, they should be easily re-usable. epicmodel is built on the assumption that we know enough about many health outcomes to create useful SCC models. An useful SCC model might be able to fulfill the following tasks:

Algorithm for SCC model creation

First, let’s briefly talk about the algorithm for SCC model creation. The details are described in the function documentation for create_scc(). These are the basic steps:

  1. Derive a list of all valid combinations of component causes
  2. Check all valid combinations of component causes for sufficiency (at this point, IFNOT conditions are ignored)
  3. Check if IFNOT conditions influence sufficiency
  4. Reduce list of sufficient causes to minimally sufficient causes
  5. Add unknown causes

IFNOT conditions, sufficiency status, and the concept of time

While talking about the induction period in SCC models, Rothman et al. (2008; pp. 15-16) write:

There is no way to tell from a pie-chart diagram of a sufficient cause which components affect each other, which components must come before or after others, for which components the temporal order is irrelevant, etc. The crucial information on temporal ordering must come in a separate description of the interrelations among the components of a sufficient cause.

These interrelations among the component causes are available in the steplist, because the steps are based on mechanisms of outcome occurrence, which predefines their temporal ordering, i.e., the IF condition of a step always occurs before the corresponding THEN statement. For IFNOT conditions, however, this is not necessarily the case. The IFNOT condition could be fulfilled before or after the IF condition. Importantly, the temporal order of IF and IFNOT makes a difference. Under the implicit assumption that IF instantly leads to THEN, there are two possible orders of occurrence for steps with both IF and IFNOT conditions fulfilled:

Importantly, the steplist does not contain information on which of these two orders of occurrence are realistic or if both are possible. Therefore, the algorithm of SCC model creation makes sure that all possible temporal orders are investigated. This is the reason, why the algorithm starts with ignoring IFNOT.

It is worth mentioning, that it is implicitly assumed that, once a step occurred, it stays until the end, e.g., if step “IF Cell A produces cytokine B THEN cytokine B is present” occurred, cytokine B will be present until the end. If this assumption is unrealistic, an IFNOT condition needs to be added: “IF Cell A produces cytokine B and IFNOT factor C removes cytokine B THEN cytokine B is present”.

In the quote, Rothman et al. speak of a separate description of the crucial information on temporal ordering in addition to the pie-chart diagram. In epicmodel, this separate description takes the form of the sufficiency status. The sufficiency status describes for every sufficient cause, if it is always sufficient or if sufficiency depends on the order of occurrence of some of its elements. See also ?new_scc for more information.

Let’s look at the built-in steplist_party as an example. It describes the situation of our friend Clara who is wondering under which circumstances her birthday party will be a success. Let’s first load the steplist, check it (after some adjustments), and create the SCC model.

steplist_checked <- steplist_party %>% remove_na() %>% remove_segment("d4") %>% check_steplist()
scc_model <- steplist_checked %>% create_scc()
scc_model
#> 
#> ── Outcome Definitions ──
#> 
#> • Emma is coming and food is fine and Laura is coming and weather is fine
#> 
#> ── SC 1 ──
#> 
#> ✔ Always sufficient
#> Component causes:
#> • Emma is invited
#> • Laura is invited
#> • Birthday party takes place on a weekday
#> • Birthday party takes place at a karaoke bar
#> 
#> Modules
#> • guests: 40% (4/10)
#> • orga: 40% (4/10)
#> • food: 20% (2/10)
#> 
#> ── SC 2 ──
#> 
#> ✔ Always sufficient
#> Component causes:
#> • Ana is invited
#> • Emma is invited
#> • Laura is invited
#> • Birthday party takes place at a restaurant
#> 
#> Modules
#> • guests: 60% (6/10)
#> • orga: 30% (3/10)
#> • food: 10% (1/10)
#> 
#> ── SC 3 ──
#> 
#> ! Sufficiency depends on order of occurrence
#> Component causes:
#> • Ana is invited
#> • Emma is invited
#> • Laura is invited
#> • Birthday party takes place on a weekday
#> • No rain
#> • Birthday party takes place at the beach
#> 
#> Sufficient orders of occurrence:
#> • Ana is invited -> birthday party takes place on a weekday
#> 
#> Modules
#> • guests: 46% (6/13)
#> • orga: 38% (5/13)
#> • food: 15% (2/13)
#> 

In the output, the sufficiency status is displayed as first element of a sufficient cause (SC). For SC1 and SC2, the status is “Always sufficient”. The status of SC3, however, is reported as “Sufficiency depends on order of occurrence”. The reason is that the mechanism of SC3 contains the following step: IF Ana is invited and IFNOT birthday party takes place on a weekday THEN Ana is coming. We can see from the list of component causes in the output that both IF and IFNOT are fulfilled in SC3. Therefore, the algorithm checks, which orders of occurrence are sufficient for outcome occurrence and which are not. In this case, there are only two options:

  1. Ana is invited -> birthday party takes place on a weekday
  2. birthday party takes place on a weekday -> Ana is invited

In the output, below the list of component causes, the sufficient orders of occurrence are listed. Here only option 1 is sufficient, because “Ana is invited” is the IF condition and must occur before the IFNOT condition “birthday party takes place on a weekday”.

You probably noticed that in this example, these orders of occurrence do not make much sense. Even if Ana is invited before the host decides that the party takes place on a weekday, she still wouldn’t go. The problem occurs because the aforementioned assumption that IF instantly leads to THEN is violated. Therefore, you as the user need to evaluate if the orders of occurrence are plausible or not. epicmodel is able to notice some implausibilities and will report their presence in the output below the sufficiency status. However, even in this case there are always all possible orders of occurrence evaluated and reported and the user needs to discard implausible ones. In our birthday party example, we need to discard the only sufficient order of occurrence, which means that SC3 is actually not a sufficient cause! When creating causal pies, we can address this issue by specifying the remove_sc argument of plot().

plot(scc_party, remove_sc = 3)

Unknown causes

The final step of the aforementioned algorithm is called “Add unknown causes”. As described above, a SCC model must, in theory, contain all causes, component causes as well as sufficient causes, of the outcome of interest. In practice, knowing all causes is of course unrealistic. create_scc() therefore adds unknown causes as placeholders. Two types of unknown causes are added (see also ?new_scc):

You can decide to not include unknown causes in all relevant functions by setting unknown = FALSE, for example when plotting causal pies with plot().

Other functions for SCC models

epicmodel offers additional functions to inspect SCC models created by create_scc(). For functions that use or further process SCC models, see “Get started” (i.e., vignette("epicmodel")).

Inspect if sufficient causes contain certain steps

When printing SCC models, it is reported, which component causes are part of the sufficient causes. However, sometimes you might want to know if some other step is part of the mechanism that links component causes and outcome for sufficient causes. Use sc_contain_steps() to get the answer. When talking about the sufficiency status above, we were interested in the step IF Ana is invited and IFNOT birthday party takes place on a weekday THEN Ana is coming. So let’s double-check if it is actually only part of sufficient cause 3. The corresponding step ID, as we see from show_steps() is IFa5d1IFNOTa7d3e3THENa5d5.

scc_model %>% sc_contain_steps("IFa5d1IFNOTa7d3e3THENa5d5")
#> 
#> ── SC 1 ──
#> 
#> Component causes:
#> • Emma is invited
#> • Laura is invited
#> • Birthday party takes place on a weekday
#> • Birthday party takes place at a karaoke bar
#> 
#> ✖ SC1 does not contain step 'IF Ana is invited and IFNOT birthday party takes place on a weekday THEN Ana is coming' (IFa5d1IFNOTa7d3e3THENa5d5)
#> 
#> ── SC 2 ──
#> 
#> Component causes:
#> • Ana is invited
#> • Emma is invited
#> • Laura is invited
#> • Birthday party takes place at a restaurant
#> 
#> ✔ SC2 contains step 'IF Ana is invited and IFNOT birthday party takes place on a weekday THEN Ana is coming' (IFa5d1IFNOTa7d3e3THENa5d5)
#> 
#> ── SC 3 ──
#> 
#> Component causes:
#> • Ana is invited
#> • Emma is invited
#> • Laura is invited
#> • Birthday party takes place on a weekday
#> • No rain
#> • Birthday party takes place at the beach
#> 
#> ✔ SC3 contains step 'IF Ana is invited and IFNOT birthday party takes place on a weekday THEN Ana is coming' (IFa5d1IFNOTa7d3e3THENa5d5)

Actually, the step is part of both SC2 and SC3. This makes complete sense because, in contrast to SC1, Ana is invited both times and therefore the IF condition is fulfilled. Only SC3 has status “Sufficiency depends on order of occurrence” because the IFNOT condition is only fulfilled in SC3 but not in SC2.

Get component causes as list

If you want to retrieve the sets of component causes that form the sufficient causes as a list of vectors, you can use scc_cause_sets(). You can retrieve both step IDs as well as descriptions.

scc_model %>% scc_cause_sets(output = "desc")
#> $cc90
#> [1] "Start: Emma is invited"                            
#> [2] "Start: Laura is invited"                           
#> [3] "Start: Birthday party takes place on a weekday"    
#> [4] "Start: Birthday party takes place at a karaoke bar"
#> 
#> $cc103
#> [1] "Start: Ana is invited"                            
#> [2] "Start: Emma is invited"                           
#> [3] "Start: Laura is invited"                          
#> [4] "Start: Birthday party takes place at a restaurant"
#> 
#> $cc125
#> [1] "Start: Ana is invited"                         
#> [2] "Start: Emma is invited"                        
#> [3] "Start: Laura is invited"                       
#> [4] "Start: Birthday party takes place on a weekday"
#> [5] "Start: No rain"                                
#> [6] "Start: Birthday party takes place at the beach"

Check sufficiency for a set of component causes

Finally, with are_sufficient() you can check for a given SCC model if a certain set of component causes would lead to the outcome of interest, i.e., if any sufficient cause is fulfilled by your provided set. There are two types of output: type = "binary", returns TRUE or FALSE, while type = "status" returns one of “always”, “depends”, or “never”, depending on the sufficiency status of fulfilled sufficient causes. (Without specifying any causes, the function prints a list of all available ones in the console.)

scc_model %>% are_sufficient(c("THENa5d1","THENa4d1","THENa6d1","THENa7d3e4"), type = "status")
#> [1] "always"
scc_model %>% are_sufficient(c("THENa5d1","THENa4d1","THENa6d1","THENa7d3e4"), type = "binary")
#> [1] TRUE

References