DescriptiveRepresentation is an R package
for measuring descriptive representation in political bodies. It
implements key functions from Gerring, Jerzak, and Oncel (2024),
offering an accessible approach to modeling:
The package therefore provides four main functions:
ExpectedRepresentation()ObservedRepresentation()SDRepresentation()RelativeRepresentation()Each function measures a slightly different concept linked to the ideas in Gerring, Jerzak, and Oncel (2024). In this vignette, we show how to install and use these functions, illustrate a few worked examples, and discuss conceptual underpinnings relevant to descriptive representation. Installation
To install and load the
DescriptiveRepresentationCalculator package, run:
How well do political bodies reflect the demographic features of the population they serve? This question is at the heart of descriptive representation, and the work of Gerring, Jerzak, and Oncel (2024) offers a systematic way to measure and analyze this phenomenon.
The approach builds on the Rose Index of Proportionality that captures how far a political body’s group shares deviate from the population’s group shares. Concretely: \[ R = 1 - \frac{1}{2} \sum_{k=1}^{K} |g_{p_k} - G_{b_k}| \] where \(g_{p_k}\) is the population share of group \(k\), and \(G_{b_k}\) is that group’s share in the body of interest \(b\). The index ranges from 0 (no descriptive representation) to 1 (perfect descriptive representation).
Note that there are a range of other possible weighting factors in
the equation. In the package, the default parameters
(a = -0.5 and b = 1), producing the Rose
Index. The user can modify the a and b
parameters to fit other affine transformations of the underlying
absolute deviations: \[
R = b + a \sum_{k=1}^{K} |g_{p_k} - G_{b_k}|
\] The Rose Index has nice theoretical properties in that it is
bounded between 0 (total representation mismatch) and 1 (complete match
between elite and population group shares).
One of the insights of Gerring, Jerzak, and Oncel (2024) is to compare observed descriptive representation to what would be expected under a random sampling model—if individuals (or group shares) were randomly drawn into the political body.
This expected value establishes a baseline: how much shortfall or surplus we might attribute purely to compositional factors (like the body’s size or the population’s group diversity).
Divergences from this random baseline can reveal additional, potentially systematic, sources of under- or over-representation.
ExpectedRepresentation(). This gives you the theoretical
baseline if seats or positions were allocated proportionally by chance
given the group shares.ObservedRepresentation(). This takes actual data on who
occupies each seat (or the observed group shares in the body) and
compares it to population-level group shares.RelativeRepresentation() to directly compute the
difference between observed and expected representation, optionally
standardized by the variability under random sampling.SDRepresentation() to assess how much
variation is left unexplained by the random sampling model.The function ExpectedRepresentation() computes the
expected level of representation (the “expected Rose Index”) under a
random sampling model:
ExpectedRepresentation(
PopShares,
BodyN,
a = -0.5,
b = 1
)
Arguments:
`PopShares`: Numeric vector of group-level population proportions (e.g., `c(0.25, 0.5, 0.25)`).
`BodyN`: Integer, the size of the political body in question (e.g., `50L`).
`a`, `b`: (Optional) Affine transformation parameters. By default, `a=−0.5`,`b=1` (for the Rose Index).
Returns: A single numeric value representing the expected representation score.
# Suppose the population is split into 3 groups: 25%, 50%, 25%.
# We have a political body (say, a legislature) of size 50.
PopShares_example <- c(1/4, 2/4, 1/4)
BodySize_example <- 50
ExpectedRep <- ExpectedRepresentation(
PopShares = PopShares_example,
BodyN = BodySize_example
)
ExpectedRep## [1] 0.9227715
In many settings, this expected value serves as the baseline to which we compare actual data. Larger bodies and more homogenous populations will tend to have higher expected representation scores under the random sampling model.
To compare theory to reality, we compute the observed representation of any group in a political body:
ObservedRepresentation(
BodyMemberCharacteristics = NULL,
PopShares,
BodyShares = NULL,
a = -0.5,
b = 1
)
Arguments:
BodyMemberCharacteristics: A vector describing group
identities for each member of the body. If supplied, the function
automatically calculates the group share distribution.PopShares: Numeric vector of population-level group
proportions (with names matching those in
BodyMemberCharacteristics).BodyShares: (Optional) A numeric vector with the same
structure as PopShares that directly specifies each group’s share in the
body. If not NULL, overrides BodyMemberCharacteristics. a,
b: Affine transformation parameters, defaulting to
(−0.5,1).Returns: A single numeric value for the observed representation score.
# Observed scenario: A 6-seat body with members: "A", "A", "C", "A", "C", "A"
# The population shares are: A=1/4, B=2/4, C=1/4.
ObsRep <- ObservedRepresentation(
BodyMemberCharacteristics = c("A","A","C","A","C","A"),
PopShares = c("A"=0.25, "B"=0.50, "C"=0.25)
)
ObsRep## [1] 0.5
If group "B" had no seats here, we’d expect a larger
observed discrepancy from the population’s proportions, lowering the
representation score.
Finally, SDRepresentation() estimates the extent to
which the observed representation can vary around its expected value
under random sampling. It performs Monte Carlo simulations, drawing
random compositions of the body and re-computing the representation
score each time:
SDRepresentation(
PopShares,
BodyN,
a = -0.5,
b = 1,
nMonte = 10000
)
Arguments:
PopShares: Numeric vector of group-level population
proportions.BodyN: Size of the political body.a, b: Affine transformation
parameters.nMonte: Number of Monte Carlo draws used to approximate
the variance.Returns: A single numeric value summarizing how much representation fluctuates (in standard deviation units) around the expected representation under a random selection model.
The function RelativeRepresentation() computes the
difference between observed and expected representation, providing a
direct measure of how a body deviates from the random sampling baseline.
It can optionally standardize this difference by the standard deviation
under the random sampling model:
RelativeRepresentation(
BodyMemberCharacteristics,
PopShares,
a = -0.5,
b = 1,
standardize = FALSE,
nMonte = 10000
)
Arguments:
BodyMemberCharacteristics: A vector specifying
characteristics for each member of a political body.PopShares: Numeric vector of population group
proportions. Names must correspond to identities in
BodyMemberCharacteristics.a, b: Affine transformation parameters
(default: Rose Index with a = -0.5,
b = 1).standardize: Logical. If TRUE, the
difference is divided by the standard deviation under random
sampling.nMonte: Number of Monte Carlo iterations for estimating
the standard deviation when standardize = TRUE.Returns: A scalar giving the difference between observed and expected
representation. If standardize = TRUE, this difference is
expressed in standard deviation units.
# Same body and population as before
BodyMembers <- c("A", "A", "C", "A", "C", "A")
PopShares_example <- c("A" = 0.25, "B" = 0.50, "C" = 0.25)
# Compute relative representation (observed - expected)
RelRep <- RelativeRepresentation(
BodyMemberCharacteristics = BodyMembers,
PopShares = PopShares_example
)
RelRep## [1] -0.2735596
#> Prints how much observed representation differs from expected
# Standardized version (in SD units)
RelRep_std <- RelativeRepresentation(
BodyMemberCharacteristics = BodyMembers,
PopShares = PopShares_example,
standardize = TRUE
)
RelRep_std## [1] -2.403115
A positive value indicates the body is more representative than expected under random sampling; a negative value indicates less representation than expected. When standardized, values beyond ±2 suggest the observed representation is unlikely under the random sampling model alone.
Expected Representation (ExpectedRepresentation()) helps
analysts understand the baseline level of representation when selection
is effectively random.
Conversely, Observed Representation
(ObservedRepresentation()) is the real-world result,
showing how close or far a body’s membership is from the population
distribution.
SDRepresentation (SDRepresentation()) quantifies how
much randomness alone could explain variation in representation,
shedding light on when observed deviations might be plausibly attributed
to other (non-random) factors like institutional rules or
discrimination.
Relative Representation (RelativeRepresentation())
directly compares observed to expected representation, providing a
single measure of deviation from the random sampling baseline. When
standardized, it expresses this deviation in terms of standard
deviations, enabling cross-body or cross-country comparisons on a common
scale.
The DescriptiveRepresentation package operationalizes
key ideas about descriptive representation from Gerring, Jerzak, and
Oncel (2024). By offering easy-to-use functions for measuring expected,
observed, and residual variance in representation, the package helps
scholars, analysts, and policymakers investigate how factors like body
size and population diversity shape the composition of political bodies
worldwide.
We hope this vignette gets you started! For any questions or feedback, feel free to open an issue on our GitHub repository.
@article{gerring2024composition,
title={The Composition of Descriptive Representation},
author={Gerring, John and Connor T. Jerzak and Erzen \"{O}ncel},
journal={American Political Science Review},
year={2024},
volume={118},
number={2},
pages={784-801}
}