Thank you for your interest in the grand
package! This
vignette illustrates how to use this package to describe a network
following the Guidelines for Reporting
About Network Data
(GRAND).
The grand
package can be cited as:
Neal, Z. P. (2023). grand: An R package for using the Guidelines for Reporting About Network Data. GitHub. GitHub. https://github.com/zpneal/grand/
If you have questions about the grand package, please contact the maintainer Zachary Neal by email (zpneal@msu.edu) or via Mastodon (@zpneal@mastodon.social). Please report bugs in the backbone package at https://github.com/zpneal/grand/issues.
Networks can represent a wide range of social and natural phenomena at many different scales, and so network data is often quite diverse. Additionally, methods for analyzing networks have evolved over several decades across multiple disciplines. As a result, different researchers, from different disciplinary backgrounds, studying networks representing different things, often describe their network data in very different (and sometimes incomplete) ways.
The Guidelines for Reporting About Network Data (GRAND) are an attempt to establish some basic reporting standards that can help facilitate consistent and complete description of networks in research publications, presentations, and data repositories. GRAND aims to be neutral with respect to discipline, method, and content, and therefore focuses only on a limited number of fundamental characteristics that are relevant for all networks: What does it represent? When and where did it come from? How is it measured?
The backbone package can be loaded in the usual way:
library(grand)
#> +-------+ grand v0.9.0
#> | GRAND | Cite: Neal, Z. P., (2023). grand: An R package for using the Guidelines for
#> | ~~~~~ | Reporting About Network Data. GitHub. https://github.com/zpneal/grand/
#> | ~~~~~ |
#> | ~~~~~ | Help: type vignette("grand"); email zpneal@msu.edu; github zpneal/grand
#> +-------+ Beta: type devtools::install_github("zpneal/grand", ref = "devel")
Upon successful loading, a startup message will display that shows the version number, citation, ways to get help, and ways to contact us.
The package offers three basic functions:
grand()
interactively queries the user about a
network, and saves the responses as graph attributes.
grand.text()
writes a uniform narrative description
of the network.
grand.table()
plots a uniform tabular description of
the network, in the style of a US Nutrition Label.
Because the goal of GRAND is to bring consistency and uniformity to the description of network data, these functions offer relatively few options. They are designed to provide a minimal uniform description that is suitable for any network and any context, which users can supplement with additional network-specific and context-specific details.
The grand()
function applies GRAND to a network stored
as an igraph object. It offers in interactive mode that guides
the user through GRAND by asking a series of questions, and a
non-interactive mode that allows the user to directly specify
GRAND attributes. This section illustrates grand()
with the
example airport
data, which is a weighted and directed
network of passenger air traffic in the United States in 2019, and which
can be loaded using:
data(airport)
To interactively add GRAND attributes, use:
airport <- grand(airport)
This graph already contains a GRAND attribute. Do you want to overwrite (Y/N)?
1: Y
What is the name of this network (enter NA if unnamed)?
1: US Air Traffic Network
What DOI is associated with this network (enter NA if unnamed)?
1: 10.1371/journal.pone.0269137
How were these data collected or generated?
1: Survey
2: Interview
3: Sensor
4: Observation
5: Archival
6: Simulation
7: Other
Selection: 5
In what year were these data collected?
1: 2019
This network contains 382 nodes. What type of entity do these represent (e.g., people)?
1: Airports
This code block illustrates the first several questions, and appropriate responses, and they would appear in the interactive mode.
The first set of interactive questions ask about the data as a whole:
name - What is the name of the network? This should usually be specified ending with the word “network” or “data” (e.g. “Florentine Families Network” or “Airline Traffic Data”).
doi - What is the DOI associated with the network? This could be a DOI for the data itself (e.g., if it is available online), or could be the DOI for a manuscript describing the data.
Data collection mode - How were these data collected or
generated. Chose one of the available options (Survey, Interview,
Sensor, Observation, Archival, or Simulation) or choose
Other
to enter something else.
year - In what year were the data collected?
The second set of interactive questions ask about the nodes or vertices:
vertex1 (and in bipartite graphs, vertex2) - What type of entity do the nodes/vertices represent? This should be specified as a plural noun (e.g., “People”).
vertex1.total (and in bipartite graphs, vertex2.total) - Networks often have an externally-defined boundary that determines which nodes/vertices should be included, even if some are missing from the network. If the network has a boundary: How many entities are included in the network’s boundary. This is used to compute rates of missingness (e.g. a classroom contained 20 children, but only 18 provided network data; 10% node missingness).
The third set of interactive questions ask about the edges:
edge.pos (and in signed graphs, edge.neg) - What type of relationship do the edges represent? This should be specified as a plural noun (e.g., “Friendships”).
weight - What do the edge weights represent? There are
four default options: Frequency (how often), Intensity (how strong),
Multiplexity (how many), or Valence (positive or negative). Choosing
Other
prompts to enter another option.
measure - How are the edge weights measured? There are
four defauly options:Continuous, Count, Ordinal, or Categorical.
Choosing Other
prompts to enter another option.
The final interactive question asks about relevant topological characteristics. Some topological characteristics are reported by default (depending on the type of network), however it is possible to request that additional topological characteristics are also reported. The available topological characteristics include:
clustering coefficient - Computed using
transitivity(G, type = "localaverage")
degree centralization - Computed using
centr_degree(G)$centralization
degree distribution - Computed using
fit_power_law(degree(G), implementation = "plfit")
density - Computed using
edge_density(G)
diameter - Computed using
diameter(G)
efficiency - Computed using
global_efficiency(G)
mean degree - Computed using
mean(degree(G))
modularity - Computed from a partition generated by
cluster_leiden(G, objective_function = "modularity")
number of communities - Computed from a partition
generated by
cluster_leiden(G, objective_function = "modularity")
number of components - Computed using
count_components(G)
transitivity - Computed using
transitivity(G, type = "global")
structural balance - Computed using the triangle index
It is also possible to add GRAND attributes directly, using:
<- grand(airport, interactive = FALSE, #Apply GRAND non-interactively
airport name = "US Air Traffic Network",
doi = "10.1371/journal.pone.0269137",
vertex1 = "Airports",
vertex1.total = 382,
edge.pos = "Routes",
weight = "Passengers",
measure = "Count",
mode = "Archival",
year = "2019",
topology = c("clustering coefficient", "mean path length", "degree distribution"))
Using the non-interactive mode requires knowing which parameters to
specify given the type of network and knowing what values to supply each
parameter. For example, here we specify the vertex1
parameter but not the vertex2
parameter because this is a
unipartite network with only one type of node. Similarly, we supply
"Archival"
as the value for the mode
parameter
because this parameter records the mode of data collection. These issues
are automated in the interactive mode, however the non-interactive mode
offers greater flexibility and the ability to add GRAND attributes when
running R scripts.
The grand.text()
function writes a complete and uniform
narrative description of the network. For example:
grand.text(airport)
#> [1] "The US Air Traffic Network is a directed and weighted network that represents 382 airports connected by 16095 routes. All airports included in the network's boundary are represented as nodes (i.e., no node missingness). The edges are weighted by passengers, which was measured on a count scale. These data were collected in 2019 using archival methods. The network's clustering coefficient is 0.711. The network's mean path length is 2.093. Fitting a power law to this network's degree distribution implies that k^-10.091 for k >= 146. This network is described in 10.1371/journal.pone.0269137."
The grand.table()
function plots writes a complete and
uniform tabular description of the network, in the style of a US
Nutrition Label. For example:
grand.table(airport)
A table can be exported for use in a paper or presentation using a graphics device. For example:
pdf("grand.pdf", width = 3.5, height = 4)
grand.table(airport)
dev.off()
The example cosponsor
data is a bipartite network
representing US Senators’ (co-)sponsorship of Senate Bills during the
116th session (2019-2020). It can be loaded, and narrative and tabular
summaries can be obtained, using:
data(cosponsor)
grand.text(cosponsor)
#> [1] "The US Senate Co-Sponsorship Network is a undirected and unweighted network that represents 102 senators and 5086 bills connected by 35166 sponsorships. All senators and senators included in the network's boundary are represented as nodes (i.e., no node missingness). These data were collected in 2021 using archival methods. The network's mean degree is 13.557. This network is described in 10.2478/connections-2019.026 and is available from https://osf.io/kjgrz/."
grand.table(cosponsor)
The example senate
data is a signed network representing
US Senators’ representing US Senators’ alliances and antagonisms during
the 116th session (2019-2020). It can be loaded, and narrative and
tabular summaries can be obtained, using:
data(senate)
grand.text(senate)
#> [1] "The US Senate Network is a undirected and signed network that represents 102 senators connected by 1539 alliances and 2339 antagonisms. All senators included in the network's boundary are represented as nodes (i.e., no node missingness). These data were collected in 2021 using backbone methods. The network's degree of balance is 0.94. This network is described in 10.2478/connections-2019.026 and is available from https://osf.io/kjgrz/."
grand.table(senate)
The grand
package includes a couple exported utility
functions that are extensions of user input functions in base
R
. These functions are used by grand()
to
interactively query the user about the supplied igraph
object.
scan2()
The scan2()
function is an extension of
scan()
that allows the specification of a required input
format using the type
parameter. Currently four input types
are allowed: character
, numeric
,
integer
, or a vector of allowable responses. For
example:
#Requiring an integer input
scan2(prompt = "Enter an integer", type = "integer")
Enter an integer
1: q
Please enter an integer.
1: 2.5
Please enter an integer.
1: 1
[1] 1
#Requiring an input from a vector of possibilities
> scan2(prompt = "Do you like this function (Y/N)?", type = c("Y", "N", "y", "n"))
Do you like this function (Y/N)?
1: 3
Please enter one of these options: Y N y n
1: yes
Please enter one of these options: Y N y n
1: Y
[1] "Y"