This vignette demonstrates how to use the functions provided in the
conversation_multidyads.R
file to analyze conversations
across multiple dyads. These functions allow you to preprocess
conversation data and calculate various similarity measures between
conversation participants.
We’ll use the provided dataset “dyad_example_data.Rdata” located in the inst/extdata directory of the package:
data_path <- system.file("extdata", "dyad_example_data.Rdata", package = "conversim")
load(data_path)
# Display the first few rows and structure of the data
head(dyad_example_data)
#> # A tibble: 6 × 3
#> dyad_id speaker_id text
#> <dbl> <chr> <chr>
#> 1 1 A What did you think of the new movie that just came out?
#> 2 1 B I haven’t seen it yet. Which one are you referring to?
#> 3 1 A The latest superhero film. I heard it’s getting great revi…
#> 4 1 B Oh, that one! I’ve been meaning to watch it. Did you enjoy…
#> 5 1 A Yes, I thought it was fantastic. The special effects were …
#> 6 1 B Really? What about the storyline? I heard it’s a bit predi…
str(dyad_example_data)
#> tibble [532 × 3] (S3: tbl_df/tbl/data.frame)
#> $ dyad_id : num [1:532] 1 1 1 1 1 1 1 1 1 1 ...
#> $ speaker_id: chr [1:532] "A" "B" "A" "B" ...
#> $ text : chr [1:532] "What did you think of the new movie that just came out?" "I haven’t seen it yet. Which one are you referring to?" "The latest superhero film. I heard it’s getting great reviews." "Oh, that one! I’ve been meaning to watch it. Did you enjoy it?" ...
Before analyzing the conversations, we need to preprocess the text data:
processed_convs <- preprocess_dyads(dyad_example_data)
head(dyad_example_data)
#> # A tibble: 6 × 3
#> dyad_id speaker_id text
#> <dbl> <chr> <chr>
#> 1 1 A What did you think of the new movie that just came out?
#> 2 1 B I haven’t seen it yet. Which one are you referring to?
#> 3 1 A The latest superhero film. I heard it’s getting great revi…
#> 4 1 B Oh, that one! I’ve been meaning to watch it. Did you enjoy…
#> 5 1 A Yes, I thought it was fantastic. The special effects were …
#> 6 1 B Really? What about the storyline? I heard it’s a bit predi…
Now, let’s calculate various similarity measures for our preprocessed conversations.
Let’s visualize the results of our similarity analyses using ggplot2. Here’s an example of how to plot the topic similarity for each dyad:
topic_sim_df <- data.frame(
dyad = rep(names(topic_sim$similarities_by_dyad),
sapply(topic_sim$similarities_by_dyad, length)),
similarity = unlist(topic_sim$similarities_by_dyad),
index = unlist(lapply(topic_sim$similarities_by_dyad, seq_along))
)
ggplot(topic_sim_df, aes(x = index, y = similarity, color = dyad)) +
geom_line() +
geom_point() +
facet_wrap(~dyad, ncol = 2) +
labs(title = "Topic Similarity Across Dyads",
x = "Conversation Sequence",
y = "Similarity Score") +
theme_minimal() +
theme(legend.position = "none")
We can also compare different similarity measures across dyads:
comparison_df <- data.frame(
dyad = names(topic_sim$similarities_by_dyad),
topic = sapply(topic_sim$similarities_by_dyad, mean),
lexical = sapply(lexical_sim$similarities_by_dyad, mean),
semantic = sapply(semantic_sim$similarities_by_dyad, mean),
structural = unlist(structural_sim$similarities_by_dyad),
stylistic = sapply(stylistic_sim$similarities_by_dyad, mean),
sentiment = sapply(sentiment_sim$similarities_by_dyad, mean),
participant = unlist(participant_sim$similarities_by_dyad),
timing = unlist(timing_sim$similarities_by_dyad)
)
comparison_long <- reshape(comparison_df,
varying = list(names(comparison_df)[names(comparison_df) != "dyad"]),
v.names = "similarity",
timevar = "measure",
times = names(comparison_df)[names(comparison_df) != "dyad"],
new.row.names = 1:10000, # Adjust this if needed
direction = "long")
ggplot(comparison_long, aes(x = measure, y = similarity, fill = measure)) +
geom_boxplot() +
labs(title = "Comparison of Similarity Measures Across Dyads",
x = "Similarity Measure",
y = "Similarity Score") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This vignette demonstrates how to use the functions in
conversation_multidyads.R
to analyze conversations across
multiple dyads using real-world data. These tools allow researchers to
examine various aspects of conversation dynamics, including topic
coherence, lexical alignment, semantic similarity, and more.
The visualizations provide insights into how different similarity measures vary across dyads and how they compare to each other. This can help in identifying patterns or trends in conversational dynamics.
Remember that the effectiveness of these analyses may depend on the size and nature of your dataset. Always consider the context of your conversations and the limitations of each similarity measure when interpreting the results.
For further analysis, you might consider: