Tea, Earl Grey, Hot

My submission for #TidyTuesday 2021 week 34. A look at StarTrek TNG voice interactions with the Enterprise’s computer. In this submission I focus on the ‘locate’ command to find someone on the ship.

tidy tuesday
R
RegEx
Author
Published

August 26, 2021

2022-08-12
blog post ported to quarto
2022-08-10
this post was featured on Data Is Plural, a weekly newsletter on open datasets by Jeremy Singer-Vine
2023-11-01
removed the table created with the reactable package to avoid possible loading of a vulnerable package. See this dependabot security alert for details.

The Task

This week’s #TidyTuesday dataset is all about Star Trek The Next Generation.

In particular, the data collected by www.speechinteraction.org/TNG/ is about voice interactions of the characters with the ship’s computer. While the dataset comprises all kinds of voice interactions (questions, commands and other utterances) I focus on the ‘locate-command’ alone.

With it, characters can locate other people on the ship, if they are looking for them.1

The way

Setup of the environment

First let’s load the required packages:

Code
library("tidyverse")
library("ggrepel")
library("ggdark")
library("showtext")
library("reactable")

Then we need to setup the custom fonts for the plot. In this post I do not load the jolly_theme.R2.

The Star Trek related fonts come with the {trekfont}-package.

Code
font <- c("StarNext", "TNGcast")
path <- system.file(paste0("fonts/", font, ".ttf"), package = "trekfont")
for (i in 1:2) font_add(font[i], path[i])
font_add_google("Open Sans")
font_families()
#> [1] "sans"         "serif"        "mono"         "wqy-microhei" "StarNext"    
#> [6] "TNGcast"      "Open Sans"
Code
showtext_auto()

Data preparation

It all begins with the download of the #TidyTuesday dataset from github:

Code
#load the data and store locally for future runs of the code
tuesdata <- tidytuesdayR::tt_load(2021, week = 34)
#> 
#>  Downloading file 1 of 1: `computer.csv`
Code
computer <- tuesdata$computer

Count the voice commands

Counting the number of location-commands is quite easy, as the dataset contains a column specifying who issues the command:

Code
# count how often a character located someone else
searches_by_people <- computer %>%
  # ignore interactions by the computer and ignore the Wake Word "Computer" itself
  filter(sub_domain == "Locate", !str_detect(char, pattern = "[Cc]omputer"), type != "Wake Word")

head(searches_by_people) |> 
  knitr::kable()
name char line direction type pri_type domain sub_domain nv_resp interaction char_type is_fed error value_id
102 Young Ensign You must be new to these galaxy class starships sir. (puts hand on the black surface saying) Tell me the location of Commander Data. At the touch and the words ‘Tell me’ the black surface comes alive with light patterns showing appropriate information. Command Command InfoSeek Locate FALSE Tell me the location of Commander Data. Person TRUE FALSE 421
110 Riker Computer tell me Captain Picard’s location! NA Command Command InfoSeek Locate FALSE Computer tell me Captain Picard’s location! Person TRUE FALSE 326
116 Data Computer where are the captain and Commander Riker? NA Question Question InfoSeek Locate FALSE Computer where are the captain and Commander Riker? Person TRUE FALSE 236
116 Picard (beat thinks) Locate Lieutenant Commander Data. NA Command Command InfoSeek Locate FALSE Locate Lieutenant Commander Data. Person TRUE FALSE 289
126 Ralph Ah… let’s see. (to computer) Ah… I want to go to a… the… ah… (he shrugs; then to himself:) Where would the captain be? To his astonishemnt the computer answers: Question Question InfoSeek Locate FALSE Ah… let’s see. (to computer) Ah… I want to go to a… the… ah… (he shrugs; then to himself:) Where would the captain be? Person TRUE FALSE 367
126 Ralph Ah… let’s see. (to computer) Ah… I want to go to a… the… ah… (he shrugs; then to himself:) Where would the captain be? To his astonishemnt the computer answers: Question Question IoT Locate FALSE Ah… let’s see. (to computer) Ah… I want to go to a… the… ah… (he shrugs; then to himself:) Where would the captain be? Person TRUE FALSE 367
Code
searches_by_people_count <- searches_by_people %>%
  count(char, sort = TRUE) %>%
  mutate(
    char = str_to_lower(char),

    # use last name for later joining
    char = ifelse(char == "beverly", "crusher", char),
    char = ifelse(char == "geordi", "la forge", char)
  )

searches_by_people_count |> 
  knitr::kable()
char n
picard 18
riker 9
data 6
crusher 4
worf 4
mrs. troi 3
la forge 2
ralph 2
troi 2
young ensign 1

Checking, how often someone is being looked for is not as straight forward. Due to time limitations I took a shortcut and compromised possible mis-counts. I basically filter the voice commands for occurrences of the main characters’ names.

As the number of rows with location commands is <100 I skimmed the commands for the names used to locate people and put these in a vector.

Code
# Define People of interest (this is not a complete cast list, but the result of skimming ~90 entries)
people <- str_to_lower(c(
  "data", "picard", "captain", "riker", "pulaski", "Goss", "Tam Elbrun", "Barclay",
  "Dalen Quaice", "Hill and Selar", "Worf", "La Forge", "Vash", "Diana", "Troi", "Crusher", "Ensign Ro",
  "Alexander Rozhenko", "Uhnari", "Morag"
))

# Create a Regex pattern by collapsing the vector with the "or" operator
people_pattern <- paste0(people, collapse = "|")


people_searched <- searches_by_people %>%
  mutate(
    # make the interactions strings to lower case
    interaction_lower = str_to_lower(interaction),

    # reduce the interactions strings to the searched person
    # e.g. from "computer, locate commander riker" --> "riker" is extracted.
    # Caution: This is not the best / generalizable way, but a rather hacky approach
    # due to limited time. It works for this use case / dataset.
    person_of_interest = str_extract(interaction_lower, pattern = people_pattern)
  ) %>%
  select(interaction, person_of_interest) %>%
  filter(!is.na(person_of_interest)) %>%
  count(person_of_interest, sort = TRUE) %>%
  mutate(person_of_interest = ifelse(person_of_interest == "captain", "picard", person_of_interest))

people_searched |> 
  knitr::kable()
person_of_interest n
picard 7
data 5
la forge 5
riker 4
barclay 3
worf 3
alexander rozhenko 2
dalen quaice 2
troi 2
crusher 1
ensign ro 1
goss 1
hill and selar 1
morag 1
pulaski 1
tam elbrun 1
vash 1

Enriching the dataset

I created a csv containing the glyphs used for the characters of interest in the TNGcast-font. In Addition I took the appropriate Federation Uniform Colors from the {trekcolors} package.

Code
relevant <- read_csv2("res/relevant.csv", col_names = TRUE)

relevant |> 
  filter(!is.na(char)) |> 
  knitr::kable()
char char_label char_col
Alexander Rozhenko a #CCCCCC
Crusher c #1A6384
Data d #AD722C
Picard j #5B1414
La Forge l #AD722C
Pulaski p #1A6384
Riker r #5B1414
Troi t #582f5e
Worf w #AD722C
Barclay z #AD722C

As last step before plotting the data is combined:

Code
whereabouts <- searches_by_people_count %>%
  full_join(people_searched, by = c("char" = "person_of_interest")) %>%
  rename(searching = n.x, searched = n.y) %>%
  mutate(char = str_to_title(char)) %>%
  replace_na(list(searching = 0L, searched = 0L)) %>%
  inner_join(relevant, by = "char")

The result

Now, that the data has been prepared the plot can be drawn.

Code
whereabouts %>%
  ggplot(aes(searching, searched)) +
  geom_point(size = 3) +
  geom_label_repel(
    aes(label = char_label, color = char_col),
    box.padding = 0.5,
    label.padding = 0.5,
    max.time = 1,
    max.iter = 100000,
    family = "TNGcast",
    size = 30
  ) +
  labs(
    title = "Where is Captain Picard?",
    subtitle = "How often did Characters in 'StarTrek TNG' ask the computer to locate someone on the Starship Enterprise\nvs. how often are they being located via the computer.\n",
    x = "Times searching someone",
    y = "Times being searched",
    caption = "\n@c_gebhard | #TidyTuesday Week 34 (2021)\nData source: http://www.speechinteraction.org/TNG/"
  ) +
  coord_trans(x = "sqrt", y = "sqrt") +
  scale_x_continuous(breaks = c(0:6, 10, 15, 18)) +
  scale_y_continuous(breaks = c(1:7)) +
  scale_color_identity() +
  dark_theme_minimal() +
  theme(
    plot.title = element_text(
      family = "StarNext",
      face = "bold",
      size = rel(3),
      hjust = 0,
      vjust = 5
    ),
    plot.subtitle = element_text(
      family = "Open Sans",
      size = rel(1.3),
      hjust = 0
    ),
    plot.caption = element_text(
      size = rel(1.1),
      face = "italic",
      hjust = 1
    ),
    plot.caption.position = "plot",
    plot.margin = margin(1.5, 0.4, 0.4, 0.4, unit = "cm"),
    axis.title = element_text(
      face = "bold",
      size = rel(1.3)
    ),
    axis.title.x = element_text(margin = margin(t = 15, r = 0, b = 0, l = 0)),
    axis.title.y = element_text(margin = margin(t = 0, r = 15, b = 0, l = 0), angle = 90),
    axis.text = element_text(
      size = rel(1.3)
    )
  )

Plot title: 'Where is Captain Picard?' A scatterplot showing how often Characters in Star Trek The Next Generation are using a voice command command to find someone via the ship's computer vs how often they are being located by someone else via the computer. The plot shows, that Captain Picard is the character who is searched for most often, but also the one using the locate command most often.

Characters in StarTrek The Next Generatio (TNG) frequently interact with the shep’s computer via voice commands. One of the computer’s functions is to locate a person on the ship. Within the speechinteractions.org/TNG/ dataset, these ‘locate-commands’ were filtered and analysed. The characters are plotted in regard to how often the used the ‘locate-command’ to find someone vs. how often they are being located via the computer.
Code
ggsave("tt21-34_picard.png", dpi = 96, height = 8, width = 10)

Note that the “officially” submitted plot3 differs from the one above. To meet the deadline I submitted a simpler version with a simple scatterplot.

Comments

Being a Star Trek fan I really enjoyed working on the dataset. In this post I shared what I learned in regard to custom fonts and using the {reactable} package. I hope it was informative to read. If there’s something missing, let me know:

Footnotes

  1. Handy, if you really need to find someone at any time, yet kind of creepy if you think about it.↩︎

  2. check out this post↩︎

  3. aka the tweeted version↩︎

Reuse

Citation

BibTeX citation:
@online{gebhard2021,
  author = {Gebhard, Christian},
  title = {Tea, {Earl} {Grey,} {Hot}},
  date = {2021-08-26},
  url = {https://christiangebhard.com/posts/2021-08-22-tea-earl-grey-hot-tidytuesday-2021-week-34/tea-earl-grey-hot-tidytuesday-2021-week-34.html},
  langid = {en}
}
For attribution, please cite this work as:
Gebhard, Christian. 2021. “Tea, Earl Grey, Hot.” August 26, 2021. https://christiangebhard.com/posts/2021-08-22-tea-earl-grey-hot-tidytuesday-2021-week-34/tea-earl-grey-hot-tidytuesday-2021-week-34.html.