Methods for “Who Talks the Most in Steven Universe?”

Reading in the File

The first step is to read in the Steven Universe Dialogue file into Python and get it into the right format to use later.

Code for reading in the file

f = open(text, ‘r’)
full = f.read()
full_list = full.split(‘\n’)
conds_raw = ‘\n’.join(full_list)
conds_raw_v2 = conds_raw.replace(‘:\n’, ‘: ‘)
conds_list = conds_raw_v2.split(‘\n’)
*disclaimer: if you copy and paste this code, make sure all quotations and apostrophes are in the right format

This brings the file in and turns it into a list where every line is either: 1) the name of the episode followed by the name of the writers who worked on the episode, or 2) the name of the character speaking in all caps, followed by a colon and then their line of dialogue. Example:

conds_list example

Split the file by Character

After reading in the file, since we  want to get the dialogue of individual characters, we need a function that takes as its input the name of a character, and returns all of that character’s dialogue. Here that function is called ‘char_dia()’ (short for ‘character dialogue’).

Code for char_dia function

def char_dia(name):
dia = [w for w in conds_list if name in w or w in titles]
dia_raw = ‘\n’.join(dia)
dia_raw_v2 = dia_raw.replace(‘: ‘, ‘:\n’)
dia_list = dia_raw_v2.split(‘\n’)
colon = [w for w in dia_list if ‘:’ in w]
char = [w for w in dia_list if name in w and w not in titles]
dia_list = [w for w in dia_list if w not in char]
dia_list = [w for w in dia_list if w not in colon]
episodes = dia_list
newlist = []
for item in dia_list:
if item in titles:
item = ‘\e’
newlist.append(item)
episodes = newlist
episodes_raw = ‘ ‘.join(episodes)
ep_split = episodes_raw.split(‘\e ‘)
ep_empty = [w for w in ep_split if w == ”]
ep_split = [w for w in ep_split if w not in ep_empty]
wrong = [w for w in ep_split if w == ‘\\e’]
ep_split = [w for w in ep_split if w not in wrong]
return(ep_split)
*disclaimer: if you copy and paste this code, make sure all quotations and apostrophes are in the right format

More in-depth explanation of the char_dia function here.

When a character’s name is input into the function, the output will be a list where every element of the list is that character’s dialogue in one episode.  So the first element in a list after inputting Steven’s name will be his dialogue from Gem Glow, the first episode he appeared in. Example:

steven_dia[0]

*Note that in Python, ‘0’ indexes the first element in a list, not 1.

Performing the Test

Now that we can quickly split the text by character, we need to figure out how many words are being spoken. Python’s built in len() function gives the number of elements in a list, so we need a function that takes a list like steven_dia and turns it into a list of lists, and prints the length of each element.

Code for printing the lengths

def find_len(text):
split = [w.split(‘ ‘) for w in text]
for w in split:
empt = [i for i in w if i == ”]
no_empt = [i for i in w if i not in empt]
print(len(no_empt))
*disclaimer: if you copy and paste this code, make sure all quotations and apostrophes are in the right format

Inputing steven_dia into this function prints the number of words Steven speaks in each individual episode he appears in.  After repeating this process for each character, we now know how many words each character has spoken for each episode.