Generating Hamilton-style Rap Lyrics with Markov Chains

Awesome. Wow.

·

7 min read

Since its initial release in 2015, Hamilton has become one of the most popular musicals of all time. The rap lyrics are undeniably catchy and well-written, so why not create some more?

This is my first (real) post. Wishing myself luck.

Markov Chains

Markov chains are a type of pseudo-AI that attempts to generate short lines of text based on a large dataset of lines that it learns from. By analyzing word frequencies and patterns of the entire script from Hamilton, the chain attempts to create its own lyrics similar to that of the musical.

Unlike neural networks or standard machine learning, Markov chains aren't too creative or predictable, but they're more simple to work with and easier to replicate in Python.

Probability and State Machines

Technically defined, Markov chains are state machines that have a finite number of states, or in this case individual words. For each state, there exists a dataset of probabilities that defines what the next word is. These probabilities are created by looking at patterns of words in the original corpus of text.

Let's look at an example word.

"Alexander"

  • 75%: "Hamilton"
  • 15%: end of line
  • 5%: "I'll"
  • 5%: "come"

These probabilities were created by reading the entire Hamilton text. The model found that most of the time, "Hamilton" comes after the word "Alexander", but there are also small chances for other words like "I'll" or "come". 15 percent of the time, "Alexander" is the end of the line, so there is no following word.

When generating lines, the model will choose randomly from these possibilities for the next word. Each of the states in the list above has its own probability map, which will help create the word after that, and so on.

Python Usage

I used the markovify Python module, which fortunately takes care of all the training and generating for you. All you have to do is provide the dataset of Hamilton text. It's in CSV format, so you'll need to clean up the data and convert it to a .txt file:

import csv

#Open csv file
with open("ham_lyrics.csv") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=",")
    line_count = 0
    for line in csv_reader:
        if line_count > 0:
            ham_lyrics_list.append(line[2])
        line_count += 1
    csv_file.close()

print(f"Processed {line_count} lines.")


#Write to a text file
with open("ham_lyrics.txt", "w") as txt_file:
    txt_file.write("\n".join(ham_lyrics_list))
    txt_file.close()

Read in the file to a string:

with open("ham_lyrics.txt") as file:
    ham_text = file.read()
    file.close()

Next, import markovify and train a NewlineText model with the data.

import markovify
ham_model = markovify.NewlineText(ham_text)

You can generate a single line of text with the following:

print(ham_model.make_sentence())

Below are some of my results with this version of the model.

  • You'll remember that night, I just need to know what Angelica said
  • While you were off getting high with the fact that you're willing to wait for it, wait...
  • Da da dat da da da dat dat da da da ya da!
  • Even before we got it made in the air, you can be a new form of government!
  • Troops are waiting in the neck in Quebec

The model likes to combine the beginning and end of two sentences to create a new one, as in the last example above. A quick search of the Hamilton lyrics revealed that it comes from adding "Troops are waiting in the field for you" and "Until he caught a bullet in the neck in Quebec".

The model also has a tendency to repeat itself a lot, and it often gets a little carried away with the 'da da da' since there are so many instances of that in the text. Most of the sentences are grammatically correct, but they make absolutely no sense. Fortunately, we're not worried about whether they have any meaning or not - rap lyrics just need some sense of flow.

Rhyming Couplets

It took me a while to figure out how to make two different lines rhyme, since Markov chains make it impossible to guess what the ending word will be. Sure, you could generate hundreds of lines and wait for one to randomly rhyme with another, but that wouldn't be very resource and time-efficient.

The ideal solution is to create two separate models, one trained on standard Hamilton text and one on a reversed version of the text, where every line has words positioned backward. "My name is Alexander Hamilton" would become "Hamilton Alexander is name My". This way, creating rhyming couplets is feasible:

  1. Generate a normal sentence with the first model.
  2. Pick a word that rhymes with the last word of the sentence.
  3. Generate a second sentence that starts with that rhyming word.

Let's try to implement this in Python, using the pronouncing module to create rhymes. You can use pronouncing.rhymes(last_word) to return a list of words that rhyme with last_word:

import csv

#Create reversed Hamilton text file
with open('ham_lyrics.txt', 'r') as f, open('ham_reversed.txt', 'w') as fout:
    for s in f:
        if s is not None:
            words = s.strip()
            words = words.strip(punctuation)
            words = words.split()
            words.reverse()
            fout.write('\n' + ' '.join(words))
import random
import markovify

with open("ham_lyrics.txt") as file:
    ham_text = file.read()
    file.close()

with open("ham_reversed.txt") as file:
    ham_r_text = file.read()
    file.close()


#Create models

print("Building Hamilton model... ", end="")
ham_model = markovify.NewlineText(ham_text)
print("Done\n")

print("Building reversed Hamilton model... ", end="")
ham_r_model = markovify.NewlineText(ham_r_text)
print("Done\n")

#Make rhyming couplet
sentence1 = ham_model.make_sentence()
last_word = sentence1.split()[-1]
rhyme_list = pronouncing.rhymes(last_word)
print(ham_r_model.make_sentence_with_start(random.choice(rhyme_list)))

Let's see what the model is spitting out now:

Stand with our young nation
I'll rise above his station

What happens if we don't get a drink
And you are or what you think

Send in your seconds, see if they can never take away
In return, they didn't ask for French aid, I pray

Who has to end
Do you have a friend

Work of genius. I couldn't seem to die
You and I came to say goodbye

Run away with us in New York City
The issue on the British cut the city

Five six seven eight nine!
We have to resign

If I could see
I know that we are free

I was weak, I was so smart
Ev'ryone give it up for the most part

We'll never be truly free
If I could see

Smooth rhymes! Lin-Manuel Miranda would be proud.

Going Further

The same Markov chain technique can be applied to any corpus of text without needing an adjustment in the code, just by changing the contents of the text file. Let's go back two centuries and use a dataset of Shakespeare's sonnets.

After training the Markov chain once on the new text and once on the reversed text, the result is as follows:

Art left the prey of every vulgar paper to rehearse?
This thou perceiv'st, which makes your praises worse

So long as brain and heart are at a frown they in their badness reign.
Those children nursed, delivered from thy heart when mine is slain

As on the kingdom of the shore, where two contracted new,
In all external grace you have bid your servant once adieu

Impressive! Even though the Shakespeare text has less lines than the Hamilton text, each line is several words longer on average. This seems to produce even better and more grammatically correct results. Some are even correctly written in iambic pentameter. We can even try to write a fourteen-line Shakespearean sonnet:

When sometime lofty towers I see barren of new pride?
Which alters when it grows,
By seeing farther than my o'erpressed defence can bide
The worst was this, my love well knows

And steal dead seeming of his great verse,
I do count the clock that tells the story of thy days.
This thou perceiv'st, which makes your praises worse
To him that bears me, tired with my lays

Save that to thee my true love control,
That time of year thou mayst in me each part will be thy defect,
Yet nor the prophetic soul
I cannot blame thee, for my dumb thoughts, speaking in effect

So either by thy beauty being mute,
I grant I never saw that you did impute

It's not perfect and it doesn't make much sense, but to be fair, neither does any of the Old English original writing anyway. Sorry, Shakespeare.

If you'd like to try out your own text, check it out on Replit.

Did you find this article valuable?

Support Gabe Tao's blog by becoming a sponsor. Any amount is appreciated!