POSTS
Search Query Haikus
Introduction
In the fall of 2009, I studied Applied Machine Learning at CMU under Carolyn Rose. As a final project, I analyzed a leaked AOL search query dataset. After that class, I continued working with the data to identify unintentional haikus in users’ search histories.
Machine Learning
My Machine Learning project concerned identifying patterns in search behaviour. While the data set does not contain personally-identifiable information (at least not directly), it does group queries by user. I attempted to build user profiles based on search habits and use these profiles to identify further search sessions. As one might expect, identifying people with such sparse data is very difficult, and my results supported this.
Here’s poster detailing the machine learning project. (803 KB)
Identifying Haikus
My brother’s immediate response to this project was, “What about haikus?” We could both agree that even nonsense phrases take on extra gravitas when composed in the form of a haiku (one 5-syllable line, one 7-syllable line, one 5-syllable line). Finding unintentional haikus within the dataset is admittedly a far cry from the original goals of the project, but I was interested enough to explore the possibilities on my winter break.
Identifying haikus required counting the syllables in each query. Doing this programmatically is non-trivial, and to my knowledge, there are no publically-available lookup tables. To accomplish the task, I was admittedly a bit rude. I queried the website HAIKU WITH TEETH thousands of times to get syllable counts for the words in each search term. With this data, I was able to identify which sequential queries formed haikus.
Results
Download results here (14 KB) WARNING: explicit content
I learned, first and foremost, that there is more to funny haikus than correct syllable counts. Many people have a search pattern which includes repeating previous searches. This led to many haikus with identical first and third lines (not funny). Additionally, while Haiku with Teeth does a great job, it is not perfect. This is particularly true in cases of typos, contractions, and proper nouns.
Despite all that, the results contain some interesting haikus:
Stand up tanning bed
gas prices new york city
meanings of roses
Poems of springtime
nash community college
angioplasty
Cats urinating
betty everett lyrics
patsy cline lyrics
Free music lyrics
tattoos flowers butterflies
flowers butterflies
Empire flooring
incident in a small town
and justice for all
What does meekly mean
what does serenading mean
what does halted mean
Fortunately for Haiku with Teeth, I did not search the entire data set. I applied this filter to a small subset as a proof-of-concept. While there may be thousands of interesting haikus left to find, for now, I am happy with my hand-picked few.