This week I released my first bot on Twitter – HaikuNewsBot

It’s built in Python, and gets news from over 50 English sources using the free News API

HaikuNewsBot tweet

It’s not 100% accurate or super useful but it’s a cute proof of concept for future bots and parsing programs. I’ll add small updates over time as I receive feedback and notice bugs.

Parsing a Haiku

I settled on making this an English-only news bot, so I could use consistent (and familiar) language parsing techniques for every article title.

Most of the syllable counting logic is a combination of the CMU Pronouncing Dictionary from the nltk package, and the textstat package.

I wrote a custom syllable parser for numbers, so that ‘16,000’ is properly counted as four syllables (six-teen-thous-and). I also wrote a parser for acronyms, discovering that the translation of letters to syllables is wonderfully simple: 3 if letter == 'w' else 1.

I’ll wait while you go through the alphabet in your head now.

Before running words through the parsers, I .split() the headline on all whitespace, stripped it of non-alphanumeric characters, and analyzed each text element by itself.

If an element was 3 characters or less and uppercase (e.g. FBI), then assume its an acronym, and use the character parsing logic above.

If an element is composed of letters and numbers (e.g. G20 or patio11), then split the alpha part(s) from the numeric part(s), and parse each of those elements separately.

Collecting and Posting Tweets

Each time the bot runs, it gets about 15 new haikus from the API. It stores these in a local SQLite database, ranking them by haiku likelihood (it has about a 70% accuracy rate).

Whenever the bot is told to post a haiku, it chooses the highest-rated one from within the last day from the database. It tries to post it to Twitter, and does nothing if it fails (at the moment). Eventually I’d like to set it to email me if it fails with the error message.

Hosting the Service

The bot is hosted on a $5 Linode VPS with some other toy programs, and runs once an hour as a cron job (0 * * * *).