Just a cute pic of me
by Mike McKee
Data Analysis

Does Tweet Length Matter?

Becoming "Twitter Sherlock Holmes" to find a relationship between tweet length and tweet performance.

There are some hooligans in the Money Twitter space shouting, “Size doesn’t matter! Content is king!”

And I believed them. It made sense. But what kind of data analyst would I be if I supported every hypothesis that seemed to make sense when I had no back up?

The answer: a bad one.

That’s why we’re here. I’m becoming the Sherlock Holmes of Money Twitter to find out once and for all if there’s a relationship between tweet length and high performing tweets.

Here’s the TL;DR of my journey as a Twitter detective:

  1. The Watson to my Holmes
  2. Prepping the crime scene
  3. Searching for clues
  4. A second look
  5. A problem arises
  6. Solving the mystery
  7. Deducing an answer
  8. Two crucial mistakes

Let’s begin…

The Watson to my Holmes (tools I used)

No Sherlock Holmes story is great without Dr. John Watson by his side. For this project, my Dr. Watson was none other than a few trusty tools:

  • SQL
  • Tableau
  • Twitter Analytics

If you want to investigate my SQL code as Sherlock Holmes would himself, check out my Github page here.

And in case you’re wondering where this data comes from, I exported my Twitter Analytics report from Feb 13th to Mar 13th. So the data here comes from real tweets with a real human writing them (as a writer I'm not a fan of AI written content).

Prepping the Crime Scene (data cleaning)

Any great detective would be silly if they let someone contaminate their crime scene before searching for clues.

And any great data analyst knows you can’t dive right into the data before cleaning it.

Most of the cleaning was straight forward. The biggest change I made to the dataset, however, was the most important. This analysis depended on grouping tweets by their lengths.

So I created five ranges based on the number of characters in each tweet. Here’s a snippet of what that SQL query looks like.

the SQL query I used to create "tweet length range" column

Searching For Clues (exploratory analysis)

After prepping my crime scene I began looking for clues that’ll prove whether length affects a tweet's performance.

So I started by stating the five most important metrics and looking at their averages based on tweet length range.

  • Engagement Rate
  • Profile Clicks
  • Impressions
  • Replies
  • Likes
Average metrics per tweet length range

The answer’s clear here.

The “113 - 168” character range has the highest averages for impressions, replies and profile clicks. It ties the “225 - 280” range for highest average likes. And it comes in second for average engagement rate.

The “113 - 168” seems like the obvious winner.

But I can’t say the case is solved just yet. There’s more data to look at. And anyone who studied stats in school knows that averages don't tell the whole story.

A Second Look (exploring deeper)

I focused the next part of my analysis on the “113 - 168” and “225 - 280” character ranges since they led in all categories for the averages. But since their average impressions were different, it’s biased to compare the rest of their averages to one another.

So I looked at their average metrics per 100 impressions instead.

The results are interesting here.

The “113 - 168” range does not outperform the “225 - 280” range.

The “225 - 280” range has 25% more engagements and 67% more likes per 100 impressions than the “113 - 168” range.

"Per 100 Impressoins" metrics per tweet length range

A Problem Arises (crafting new data)

But this still isn’t enough to say “225 - 280” is the best tweet length range.

So we’re in a pickle now. The “113 - 168” range performs better when it comes to overall metric averages. But the “225 - 280” range reigns superior when looking at “per 100 impressions” stats.

What do we do?

Return to the beginning of the case and remember the purpose of this project: To find if there’s a relationship between tweet length and high performance.

So I put on my detective cap, took out my magnifying glass, and wrote a simple SQL query to find the percentage of high performing tweets in each range.

a not so simple SQL query
my "simple" SQL query

For the sake of the analysis, I define a high performing tweet as one that performs above average in all the following 5 categories:

  • Likes
  • Replies
  • Impressions
  • Profile Clicks
  • Engagements

And here’s what I found…

Percentage of high performing tweets per tweet length range

Solving the Mystery (our question answered)

The “225 - 280” range absolutely dominates. 15% of all tweets in the range are high performing. That’s 114% higher than the “113 - 158” range’s 7% score.

Assuming both tweet length ranges have the same number of tweets, the “225 - 280” range would have 2x the number of high performing tweet.

Even though I’m not creating Twitter content the same way I was back in February/March, I would tell my former self to focus on writing tweets with a high character count.

Deducing an answer (explaining why)

It’s easy to see my longer tweets performed better than my shorter ones… But why?

That’s a question the data alone won’t tell us.

Luckily with a bit of intuition and common sense, a (possible) answer is clear.

Longer tweets take up more space on your Twitter feed. So if two of my tweets showed up, the longer one would stand out more. This means that there’s a higher probability you'd notice the longer tweet over the shorter one.

More research would have to be done to prove this right, but it’s a reasonable guess.

Three little mistakes (misleading info)

Sherlock Holmes doesn’t solve every case mistake-free, and my analysis here has three flaws that affect the accuracy of my findings.

Let’s take a look at them…

One: Small Sample Size

This project only analyzes the data from my Twitter account over a 28 day period — 194 tweets in total. It’s enough to explain which tweets performed the best for me but not enough to generalize for everyone in the Money Twitter space.

Two: Ignorance

For the analysis I focused solely on tweet length and ignored the quality of each tweet’s content. Great content will almost always outperform bad content.

Since I didn’t measure the quality of each tweet, it’s impossible to judge whether the tweet ranges contained the same quality of content.

Three: Subjective measurements

For the sake of the analysis, I defined high performing tweets as having performing above average in the 5 major metrics we’ve looked at here.

But you may define high performing tweets differently. And your neighbor may define them differently than you too. So depending on how others define success, the data could give different answers.

Let’s Play a Game

As a math major turned marketing fanatic, I love looking at how psychology plays a role in probability and stats. Although my calculations and analysis are accurate here, I decided to tell a story that’s a bit misleading.

You see, there’s a number I shared here that’s actually a psychological persuasion tactic.

It’s a way to exaggerate results to make an audience support your argument. While unethical in some cases, I only used it here to test if you spotted the persuasion manipulation tactic.

Shoot me an email at mike@dorkydata.com if you want to know the answer. I’d love to see whether or not I deceived you.