December 21, 2023
During my first two months as a data analyst, I cared about one thing and one thing only:
Writing SQL queries that worked.
That’s all my boss asked me to do.
Or so I thought…
After two months, I realized performance matters a lot. Unlike most data analysts, I don’t only write one-off SQL queries to pull data into Excel for analysis (yuck, I don’t like the sound of that).
Instead…
I spend much of my time creating/altering SQL Views and Stored Procedures.
My code lives in the database.
It’s used in production.
A few weeks ago, I altered a View and found a nifty way to put a window function inside a subquery inside another subquery.
And my logic worked!
But it took nearly two minutes to execute since the dataset had over one million rows. That’s not good.
Slow SQL queries = Slow reports/websites.
Slow reports/websites = unhappy stakeholders.
Unhappy stakeholders = Not good for me.
Even though my code worked, I couldn’t push it into production. That would have made our website so slow that everyone in the company would have to come pick a fight with my data team.
So, I had to optimize the script to run the same amount of lines faster.
I manipulated the logic to execute the one million rows in two seconds.
2 seconds vs 2 minutes.
Both queries work, but one works better.
To me, being a data analyst isn’t about writing something that works. That’s the first step of the journey. What matters most is performance. No one will care how cool your query is if it runs slow and wastes their time.