How to make your SQL 10x faster (and silence the “AI can’t help with that” crowd)
A 4-week deep dive into schema-aware prompts, query plans, and AI-assisted indexing, built for engineers who give a damn about performance.
This mini-course is part of the Level-up data engineering playlist. Click here to explore the full series.
I used to think AI couldn’t help with real SQL work.
Sure, it could autocomplete a SELECT
, but optimizing a 30M-row query? That’s senior engineer territory.
This mini-course proves otherwise.
You’ll learn how to prompt like a systems thinker, diagnose EXPLAIN plans like a pro, and simulate indexing with zero risk.
In the next 4 weeks, you’ll tackle one real-world performance bottleneck. By the end, you won’t just write faster SQL.
You’ll think faster too.
How to work with this mini-course
Just reading the article alone would take you over 20 minutes. So:
Bookmark this guide and set a reminder to revisit it weekly.
Skim the entire article once to understand the big picture.
Each week, complete the exercises before applying them to your own projects.
Take your time. Don’t rush to implement everything at once. Master each step before moving to the next.
Also, you will need about an hour to read the whole thing and write the code at once. It’s much easier to spend 15 minutes per week!
Who is this for
This is for the engineer who’s done all the right things and still feels stuck.
You know how to write a solid query.
You’ve skimmed AI blog posts that say “Just use GPT!”
You’ve watched it write garbage SQL and thought: There’s no way this helps in prod.
You’re not wrong.
But also, it’s not the model.
It’s the prompt, the schema, and the way you teach it to reason.
This is for you if:
You write analytical queries that run slow, and want to know why
You’re tired of guessing what EXPLAIN plans mean
You want AI to be your performance copilot and not your crutch
You don’t want to play “SQL roulette” every time a query hits prod
You care about real metrics: buffer hits, index paths, cost deltas
This mini-course won’t make you a DBA.
It’ll make you dangerous in performance reviews, schema design meetings, and tuning sessions.
No magic. Just muscle.
What you’ll learn (by the end of 4 weeks)
You’re not here to get “better at SQL.”
You’re here to turn that skill into firepower, backed by evidence, not vibes.
By the end of this, you’ll know how to:
Prompt LLMs with schema, row counts, and constraints and catch when they hallucinate
Read an
EXPLAIN (ANALYZE)
plan like a performance engineerUse ChatGPT to diagnose query bottlenecks (and challenge its answers with facts)
Simulate indexes with HypoPG before writing a single
CREATE INDEX
Rewrite slow queries using window functions, CTEs, and partition-aware patterns
Spot when AI is helpful and when it’s confidently wrong
This isn’t autocomplete training.
It’s optimization thinking with AI as your assistant.
Setup your playground + tools
Before we start optimizing anything, we need a playground worth tuning.
We’re using the Wikipedia Clickstream dataset (Jan 2025) — messy, massive, and built for OLAP-style exploration.
You’ll load it into PostgreSQL on your local machine (Mac only — no Linux or Windows instructions here).
Here’s what we’re building:
A fact table with ~50M rows of user clickstream data
Two dimension tables to help with labeling and grouping
A test-only copy for slow queries you can safely experiment on
Step 1: download the data
Grab the Jan 2025 clickstream dataset:
curl -O https://dumps.wikimedia.org/other/clickstream/2025-01/clickstream-enwiki-2025-01.tsv.gz
gunzip clickstream-enwiki-2025-01.tsv.gz
This file is ~1.3GB uncompressed and uses tab-separated values.
Step 2: install PostgreSQL (if you haven’t yet)
Using Homebrew:
brew install postgresql
brew services start postgresql
Check version:
psql --version
You want PostgreSQL 14 or higher.
Step 3: create the database
Open Postgres CLI:
createdb clickstream_ai
psql clickstream_ai
Inside psql, create your base table:
create table clickstream (
prev_title TEXT,
curr_title TEXT,
link_type TEXT,
n BIGINT
);
This captures:
prev_title: the referring page
curr_title: the page that was visited
link_type: internal, external, or other
n: number of transitions
Step 4: Clean and load the data
Wikipedia data is in not in great shape. Let’s do our favourite thing as data engineers and clean it.
awk -F '\t' 'NF==4 && $3 != "other" {
gsub(/"/, "\"\"", $1); gsub(/"/, "\"\"", $2); gsub(/"/, "\"\"", $3); gsub(/"/, "\"\"", $4);
print "\"" $1 "\",\"" $2 "\",\"" $3 "\",\"" $4 "\""
}' clickstream-enwiki-2025-01.tsv > clickstream_cleaned.csv
Now load it:
psql clickstream_ai -c "\\copy clickstream from 'clickstream_cleaned.csv' with (format csv, header false)"
Check count:
select count(*) from clickstream;
You should see around 36 million rows.
Step 5: create dimension tables
We’ll create two lookup tables:
create table dim_page (
title TEXT PRIMARY KEY,
is_disambiguation BOOLEAN DEFAULT FALSE
);
create table dim_link_type (
link_type TEXT PRIMARY KEY,
description TEXT
);
Optional: add a few dim_page entries to use in joins later.
Step 6: create a slow copy for experiments
create table clickstream_slow as select * from clickstream;
You’ll use this to run bad queries on purpose and learn to fix them.
Step 7: tools you’ll use
ChatGPT / Claude / Copilot / Cursor for SQL generation + plan interpretation
HypoPG to simulate indexes without committing to schema changes
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) to see exactly what your query is doing
Did you miss the announcement? You and I will spend the next year to turn your technical work into profit-driving business systems without playing politics, changing jobs, or necessarily becoming managers. Subscribe now and secure the next year of growth.
Keep reading with a 7-day free trial
Subscribe to Data Gibberish to keep reading this post and get 7 days of free access to the full post archives.