A battle-tested analytics stack for teams who need speed, clarity, and control
If you're building an analytics platform from scratch, here's the exact stack I'd use today. Tools, trade-offs, and practical lessons behind a modular, scalable, and low-maintenance data setup
Greetings, Data Engineer,
It’s easy to obsess over tools. It’s also easy to get them completely wrong.
The goal of a good analytics stack isn’t to look modern. It’s to work under pressure, under change, and under constant expectation.
After years of building, scaling, and rebuilding data systems, here’s the stack I’d pick today. Every tool here earned its place through lessons, trade-offs, and scars.
If I were starting again, this is how I’d build.
How I think about analytics architecture today
Start with modularity. Build for change.
Most data platforms fail when change shows up. The business pivots. A stakeholder needs new metrics. A team wants to track things differently. You can’t predict any of that.
But you can structure for it.
I think in layers:
Extraction: Isolate from everything else. Make it pluggable.
Loading: Store raw. Don’t transform early.
Transformation: Centralised, versioned, testable.
Orchestration: Flexible, decoupled from infrastructure.
Storage: Scalable and replaceable without panic.
BI: Accessible and trustable, especially by non-technical users.
Each layer should operate independently. If I need to swap one, I don’t want to touch five others.
That’s the design principle behind every tool below.
Extraction and loading: Use Singer if you want flexibility over polish
How I think about this layer
I need tools I can run locally, read without a manual, and swap out without touching the rest of the stack.
That usually means plain Python. No fancy wrappers. No platform dependency.
If a job fails at 2am, I want to be able to trace it myself without sending a support ticket or waiting for logs to load in some UI.
This layer breaks often. Source systems change. APIs get rate-limited. Log formats drift. So I pick tools that make failure obvious and fixable.
Why I pick Singer
Singer isn’t perfect, but it gets the shape of the problem right.
Each tap is self-contained. You can fork it, fix it, and run it anywhere. It writes raw JSON, which means you always know what you’re loading.
The big win is control. When something breaks, I can see what happened. I don’t have to reverse-engineer someone else’s logic or depend on a SaaS vendor to debug it.
I also don’t need to redesign my stack every time I add a new data source. I can drop in another tap and move on.
It’s not the fastest. It’s not the easiest. But it’s clear. And when you’re starting out, that matters more than anything.
Transformation and testing: Start with dbt unless you have a reason not to
How I think about this layer
This is where people usually overcomplicate things. You don’t need a custom framework. You don’t need to reinvent version control. You need something your whole team can read and ship confidently.
I want transformations to be transparent. Versioned in Git. Easy to test. Easy to roll back. If someone new joins, they should be able to follow the logic without asking ten questions.
SQL is the default language for this work, so the tool needs to treat SQL as the main interface, not a second-class one.
Why I pick dbt
dbt helps the team grow without the logic falling apart. It sets a standard right away: tests live next to the models, docs get generated automatically, models build on each other in a clear way.
You don’t have to create naming conventions from scratch. You don’t have to figure out how to test a join or track a dependency across five tables. The defaults are good. And the community is big enough that you’re rarely stuck alone.
It works well with most warehouses. Works locally. Works in CI. And most importantly, it forces some discipline.
That’s usually what’s missing when teams start to scale: the first few models grow into a mess, and then everyone’s afraid to touch them. dbt slows that mess down a lot.
Orchestration: I use Meltano because it pulls everything together without drama
How I think about this layer
This layer is more about glue than power. I’m not trying to build a scheduling empire. I need a way to run pipelines reliably, know when they fail, and avoid duct tape.
If something breaks, I should know where. If I need to swap a component, I shouldn’t have to change my scheduler. And I don’t want to spend a week wiring up monitoring or retries.
It has to run dbt. It has to run Singer. It has to play nice with Airflow, but not force it.
Why I bundle with Meltano
I use Meltano because it wraps the whole thing in a way that’s actually deployable.
It knows how to run Singer taps. It runs dbt cleanly. It keeps environments isolated. And if I need to scale up orchestration, I can wire it into Airflow without rebuilding everything.
You don’t need full DAG tooling when you’re starting out. Meltano gives you a clean CLI, simple scheduling, and fewer moving parts.
If I want to run things on cron for now, that’s fine. If I need a full scheduler later, I can grow into it.
That balance is what makes it work for me.
Mage and Kestra are also strong contenders. They’re more modern in feel, easy to onboard, and give you decent observability out of the box.
But Meltano is what I know the most and what gives me the most flexibility.
Warehousing and storage: Snowflake’s my go-to, but the others are strong too
How I think about this layer
Warehousing is where everything converges. If it’s slow, everything downstream gets worse. If it’s expensive, every job hurts. So this layer has to be predictable, flexible, and safe to grow on.
I want to run heavy joins without worrying about concurrency. I want to isolate workloads by team. And I want to know how to dial things up without accidentally tripling my costs.
This layer needs to feel boring in the best way.
Why I usually go with Snowflake
All three, Snowflake, Databricks, and Clickhouse are solid. You won’t make a bad call if you choose based on your team and constraints.
I lean on Snowflake because I’ve used it a lot. I know how to cluster, tune warehouse sizes, and manage refresh jobs in a way that keeps costs down.
People say Snowflake is expensive. It can be. But once you understand how to schedule loads, batch queries, and isolate compute, it’s shockingly affordable. Especially for the performance.
It also plays well with dbt and BI tools. You don’t need to manage infra. And you get solid concurrency out of the box.
Is this resonating with you? I'd greatly appreciate your feedback in the form of a quick testimonial. Submit yours now
What about Databricks and Clickhouse?
Databricks is great if you’re doing more compute-heavy work. It’s ideal for teams who care about notebooks. But you’ll need more infra maturity to get the most out of it.
Clickhouse is fast. Like, extremely fast. But it’s not for everyone. You should only go there if you need sub-second queries, want to self-host, and have the ops team to back it up.
For most analytics work, Snowflake keeps things simple and powerful. That’s why it’s usually my starting point.
BI tools: This is where trust either builds or breaks
How I think about this layer
This part gets ignored early on. It shouldn’t.
If the people you’re building for can’t explore data or get quick answers without help, you’ll be stuck doing their work for them. Every report becomes a support ticket. Every dashboard becomes a bottleneck.
So I want a BI tool that lets non-technical folks answer their own questions without breaking things. It has to be usable, predictable, and boring in the best way.
It’s not about features. It’s about how easily people can get what they need without second-guessing it.
What I’ve used and what I recommend
Right now, we’re moving to Omni. It’s clean. It’s governed. It works well for growing teams that need both speed and structure.
The big upside is that you can keep metric logic in one place without hiding it behind five layers of abstraction.
Metabase still works great for smaller teams. It’s fast to set up and doesn’t require a huge learning curve. If you’re early-stage and want something usable on day one, it’s a good choice.
Lightdash is great if you’re all-in on dbt. It pulls in your models and makes them visible in a way that lines up with your codebase. That’s powerful if you want full alignment between analysts and engineers.
When I’m working solo, or I don’t have to design for non-technical users, I use Evidence. It’s markdown-based and feels like writing a doc that happens to be powered by real data
I can be fast, precise, and in full control without fighting the tool. It’s not for everyone, but it’s great when the audience is technical or already lives in Git.
What I avoid
I steer clear of Superset.
It looks great on paper, but most business users get lost in it. It feels like a dev tool. You end up with only analysts using it, and they still ask for help. That defeats the point.
If your BI tool makes smart people feel dumb, it’s not worth it. You want something people actually enjoy opening and trust when they do.
Final thoughts: Tooling doesn’t solve everything but it sets the tone
If I’ve learned anything, it’s this: your stack doesn’t need to be fancy. It needs to be dependable.
Pick tools that are boring in the right way. Ones you can read. Ones your team can grow into. Ones you don’t have to babysit every time something upstream changes.
The real win is when your platform fades into the background and people get to focus on decisions instead of definitions.
That’s why I care about modularity. That’s why I avoid tools that do too much or lock you in.
And that’s why I’m comfortable recommending the tools in this stack. Not because they’re perfect, but because they’ve helped me get real work done.
You’ll make different trade-offs. That’s fine. What matters is that you understand the trade-offs when you make them.
That’s the difference between building something that scales and building something that breaks the moment the next team shows up.
Cheers,
PS: Are you serious about stepping into career development? My course my Stakeholder Influence System will teach you everything you need to become a strategic data partner and get more buy-in.
PPS: Only Paying Data Gibberish members can unlock all the deep dives, mini-courses, and resources in the premium content library. Don’t limit your growth. Upgrade your experience.