Presenting Yuki Kakegawa
Yuki is a Staff Data Engineer, author of the Polars Cookbook, and the founder of Orem Data. He writes about data tools and independent consulting on Substack and LinkedIn.
I asked Yuki, Staff Data Engineer and author of the Polars Cookbook, what the biggest adjustment was when he became a team of one.
He skipped tooling. He skipped workload. He said this:
It’s a shift in mindset. You’re not a data engineer anymore. You’re a data person helping the business with data and analytics
That sentence landed harder than I expected, because most people assume the challenge is the technical breadth: wearing all the hats, being the analyst, the engineer, and the data scientist rolled into one.
That part is visible. What nobody talks about is the identity part.
You Were Hired as a Data Engineer. That Is the Problem.
At a large company, your identity is your scope. You do pipeline work, pass it to the analyst, the analyst makes the report, the report goes to the business. You are one node in a chain, and that chain insulates you from the messiness of what the business needs.
When you’re a team of one, that chain is you.
Yuki described his current day-to-day:
talking to the business to understand what they care about
translating that into projects
building the pipelines
building the models,
building the reporting layer
delivering insights directly to stakeholders
He skips the handoff entirely.
That scope describes a different job, built around a different identity.
The Selfishness That Kills You
He used a word that surprised me: selfish.
When I was first getting started, I wanted to work on pipelines and I didn’t want to work on reporting. That selfishness kills you in this role. You have to be flexible and you have to care about the business first.
Most data engineers have strong opinions about which work is worth doing. Pipelines are interesting. Reporting is boring. Data modeling is craft. Dashboards are noise.
Those opinions are how you survive in a large org where you can negotiate your scope.
A team of one has no scope to negotiate. The business has no interest in which part of the stack you find meaningful.
The work that needs doing is the work you do, and if you have not made peace with that before taking the role, the first six months will feel like a slow grind against a situation you agreed to.
What Is Broken at Every Startup You Walk Into
I asked Yuki what he finds broken almost every time he walks into a young company.
Tooling is rarely the issue. It’s the processes around the processes that produce the data used for reporting and analytics downstream. And the alignment on what’s important, the definition of metrics, what we want to build.
Two things. The upstream processes that generate data, and whether the business has agreed on what the numbers mean. The stack is almost never the problem.
The Metric Definition Problem
This is the one that is hardest to fix and most commonly ignored. Yuki described the ideal state: every metric the business cares about is written down, defined, and agreed on before the data team touches a model.
When that is true, the data team’s job is implementation. The hard part is already done for you.
The real state at most companies is three departments with three definitions of the same number.
If you use one team’s definition, you put another team in a bad spot. If you try to reconcile them, you are suddenly in a political conversation nobody hired you to have.
I’ve been in that position. That’s the part I hated the most.
Defining core metrics early, before the company grows, before each department builds its own reporting layer, before everyone has opinions baked into their own numbers, is more valuable than any pipeline you will build.
If you’re walking into a company where this work is undone, do it first.
How to Handle the Stakeholder Who Thinks Your Work Takes Two Weeks
Yuki described a pattern that every team-of-one data person will recognise. A stakeholder requests a report, thinks it can be done in two weeks, and the data behind it is messy enough that you know it will take four.
I can try, but I can’t promise, because I think these things will be bottlenecks.
Two things are happening in that exchange. Setting expectations is the obvious one. Making the complexity visible is the one that changes the relationship long-term.
When you’re the only data person, the rest of the company defaults to assuming data work is fast. Pull the numbers, build the report, done.
They genuinely have no frame of reference for what it takes to model messy source data into something accurate enough to make decisions with. Your job is to narrate the work before you do it, and skip the explanation after you miss the deadline.
The Transparency Stack
Yuki added something that makes the whole prioritisation problem easier: making your priority stack visible to the entire business.
Being transparent about what you’re working on and what the priorities are is important. Because if one stakeholder thinks his project is the top priority, but I’m working on a priority item requested by the CEO, then that stakeholder will understand that his project is not at the top.
This works because it shifts who does the priority negotiation. When everyone can see the queue, the conversation moves from “why isn’t my thing done“ to “is my thing in the right place in the queue“. The second conversation is faster and involves fewer emotions.
Accuracy vs. Speed Is a Trade-Off to Name, Not a Choice to Make Alone
I pushed Yuki on a question that sounds simple: accuracy or speed?
Definitely accuracy. But that’s where your skill comes in. How do you deliver projects fast while ensuring accuracy?
The real answer lives in the communication around the trade-off, not the trade-off itself. If you can deliver in one week at 80% data quality, or in two weeks at 100%, the stakeholder should make that call. They can only make it if you put the options in front of them explicitly.
If you wait two weeks, I can ensure 100% accuracy. If you want it in one week, there might be things that still need to be pinned down.
Yuki’s underlying point is worth sitting with: making decisions on low-quality data defeats the entire point of data-driven decision making. Speed that produces bad numbers is a liability that takes three times longer to fix than the shortcut saved.
On Using AI When You Have Nobody to Think With
One thing Yuki said is worth pulling out directly, because it is the most honest framing of AI use I have heard from anyone working in the data space:
I use AI especially when I want to bounce ideas around. I use it because I don’t have anybody else to talk to, because I’m the only data person.
AI as a rubber duck rather than a replacement for judgment. At a startup where you’re the only data person, you have no peer to sanity check your modeling approach, your architecture choices, or whether your prioritisation call makes sense. AI fills a small part of that gap.
But Yuki also admitted he is not sure AI is making him faster overall. The time he used to spend building the solution, he now spends validating what the AI built. The total may net to zero, with the shape of the work changed and the volume unchanged.
That matches what I have seen. AI handles narrowly scoped, well-defined tasks well: understanding what a table column means by parsing the codebase, generating boilerplate inside a framework you have already defined, drafting initial SQL that you then refactor.
For architectural decisions and anything that requires sustained judgment, the model is an input, and the judgment stays with you.
Connect With Yuki
Yuki is very active online. Here are some of the best ways to connect with Yuki:
—
Yordan












