Skip to main content

Command Palette

Search for a command to run...

How I Actually Use AI to Write Code (Without Wrecking the Codebase)

Same trick as everything else: give the model a rulebook and a leash.

Updated
7 min read
How I Actually Use AI to Write Code (Without Wrecking the Codebase)
K
I am a Lead Full Stack Engineer with 6.5+ years of experience building scalable cloud-native platforms, distributed systems, and production-grade applications across telecom, fintech, govtech, and edtech domains. My core strength is backend engineering with Java, Spring Boot, microservices, and AWS, but I work across the entire delivery pipeline — from schema design and APIs to frontend interfaces and deployment systems. I describe my engineering style with one line: “I ship end-to-end. Schema to surface. No handoffs.” I believe strong engineering comes from ownership, not isolated specialization. The same engineer who designs the service should understand the UI consuming it, the deployment pipeline running it, and the metrics validating it in production. That mindset has shaped how I build systems, mentor teams, and deliver software. Over the years, I have worked on carrier-scale enterprise platforms, CRM modernization systems, loan-processing applications, real-time tutoring infrastructure, and department-scale governance portals. Across every domain, the engineering discipline remains the same: understand the problem deeply, design clear system boundaries, instrument what matters, and deliver measurable outcomes. My backend stack primarily revolves around Java, Spring Boot, Spring Cloud, distributed microservices, REST APIs, authentication systems, caching, resiliency patterns, and performance optimization. I have also built extensively using Node.js and NestJS for modern service architectures. On the frontend side, I work with React, Angular, TypeScript, and React Native to deliver responsive and scalable user experiences. I have hands-on experience with cloud-native infrastructure and DevOps workflows using AWS services like EC2, Lambda, S3, ECR, RDS, CloudWatch, CodeBuild, and CodePipeline, along with Docker, Jenkins, SonarQube, Grafana, ELK Stack, and CI/CD automation. I care deeply about observability, operational visibility, and systems that remain maintainable under scale. One thing that defines my approach is that every system should move a metric. I focus on engineering outcomes — improving performance, reducing operational friction, increasing delivery speed, simplifying developer workflows, or creating better user experiences. If a feature does not create measurable lift, it is incomplete. I am also deeply interested in modern AI-assisted engineering workflows. I actively use tools like GitHub Copilot, Claude, Gemini, Cursor, and agentic development systems to accelerate development, improve productivity, and rethink how software teams build products at scale. Beyond coding, I enjoy mentoring engineers, improving engineering standards, reviewing architectures, and building systems that other developers can scale confidently. I value clarity over complexity, practical execution over theoretical perfection, and shipping over endless planning. Today, my focus areas include distributed systems, platform engineering, cloud-native architecture, AI-powered developer tooling, scalable backend infrastructure, and modern full-stack application design. Backend-deep. Full-stack by delivery. Schema to surface. Service to screen. No handoff costs.

AI can write code fast. It can also write a bug fast, with total confidence and a beautifully worded commit message. That's the part the demos leave out.

You've seen the videos. Someone types one sentence, an app appears, the crowd claps. What's never on camera is the next six months: the part where somebody has to read that code, change it, and explain to a customer why it did something nobody intended. Generating code was never the hard part of this job. Living with it is.

I lean on AI every day. Claude Code, Cursor, the usual suspects. But I use it the way you'd use a strong, fast junior engineer who started this morning, has read nothing, remembers nothing tomorrow, and will confidently do exactly the wrong thing if you let them. Handled like that, it's a huge multiplier. Handled like a senior you can hand the keys to, it quietly rots your codebase while everyone's impressed by the velocity.

Here's how I actually run it.

The vibe-coding trap

"Vibe coding" is fun right up to the moment it isn't. You prompt, it produces, it runs, you ship. Speed feels incredible. Then a bug shows up in code you didn't write and don't understand, and now you're debugging a stranger's work with none of the context that would've come from writing it yourself.

That's the trap. The code looking plausible is not the same as the code being correct, and AI is extremely good at plausible. It writes things that pass the eye test and fail the edge case. Confident, well-formatted, subtly wrong. The worst kind of pull request, basically, except it arrives every ninety seconds.

The fix is the same one that works for AI-assisted writing, and honestly the same one that works for junior engineers: you don't hand over a blank canvas and hope. You hand over a rulebook and a leash.

The rulebook lives in the repo

For code, the rulebook isn't a prompt you retype. It's a file that sits in the repository and the agent reads automatically every time it starts: CLAUDE.md, or AGENTS.md, or your editor's rules file. Same idea whatever the tool calls it.

Think of it as the onboarding doc you wish every new hire actually read, except this one does, every single session, without complaint.

Without it, the agent guesses. It puts files wherever, invents a naming scheme, picks a library you'd never approve, and writes business logic into a controller because nothing told it not to. With it, the guessing stops. Here's the shape of mine:

# CLAUDE.md

## Project map
- HTTP controllers live in `src/api`, thin. Business logic in `src/services`.
- Persistence in `src/repositories`. No SQL outside that folder.

## Conventions
- Constructor injection only. No field @Autowired.
- DTOs in/out at the boundary. Never leak entities to the API.

## Commands (run these, don't claim "done" until they pass)
- Build:  ./mvnw -q compile
- Test:   ./mvnw -q test
- Lint:   ./mvnw -q spotless:check

## Do NOT
- Add a dependency without asking.
- Touch the `payments` module.
- Commit anything under `secrets/` or any `.env`.

That last block is the most valuable part. Every "do not" is a class of disaster the agent now can't wander into. You write it once and you stop re-explaining it forever.

Tests are the leash

The rulebook keeps the agent pointed in the right direction. Tests are what stop it from lying to you about whether it got there.

This is the single thing that makes AI coding safe past toy size: you don't trust the code, you trust the test that proves the code. So the agent doesn't get to declare victory because the diff looks reasonable. It declares victory when the suite is green, and not one second before.

In practice I either write the failing test first, or I make the agent write it before the implementation and I check that the test is actually testing the real thing (they love a test that asserts true == true and calls it a day). Then the loop is simple: red, write code, green, or back to the agent. The test is the leash. It's exactly as long as your coverage is honest.

// The contract, written before the code:
@Test
void rejectsTransferWhenBalanceTooLow() {
    var account = new Account(money("10.00"));
    assertThrows(InsufficientFundsException.class,
        () -> account.transfer(money("25.00")));
}
// Now the agent can write transfer(). It isn't "done" until this is green.

No test, no trust. If a change can't be pinned down by a test, that's usually a sign it's the kind of change I shouldn't be handing off in the first place. Which brings up the real question.

Know what to hand over

Not everything is safe to give away, and pretending otherwise is how the velocity demos turn into incident reports.

The left column is where AI earns its keep: boilerplate, CRUD endpoints, the fifteenth mapper between two nearly identical shapes, unit tests for code that already exists, mechanical renames across forty files, "explain what this legacy function does." Work that's tedious, fast to generate, and easy to verify. Hand it over and don't feel a thing.

The right column is where I keep my hands on the wheel. System and data architecture. Anything touching auth, security, or money. Concurrency, locking, ordering, the stuff that works fine until it's 2am and a race condition is eating your weekend. And the one thing the model fundamentally can't do for you: figuring out what the actual problem is. The agent can build the thing. It can't tell you you're building the wrong thing.

The boundary isn't fixed, by the way. As your tests and your rulebook get stronger, more work safely slides left. But it slides because you earned the safety, not because you got tired of reviewing.

What the agent can't do

It can't hold your whole system in its head. It sees the files in front of it and a summary, not the four years of context about why that weird workaround exists and what breaks if you "clean it up."

It can't own the consequences. When the thing it wrote goes sideways in production, it's not on the call. You are. No skin in the game means no judgment about risk, just confident output either way.

And it doesn't know what "good" means for your domain. Good code for a banking ledger and good code for a throwaway internal dashboard are different in ways the model will flatten into the same tidy average unless you tell it otherwise.

So the senior skill in all this isn't writing code anymore. It's reading a diff quickly, smelling what's off, and saying "no, not like that" with a reason. The job shifted from typing to reviewing, and reviewing well is harder than it looks. You're the editor. The agent is a very fast, very forgetful typist.

The honest version

This makes me genuinely faster, and I'm not going to pretend it doesn't. On the 70% of the work that's mechanical, it's a different speed of working entirely. That's real, and it frees up attention for the 30% that's actually judgment, which is the part that was always worth my time.

But it is not hands-off, and the people selling it as hands-off are describing a codebase I wouldn't want to inherit. I read every diff. I write the tests, or I check the ones it wrote like I don't trust them, because I don't. The rulebook and the leash aren't bureaucracy. They're the only reason the speed doesn't turn into a slow-motion mess six months out.

The tool will change. It's Claude Code today, something better next year, and your CLAUDE.md and your test suite move over almost untouched, because they're about your codebase, not the model. That's the asset. The model is just the engine you bolt it to.

So: scope it small, write down the rules, make the tests the gate, and review like it's a junior's PR, because that's precisely what it is. Then merge it under your name and mean it.

82 views
S

Really enjoyed this perspective. The "rulebook and leash" analogy is spot on.

We've found that AI delivers the most value when it's used to accelerate implementation, not replace engineering judgment. Clear conventions, strong testing, and good code reviews still matter just as much as before.

The productivity gains are real, but so is the risk of accumulating technical debt if teams start trusting generated code without understanding it.

More from this blog

K

Kishore K

3 posts

kishorek.dev is a blog focused on software engineering, AI, backend development, scalable architectures, microservices, cloud, and modern developer workflows. Expect practical insights, production learnings, system design patterns, DevOps strategies, AI engineering content, and real-world experiences from building reliable and scalable systems. Built for developers who value thoughtful engineering over hype.