The Night Claude 3.5 Changed Everything: 100% LLM Code Story

Raw notes from the breakthrough night: November 2024. I wrote a short, emotional post the night I finally got 100% LLM-generated code to work in production. This is the story and lessons learned.

The Inflection Point

November 2024. One evening, I started working on a deployment automation tool. By 5-6 AM, I had a functional production system—100% of the code generated by an LLM.

The project: Multi-Project Deployer - orchestrates Pulumi infrastructure deployments across multiple projects with complex dependency management.

Time to build manually: Estimated 6 months Time with LLM: 3 days Time saved per use: Weeks of manual configuration → ~1 hour automated

But here’s what made it remarkable: This wasn’t my first attempt to build something substantial with LLMs.

I’d experimented casually when ChatGPT launched in late 2022—and wasn’t impressed with the output. In early 2024, I got serious and tried twice to build real projects with LLMs.

Both failed.

Not because I didn’t know what I was doing. But because the tools weren’t ready yet.

Part 1: Two Failures (Early 2024)

Both attempts in early 2024 failed with the same pattern. I tried different projects with earlier versions of both ChatGPT and Claude. Both followed the same destructive pattern:

The Failure Loop:

Build Feature A → ✓ Works
Build Feature B → ✓ Works
Feature A stops working → ✗
Fix Feature A → ✓ Works
Feature B breaks → ✗
Fix Feature B → Feature A breaks again → ✗
Dead end.

The LLMs would:

❌ Undo their own code changes
❌ Break existing functionality when adding new features
❌ Contradict earlier decisions
❌ Lose track of what we were trying to achieve
❌ Get stuck in loops of breaking and fixing

The root cause: Instruction following and consistency.

The LLMs couldn’t:

Stick to constraints reliably
Remember and respect earlier decisions
Maintain consistency across conversation turns
Build incrementally without drift

Why this matters: Validation guardrails became critical when I discovered LLM consistency issues. The failures taught me that hoping for perfect output is a losing strategy—systematic validation is the only reliable approach.

Part 2: November 2024 - The Tool Catches Up

The Problem That Forced My Hand

I’d rebuilt my Kubernetes cluster three times over the past year. Each time: a few weeks of configuration hell.

The problem:

Multiple Pulumi projects (infrastructure as code)
Complex dependencies between projects
Project A must deploy before Project B
Project C depends on both A and B
Manual tracking in notes/spreadsheets
Error-prone, repetitive, undocumented

After the third rebuild, I was done suffering.

I decided to automate 100% of it. One command to deploy everything, respecting all dependencies.

Claude 3.5 Sonnet: The Difference

The breakthrough capability: Instruction Following

Earlier LLMs would drift. They’d lose context, break constraints, contradict themselves.

Claude 3.5 Sonnet (October 2024 upgraded version) was different.

The tooling: Just the web interface at claude.ai. Cut and paste from browser. No IDE integration. But the breakthrough wasn’t the tooling—it was the model’s reliability.

It actually followed instructions consistently.

For the first time, the LLM could:

✓ Add new features without breaking existing code
✓ Stick to earlier decisions across conversation turns
✓ Follow specific requirements reliably
✓ Build incrementally without corrupting the codebase

Why workflow orchestration works: The Multi-Project Deployer proved a counterintuitive principle I later wrote about: AI shouldn’t orchestrate workflows —it should generate the code that does. This architectural decision is why the tool remained maintainable.

The Moment I Knew

The test that had failed in both early 2024 attempts: Adding Feature B without breaking Feature A.

When Claude 3.5 Sonnet successfully added dependency graph traversal logic to the existing Git integration code, and both kept working, I knew this time was different.

That was around hour 2 of the first evening.

By hour 4, I had the core functionality complete.

I couldn’t stop. The speed. The functional code. The discovery. The magic.

It felt like starting programming for the first time.

I worked from 6 PM until 5-6 AM. Not debugging LLM mistakes. But because it was actually working and I was intoxicated by the pace.

Note: This was November 2024, before Claude Code CLI launched (February 2025). Today’s workflow is even more seamless—but the core breakthrough was instruction-following capability, not IDE integration.

Part 3: Production Reality (15 Months Later)

First Deployment

When I pointed the Multi-Project Deployer at my actual infrastructure:

✓ Worked well enough to be useful ✗ Had bugs and edge cases ✓ Refined iteratively until it did what I needed

Critical lesson: “Well enough to be useful” beats “perfect but never ships.”

I didn’t wait for perfection. I shipped when it solved the core problem, then refined based on real usage.

Real Usage: 3 Cluster Rebuilds

Over the 15 months since building the tool (November 2024 → February 2026), I’ve used it for 3 full cluster rebuilds. Each time, the ROI became more obvious.

Before Multi-Project Deployer:

Manual process: Few weeks per rebuild
15+ steps to track
Dependency management via notes/memory
High error rate (missed dependencies, wrong order)

After Multi-Project Deployer:

Automated process: ~1 hour per rebuild
One command: npx multi-project-deployer up
Zero manual dependency tracking
Consistent, repeatable results

Unexpected benefit: Building the automation forced me to improve my overall infrastructure practices.

To automate deployment, I had to:

Properly document dependencies
Standardize configurations
Eliminate manual steps I’d been tolerating
Make everything reproducible

The tool paid for itself in the first use. Three rebuilds later, it’s saved months of cumulative time.

Technical deep-dive: For implementation details on the dependency graph logic, Git integration, and Pulumi orchestration, see Multi-Project Deployer: 100% LLM Code .

What Broke in Production

The LLM-generated code wasn’t perfect. Edge cases, race conditions, and configuration gaps surfaced in real usage.

But Sonnet could iterate and fix them. Each bug I reported, Claude 3.5 Sonnet debugged and resolved. The core design—dependency graphs, Git integration, Pulumi orchestration—remained solid throughout.

The validation lesson: LLM output requires validation . Not because the code was fundamentally broken, but because edge cases don’t surface until production. Automated testing caught what prompting alone couldn’t.

Conclusion: Programmer^10 Is Real

But it’s not magic. It’s capable tools finally catching up.

The tools (November 2024 → now):

Instruction following: Poor → Excellent
Context window: Limited → Massive
Code understanding: Basic → Deep
Consistency: Variable → Reliable

The result:

Built in 3 days what would take 6 months manually
100% LLM-generated code
Production-ready system
Time saved: weeks → hours

If You Failed Before, Try Again Now

My breakthrough happened in November 2024 with Claude 3.5 Sonnet (October 2024 upgrade). If you tried in early 2024 or before and failed, the tools have caught up.

Context, understanding, and consistency have all improved exponentially.

What wasn’t possible a year ago is routine today.

What’s routine today will seem primitive a year from now.

The question isn’t whether LLMs can build production software. They can.

The question is: Have the tools caught up yet?

For me, November 2024 was that inflection point.

Maybe February 2026 is yours.

Appendix

The Project

Multi-Project Deployer: 100% LLM Code - The original post about the project itself.

My Raw Notes from That Night

Software Development with LLM - What I wrote immediately after the breakthrough, before I understood what had happened.

Build LLM Guardrails, Not Better Prompts - Why validation is non-negotiable
Why AI Shouldn’t Orchestrate Workflows - Architecture principles for AI-augmented development

Attribution: This post was written by Claude (Sonnet 4.5) based on ideas, guidance, and editing by Eric Gulatee. Written February 2026.

The Inflection Point#

Part 1: Two Failures (Early 2024)#

Part 2: November 2024 - The Tool Catches Up#

The Problem That Forced My Hand#

Claude 3.5 Sonnet: The Difference#

The Moment I Knew#

Part 3: Production Reality (15 Months Later)#

First Deployment#

Real Usage: 3 Cluster Rebuilds#

What Broke in Production#

Conclusion: Programmer^10 Is Real#

If You Failed Before, Try Again Now#

Appendix#

The Project#

My Raw Notes from That Night#

Related Posts#