Raw notes from the breakthrough night: November 2024. I wrote a short, emotional post the night I finally got 100% LLM-generated code to work in production. This is the story and lessons learned.
The Inflection Point
November 2024. One evening, I started working on a deployment automation tool. By 5-6 AM, I had a functional production system—100% of the code generated by an LLM.
The project: Multi-Project Deployer - orchestrates Pulumi infrastructure deployments across multiple projects with complex dependency management.
Time to build manually: Estimated 6 months Time with LLM: 3 days Time saved per use: Weeks of manual configuration → ~1 hour automated
But here’s what made it remarkable: This wasn’t my first attempt to build something substantial with LLMs.
I’d experimented casually when ChatGPT launched in late 2022—and wasn’t impressed with the output. In early 2024, I got serious and tried twice to build real projects with LLMs.
Both failed.
Not because I didn’t know what I was doing. But because the tools weren’t ready yet.
Part 1: Two Failures (Early 2024)
Both attempts in early 2024 failed with the same pattern. I tried different projects with earlier versions of both ChatGPT and Claude. Both followed the same destructive pattern:
The Failure Loop:
- Build Feature A → ✓ Works
- Build Feature B → ✓ Works
- Feature A stops working → ✗
- Fix Feature A → ✓ Works
- Feature B breaks → ✗
- Fix Feature B → Feature A breaks again → ✗
- Dead end.
The LLMs would:
- ❌ Undo their own code changes
- ❌ Break existing functionality when adding new features
- ❌ Contradict earlier decisions
- ❌ Lose track of what we were trying to achieve
- ❌ Get stuck in loops of breaking and fixing
The root cause: Instruction following and consistency.
The LLMs couldn’t:
- Stick to constraints reliably
- Remember and respect earlier decisions
- Maintain consistency across conversation turns
- Build incrementally without drift
Why this matters: Validation guardrails became critical when I discovered LLM consistency issues. The failures taught me that hoping for perfect output is a losing strategy—systematic validation is the only reliable approach.
Part 2: November 2024 - The Tool Catches Up
The Problem That Forced My Hand
I’d rebuilt my Kubernetes cluster three times over the past year. Each time: a few weeks of configuration hell.
The problem:
- Multiple Pulumi projects (infrastructure as code)
- Complex dependencies between projects
- Project A must deploy before Project B
- Project C depends on both A and B
- Manual tracking in notes/spreadsheets
- Error-prone, repetitive, undocumented
After the third rebuild, I was done suffering.
I decided to automate 100% of it. One command to deploy everything, respecting all dependencies.
Claude 3.5 Sonnet: The Difference
The breakthrough capability: Instruction Following
Earlier LLMs would drift. They’d lose context, break constraints, contradict themselves.
Claude 3.5 Sonnet (October 2024 upgraded version) was different.
The tooling: Just the web interface at claude.ai. Cut and paste from browser. No IDE integration. But the breakthrough wasn’t the tooling—it was the model’s reliability.
It actually followed instructions consistently.
For the first time, the LLM could:
- ✓ Add new features without breaking existing code
- ✓ Stick to earlier decisions across conversation turns
- ✓ Follow specific requirements reliably
- ✓ Build incrementally without corrupting the codebase
Why workflow orchestration works: The Multi-Project Deployer proved a counterintuitive principle I later wrote about: AI shouldn’t orchestrate workflows —it should generate the code that does. This architectural decision is why the tool remained maintainable.
The Moment I Knew
The test that had failed in both early 2024 attempts: Adding Feature B without breaking Feature A.
When Claude 3.5 Sonnet successfully added dependency graph traversal logic to the existing Git integration code, and both kept working, I knew this time was different.
That was around hour 2 of the first evening.
By hour 4, I had the core functionality complete.
I couldn’t stop. The speed. The functional code. The discovery. The magic.
It felt like starting programming for the first time.
I worked from 6 PM until 5-6 AM. Not debugging LLM mistakes. But because it was actually working and I was intoxicated by the pace.
Note: This was November 2024, before Claude Code CLI launched (February 2025). Today’s workflow is even more seamless—but the core breakthrough was instruction-following capability, not IDE integration.
Part 3: Production Reality (15 Months Later)
First Deployment
When I pointed the Multi-Project Deployer at my actual infrastructure:
✓ Worked well enough to be useful ✗ Had bugs and edge cases ✓ Refined iteratively until it did what I needed
Critical lesson: “Well enough to be useful” beats “perfect but never ships.”
I didn’t wait for perfection. I shipped when it solved the core problem, then refined based on real usage.
Real Usage: 3 Cluster Rebuilds
Over the 15 months since building the tool (November 2024 → February 2026), I’ve used it for 3 full cluster rebuilds. Each time, the ROI became more obvious.
Before Multi-Project Deployer:
- Manual process: Few weeks per rebuild
- 15+ steps to track
- Dependency management via notes/memory
- High error rate (missed dependencies, wrong order)
After Multi-Project Deployer:
- Automated process: ~1 hour per rebuild
- One command:
npx multi-project-deployer up - Zero manual dependency tracking
- Consistent, repeatable results
Unexpected benefit: Building the automation forced me to improve my overall infrastructure practices.
To automate deployment, I had to:
- Properly document dependencies
- Standardize configurations
- Eliminate manual steps I’d been tolerating
- Make everything reproducible
The tool paid for itself in the first use. Three rebuilds later, it’s saved months of cumulative time.
Technical deep-dive: For implementation details on the dependency graph logic, Git integration, and Pulumi orchestration, see Multi-Project Deployer: 100% LLM Code .
What Broke in Production
The LLM-generated code wasn’t perfect. Edge cases, race conditions, and configuration gaps surfaced in real usage.
But Sonnet could iterate and fix them. Each bug I reported, Claude 3.5 Sonnet debugged and resolved. The core design—dependency graphs, Git integration, Pulumi orchestration—remained solid throughout.
The validation lesson: LLM output requires validation . Not because the code was fundamentally broken, but because edge cases don’t surface until production. Automated testing caught what prompting alone couldn’t.
Conclusion: Programmer^10 Is Real
But it’s not magic. It’s capable tools finally catching up.
The tools (November 2024 → now):
- Instruction following: Poor → Excellent
- Context window: Limited → Massive
- Code understanding: Basic → Deep
- Consistency: Variable → Reliable
The result:
- Built in 3 days what would take 6 months manually
- 100% LLM-generated code
- Production-ready system
- Time saved: weeks → hours
If You Failed Before, Try Again Now
My breakthrough happened in November 2024 with Claude 3.5 Sonnet (October 2024 upgrade). If you tried in early 2024 or before and failed, the tools have caught up.
Context, understanding, and consistency have all improved exponentially.
What wasn’t possible a year ago is routine today.
What’s routine today will seem primitive a year from now.
The question isn’t whether LLMs can build production software. They can.
The question is: Have the tools caught up yet?
For me, November 2024 was that inflection point.
Maybe February 2026 is yours.
Appendix
The Project
Multi-Project Deployer: 100% LLM Code - The original post about the project itself.
My Raw Notes from That Night
Software Development with LLM - What I wrote immediately after the breakthrough, before I understood what had happened.
Related Posts
- Build LLM Guardrails, Not Better Prompts - Why validation is non-negotiable
- Why AI Shouldn’t Orchestrate Workflows - Architecture principles for AI-augmented development
Attribution: This post was written by Claude (Sonnet 4.5) based on ideas, guidance, and editing by Eric Gulatee. Written February 2026.
