Writing a Full App With Claude: A Step-by-Step Walkthrough

Demos make AI look magical. A polished video shows Claude Code building a full app in ten minutes. What those demos hide is the messy reality of real development: the bugs that emerge only in production, the edge cases that break the happy path, the integration pain with real databases and real services. This guide is a transparent, start-to-finish walkthrough of building a real web application using Claude Code as the primary development tool. Every prompt, every mistake, every fix. The goal is to show what building an app with Claude actually looks like — not the glossy version but the real one, so you know what to expect when you try the same on your own project.

The project: a recipe-sharing app

The sample project: a simple recipe-sharing app where users can post recipes, view others' recipes, search by ingredient, and save favourites. Backend in Node.js with Express, PostgreSQL database, React frontend, authentication via magic links. Deployable on any modern cloud host.

The project is deliberately realistic: it has enough features to exercise multi-file coordination, enough complexity to hit real integration points, and enough surface area that things can (and will) break in interesting ways. Building it by hand would take a solo engineer about a week of focused work. Building it with Claude Code took about two focused days in my experience — not the "fifteen-minute demo" you see in marketing videos, but a meaningful speed-up.

Setup: CLAUDE.md and project skeleton

First step: write a good CLAUDE.md before any coding. Time spent: 30 minutes.

The CLAUDE.md includes: project description (what the app does, who the users are), tech stack (Node, Express, PostgreSQL, React, Tailwind), conventions (TypeScript strict mode, Prettier for formatting, explicit error handling, no throwing from library code), architectural principles (thin controllers, fat services, database access only through repositories), testing requirements (every service method has unit tests, each endpoint has integration tests), and deployment target (single Docker image, environment variables for configuration).

This document is the most valuable thing I wrote on the project. Every Claude Code session starts by reading it, and the output quality is noticeably better than sessions in projects without a good CLAUDE.md.

Next: create the project skeleton. I do this by hand — initialising npm, setting up TypeScript, creating the directory structure. This is standard boilerplate that takes 15 minutes, and having a clean skeleton before involving the AI avoids drift in the first agent session.

Task 1: the database schema and repository layer

First Claude Code task: design the database schema and implement the repository layer.

Briefing: "Design the database schema for a recipe-sharing app. We need tables for users (with magic-link authentication tokens), recipes (title, description, ingredients as JSON array, instructions as text, author), favourites (user-recipe link), and search-friendly text indexes. Propose a schema using PostgreSQL syntax. After I approve, implement migration files, the repository layer in TypeScript, and unit tests."

What Claude Code produced: a well-structured schema with appropriate indexes, migration files in standard format, TypeScript repository classes with typed return values, and unit tests using Vitest. Most of the output was usable as-is.

What needed correction: the initial schema used varchar(255) for recipe titles. I asked to change to text with a length constraint; this is a preference for flexibility. The repository methods initially threw errors; I asked for Result-type returns instead. Two small rounds of iteration.

Time: 45 minutes start-to-finish, vs probably 2-3 hours by hand. Good first task.

Task 2: authentication and magic links

Second task: implement magic-link authentication.

Briefing: "Implement magic-link authentication. When a user enters their email, we send a one-time token via email, they click the link, and they are logged in via JWT. The auth middleware should verify JWTs on protected routes. Use best practices for token expiration, CSRF protection, and rate limiting."

What Claude Code produced: a complete auth flow with endpoints for requesting magic links, verifying tokens, and issuing JWTs. Middleware for JWT verification. A decent stub for email sending (using a placeholder service). Tests covering happy paths and token expiration.

What needed correction: the initial JWT verification did not handle token revocation. I asked for an optional revocation list, using Redis. The rate limiting was naive; I asked for a token-bucket implementation with configurable per-endpoint limits. Three rounds of iteration.

Time: 90 minutes. This was harder because auth is security-sensitive and my review needed to be thorough. Still faster than hand-writing, but not by a huge margin.

Task 3: recipe CRUD endpoints

Third task: the recipe endpoints (create, read, update, delete, search).

Briefing: "Implement the recipe endpoints. Create, read by ID, list by author, update (author-only), delete (author-only), and search by ingredient. Use the repository layer we built. Validate input with Zod. Return structured errors for validation failures. Add integration tests for each endpoint."

What Claude Code produced: five clean endpoints with Zod schemas, thin controllers delegating to services, proper HTTP status codes, and integration tests using supertest. Almost entirely usable.

What needed correction: the search endpoint initially did case-sensitive matching. I asked for case-insensitive. The pagination used offset-based; I asked for cursor-based. Two small rounds.

Time: 60 minutes. This was the kind of structured work Claude Code excels at. Clear patterns, clear constraints, clear success criteria.

Task 4: the frontend

Fourth task: the React frontend.

Briefing: "Build the React frontend. Pages: login, home (recipe feed), create recipe, edit recipe, search, user profile. Use React Router, TanStack Query for data fetching, Tailwind for styling. The API client should be typed using types inferred from the backend Zod schemas."

What Claude Code produced: a working React app with all the pages, routing set up, data-fetching hooks, and a typed API client. Styling was functional but generic.

What needed correction: the biggest issue was that Claude Code initially created a monolithic component structure — too much logic per page. I asked for more componentisation. The design was plain; I did several rounds of design iteration to improve the visual polish. I also caught a bug where the API client was missing the auth header on some requests.

Time: 3-4 hours. Frontend work, especially design iteration, is harder for AI than backend structured work. Still faster than hand-coding but by a smaller multiple.

Task 5: deployment

Final task: deployment configuration.

Briefing: "Create a Dockerfile for production builds (backend + frontend), a docker-compose.yml for local development, and a deployment workflow for Fly.io. Include a health check endpoint."

What Claude Code produced: a multi-stage Dockerfile, a clean docker-compose config, a fly.toml with sensible defaults, and a health endpoint. All worked on first try, after addressing a minor issue with environment variable passing.

Time: 45 minutes. Deployment configs are a strength for AI because they are highly structured and well-documented patterns.

What worked well

Reflecting on the overall experience.

Structured backend work was fast and high-quality. Repository layers, API endpoints, validation, tests — Claude Code executed these patterns cleanly with minimal iteration.

Boilerplate-heavy setup — dockerfiles, configs, CI workflows, dependency setup — was trivially handled. Hand-writing these used to consume disproportionate time; AI removes that friction.

Test generation was reliable. For backend logic, the tests Claude Code produced were thorough, covered edge cases I would have missed, and helped me catch my own bugs when the tests surprised me.

Documentation writing was excellent. README, API docs, and inline comments were all drafted cleanly and required only light editing.

What was hard

The friction points.

Frontend design. Turning structural code into a polished UI is slow with Claude Code. The model produces functional designs but rarely great ones. Several iteration rounds were needed for acceptable polish.

Cross-cutting architectural decisions. When the project had to decide between approaches with tradeoffs (cursor vs offset pagination, sync vs async email sending), Claude Code would pick a default but the decision-making felt mechanical. Human judgment was needed to override.

Integration debugging. When things failed between components — frontend calling backend with wrong content-type, Docker network weirdness — diagnosing required more back-and-forth than it should have. Claude Code would propose fixes based on plausible causes, but the real causes sometimes took investigation.

Security review. I reviewed all auth-related code very carefully. Claude Code's auth implementation was reasonable, but trust should be earned, not granted. Being paranoid about auth saved me at least one subtle bug.

The prompting patterns that worked

Looking back, a few patterns were consistently useful.

Plan-then-execute. "Propose a plan, then wait for my approval before making changes." This was invaluable for non-trivial tasks.

Reference-driven. "Use the pattern from X file." Pointing at existing code was always faster than describing patterns in prose.

Explicit constraints. "Do not modify files outside the recipes directory." Without constraints, Claude Code sometimes reached into unrelated files.

Definition-of-done. "Ready when all tests pass and the curl command returns 201." A clear success signal made the loop close cleanly.

Short iterations. "Let's do just the GET endpoint first, then add POST after we review." Keeping changes bounded made review easy and caught drift early.

The total time honest accounting

Adding up actual hours. Backend work (schema, auth, CRUD, tests): ~4 hours. Frontend work: ~4 hours. Deployment config: ~1 hour. Iteration and debugging throughout: ~3 hours. Total: ~12 hours over two focused days.

By hand, this app would have taken me ~40-50 hours spread over a week. Claude Code produced roughly a 3-4x speedup on what is an intentionally realistic but simple app. For more complex apps, the speedup varies — sometimes higher (boilerplate-heavy work), sometimes lower (architecturally subtle work).

This is nowhere near the "fifteen-minute demo" speedups you see in marketing. Real work with real review takes real time. But 3-4x is still transformational. Multiply by a career's worth of projects and the impact is enormous.

The uncomfortable moments

A few moments in the project were uncomfortable in ways worth documenting, because they represent the real experience of building with AI, not the polished version.

Moment 1: the silent scope creep. In Task 3, Claude Code "helpfully" added rate limiting middleware that I had not asked for. The code was reasonable but was not in the brief. I noticed during review and asked for it to be removed. Similar drift happened twice more during the project; I learned to scan diffs for unrelated changes specifically.

Moment 2: the wrong mental model. On Task 2 (auth), Claude Code initially implemented magic-link verification using email as the primary identifier instead of email-plus-token. This would have allowed anyone who knew a user's email to log in as them. I caught this on review. It was a single-letter-of-the-spec mistake — "email verification via magic link" was interpreted as "verifying the email" rather than "verifying a token sent to that email." The vulnerability was subtle enough that a casual reviewer might have missed it. This incident shaped how carefully I reviewed security-sensitive code for the rest of the project.

Moment 3: the hallucinated API. In the frontend, Claude Code used a React Query method that does not exist — a plausible-sounding method invented from similar APIs. The TypeScript compiler caught it immediately. Without the compiler, it would have shipped and failed at runtime. Static typing remained a major safety net throughout the project.

These moments are not deal-breakers. They are the normal texture of AI-assisted development, and they are why review discipline matters.

What I would change next time

Lessons for the next project.

Spend more time on CLAUDE.md. The initial document was good, but I kept adding to it as the project went. Writing it more fully up-front would have saved some early iteration.

Use subagents more. I used Claude Code's default mode throughout. A code-reviewer subagent for each task's final review, and an explore subagent for early codebase research, would have added value.

Write the tests first more often. Test-first was faster on tasks where I did it than tasks where I did not.

Accept that frontend is harder. Do not try to drive polished UI design through prompts alone; use design tools first, then have Claude Code implement the agreed design.

Commit more often. I committed every few tasks; more frequent commits would have made reversion easier on the few sessions that went badly.

Who this experience translates to

The lessons generalise in predictable ways.

Solo developers building CRUD apps: expect 3-5x speedup on a well-briefed project. The tasks Claude Code is strongest at — boilerplate, structured backend work, tests, deployment config — are most of what these apps need.

Startup teams shipping features: similar speedup on feature work, slightly less on the harder product decisions and design iteration.

Large enterprise teams working on existing codebases: speedups are real but variable, depending on how well the existing codebase is documented and how clear patterns are. Poorly documented legacy code is harder for AI than greenfield work.

Research and prototype work: large speedups. Prototypes are forgiving of rough edges, and Claude Code's speed at generating functional-if-imperfect code fits the prototype mindset.

Production readiness versus prototype readiness

One honest caveat about the two-day timeline: the app that resulted was a credible prototype, not production-ready software. Production readiness requires additional work that AI handles more unevenly.

Observability: structured logging, metrics, tracing, alerting. Claude Code can draft these but the tuning and integration are human work. Add a day or two.

Security hardening: penetration testing, dependency audits, secret management, real rate limiting at the network layer. Multiple days of review beyond the initial implementation.

Scaling: real load testing, database indexing tuned for actual query patterns, caching strategies for common reads, connection pooling. Often another week of work.

Operational polish: runbooks for incident response, on-call documentation, backup and disaster recovery procedures. Usually a few more days.

Getting from "working prototype" to "production-grade" adds 1-2 weeks to a project of this size. AI helps with each of those steps but the speedup is more modest than on initial feature work. Understanding this distinction helps set realistic expectations for AI-assisted development projects.

You can genuinely ship a small production app with Claude Code in days rather than weeks — but only if you treat it like a fast teammate, not a magic wand. Review carefully, iterate in bounded steps, and respect the honest limits of what AI can do.

The short version

Building a real app with Claude Code genuinely works, but the realistic timeline is days rather than weeks — not the minutes-instead-of-hours speedup that the splashier demos suggest. The friction points are design, architectural decisions, and integration debugging. The wins are structured backend work, tests, boilerplate, and deployment. A good CLAUDE.md is consistently the single highest-leverage investment in any Claude Code project. Careful review of every diff is non-negotiable. Expect 3-4x productivity on realistic projects, not 10x. Multiply that over a year of projects, and the practice is genuinely transformational for what one developer can ship. The ceiling is not where the AI writes all the code; it is where the human uses the AI well enough to double their effective output while maintaining (or improving) quality. Teams that get there early will have a compounding advantage over teams that do not.