How We Taught an AI to See Websites and Rebuild Them from Scratch

Every week lots of businesses face the same painful realization. Their website is old and outdated. It was built on WordPress years ago. Now its stuck in the past. The developer who made it is gone. The design looks nothing like the brand the company has become. The cost of maintaining it keeps going up.

We asked a different question: what if you could just give us the URL?

No migration checklists. No content extraction spreadsheets. No six-week agency timelines. Just paste a URL, tell us what you want, and get a Next.js application back. Your content, your brand, your identity, none of the baggage.

This is the story of how we built it.

The migration problem

Website migration is one of those problems that sounds simple until you actually try to do it. The naive approach (scrape the HTML, clean it up, ship it) fails almost immediately. Real websites are messy. They have dynamically loaded content, cookie consent banners obscuring the viewport, lazy-loaded images that don't render until you scroll, JavaScript-dependent layouts that look nothing like their source HTML, and years of accumulated CSS specificity wars that make extraction a nightmare.

The existing tools in this space fell into two camps: automated scrapers that produced garbage, and agencies that charged $15,000 and took two months. Neither worked for the vast majority of businesses that just wanted their site to not look like 2016 anymore.

We wanted to build something that actually understood the website. Not just its HTML, but its design intent, its content hierarchy, its brand DNA. And could reconstruct it as a modern codebase.

Not cloning. Translating.

Early on, we made an important distinction that shaped the entire architecture: we weren't building a website cloner. We were building a website translator.

A clone copies syntax. A translator understands semantics.

When a human designer looks at a website, they don't see <div class="hero-wrapper-v2 legacy-container">. They see a hero section with a bold headline, a supporting subheading, a call-to-action button with a specific visual weight, and an image that anchors the composition. They understand the intent behind the layout.

That's what we needed our system to do. And it turns out, the way to get there is not by parsing HTML harder. It's by looking at the page the same way a human does. Visually.

One URL, endless directions

Here's something we learned early: you can't throw the same approach at every website transformation and expect great results. Migrating a site without changing a visual detail is a very different problem than reimagining it from scratch. Preserving content while overhauling the design requires a different strategy than borrowing a reference site's aesthetic for a completely different business.

Every request is different, and our system recognizes that. Based on what the user asks for, it automatically adapts: what it extracts from the source URL, what it preserves, what it reimagines, and how much creative freedom the generation layer gets. No manual configuration, no picking from a dropdown. Just describe what you want, and the system figures out the right build strategy on its own.

See it in action

Theory is cheap. Here's what this actually looks like.

Berkshire Hathaway

If you've ever visited berkshirehathaway.com, you know it's the internet's most famous time capsule. Plain HTML, no styling to speak of, a layout that predates CSS itself. Warren Buffett's company is worth nearly a trillion dollars, and their website looks like a homework assignment from a 1998 intro to HTML class.

We gave our system the URL and told it to redesign it as a mobile-first, modern experience.

Before:

Berkshire Hathaway original site

After:

Berkshire Hathaway redesigned by Rocket

Same information hierarchy. Completely transformed visual identity. Responsive out of the box, built in minutes -- not weeks.

Freddy Wexler: WordPress to Next.js

A real WordPress site from the WordPress showcase directory. The kind of site businesses run every day.

Before:

Freddy Wexler original WordPress site

After:

Freddy Wexler rebuilt by Rocket in Next.js

Same content, same visual identity, running on Next.js. No WordPress. No plugins. No maintenance headaches. Give us a URL, get back a codebase you actually want to work with.

Seeing like a designer

Here's where it gets interesting.

Traditional web scrapers parse the DOM. They see structure. They see elements. But they don't see design. A scraper can tell you there's a <section> with a background-color: #1a1a2e, but it can't tell you that section creates a sense of depth with a dark backdrop that makes the white typography pop.

Our extraction engine perceives websites on two dimensions simultaneously, structural and visual. It reads the markup and sees what the fully rendered page looks like to an actual visitor. That dual perception is the difference between understanding code and understanding design.

This matters more than you'd think. Many modern websites are basically empty shells that JavaScript fills in after load. Static HTML tells you almost nothing about them. Our extraction engine sees the complete picture: the final rendered state, the visual hierarchy, how elements relate to each other spatially, the way color and type create emphasis. It picks up on things that source code simply can't tell you.

And the real web is messy. Cookie consent banners, lazy-loaded content, modal overlays, JavaScript-dependent layouts. There's always something standing between you and the actual website. Our extraction engine handles all of it. It knows how to get to the real page underneath, regardless of what's in the way.

The extraction-generation split

One of the more interesting architectural decisions was separating the system into two distinct phases: extraction and generation.

For some of these, the phases are handled by different specialized agents. The extraction agent focuses purely on understanding the source website, pulling out the right signals. The generation agent takes those signals and writes the code.

This separation gives us two key advantages:

First, each agent can be good at its own job. The extraction agent needs to be thorough and careful. It should never lose information. The generation agent needs to be opinionated about code quality and fluent in modern frontend patterns. These are very different skillsets, and trying to do both in a single pass produces worse results.

Second, the boundary between extraction and generation is a natural validation checkpoint. We can inspect what was extracted before generation begins, catch errors early, and ensure nothing was lost in translation.

Context is everything (and the hardest problem)

If I had to name the single hardest engineering challenge in this system, it would be context engineering.

Each agent in the pipeline needs the right information to do its job. Not more, not less. Too much context and the model gets confused, latches onto irrelevant details, or runs out of room before it finishes. Too little and it fills in the blanks with guesses.

For the extraction agent, context means: the rendered HTML, the visual snapshot, the user's intent, the rules for what to extract, knowledge about common web patterns, and instructions for edge cases. For the generation agent, it's the extracted signals, the project structure, coding conventions, framework constraints, and the design principles behind good frontend code.

Getting this right took a lot of iteration. We built a pipeline that carefully constructs, filters, and routes context to each agent at each step. When tool results get too large, they're offloaded so they don't crowd out the information that actually matters.

The payoff is real: giving each agent exactly what it needs and nothing more produces far better results than a monolithic "do everything" approach.

What we learned building this

Hybrid beats pure AI. Our system isn't purely LLM-driven. There are deterministic rules, validation gates, and hard constraints guiding the AI at every step. The LLM brings the intelligence; the engineering brings the guardrails. Pure AI systems drift. Hybrid systems stay on track.

The right approach for the right problem. A single pipeline for every use case produces mediocre results across the board. Dedicated approaches for each use case, even when it means more engineering work, produce far better outcomes.

Where we're going

The system described here is what we've built so far. We're working on multi-page transformation, deeper design intelligence (understanding not just what design decisions were made but why), and tighter iteration loops so users can refine the output conversationally until it's exactly right.

The vision is simple: any website, any state of disrepair, rebuilt as a Next.js app from a single URL. We're not there on every edge case yet, but we're closer than anyone else.

If you have a WordPress site that's been haunting you, a legacy codebase that no one wants to touch, or a design that hasn't been updated since Bootstrap 3 was cool. Give us the URL. We'll handle the rest.

Table of contents