Vibe Coding Showdown (2025): Base 44 vs. Replit vs. v0 vs. Bolt vs. Lovable vs. Rork

October 30, 2025

Finnian Brown

TLDR: I gave six “vibe coding” tools the same job: build a modern Tumblr-style social app with auth, posts, media uploads, likes, comments, tags, and a dashboard feed.

A lot of prompt-to-app tools demo great. The question is: can they build apps that actually work? The answer for most was no.

Here's the video version:

Scorecard

Tool

Auth

Posts

Images

Comments

Likes

DB

Frontend

Base44

Good

Replit

Okay

Bolt

⚠️

Okay

Lovable

⚠️

Weak

Vercel v0

Weak

Rork

⚠️

⚠️

⚠️

Pretty

I did all this testing on October 27th, 2025.

The Problem with AI Agents and "Vibe Coding"

The basic problem with all of these highly automated prompt-to-app tools is more than 90% of the time, the app just doesn't work at all.

About the only thing I've found the tools to be able to reliably do is make blogs or front-end only sites.

If I look at my history on Replit or Bolt or Lovable, I see a list of projects that failed to get off the ground.

This is a huge bummer, because one of the promises of the AI age is that it's supposed to be the golden age of the idea guy.

Here's a small sample of things that spent serious time and money trying to get work on Replit and Loveable that are currently pet rocks:

  • A physically accurate tide simulator showing Earth and Moon positions

  • A token on Solana called Zhom

  • A 2d missile defense simulation game

  • A bank backend in COBOL

  • A hacker news clone

  • A Factorio blueprint editor

  • A site tracking if Highway 1 is open (this kinda worked)

  • A web version of Verdy du Vernois, 1876 — Beitrag zum Kriegsspiel

But before we get to the above, I think the rite of passage for any competent coding agent is to build a functional social media site. I could have picked Reddit, Hacker News, Instagram, TikTok, but I think cloning Tumblr is as good as any of those, and if it works I can imagine having more fun with it.

The Test

I asked each tools to "build Tumblr". The features are well known. You need to be able to sign up. You need to be able to post text, images, quotes, etc.. You need to be able to follow different people's blogs. You need to have likes, reblogs, and a comment system.

Here's the prompt I used:

Build a comprehensive web application that meticulously replicates Tumblr's entire functionality and user experience, aiming for a modern 2025, view-for-view clone. This should include core features such as user account creation and management, posting capabilities with multimedia support (images, videos, text), following and interaction systems (likes, reblogs, comments), notifications, and dashboard management. Implement the necessary frontend components, backend architecture, database schemas, and APIs required to achieve feature parity with Tumblr. Additionally, ensure responsive design, scalability, and security best practices so the clone is robust and user-friendly across multiple devices and user loads. Design should be mobile first but responsive and above all simple and minimal.

I used Aqua Voice to dictate the prompts and the follow-ups, which you can try for free here. I highly recommend the "voice prompting" workflow to anyone who hasn't tried it yet.

Tools Used:

You can check out the published versions of what each tool produced.

I'm a big fan of Simon Willison's blog and his now famous test of asking models to create an SVG of a pelican riding a bicycle.

The pelican test isn't a comprehensive benchmark, but it's surprisingly high signal for how simple it is.

I'm hoping, "clone Tumblr," can become something like that for zero to one coding tools.

Here are the results.

Replit (Agent v3)

Replit Agent v3 is very flexible. It takes a while to run (18 minutes in my case), but was second only to Base44 in terms of out-of-the-box functionality. When it was done, I had a functioning app with Replit auth, real file uploads that worked on the first try, text posts, and likes.

My two gripes were:

  1. the design kind of sucked

  2. the agent didn't implement comments and just put, "coming soon..."

Tumblr clone built with Replit showing an Aqua Voice stats card.

Verdict: If you want control over your stack and are fine waiting a bit longer, Replit is a solid choice and probably the correct choice if TypeScript React node isn't ideal.

Bolt

Bolt gave me a login screen and got me signed in, but the minute I tried to post I got a database error. The design wasn’t anything too special, but the deal-breaker was backend reliability.

Worse, when I pushed for images, the tool defaulted to a “paste an image URL” pattern, which is a cheat. Tumblr doesn’t ask for a link to an image that already lives somewhere else. You have to hack around things that with bolt, and it's very hard to go from 0 to 1. In fairness, I could probably fix the DB errors with ten minutes more prompting, but the whole point of this test was out-of-the-box performance. Bolt failed that.

Tumblr clone built with Bolt.

Verdict: So So at frontend, but annoying backend errors.

Base 44

I'd never used Base44 before, but it worked better out of the box than any of the other tools. It had functional auth, text posts, image uploads, likes, and comments. The core stuff was all there.

The only gripe I have with it is it built me something a bit closer to Twitter or Instagram than Tumblr, but there's a lot of overlap and the design was functional if a little bit uninspired.

Base 44 seems to provide backend primitives to the model which avoid having to set up things like a Postgres database schema from scratch. That bet seems to have paid off, and while it might make scaling out more difficult and does mean some platform lock-in, in my mind the tradeoff is more than worth it for functional code.

Tumblr clone built with Base 44.

Verdict: Clear winner, most functional out of the box.

Vercel v0

I tried v0 back when it launched as a front-end only tool. It's since become a full prompt to app platform with super base integrations and all that. The interface was very pretty, but the code didn't work at all.

First, a migration script wouldn't run, and after clicking the fix it button a few times, I made it to a signup screen that was broken and the model couldn't get it working after three tries.

Tumblr clone built withVercel V0 showing an error..

Verdict: great vibe coding UI, that didn't work at all. Pet Rock.

Lovable

I went into the test expecting Lovable to win. I enabled their cloud backend, which is supposed to make backend tasks less brittle. Auth worked out of the box, so did text, posts, and likes. When it came to uploading images, it wanted a URL instead of an image file, which was disappointing given that image uploading is kind of the core feature.

The UI was also pretty terrible, which was surprising. For previous tasks, I'd found Lovable pretty good at frontend.

Tumblr clone built with Lovable.

Verdict: maybe a bit better than Bolt, but only marginally. Most things didn't work.

Rork

Rork made a strong first impression with a clean UI, both for the Vibe Coder and for the Tumblr clone. It had by far the best front end, but the backend was a bit of a dumpster fire.

After enabling "backend", I still couldn't get a persistent account system, persistent likes, working posts, or image uploads.

Tumblr plan built with Rork.

Verdict: Best frontend by far, but it stayed non-functional. We call this Boomer Mode ( treating non-functional mockups of an app as the app itself)

Awards

Best Overall: Base 44 (most working stuff out of the box)

Runner Up: Replit Agent (lame design but pretty good functionality)

Best Frontend: Rork (great looking UI, but couldn't get it hooked up to anything)

Why is this on the Aqua Voice Blog?

Aqua's mission is to make voice a first-class input method for computers. The more things that can be done with text prompts, the more powerful voice is as a tool. We've put a ton of time and resources into optimizing the Aqua clients for Mac and Windows to be fast, contextually aware, easy to use, and great for technical speech.

The reason we decided to get in the model training game was to get better at technical terms that other models were ignoring. Avalon is the only ASR model on the planet that can transcribe Supabase correctly.

But in order for voice to be a "dream interface," the AI agents have to hold up their end of the bargain. if we're honest, many of them do not. As Karpathy pointed out in his recent Dwarkesh appearance, many agent implementations are way over their skis, promising a lot more than they can deliver.

We want Aqua users, most of whom are AI early adopters, to use tools that will actually work. So they can go from speech → prompt → working application, with minimum AI babysitting.

Yes, I actually tested all of these and wrote this myself.