Testing Claude Code to Build a Lightweight Tool

What if AI could help you build the plugin, tool, or script you actually need to do your work?

Aug 04, 2025

Let’s continue discussing what AI coding tools unlock for product designers.

In case you missed it, last week I explored v0 and looked at how it can build UI while staying in context with your design system.

This week: building your own tools.

Not design tools like Figma or Sketch, but a simple Figma plugin, an integration, or a script. Something that’s existing tools don’t offer yet, and that’s been a pain point for you.

With AI on the table, you’re no longer stuck waiting for the perfect plugin, platform, or product. You can shape something that actually fits how you work.

But, do we really need to build one?

After two hours and $11 in AI API credit using Claude Code, I’m convinced it could be worth the investment. These simple tools you build might save you hours and unlock better ways of working. The return can be huge for the effort or cost it takes.

In this experiment, I built a lightweight utility for usability testing like Maze or Sprig. It tracks time spent, whether the tester completed the task, if they opened a hint, and collects written feedback. It captures signals of confusion or success.

At first, I wondered: don’t tools like Maze or Sprig already cover most of what I need? But building something lightweight means I only integrate the functions I actually use. Even, I can add or tweak features that aren’t available out of the box.

More importantly, off-the-shelf tools often come with limitations e.g. API. When you build your own, you get around that. Say you want to automatically summarize usability insights with AI once a certain number of testers complete the flow and send that summary. That kind of workflow isn’t easy to plug into with existing tools.

Maybe you want to get more ambitious: generate design variants with AI, send them to your tester pool, capture the data, and synthesize insights.

A hypothetical flow most existing tools can’t support, unless you build it yourself

But let’s keep it simple for now.

I just want to share this: there’s real potential in AI to help us build what existing tools don’t offer yet.

The Stack I use

Claude Code (running in VS Code) for AI prototyping
Next.js to handle the full-stack app (frontend + API routes)
shadcn/ui for accessible, pre-built components with clean design defaults
Notion as a simple backend database, integrated via the Notion API

Thoughts on Claude Code

The reason I tried Claude Code this time is because I ran into errors using v0, and it couldn’t help me fix them. I heard people say Claude Code is the best for AI coding, so I gave it a shot.

Claude Code seems to do much better than v0 when it comes to vibe coding1. It feels more agentic2, it can keep going on a task, run the code in the environment, check the results (in the console log), and verify things on its own. I can leave it, make my coffee or check on my kids, and come back to see the results.

I like seeing the to-do list it generates and crosses things off when they’re done.

Sometimes, Claude Code still didn’t fully understand my intention. But when I corrected it, one thing that really impressed me was how Claude debugged like an engineer. It added console.log statements to the code, checked the output, and reran the code to validate whether the result matched the expected behavior.

And also, as you can see in the screenshot above, it asks for confirmation before doing something, which keeps us in the loop. It gives me a moment to slow down and read the reasoning. If I don’t understand something, I drop it into an LLM and ask it to explain it back to me. I don’t want it to vibe code all the way through, I want to learn and understand what’s going on.

Tip: Ask the LLM to explain it in plain language. That’s been helpful for me.

Another thing I appreciate is how Claude shows how much percentage of the context window is left, so I know when its performance might start to degrade and costly, and it’s time to start a new conversation. The context window is really helpful within a single chat session, it remembers the entire conversation history with Claude Code, including all the files and prompts you’ve given.

Context window diagram with extended thinking — Claude - Context windows

Other than that, a small delightful touch for me is the changing copy that shows what Claude is currently doing.

My workflow for creating usability testing tool

When working with AI coding tools, I follow two key principles:

Be clear on the intention. Before jumping into code, I start with a separate brainstorming chat in an LLM to break things down step by step: what to prompt, how to structure it, and what order to run it in. I’m not a fan of single-shot prompts that try to do everything at once. I’d rather approach things part by part. Treat this early step like writing a mini PRD for yourself.
Focus on getting it to function first even if it’s just the minimal version. If your goal is to track certain metrics by connecting to Notion as a database, start with one metric and make sure that connection works. Once that’s solid, you can build on it with more details or additional metrics in the next prompt.

I also thought about the tool conceptually as having three parts: the prototype you want to test, a function to capture metrics, and a dashboard to display the results.

I tackled them one by one, starting with a working prototype: a flow for testers to try out. I added an intro page and a thank you page to complete the experience.

Then I wrote a prompt to capture metrics and record them in Notion.

Connect to the Notion API and focused on capturing one core metric: did the user complete the task, fail, or abandon it?

The reason I use Notion is because I already use it a lot, and with its API, it’s easier for me. I don’t have to set up a heavy backend.

Once that was working and successfully recorded into Notion, I added the rest of the metrics:

Task Success: Indicates if the user completed the task.
Time on Task: Duration taken to complete the task.
Hint Clicks: Number of times hints were clicked.
Step Views: Number of step-by-step instructions viewed.
Start Time / End Time: Timestamps for when the session began and ended.

Finally, I built a metrics dashboard that pulls this data from Notion and displays the metrics I want to track.

Here’s the boilerplate if you’d like to copy my project: Github

Here’s the working prototype if you’d like to test it: Prototype | Metrics Dashboard

So, what do you want to build today?

Until next time,
Thomas

Footnotes:

Vibe coding is an AI-assisted approach to building software where you prompt what you want, and the AI writes the code. You guide with feedback rather than managing every line. The term was popularized in 2025 by ex-OpenAI researcher Andrej Karpathy, who described it as “forgetting that the code even exists.” Wikipedia

Agentic means it operates with a higher degree of autonomy. It doesn’t just wait for instructions, it takes initiative, follows through on tasks, and manages more of the process on its own. Wikipedia

Design Buddy

Discussion about this post

Ready for more?