Rembrandt's Workshop

Multi-Agentic Software Development

Nov 03, 2025

It is Amsterdam, 1642, a center of industry and culture where newly wealthy merchants are finding themselves in desperate need of something to put on their walls. To that end, we find ourselves in Rembrandt’s workshop. The master stands before an enormous canvas which will become The Night Watch. Around him, a dozen apprentices grind pigments, prepare panels, and work on background elements. Some are skilled journeymen who can paint entire sections from descriptions provided by Rembrandt. Others are newer students, capable, but requiring close supervision. Rembrandt himself focuses on the composition, the dramatic lighting, the faces that will give the work its power.

We have no idea just how many people were involved in producing this, but only one guy signed it

Three hundred and eighty-three years later, I find myself running a remarkably similar operation. The difference is that my apprentices are made of math instead of meat, they work at the speed of electricity, and I can spin up a dozen of them for the cost of a decent lunch.

This is the best mental model I’ve found for working with the strange new angels of multi-agentic AI in 2025. Like all the best analogies, it’s not perfect but the differences are deep and illuminating. But because it captures something essential about the actual workflow that’s crystallizing as tools like Claude Code, Cursor 2.0, and GitHub Code Quality reshape how we build software. We’re not managing perfectly autonomous agents that will replace us. We’re not pair programming with a single AI assistant. We’re running workshops where we maintain architectural vision while orchestrating multiple specialized assistants, each handling different aspects of the work.

The Workshop Floor Is Getting Crowded

In my last post, I talked about how agentic AI shattered the iron economics of software development—how code production went from $150/hour to practically free. What I didn’t fully explore was what happens next: not one developer working with one AI assistant, but one developer orchestrating an entire team of specialized agents, each bringing different capabilities to bear on different aspects of the work.

The last few months have seen a cluster of releases that suggest the industry is converging on this same insight:

GitHub Code Quality just entered public preview. It offers in-context code quality findings generate by specialized agents and available directly in pull requests, including one-click AI fixes. This is GitHub/Microsoft acknowledging reality: AI assistants are going to be generating mountains of code, and we need better tooling to review and improve what they produce. The bottleneck has shifted from generation to verification.

Cursor 2.0 launched with a variety of new features designed specifically for multi-agent interaction. They’ve got a new interface to let different agents work on the same problem simultaneously and pick the best result, handling the details using either git worktrees or remote sandboxes. They’ve reworked their diff interface specifically around the idea that the user will be spending a lot of time reviewing agentic offerings. Even little details like native voice-to-text show where the future is going. A year ago the idea of using voice to run your IDE would have been a punchline: voice was nowhere near responsive and precise enough for development work. Nowadays voice is perfect for firing off small course corrections to agents, something I do dozens of times an hour.

My own experiments with Bad Dave’s Robot Army—a collection of 34+ specialized agents and slash commands for Claude Code—are exploring similar territory. I’ve got agents for security reviews, performance analysis, architecture audits, code history analysis, even one that approaches your codebase with “beginner’s mind” to spot assumptions you’ve stopped questioning. Each one is a specialist, focused on doing one thing extremely well. They are knitted together with tools for managing and documenting the development process into something coherent and replicable (something that human dev orgs often struggle with).

The pattern is unmistakable: the future of programming isn’t one human working with one AI assistant. It’s one human orchestrating multiple specialized agents, each handling different aspects of the work. We’re not just coding with AI anymore. We’re conducting it, and we’re starting to build interfaces that reflect that reality.

Lessons from the Master’s Workshop

In a 17th century atelier, the master painter had to solve several problems:

Maintaining Coherent Vision: When ten people contribute to one painting, it still needs to look like a unified work. Rembrandt couldn’t have one apprentice painting in a realistic style while another worked in abstract expressionism. Similarly, when multiple AI agents contribute to a codebase, you need architectural consistency, naming conventions, and design patterns that create coherence.

Calibrating Delegation: Not every task required the master’s touch. Rembrandt painted the faces and hands—the emotionally critical elements that required artistic genius. Apprentices handled backgrounds, drapery, and preparatory work. The art was knowing what to delegate and what to do yourself. With AI agents, this becomes: which tasks are routine enough to fully delegate (boilerplate, test generation, refactoring, small-to-medium extensions of existing parts of the codebase) and which require human architectural judgment (security-critical code, complex algorithmic work, integration decisions, etc.).

Iterative Refinement: Work flowed back and forth. Apprentices would complete sections, Rembrandt would review and overpaint, they’d refine further. This wasn’t one-shot delegation—it was collaborative iteration. Current AI coding tools work the same way: you give direction, the agent generates code, you review and correct, it refines. The workflow is inherently conversational.

Quality Control Bottleneck: Rembrandt himself was the limiting factor. He could only review so much work per day. His apprentices could produce volume, but Rembrandt’s review bandwidth determined the workshop’s throughput. This maps directly to the current bottleneck with AI coding: agents can generate code faster than anyone can meaningfully review it. Your verification bandwidth determines system throughput, not the agent’s generation speed.

Conway’s Law Rules Everything Around Me

This review bottleneck connects to something deeper, something that’s been haunting software architecture since 1967: Conway’s Law.

Melvin Conway observed that “organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.” Put more directly: your architecture mirrors your org chart. Always has, always will. If you’re building a compiler using four separate teams, you will end up producing a four-phase compiler. When I first heard of Conway’s Law, I assumed it was a joke, albeit one that like all great jokes contained large elements of truth.

The reason for this correspondence between org structure and software architecture isn’t mystical: it’s just bandwidth. When two teams can’t communicate frequently, they’re forced to minimize dependencies between their systems. Can’t have daily syncs between Boston and Bangalore? Better make sure those teams aren’t working on tightly coupled components. The organizational structure dictates the technical architecture because communication costs fundamentally shape what’s feasible to build.

With human teams, cross-team communication is expensive. Meetings cost calendar time. Timezone differences add latency. Context switching destroys focus. Distributed memory means things get forgotten or miscommunicated. All of these forces create natural pressure toward modular architectures with clean interfaces and minimal coupling. It also explains why small teams of elite creators living in Mom’s garage and eating ramen can produce incredibly powerful and cohesive inventions like Google, while large organizations can struggle to get incremental extensions to existing products out the door in less than a year.

AI agents theoretically blow this up. Want to give three agents perfect context about your entire codebase? Paste it into their context windows. Want agents to coordinate? They can “communicate” instantaneously, passing information at electronic speed with perfect fidelity. The bandwidth constraints that forced human teams toward modularity just don’t exist the same way.

But here’s where reality reasserts itself: you are still human. Your review bandwidth remains stubbornly biological.

This creates a fascinating inversion. Agent-to-agent bandwidth is effectively infinite. Human-to-agent bandwidth is painfully finite. So you end up with these competing pressures:

Agents could build tightly coupled systems efficiently (because they can coordinate perfectly)
But you need them to build modular, reviewable systems (because you’re the bottleneck)

The solution emerging from practice looks a lot like Rembrandt’s workshop: establish clear architectural constraints upfront (the “Rembrandt style” that ensures coherence), delegate within those constraints, and invest heavily in making review as efficient as possible.

A Day in the Workshop

Let me show you what this actually looks like in practice. This is a composite sketch from recent work, but it’s representative of how I’m building software now:

The day starts with review, not creation. I run /arch-review and /security-review commands from my Robot Army to audit yesterday’s changes. Specialized agents generate reports highlighting architectural drift, security concerns, and code smells. I’m not reading every line of code—I’m reading expert analysis from agents that know what to look for. This takes maybe twenty minutes and would have taken hours of careful manual review before.

When I start work on a new feature, I begin with /create-prd to generate a product requirements document from rough notes and mental models. The agent drafts something coherent. I review, suggest modifications, iterate a few rounds until the PRD actually captures what I want to build. Then /implementation-plan-from-prd breaks it down technically. The agent proposes implementation approaches, identifies dependencies, suggests testing strategies. We iterate on this too. The PRD and plan become architectural constraints, the “Rembrandt style” that will guide the execution.

Here’s where it gets interesting: I use /tasks-from-plan to decompose the implementation into discrete, parallelizable tasks. Then I spin up multiple agents simultaneously, each working on different pieces in separate git work trees so they don’t collide. One handles database migrations. Another builds the API layer. A third writes tests. A fourth generates documentation. They’re all working from the same architectural spec, but executing independently.

Review time is where I earn my keep. This is where I’m most like Rembrandt examining his apprentices’ work. I use IntelliJ’s diff window more than I use it’s edit window nowadays. For routine stuff—formatting inconsistencies, simple bugs, missed edge cases—I just tell Claude to fix it. For anything touching core architecture, security boundaries, or complex business logic, I review carefully and often suggest rewrites. Those rewrite suggestions become codified tasks themselves (Github Issues are excellent for this) so that everything becomes traceable and transparent.

The hardest part is integration. Just as the collective noun for a group of owls is a “parliament”, the collective noun for a group of software developers is a “merge conflict”, and modern AI is making this problem worse. The separate agent outputs need to be merged into a coherent whole, and this is unavoidably human work. Making sure the database schema changes align perfectly with the API design. Ensuring error handling is consistent across layers (I’ve got an agent makes recommendations on nothing but that). Verifying that the pieces actually fit together in ways that make sense. The agents can’t do this because they weren’t coordinating with each other—they were each working from specifications I provided, and those specifications inevitably have gaps.

What’s emerged is a workflow that maximizes parallel execution while keeping me focused on the high-leverage work: architecture, integration, and verification. The agents handle volume. I handle judgment.

Where the Angels Soar and Where They Stumble

After several months of intensive work with these tools, I’ve developed a pretty good sense of what works and what doesn’t. Current multi-agentic programming excels at certain tasks while face-planting on others.

The wins are substantial. Boilerplate and scaffolding—generating CRUD endpoints, database migrations, test fixtures, anything following well-established patterns—is almost too easy now. This is like having apprentices prepare canvases and mix standard paint colors. They can do it faster and more consistently than you can, so why would you?

Parallel exploration is genuinely transformative. Having three agents simultaneously try different approaches to the same problem, then picking the best one, is something that would be prohibitively expensive with human developers but is trivial with AI. The economic barriers to exploration have collapsed.

Specialized analysis has become routine. Security audits, performance reviews, accessibility checks, architecture evaluations—I can get expert-level analysis in domains where I’m competent but not expert without spending years becoming an expert in everything. The Robot Army model takes this to its logical conclusion: each agent is a specialist that knows its domain deeply, each produces reviews at any level of detail in exactly the format I want.

And the tireless iteration—the ability to refactor the same module five times until it’s right, to regenerate a test suite with different assumptions, to keep exploring until you find the elegant solution—removes social friction that normally limits quality. Agents don’t get bored. They don’t get frustrated. They don’t silently resent you for asking them to try another approach.

But the limitations are real and important to understand. Novel architectural decisions—genuinely new design problems where you’re facing tradeoffs you haven’t encountered before—still require human judgment. Current agents mostly pattern-match against things they’ve seen in training. They’re not doing first-principles reasoning the way a senior architect with deep domain knowledge does.

Deep debugging remains surprisingly human. When something is subtly wrong in ways that require careful hypothesis formation and systematic testing, agents often get stuck in loops, trying the same fixes repeatedly or chasing red herrings. They’re excellent at fixing known bug patterns. They’re poor at investigating genuinely confusing failures where the problem isn’t what it appears to be.

Context is a persistent issue. Despite the new larger context windows, agents lose track of architectural decisions made earlier in long sessions. They exhibit attention dropoff similar to humans reading a very long document—they miss things, they forget constraints, they make assumptions that contradict earlier decisions. I find I have to often ask “Does the work you’ve just submitted fit what we have specified in ARCHITECTURE_AND_CODING_GUIDELINES.md? If not please identify the gaps”.

And integration work—making separately-generated pieces fit together cohesively—remains stubbornly human. This requires understanding subtle relationships that weren’t explicitly in any one agent’s instructions, seeing how changes ripple through the system, maintaining consistency across layers. The agents can’t do this yet because they weren’t coordinating with each other, and the implicit knowledge required to integrate well is hard to specify explicitly.

The View From Rembrandt’s Studio

The actual practice of all this, the skill that separates effective use of these tools from chaotic flailing, probably feels like what Rembrandt must have felt like.

Rembrandt’s genius wasn’t just his painting ability. It was his capacity to design a system that produced Rembrandt-quality output at scale. He knew:

Which tasks to delegate and which to do himself
How to specify work clearly enough that apprentices could execute it
When to accept an apprentice’s work and when to overpaint it
How to maintain stylistic coherence across multiple hands
When to invest in teaching an apprentice versus just doing it himself

These same skills translate directly:

Knowing What to Delegate: Not everything is equally agent-friendly. Tests? Usually good, often shockingly great (Ask your LLM about adding property-based tests sometime, and stand back). Security-critical authentication logic? Maybe write it yourself.

Specification Clarity: Vague instructions produce vague results. The better you specify constraints, context, and desired outcomes, the better the agent output. This needn’t (and shouldn’t) be a “big design handed down from on high”, for all the same reasons that that rarely works with human teams. You need to iterate your way to clarity and document it obsessively.

Review Calibration: Which generated code needs deep review versus quick acceptance? This is a constantly-evolving judgment call based on your trust in the agent for that particular type of task.

Architectural Coherence: Establishing conventions, patterns, and standards that agents follow. This is your “workshop style”—the thing that makes code from multiple agents feel unified.

Investment Decisions: Sometimes it’s faster to just write something yourself than to explain it well enough for an agent to do it. Sad, but it happens. Knowing when you’re past that threshold is an art.

Conway’s Law, Again, Inevitably

Here’s where Conway’s Law reasserts itself: even though agents could theoretically coordinate perfectly, they’re actually coordinating through you. You’re the hub. They’re the spokes.

This means the system architecture still reflects communication structure—it’s just that the relevant communication structure is now human-to-agent rather than human-to-human.

The solutions look familiar:

Modular Task Decomposition: Breaking work into pieces that can be done independently, with clear interfaces. This minimizes integration work.

Architectural Documentation: Writing down the constraints and patterns that all agents should follow. This is your style guide, your workshop rules.

Interface Contracts: Specifying exactly how different pieces should connect, so agents working on separate pieces produce compatible results. Newer programming languages give you a lot of tools to make interfaces more precise and expressive. Use them.

Quality Gates: Automated checks (tests, linters, type checkers) that verify agent output meets standards without requiring your review bandwidth.

These are the same patterns human teams use to manage communication costs. They’re just being applied to manage review costs instead.

The Path Forward

Where is this heading?

The optimistic vision: agents get better at coordination, reducing the integration burden. Tools emerge for hierarchical agent systems—”senior” agents that review and integrate “junior” agent output. The review bottleneck eases as automated verification improves. We move from hub-and-spoke to mesh topologies where agents coordinate directly.

The realistic near-term: more tools like GitHub Code Quality and Cursor 2.0 that make review more efficient. Better specialized agents for different domains (my Robot Army experiments are one tiny example). Gradual accumulation of practices and patterns for effective orchestration.

The certainty: this isn’t going away. The workflow I’ve described—one human directing multiple AI agents—is already mainstream among early adopters. The tools are maturing rapidly. The question isn’t whether this becomes normal, but how quickly and what shape it takes.

Back to Amsterdam, Forward to Everything

Rembrandt’s workshop produced some of history’s greatest paintings. This wasn’t in spite of the collaborative process, but because of it. The workshop model let Rembrandt focus his personal impact where it mattered most—the dramatic lighting, the psychological depth, the moments of human truth—while scaling his output beyond what any individual could produce.

The paintings weren’t perfect. X-rays reveal overpainting, corrections, entire sections where the master refined his apprentices’ work or painted over it completely. But the system worked. It produced masterpieces at a scale and speed that solo painting never could.

We’re in the early days of our own workshop era. The tools are crude, the practices are still crystallizing, we’re figuring out patterns through trial and error and occasional humiliating failures. But the basic shape is clear: human architects directing armies of AI apprentices, maintaining vision and judgment while delegating execution, reviewing and refining and integrating until the whole coheres into something greater than the sum of its parts.

This isn’t the science fiction future where AI replaces developers wholesale and we’re left to toil in their gallium-arsenide mines. It’s not the dystopia where code quality collapses under a tsunami of AI-generated technical debt. It’s something more interesting and more human: a new way of working that amplifies what we’re good at (architecture, integration, and most especially judgement and taste) while offloading what we’re not (tireless execution, parallel exploration, specialized analysis, tolerance for iteration).

The constraint that shaped our entire industry—the brutal economics of $150/hour developers—is dissolving. We’re rebuilding our profession from first principles, discovering what’s actually valuable now that code production itself is no longer the bottleneck. Turns out it’s the same things that made Rembrandt great: knowing what to build, maintaining coherent vision, having the judgment to know good from merely adequate, and the craft to integrate diverse elements into something unified and excellent.

It’s still craft work, fundamentally. And like any craft, it rewards careful attention to what actually works in practice rather than what sounds impressive in theory.

The angels are here. They’re strange and powerful and they’re changing everything about how we build software. But they’re not replacing us, they’re working with us. They’re giving us the ability to build at scales and speeds we couldn’t before, if we develop the skills to orchestrate them effectively.

So: what are you building in your workshop?

And more importantly: are you ready to be Rembrandt?

This is part of an ongoing series about software development in the age of agentic AI. You can find my experiments with multi-agentic systems at Bad Dave’s Robot Army. If you’re navigating these changes yourself, I’d love to hear what’s working (and what’s not) in your practice. Post constructed with the able assistance of Claude Sonnet 4.5, because I’m not some sort of savage.

Jordan Rubin

Dec 13

I’m enjoying your blog!

Here’s a conversation with Claude, mapping all of the structure of “Rembrandt’s workshop” to the structure of “Agentic software development”: https://claude.ai/share/a6ed0b94-7e12-412e-a726-54ae3a56d3fb

Notably, Claude flags some differences: apprenticeships, material scarcity, career progression, and guild certification. These differences might point to capabilities to put into agentic coding tools.

I call this mapping move “@metaphorize”; I’ve released it as part of my skill library, Future Tokens, which you can find on my Substack if you’re interested.

“@rhyme” is a metaphor-finding skill. Claude thinks film production, air traffic control, and a surgery team are slightly better metaphors for agentic coding than Rembrandt’s workshop is, while orchestra conductor is slightly worse.

1 reply by Dave Griffith

1 more comment...

Dancing with Robots: A Software Architect's Journey

Discussion about this post

Ready for more?