Popular searches
//

Five Principles for AI-Assisted DDD — Keeping the Human at the Center

1.6.2026 | 19 minutes reading time

Part of the series Domain-Driven Design Meets AI.

The previous post, AI as a Design Partner, established that AI fills three roles inside the Synergetic Blueprint — Drafter, Validator, Provocateur — and that these are positions, not actors. As the Blueprint progresses, the roles stay fixed while who fills them shifts: humans draft at the top where novelty lives, AI drafts in tactical design where the context is rich enough to be reliable, and the actor flips at the boundary in between.

Naming the role tells you what to expect from an AI’s output. It does not yet tell you how to use that output safely — when to trust it, when to challenge it, and how to keep the people who hold the domain knowledge firmly in control. That is the job of the five principles in this post. They are the rules of engagement for AI-assisted modeling and coding sessions: how we prompt AI, what we expect back, and how we let its output into our artifacts.


Why Principles at All?

Because AI gets things wrong convincingly.

An AI is neither a mind nor a database. It is a statistical model of its training data — it captures the patterns in a large corpus and generates text by predicting what comes next (Zhao et al., 2023). It models the distribution of that data, not truth, so when the data falls short on a topic, the model still produces a confident answer the data does not actually support. It may fabricate statistics, cite studies that were never published, or misquote a real source. Researchers do not even agree on whether this is fixable: one line of work argues hallucination is in principle avoidable (Kalai et al., 2025), another that it is an innate, mathematically inevitable limitation (Xu et al., 2024). Avoidable or not, it happens — and any practice that puts AI inside a design process has to assume it will.

That single fact is what the rest of the principles defend against. It is also why AI is so useful in the right place. Software design needs two kinds of work: formalized work, such as implementing a well-specified interface in YAML, and creative work, such as designing a new application around a genuinely new business idea. Hand AI the formalized work and it is a force multiplier. Ask it to originate the creative work and you will be disappointed — it cannot reach beyond what its training captured. Dividing that labor well is exactly what the principles are for. And that need for guidance is not unique to AI; it holds for collaboration between humans too.


The Five Principles at a Glance

1. AI gets things wrong convincingly. The foundational caution. AI output is plausible by construction and correct only where it happens to align with the training distribution. Treat fluency as no evidence of truth.

2. Experts first. The source of truth for a domain lives in people, not prompts. AI can help access and organize domain knowledge; it cannot replace the experts who hold it. Involve them from the first conversation, and keep them as the validators of whether a design actually makes sense.

3. Artifacts are prompts. Every artifact the Blueprint produces — Domain Story, Visual Glossary, EventStorming board, API Product Canvas — is not just an output of human collaboration but an input for the AI. Output quality is proportional to input quality. This is what Spec-Driven Development formalizes, and an artifact’s commitment level tells you how far to trust what the AI generates from it.

4. Validate before you propagate. No artifact flows into the next one down the Blueprint until a human with domain knowledge has checked it. This is the human-in-the-loop principle, and it has to balance two opposite failure modes: the junior who over-trusts AI output and the senior who reflexively under-trusts it.

5. The Devil’s Advocate in the room. Collaboration means someone has to push back. The Provocateur challenges assumptions midstream, while decisions are still soft — and can be a human or AI, but never the final judge. Which actor fills the role flips along the Blueprint: AI provokes the human-led vision in strategic design; the human provokes AI’s drafts in tactical design.

These are not a checklist to run in order. They combine and recur across every step — “validate before you propagate” applies wherever AI produces something; “the Devil’s Advocate in the room” applies wherever there is a proposal to challenge, whoever made it. The rest of this post takes each in turn.


Experts First

To design software, you need to understand the problem domain and the users’ needs. That knowledge lives in people, not in prompts. AI can help you access and organize it, but it cannot replace the experts who hold it.

So the domain experts come first, always. They are involved from the beginning of the design process, and the design is shaped by listening to them. They are the source of truth for the problem domain and the ones who can validate whether a design actually makes sense. We saw this in the previous post, and it recurs at every step of the Blueprint.

This is also where the most consequential handoff in the whole process happens. The pivot from the strategic to the tactical level of the Synergetic Blueprint is the moment when the actor playing primary Drafter changes from a human to an AI. Before the pivot, expertise is being elicited and shaped — work that only people can do. After it, expertise has been captured in artifacts rich enough for AI to draft against reliably. “Experts first” is what makes that pivot safe: the AI only takes over drafting once the experts have already put enough of themselves into the artifacts.


Artifacts Are Prompts

In the first post of this series we introduced the Synergetic Blueprint as a collection of artifacts that capture our understanding of the problem domain and our design decisions. Those artifacts are not only outputs of the design process; they are also inputs for our AI collaborators.

The quality of the output you get from AI is proportional to the quality of the input you give it (Brown et al., 2020; Wei et al., 2022). A prompt is not merely a question — it is a cognitive frame. Its structure, specificity, and embedded context determine which of the model’s latent capabilities surface and which stay inaccessible. Prompt design, seen this way, is less a matter of convenience and more an act of intentional knowledge transfer.

We produce these artifacts in human collaboration — workshop formats like EventStorming (Brandolini, 2013), or one-on-one conversations with domain experts (Evans, 2003). And the artifacts we produce there are the prompts for the AI:

  • A well-structured Domain Story (Hofer & Schwentner, 2021) lets AI generate good event proposals for an EventStorming session.
  • A clear Visual Glossary (Zörner, 2021) lets it generate accurate code snippets for the domain model.
  • A detailed API Product Canvas (Junker & Lazzaretti, 2025) lets it generate a precise API specification.

That relationship — artifacts as prompts — is exactly what Spec-Driven Development formalizes.

Spec-Driven Development

In Spec-Driven Development (SDD), you start with a spec instead of coding first and writing docs later (Böckeler, 2025). An AI coding agent gathers requirements and writes specifications before writing any code. The spec is a contract for how the code should behave, and it becomes the source of truth: agents use it to generate, test, and validate code (Delimarsky, 2025; Tessl, 2025).

Figure 3-1: Spec-Driven Development structure — agents consume specifications (provided as artifacts) together with overarching principles stored in a memory bank, then produce code that humans validate. Adapted from Böckeler (2025)

The artifacts of the Synergetic Blueprint — Domain Stories, EventStorming maps, Visual Glossaries, API Product Canvases, User Stories — are the prompts for the AI roles. When AI is the Provocateur, the Domain Story is the prompt to check it against the artifacts already produced along the Blueprint. When AI is the Drafter, the API Product Canvas is the prompt to produce an OpenAPI specification. Most frameworks for AI-assisted design and development, SDD among them, follow this same pattern.

Spec-Commitment Levels

Not every spec carries the same weight. SDD distinguishes three commitment levels (Böckeler, 2025), and the level you are at determines how much trust you place in the output AI generates from it.

Figure 3-2: Specification commitment levels — spec-first, spec-anchored, spec-as-source. Adapted from Böckeler (2025)

Spec-first — the spec is created for the initial implementation of a feature and then thrown away once the feature ships. When the feature evolves later, a new spec is written. Humans and AI co-create the spec and the code, but the spec is not maintained.

Spec-anchored — the spec is created with the initial implementation and then evolves with it. Both spec and code are co-created and both are maintained over time.

Spec-as-source — the spec is the source of truth. Code is generated from it, and the spec keeps being maintained afterward.

Along the Blueprint you meet all three, and which one applies usually depends on how the team uses the artifact:

Figure 3-3: Specification commitment levels and roles along the Synergetic Blueprint

An OpenAPI specification (OpenAPI Initiative, 2024) serving as the published interface of a microservice is exactly the rare artifact that earns spec-as-source: prescriptive rather than descriptive, machine-readable rather than prose, and intentionally frozen at the boundary. Change propagates outward from it, not inward to it.

One catch worth stating plainly: even where an OpenAPI spec qualifies as spec-as-source, you rarely need an AI agent to turn it into code. Standard generators such as OpenAPI Generator (OpenAPI Generator Contributors, 2026) already do that reliably and without hallucination risk. The spec stays the living contract for the bounded context — not a static source from which code is conjured without human involvement.

Prototypes are the other spec-as-source case: they synthesize the upstream artifacts into something concrete enough to validate the decisions behind them (Junker, 2026).

The takeaway holds across the whole Blueprint: the higher the commitment level, the higher the stakes, and the more carefully the output must be examined before it enters your codebase or your design. Spec-first output you explore and discard; spec-anchored output you integrate and maintain; spec-as-source output you ship. Treating artifacts as prompts gives AI the context it needs to be useful — but a well-prompted AI still produces output that reflects patterns in training data, not truth about your domain. Which is why something has to happen next.


Validate Before You Propagate

The Blueprint produces a lot of artifacts — some by humans, some by AI, many co-created. Every one of them should be validated before it propagates into the next artifact downstream (Boehm & Basili, 2001; Ford et al., 2021).

When AI produces an artifact, a human with the domain knowledge and design expertise to check it against everything upstream has to validate it. AI models the distribution, not truth (Kalai et al., 2025; Xu et al., 2024) — and even purpose-built Retrieval-Augmented Generation tools hallucinate somewhere between 17 and 33 percent of the time in high-stakes domains (Magesh et al., 2025). The human with domain knowledge is the one with access to the ground truth the model structurally lacks.

When humans produce an artifact, it should be validated by other humans (Perry et al., 2023) — and here AI can take the Provocateur seat, challenging assumptions and surfacing inconsistencies, gaps, and contradictions. When both produced it, validate with other humans and with different methods. Workshop artifacts, for instance, can be validated by prototypes generated from the workshop results (Junker, 2026).

This is the human-in-the-loop principle, and it is older than AI. It traces back to Wiener’s cybernetics, where a human operator was treated as a component inside a feedback control loop (Wiener, 1961); the term “human-in-the-loop” is a direct descendant, popularized in machine learning to describe systems built around human feedback and judgment (Holzinger, 2016; Monarch, 2021). The principle requires balanced trust in AI output (Lazaros et al., 2026).

What makes validation hard is that trust is not evenly distributed across a team. A junior developer tends to over-trust AI output — inventing explanations for unclear proposals and clinging to them even when they sense something is off (Buçinca et al., 2021). A senior developer tends to under-trust it. A good validation process mitigates both, with clear criteria for checking artifacts and a culture of critical thinking — pair or mob programming, for example (Emrich & Ade, 2026).

Validation prevents errors and misunderstandings from propagating down the Blueprint, where they would otherwise harden into technical debt and design flaws. It is not a one-time gate but a continuous activity that runs throughout the process as artifacts are produced and refined. Validation is not just testing, nor even controlling. It is collaboration.


The Devil’s Advocate in the Room

Collaboration means someone has to push back. You need a Devil’s Advocate challenging the assumptions and the logic of every proposal — and the value of doing so is not folklore: groups using the technique make measurably higher-quality decisions and re-evaluate their own positions more readily (Schweiger et al., 1986).

In a workshop, a senior developer or even the facilitator can play the part. The facilitator steps out of the neutral-observer role and becomes an active participant, voicing risks and pressing the group toward a decision (Kelle et al., 2024). When AI takes the role, it can challenge the assumptions and logic of a proposal just as effectively (Chiang et al., 2024) — with one hard limit: the AI must never become the judge. If people defer to its assessments as authority rather than evaluating decisions themselves, the Devil’s Advocate has quietly become a tyrant (Jong et al., 2025). Whoever fills the role, the Provocateur acts midstream, while decisions are still changeable — not after they have propagated into the next artifact.

This is where the role distribution from the previous post becomes concrete, because which actor plays the Provocateur flips along the Blueprint:

  • In strategic design, humans lead as Drafters and AI’s primary role is Provocateur — challenging a vision it cannot originate. In the Larder example, AI can interrogate a North Star Metric of “number of recipes shared” by asking how it ties to the business goals and whether it really captures the value delivered to users.
  • (AI can co-draft here too — gathering business requirements, for instance — but the human stays the originator of the vision.)
  • In tactical design, the assignment inverts. AI becomes the Drafter; the human steps into the Provocateur seat, pushing back on AI’s drafts and demanding justification for each decision. Whether ShoppingItem is an entity inside the ShoppingList aggregate or a value object the list merely contains is a domain decision, not a structural one — AI’s proposal is plausible, and plausible is not the same as right.

The role stays the same. Only the actor changes.


The Core Argument

Put the Synergetic Blueprint together with these five principles and the spec-commitment levels, and you get something more useful than either “use AI for X” or “AI replaces Y.” You get a discipline that guards against convincingly wrong output and leverages AI’s genuine strengths.

  • AI gets things wrong convincingly is the caution everything else answers to.
  • Experts first keeps the source of truth with the people who hold the domain knowledge, because AI can organize that knowledge but never originate it.
  • Artifacts are prompts re-frames every artifact as the input that shapes AI’s output — and the commitment level tells you whether to explore it, maintain it, or ship it.
  • Validate before you propagate puts a domain-literate human in the loop before anything flows downstream, balancing the junior’s overtrust against the senior’s undertrust.
  • The Devil’s Advocate in the room keeps the design honest, with the Provocateur challenging assumptions while they are still soft.

The principles are not exclusive to a single step. “Validate before you propagate” applies wherever AI produces an artifact, at any commitment level. “The Devil’s Advocate in the room” applies wherever there is a proposal to challenge, no matter who made it. They combine and recur, and which actor fills which role flips as the work moves from ideation to running software.

Figure 3-4: Cheat sheet for AI collaboration in the Synergetic Blueprint — roles, commitment levels, and principles for each phase

From here, the series stops talking about collaborating with AI and starts doing it, step by step along the Blueprint.

Next in this series: North Star and Business Planning with AI — where every product begins: at the discovery edge, before a single bounded context has been drawn, when the question is not yet “how do we build it?” but “what are we building, and why would anyone want it?” We start with the North Star Metric and the business plan — the first artifacts of the Blueprint, and firmly spec-first: explored, challenged, and revised long before anything is committed.

This series grew out of my work on the book DDD Meets AI, forthcoming from Springer Nature.

References

Böckeler, B. (2025). Understanding spec-driven-development: Kiro, spec-kit, and Tessl. martinfowler.com, “Exploring Gen AI” series. https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html

Boehm, B., & Basili, V. R. (2001). Software defect reduction top 10 list. Computer, 34(1), 135–137. https://doi.org/10.1109/2.962984

Brandolini, A. (2013). EventStorming. https://www.eventstorming.org. https://www.eventstorming.org

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

Buçinca, Z., Malaya, M. B., & Gajos, K. Z. (2021). To trust or to think: Cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), 1–21. https://doi.org/10.1145/3449287

Chiang, C.-W., Lu, Z., Li, Z., & Yin, M. (2024). Enhancing AI-assisted group decision making through LLM-powered devil’s advocate. Proceedings of the 29th International Conference on Intelligent User Interfaces (IUI ’24), 103–119. https://doi.org/10.1145/3640543.3645199

Cohn, M. (2004). User stories applied: For agile software development. Addison-Wesley.

DDD Crew. (2019). Bounded context canvas. GitHub repository, ddd-crew/bounded-context-canvas. https://github.com/ddd-crew/bounded-context-canvas

Delimarsky, D. (2025). Spec-driven development with AI: Get started with a new open source toolkit. The GitHub Blog. https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/

Dilger, M. (2024). Understanding Eventsourcing: Planning and implementing scalable systems with Eventmodeling and Eventsourcing (p. 500). Independently Published. https://leanpub.com/eventmodeling-and-eventsourcing

Ellis, S. (2017). What is a North Star Metric? GrowthHackers Blog. https://blog.growthhackers.com/what-is-a-north-star-metric-b31a8512923f

Emrich, M., & Ade, F. (2026). EXACT-Coding: Beyond vibe coding: AI-assisted development mit TDD und software craft. LeanPub. https://leanpub.com/exact-coding

Evans, E. (2003). Domain-driven design: Tackling complexity in the heart of software. Addison-Wesley.

Ford, N., Richards, M., Sadalage, P., & Dehghani, Z. (2021). Software architecture: The hard parts: Modern trade-off analyses for distributed architectures. O’Reilly Media.

Hofer, S., & Schwentner, H. (2021). Domain storytelling: A collaborative, visual, and agile way to build domain-driven software (1st ed., p. 288). Addison-Wesley.

Holzinger, A. (2016). Interactive machine learning for health informatics: When do we need the human-in-the-loop? Brain Informatics, 3(2), 119–131. https://doi.org/10.1007/s40708-016-0042-6

Jong, S. de, Moberg, R., & Berkel, N. van. (2025). Confirmation bias as a cognitive resource in LLM-supported deliberation. https://arxiv.org/abs/2509.14824

Junker, A. (2026). From domain story to prototype: Specification-driven prototyping in DDD workshops. codecentric AG knowledge hub, “Domain-Driven Design Meets AI” series. https://www.codecentric.de/en/knowledge-hub/blog/from-domain-story-to-prototype

Junker, A., & Lazzaretti, F. (2025). Crafting great APIs with Domain-Driven Design: Collaborative craftsmanship of asynchronous and synchronous APIs. Apress. https://doi.org/10.1007/979-8-8688-1457-0

Kalai, A. T., Nachum, O., Vempala, S. S., & Zhang, E. (2025). Why language models hallucinate. https://arxiv.org/abs/2509.04664

Kelle, E. van, Verschatse, G., & Baas-Schwegler, K. (2024). Collaborative software design: How to facilitate domain modeling decisions (p. 300). Manning Publications.

Lazaros, K., Vrahatis, A. G., & Kotsiantis, S. (2026). Human-in-the-loop artificial intelligence: A systematic review of concepts, methods, and applications. Entropy, 28(4), 377. https://doi.org/10.3390/e28040377

Magesh, V., Surani, F., Dahl, M., Suzgun, M., Manning, C. D., & Ho, D. E. (2025). Hallucination-free? Assessing the reliability of leading AI legal research tools. Journal of Empirical Legal Studies, 22(2), 216–242. https://doi.org/10.1111/jels.12413

Monarch, R. (Munro). (2021). Human-in-the-loop machine learning: Active learning and annotation for human-centered AI. Manning Publications.

OpenAPI Generator Contributors. (2026). Generators list. OpenAPI Generator documentation. https://openapi-generator.tech/docs/generators

OpenAPI Initiative. (2024). OpenAPI specification, version 3.1.1. Linux Foundation. https://spec.openapis.org/oas/v3.1.1.html

Osterwalder, A., & Pigneur, Y. (2010). Business model generation: A handbook for visionaries, game changers, and challengers (p. 288). John Wiley & Sons.

Perry, N., Srivastava, M., Kumar, D., & Boneh, D. (2023). Do users write more insecure code with AI assistants? Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). https://doi.org/10.1145/3576915.3623157

Piskala, D. B. (2026). Spec-driven development: From code to contract in the age of AI coding assistants. https://doi.org/10.48550/arXiv.2602.00180

Schweiger, D. M., Sandberg, W. R., & Ragan, J. W. (1986). Group approaches for improving strategic decision making: A comparative analysis of dialectical inquiry, devil’s advocacy, and consensus. Academy of Management Journal, 29(1), 51–71. https://doi.org/10.5465/255859

Smart, J. F., & Molak, J. (2023). BDD in action: Behavior-driven development for the whole software lifecycle (2nd ed., p. 488). Manning Publications.

Starke, G., & arc42 Contributors. (2023). Architecture communication canvas. arc42, canvas.arc42.org. https://canvas.arc42.org/

Tessl. (2025). Spec-driven development with tessl. Tessl documentation. https://docs.tessl.io/use/spec-driven-development-with-tessl

Wardley, S. (2022). Wardley maps: Topographical intelligence in business (M. Craddock, Ed.). LeanPub. https://leanpub.com/wardleymaps

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837.

Wiener, N. (1961). Cybernetics: Or control and communication in the animal and the machine (2nd ed.). MIT Press.

Xu, Z., Jain, S., & Kankanhalli, M. (2024). Hallucination is inevitable: An innate limitation of large language models. https://arxiv.org/abs/2401.11817

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2023). A survey of large language models. https://arxiv.org/abs/2303.18223

Zörner, S. (2021). Software-architekturen dokumentieren und kommunizieren: Entwürfe, Entscheidungen und Lösungen nachvollziehbar und wirkungsvoll festhalten (3rd ed., p. 309). Carl Hanser Verlag.

share post