When GenAI in software development comes up, many think first of generating application code – and thus of developers as the primary audience. Yet much of its value lies precisely where the business side and technology intersect: in answering questions that require domain-specific system knowledge but demand technical means.
As a running example, consider the work involved in maintaining a corporate website. A website is never finished – new requirements emerge, content and components are tried out but rarely cleaned up, and even seemingly small changes like adjusting the link structure can entail significant effort in a Content Management System (CMS). The questions that arise are diverse:
- How do you prepare content for new requirements – such as optimization for AI-powered search engines (Generative Engine Optimization, or GEO), when the field is only just taking shape?
- How do you get an overview of which of the dozens of CMS components are actually still in use – and what can be cleaned up?
- How can hundreds or thousands of content items be systematically updated when the website's structure changes?
What these questions have in common: they require domain-specific system knowledge but demand technical means to answer – such as queries against the CMS's API or scripts that systematically evaluate content. A marketing department that owns the content has this knowledge about the system and its content – but typically has no developers on the team. Which CMS components are used on which pages? How many of the configured forms are actually in use? Analyses like these, which would be necessary for informed decisions, often simply don't get done.
GenAI can bridge this gap with purpose-built tooling: business departments can have custom scripts and analyses generated for specific challenges – tools that don't need to be maintained once they've served their purpose. My colleague Goetz Markgraf recently introduced a five-level model of AI-assisted software development (German). The model describes autonomy levels that can be applied beyond software development – wherever GenAI gradually takes on more independence. In the following, we use our own practice to show what four of these levels actually feel like – from pure research to agent-driven mass updates of CMS content.
Spoiler: GenAI delivers not just code, but answers – as a research assistant, as an analysis tool, and as an execution partner for data-driven decisions. This article walks through four autonomy levels and shows how the value deepens with each one.
Four Levels – and What They Require
The five-level model from my colleague describes a progression from pure knowledge work (Level 1) through targeted code optimization (Level 2) and agent-driven analysis (Level 3) to specification-driven mass editing (Level 4). Level 5 – fully autonomous systems – is still largely hypothetical and is deliberately omitted here. This is not about application development, but about problem-solving. The goal of our project is to address concrete challenges in our own areas of work – from research through analysis and quantification to custom, purpose-built tooling. The use of GenAI can be divided into different phases:
- Level 1 stands on its own – purely about understanding a new topic.
- Levels 2 and 3 operate in the analysis phase – quantifying, monitoring, making data-driven decisions.
- Level 4 marks the transition to the execution phase – with write access and production changes.
The analysis levels (2 and 3) implement Fitness Functions: automated, data-driven evaluations that enable quantification of the system. This allows KPIs to be monitored – particularly valuable when the system evolves or undergoes fundamental changes.
The human remains the decision-maker across all levels: What should be researched, analyzed, or changed? How should results be interpreted? What conclusions can be drawn? From Level 2 onward, technical prerequisites come into play that are not relevant for pure research (Level 1):
- The tool: For GenAI to not only suggest code but also execute it directly, a client with so-called tool-calling capability is needed – such as Claude Code, Opencode, or Gemini CLI. Alongside terminal-based clients, more and more desktop apps like Antigravity and Codex are emerging. These clients can independently execute commands generated by the model (e.g., Python scripts) and report back the results. This creates an iterative workflow: generate, execute, review, refine.
- The working environment: Execution takes place in the terminal. The agent uses shell commands like
curl,cat, orechoto read files, query APIs, and process results. Those unfamiliar with these commands can have GenAI explain what a command does – Level 1 thus also helps with understanding the higher levels. - Good context: It is important to provide GenAI with the right context depending on the specific task: a precise problem description, the API documentation (e.g., the GraphQL schema), and ideally an existing query as a starting point.
- API access via token: The CMS provides access keys (so-called access tokens with specific permissions) that grant requests to access content – read-only (read token) or read-write (write token). For the analysis levels, a read token is sufficient; write access only comes into play at Level 4 and accordingly requires greater safeguards and control over what the AI does.
The examples are drawn from our work on our own corporate website, codecentric.de. It comprises roughly 500 content pages in two languages and blog posts in the four-digit range. The pages are composed of over 50 components (also called building blocks) – reusable elements like hero banners, contact forms, or text-image sections. Add to that dozens of forms with complex dependencies and a linking scheme with dynamic links as well as thousands of internal links in body text. At this scale, it is nearly impossible to assess what is actively in use and should be considered for optimization without systematic analysis.
Level 1: Building Orientation – Research on a New Topic
Before technical implementation comes understanding. With the transition to the AI era, a website faces new requirements, raising questions such as how content can be optimized for AI-powered search engines – a field that is just emerging under the term Generative Engine Optimization (GEO). The specific questions are manifold: How does GEO differ from traditional SEO? Which measures make sense for a corporate website? What is llms.txt? What are Grounding Pages?
At Level 1, GenAI (ChatGPT, Claude, Gemini) is used as a research assistant – you ask questions, have concepts explained, compare approaches. All context is provided manually via copy-paste or document upload. In addition to the prompt, most GenAI providers offer a Deep Research feature. This launches an autonomous process that iteratively searches for sources, reads them, evaluates them, and searches again. The result is a comprehensive report on the topic, which can then serve as the basis for follow-up questions within the conversation.
A typical request looks like this:
I manage our corporate website https://codecentric.de/en – we are a mid-sized IT services company based in Germany.
We want to evaluate which Generative Engine Optimization (GEO) measures would be beneficial for our website – that is, how we can prepare our content so that it is better captured and referenced by AI-powered search engines.
Two specific approaches we want to assess:
- llms.txt (llmstxt.org) – a standard for making websites machine-readable in summary form
- Grounding Pages (groundingpage.com) – dedicated pages that serve as reference sources for AI models
Evaluate both approaches for our case and show me concrete content for our corporate website.
Follow-up questions about the research results can then be asked in a targeted manner, and the content can be verified through the linked sources.
The result is a detailed report on a topic, complete with source references, available in minutes rather than days. This makes it possible to quickly build knowledge and discuss follow-up questions related to your own use case. Reviewing the key sources is a mandatory part of responsible use of AI-generated content. The report can serve as the basis for strategic decisions about which GEO measures to implement.
Level 2: Targeted Queries – Having Existing Queries Optimized
Level 2 is about sending targeted queries to the CMS to determine the usage of specific components. Ideally, you already have a working GraphQL query for the CMS – in this example, a query is used that finds forms in use, but the query has not been used in a long time. In the meantime, the cloud-based headless CMS has gone through several updates. Additionally, new components have been added that can contain forms. Instead of manually reworking the query, it is handed to GenAI along with the API documentation, which then optimizes it. You first compare the optimized query with the original to understand and evaluate the changes. The query can then be executed manually. No programming skills are required – familiarity with the CMS and an existing query of your own are enough. Anyone who knows the system and understands which question needs to be answered can use GenAI for their tasks.
The Prompt
The following request illustrates how little effort is needed to obtain an improved query. GenAI receives a problem description, a reference to the API documentation, and a query as a starting point:
I need to find out where content of type "Form" is used in our CMS. In the backend, I can see 62 items. They are presumably configured mainly in buttons.
The API documentation can be found here: [Link to API documentation]
Here is a query we previously used to find forms in use – it may be outdated and may not check all locations:
1{ 2 pages(first: 1000, stage: PUBLISHED) { 3 title 4 sections { 5 ... on ContentInSection { 6 button { 7 form { 8 title 9 } 10 url 11 } 12 } 13 } 14 } 15}
From this request, GenAI generates an optimized GraphQL query that accounts for all relevant section types and fully maps form usage. What would have manually required hours of query crafting and trial-and-error in the API explorer is ready to use in minutes. The query can then be executed independently.
The result is a repeatable, data-driven evaluation – a Fitness Function that measures how many of the defined forms are actually in use. This KPI helps identify unused content and reduce it in a targeted manner. The generated query may not work perfectly on the first try. But then the iterative use of GenAI helps to adjust the query again, by feeding an error message or change requests as new input into the existing prompt.
Level 3: Autonomous Analysis – The Agent Explores the CMS
In the previous example, a specific query serves as the starting point. At Level 3, you only describe the information need – the agent handles the rest. Input is given in natural language, e.g.: "I want to know which components are used on which pages," and the agent works through the necessary steps independently. Programming skills are not strictly required, but anyone using this level should be familiar with the system and be able to follow what the agent executes – which queries it sends, whether the results are plausible. Anyone who knows the CMS and can contextualize the inputs and outputs can use this level independently.
Using a scripting language like Python is not strictly necessary, but it does make things easier, especially in combination with GraphQL. Alternatively, pre-installed system tools like curl can be used if no scripting language like Python is installed. The agents know these system tools and how to use them.
The Prompt
The difference from Level 2 is already evident in the prompt: the goal is specified and Introspection is referenced as the entry point – the agent figures out the path on its own:
I want to know which CMS components are actually used on which pages – as a basis for deciding which components need to continue being maintained.
First, use GraphQL Introspection [Link to Introspection documentation] to determine all available section types in the schema. Then query for each type which pages it is used on – in both language versions and accounting for pagination.
The read token for the API is in the
.envfile in the current directory. The repository also contains existing Python code that shows how to interact with the API.Export the result as a CSV with the columns:
component_type,page_slug,language,usage_count. Components with no usage should be listed withusage_count = 0and an emptypage_slug.
GraphQL APIs offer a built-in self-description capability – known as Schema-Introspection: the API can be queried about its own schema, i.e., which data types, fields, and relationships it knows. This is the key difference from the previous example: the structure does not need to be known in advance – the agent takes over the exploration.
As a result, the agent delivers not just the raw data but also prepares it – for example, as a summary showing how many components are defined in total, how many are actually embedded on pages, and which ones remain unused. It is immediately apparent where action is needed. The same approach can be applied to forms: only about two-thirds of the defined forms are actually in use. Without analysis, the unused remainder would continue to be maintained. Here too, Fitness Functions emerge – but with a higher degree of automation. The agent can independently create new evaluations when the question changes or when users have made changes to the system. The information need is described, the agent delivers the metric. This enables continuous KPI monitoring. The impact: effort is narrowed down based on data rather than assumptions.
Level 4: Controlled Changes – Systematically Updating Thousands of Content Items
Let us turn to a more complex example: for SEO optimization, blog posts are to be moved from the existing subdomain with the path scheme /year/month/slug to the main domain under /blog/slug. However, the body text of blog posts contains hard-coded links to other blog posts, whether as references or as the next part of a series. These need to be systematically updated in the body text to prevent broken links from the transition.
A modern CMS knows the relationships between content items: if a button references another page, the link remains intact even when the target is moved. The blog post content in our example, however, is stored as Markdown or Rich Text – this gives authors more flexibility and requires no CMS-specific knowledge, but it also means: links in body text are just plain text to the CMS, not managed references. The entire content of a blog post is just a single data field with longer content as far as the CMS is concerned. The update must therefore be done manually within that content.
Regular expressions (text-based search patterns, comparable to an advanced find-and-replace function) seem like the obvious solution – but the variety of link patterns makes a naive implementation error-prone:
- Links exist as
httpandhttps, with and without language prefix (/en/), with and without trailing slash (/slug/). - Search links use different parameters (
?s=instead of?q=). - Some URLs contain tracking parameters that need to be removed.
- At the same time, certain URLs must not be rewritten: links to content pages or author profiles.
- The content is Markdown with embedded HTML – code blocks and external links must not be altered.
- UTF-8 and umlauts in slug paths create additional edge cases.
The Prompt
Unlike the previous examples, this prompt is just the beginning of a longer conversation. Over its course, the transformation rules are further specified – such as which URL variants need to be recognized, which exceptions apply, and how edge cases are handled. Each response from GenAI delivers new code or tests that are reviewed, corrected, and extended. The specification does not emerge on paper upfront, but in dialogue with the AI based on the code generated at each step:
For SEO optimization, our blog posts are moving from
https://blog.example-company.com/2024/05/my-slugtohttps://www.example-company.com/blog/my-slug. Internal links within blog post content pointing to other blog posts still use the old scheme and need to be updated.Write a Python script that retrieves the content of all blog posts via the GraphQL API, identifies internal links in the Markdown body text, and transforms them to the new scheme. The repository contains existing Python code for the API integration and the
.envfile with the read token.Start with a first draft of the transformation rules and a test file with example URLs. We will then iteratively refine the specification together.
On this basis, GenAI generates a regex-based Python script that recognizes the various URL patterns and transforms them correctly. What is crucial here is the specification-driven approach: extensive specifications define the transformation rules, edge cases are captured as test specifications, tests run against the real API (read token for testing, write token for production), and the generated code undergoes a review before write access is granted. Especially with mass operations on production content, the edge cases are decisive. A purpose-built test blog post bundles all known edge cases and can be repeatedly processed by the script to ensure correctness. Parameterized tests systematically verify each URL variant: what should be transformed, what must remain unchanged.
Write access to production content requires guidance from software developers – both to understand the generated code and to ensure the right approach: test coverage, code review, controlled execution, to name just a few criteria. The consequences if the automated changes do not match the desired outcome can be severe. From hundreds of broken links to incomplete link texts and "�" characters instead of umlauts to truncated content – many things are possible. The reputational damage, both internal and external, is certain, while restoring the backup from last month also rolls back the work and changes of team colleagues. Experienced developers know that automated processing of several thousand blog posts creates additional server load, and extend the script with a batch function. This allows small batches to be tested first and manually reviewed before larger batches are processed.
The result is a script that can reliably correct several thousand blog posts – automated, reproducible, testable. Do I still have concerns before the first production run that something might go wrong? Yes – and as a human, I go through my mental checklist one more time and start with small batches, while the ever-optimistic AI agent claims after each iteration that the script is now ready and wants to run it immediately on all blog posts.
Success Factors – What We Learned
Good context is decisive. The better the provided API documentation and examples, the more useful the result. A working query as a starting point makes a real difference.
Iterative refinement pays off. GenAI's first attempt is rarely perfect – but corrections and extensions come quickly, as they build on what has already been generated. With each prompt, the analysis is gradually refined. It should also be noted that results can vary depending on the GenAI model and version. Course corrections within the conversation are therefore usually necessary and are a normal part of the workflow.
Testing is indispensable. Especially with mass operations on production content, edge cases should be systematically collected and maintained as test data. A dedicated test dataset with known special cases saves considerable time in the long run.
Purpose-built tooling is perfectly fine. The scripts don't need to be production-grade. They serve a concrete, time-limited purpose – and that is exactly where GenAI as an accelerator is ideal.
The human steers, the AI accelerates. Domain knowledge about the system, its history, and its quirks stays with the human. GenAI delivers execution speed, not the content strategy. The intensity of human control changes across the levels: at Level 1, every step is guided; at Level 4, the goal is set and what the agent produces is reviewed – but decision authority remains with the human. This goes hand in hand with: the more autonomy GenAI is given, the more important safeguards and verification become. With read-only access, the risk is lower. With write access to production content, thorough testing, code review, and controlled execution are indispensable – a guiding principle that my colleague also emphasizes in his blog post.
The levels build on each other. Nobody starts at Level 3 without prior experience. The levels form a learning curve: anyone who hits the limits of their understanding at Level 2 or 3 – for instance with a shell command or an API response – can always fall back to Level 1 and have GenAI explain what is happening. This way, your own understanding grows with each level. At the same time: not every problem requires the highest level. If you just want to clarify a question, you don't need an agent – a targeted research session at Level 1 is sufficient. The right level follows from the specific need, not from what is technically possible.
Closing the knowledge gap. Those who know the system can use GenAI to independently obtain answers for analysis – without having to wait for engineering capacity. However, anyone letting agents act autonomously should always understand what they are executing; beyond a certain point, guidance from developers is essential. But even then, the nature of the collaboration changes: the business side no longer needs to translate its concerns into technical requirements – it can articulate them in its own language, and GenAI handles the translation into the technical domain.
Analysis phase before execution phase. Fitness Functions provide the data foundation for informed decisions. Only once the analysis is in place – once it is clear which components are in use, which forms are orphaned, which dependencies exist – does execution follow.
Conclusion – GenAI as a Bridge Between Business and Technology
The four examples show: GenAI delivers its value not only when writing application code. Even in research, analysis, and data-driven decision-making, it creates tangible benefits. This changes the collaboration between the business side and engineering. With graduated autonomy across levels, domain experts can increasingly obtain their own answers – and even where guidance from developers remains necessary, the conversation shifts: away from technical implementation details, toward describing the business objective.
For the business side, this means concretely: instead of waiting for engineering capacity – which in many business departments simply does not exist – a large portion of the analysis can be carried out independently. The bottleneck shifts from technical execution to enablement in the use of GenAI. And this enablement is significantly easier to acquire than dedicated engineering resources.
If you'd like to go deeper: in our webinar on AI-assisted software modernization (German), we demonstrate with another practical example how GenAI supports modernization efforts. If you want to find out how to enable your own department with GenAI, our workshop on generative AI use cases offers a structured entry point.
What challenges have you already had a tool generated for? Share your experiences – we're curious to hear.
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Blog author
Patrick Krings
IT Consultant & Developer
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.