Ensuring accessibility with AI: what works today (and what doesn't)

2.6.2026 | 11 minutes reading time

Since June 2025, the Barrierefreiheitsstärkungsgesetz (BFSG), Germany's law implementing the European Accessibility Act, has been in effect. Most teams know they should be doing something about it, but in day-to-day work, the topic usually falls by the wayside. Too much effort, too specialized, not enough expertise on the team, and besides, there are more important things to do. At the same time, AI tooling is exploding. The tempting question is obvious: Can AI save us the effort of ensuring accessibility, or are we still far from that? I set out to explore this question. I tried different AI tools and approaches to see how far you can get with AI and accessibility as of May 2026. I'd like to share my findings with you.

AI and accessibility: two sides

In general, AI can support us on the topic of accessibility from two different directions: from the user perspective and from the developer perspective.

From the user perspective, AI could help by interpreting visual content and assisting users in navigating the internet. Examples include Seeing AI, UserWay or Be My AI, which promises:

"No more guessing. Whether it's a menu, a document or a tricky interface, AI helps you navigate with confidence".

However, this comes with certain risks and has its limitations. Descriptions can be inaccurate or even incorrect, and for motor impairments, for example, it doesn't help at all. Moreover, you can't assume that people with impairments will purchase such expensive devices or subscriptions. Therefore, the responsibility remains with us developers to ensure accessibility.

So: How can AI support us on the developer side? Let's take a look.

Where we stand today

Before we talk about tools, a quick reality check. Automated accessibility testing engines like axe-core detect between 20% and 57% of WCAG violations, depending on the study.

WCAG (Web Content Accessibility Guidelines) are international guidelines for making websites and digital content accessible.

The often-cited 57% figure from Deque is based on a modified metric that counts issues by frequency rather than by WCAG criteria. Independent comparative tests arrive at significantly lower numbers. The remaining violations cannot be detected automatically and must be found and assessed manually.

This is because different aspects of accessibility lend themselves to automation to varying degrees. Color contrasts, missing alt texts for images, or incorrect ARIA attributes (additional information for screen readers) can be reliably detected by a tool. Whether a heading hierarchy makes sense in terms of content is already more difficult. And whether a person using a screen reader can actually understand and operate the page would need to be tested manually by a human. Or can AI do it already?

The experiment

I took an application that wasn't fully accessible and tried to improve it using only AI tools. I skipped explicit accessibility checks in E2E tests and didn't test the page manually. Instead, I relied entirely on AI tools. The AI was supposed to understand the use cases from the code and test the page independently. Ideally, it should also test components that weren't directly visible, such as error messages or toasts.

I tried several approaches: a CI scanner that automatically creates issues; Claude Code with and without tools; specialized MCP servers; and a two-agent workflow that ultimately found over twenty real problems. I'd like to share what came out of each approach.

Level 1: The CI scanner: scan and fix

First, the idealistic vision: a tool scans the finished application, finds problems, and ideally fixes them itself. The GitHub AI-powered Accessibility Scanner was supposed to do exactly that. You define URLs, the scanner checks them, takes screenshots along the way, creates a GitHub issue for each problem, and optionally tasks Copilot with creating a fix as a pull request. The idea appealed to me immediately: aside from the CI pipeline configuration, you don't have to touch the code yourself. Everything else runs automatically.

Disclaimer: As of May 2026, the scanner is still in Public Preview and is being actively developed; the limitations described may change.

In practice, I ran into problems. The scanner couldn't log in because it looks for hard-coded labels in the login form: username and password. If the labels have different names or the app isn't in English, it bails out. This could be solved with a workaround: I handled authentication via an API call upfront and passed in the session cookies.

However, even with that, I didn't get real results: the scanner runs the axe scan before React has finished rendering the page, so it checks an empty DOM. It reported errors that didn't actually exist and immediately created more than 15 repetitive issues, one per URL. I couldn't solve the empty DOM problem.

And ultimately: under the hood, it's just an axe scan, that's not really "AI." It only checks the initially loaded view, no modals, no forms, no keyboard navigation, no VoiceOver, and above all, no assessment of "is it actually usable?" Even if the client-side rendering problem were solved, it wasn't worth the effort to me: you can build axe-core into your existing E2E tests faster than going through the whole setup.

Level 2: Claude Code

The next approach was simpler: I gave Claude Code a maximally simple and deliberately general prompt: "Make my app accessible." Without additional tools, Claude looks at the source code and fixes what it finds.

I had expected Claude to only find a few obvious missing aria-label attributes, but it found significantly more, for example:

missing button semantics,
non-focusable elements with click handlers,
contrast issues,
missing ARIA states,
<div> elements acting as buttons,
issues with the heading hierarchy

With Playwright MCP, an interface through which AI assistants can call external browser tools, the approach can be extended: Claude can open the running app, navigate through pages, find runtime issues, and fix them directly. It discovered things that the static code review couldn't find:

page titles didn't update on route changes
a combobox had aria-label="null" as a string — a runtime bug that looks like a valid variable in the code
clickable table rows were partially unreachable for keyboard users
the login form used only placeholders instead of real labels, hard to spot in the code since an external component library is used.

The page got better as a result, but it was still far from fully accessible.

The problem is the lack of systematic approach. Claude does reference WCAG rules when it finds problems, but it doesn't work through a checklist. It walks through the app and fixes what catches its eye. Playwright MCP's browser_snapshot would have returned the accessibility tree, but it wasn't even used. With it, Claude could have seen which elements have no understandable name for screen readers and which roles are missing or incorrectly set. The landmark structure from a screen reader's perspective would have been visible too. Instead, it relied on what was visually apparent. In the end, the app was better than before, but I had no confidence in what was still missing.

Level 3: MCP servers for accessibility

With MCP, AI assistants can call external tools that are specifically predefined and developed for the task at hand. From this step, I hoped to gain more systematic structure with specialized MCP servers for accessibility on one hand, and qualitatively better results on the other.

I tried different open-source MCP servers. Unfortunately, when it came to live testing, they all had the same login problem as the GitHub Accessibility Scanner. I managed to solve it with a workaround for one server.

The MCP server MCP Accessibility Scanner runs as a Docker container with its own Playwright browser and comes with over 30 different tools. Besides the usual browser tools (navigate, click, type, screenshot), it has dedicated accessibility tools:

`scan_page`	axe-core scan with configurable WCAG tag filters (WCAG 2.0 to 2.2, Level A to AAA)
`audit_keyboard`	simulates tab navigation, checks skip links, focus visibility, and focus traps
`scan_page_matrix`	repeats the scan automatically across different viewports, zoom levels, and media features (forced colors, reduced motion)
`audit_site`	crawler that follows internal links and scans all pages found
`browser_snapshot`	returns the accessibility tree of the page as a screen reader sees it

The server couldn't log in out of the box either, but with Claude's help and a workaround, it managed: Vite configuration adjusted, login via native JavaScript setters instead of the built-in tools. The result was qualitatively different from the previous levels: a systematic keyboard audit uncovered invisible focus points and focus jumps. It also found a touch target issue according to WCAG 2.2, which a normal axe scan doesn't even check. All results came back as reports with concrete WCAG references.

The MCP server itself finds problems and reports them, but it doesn't understand the code and doesn't suggest fixes. It also only tests the specified URLs in their initial state, no dynamic content like dialogs or error messages. An AI agent could work through the findings and change code, but without expertise in accessibility best practices, the result wouldn't be better than Level 2. The next question was: What happens when you deploy specialized agents that know what to look for?

Level 4: The two-agent workflow

I wanted to go a step further and deploy agents specialized in accessibility. At Awesome GitHub Copilot, you can find various open-source agents. I "hired" an Accessibility Expert and an Accessibility Runtime Tester for the project and ran them in parallel.

The Accessibility Expert analyzes source code against WCAG 2.1/2.2: semantic correctness, keyboard and focus behavior, ARIA roles and states, form labeling, focus management in modals, live regions for dynamic updates. It follows a defined protocol and also checks components that aren't currently visible on screen, which is the key difference from Level 2.

The Accessibility Runtime Tester navigates through critical user flows via keyboard, opens dialogs and checks focus behavior, injects axe-core with WCAG filters, and uses browser_snapshot to see which elements have no name for screen readers. The tester's guiding question is: "Can a keyboard user complete this flow from start to finish?"

The two agents can link their findings, discuss with each other, and help each other out. The Accessibility Expert identified an interactive element without keyboard support in the code, and the Runtime Tester confirmed that it wasn't reachable via keyboard. A static scan delivers only one of those two pieces of information.

In the end, the agents found significantly more problems than Claude alone in Level 2, presumably thanks to best practices and concrete checklists. Although I also had to emphasize here that they should please open ALL popups and modals and please click through EVERYWHERE.

A few errors that were found with the MCP server's tools in Level 3 were missed by the agents, but they additionally tested user flows. What was newly discovered:

interactive elements that work via click but aren't reachable via keyboard because role, tabIndex, or keyboard handlers are missing
focus after closing a dialog: focus jumps to <body> instead of returning to the triggering element

The setup effort is the lowest of all tested approaches: drop two Markdown files into the project, done. No npm install, no Docker, no CLI configuration. The result isn't deterministic and isn't suitable as a CI check, but for a one-time structured review from which you can derive targeted E2E tests, I was able to achieve quite convincing results.

Level 5: Combining agents and MCP servers

The logical consequence of everything: deploy the specialized agents from Level 4 together with the MCP server from Level 3. When the Accessibility Runtime Tester has access to the MCP server's tools, it can build them directly into its testing workflow instead of navigating everything manually.

The MCP server provides measurements: contrast ratios, pixel sizes, tab orders. The agent interprets them, links them to the code, and can assess whether a problem actually blocks a user. The tools provide more context, which hopefully means we can expect more precise findings.

The combination of specialized agents and MCP servers delivers more than any approach alone: deterministic measurements from the MCP tools, contextual understanding from the agent, and user flows that a pure scanner would never cover.

Conclusion

Can you build a fully accessible application with AI alone? Based on my experiments: not yet. But you can get significantly further than I expected.

Above all, it allows you to quickly find and fix a lot of obvious problems without specialized knowledge, making applications more accessible than before, but not 100%, even if the AI claims otherwise. Because: "The tool says it's okay" doesn't mean "It's okay."

The example of accessibility overlays illustrates this once more: tools like accessiBe promised to make any website accessible via an AI widget. The US Federal Trade Commission fined the company one million dollars in early 2025 for misleading claims.

The specialized agents partially manage to trigger and check dynamic elements like toasts, error messages, or dialogs, but not all of them. With documented user stories as input, this would presumably be more reliable, provided the documentation already exists. And even then: whether VoiceOver announces a button at the right moment depends on the context and the sequence of interactions. Non-technical requirements that need contextual knowledge can't be evaluated by any tool: Is the page structure logical? Are form elements meaningfully labeled? Is the error handling understandable?

The ecosystem is evolving rapidly. What didn't exist a year ago is now available as an open-source project and works: MCP as a standard, specialized accessibility agents, axe-core directly in the IDE. The tools are still young, but the direction is right. AI is already making accessibility audits significantly more efficient today, but it replaces neither the understanding of what accessibility truly means nor manual validation.

Was this post helpful?

Blog author

Elina Onchul

Do you still have questions? Just send me a message.

Agentic Engineering: Where Loops Fail in Practice and Why

Boris Cherny, Head of Claude Code at Anthropic, said a sentence that went through the tech scene: "I don't prompt Claude anymore, I write loops that prompt Claude." That sounds like elegance, like acceleration, like the future. But it also sounds like...

AI
Generative AI
Software development

21.7.2026 | 10 minutes reading time

Holistic AI Transformation: 7 challenges beyond tool choice

What is an AI transformation? AI transformation refers to the organizational introduction of AI technologies in a company and the accompanying changes in processes, roles, and competencies. It is not a tool rollout, but the systematic interplay of technology...

AI
Change Management

16.7.2026 | 9 minutes reading time

AI Code Review: Why Loops Without Tests Are Dangerous

In Part 1 we sorted out the three market terms: Context, Harness, Loop Engineering. But Addy Osmani himself warns of a concrete risk: loops without verification keep running, even when the output is wrong. "Whoever writes the loop often no longer understands...

AI
Generative AI
Software development
Software architecture

15.7.2026 | 10 minutes reading time

Marcel Mikl

Loop Engineering, Harness Engineering, Context Engineering: what's the...

Boris Cherny, Head of Claude Code at Anthropic, said: "I don't prompt Claude anymore. I write loops that prompt Claude." Only days later, on June 7, 2026, Addy Osmani, Engineering Lead at Google Chrome, turned that into the term Loop Engineering. Since...

AI
Generative AI
Software development

5.7.2026 | 12 minutes reading time

Benjamin Font Pera

Selfhosting AI models in your kuberenetes clusters

AI is on everybody's mind nowadays. While some organizations have the possibility to use externally hosted models from e.g. Anthropic, Google, ..., others might not have those options. There are multiple options to host AI models on your own hardware...

LLM
AI
Compliance
regulatory

3.7.2026 | 7 minutes reading time

Why every redesign breaks your Playwright project — and how three layers...

TL;DR: We show how a structural separation of UI selectors and business logic can look like when using Playwright, adapting the proven Robot Pattern into the Layered Robot Pattern. This way, browser automation can proceed without fear of UI changes. ...

AI
Software development
Frontend
Testing
Pattern
UX/UI
Test Driven Development
Software architecture
Resilience
Webdevelopment
BDD
Android

3.7.2026 | 9 minutes reading time

Lars Jouon

Rebecca Jox

Replacing Low-Code Platforms with AI-Driven Custom Development in Healthcare

A healthcare software solution needs to be developed to aggregate information (e.g., patient data, diagnoses, lab results) from various medical systems and provide it to another component for further processing via a custom-defined API. The system must...

AI
Software development
Integration

27.6.2026 | 8 minutes reading time

Christian Langmann

Autonomous development workflows with Claude Code

Most developers today use AI tools as faster autocomplete. Over the past few months, on a client project, I took a different path: multi-agent setups with Claude Code, where specialized agents work in parallel, review one another, and coordinate on their...

AI
Software development
Generative AI

22.6.2026 | 17 minutes reading time

Christoph Dalski

From prompt to product: Why the design step matters

Anyone working with AI-assisted coding assistants today knows the promise: Type a description, and seconds later a working interface appears. Tools like Cursor, Claude Code, or GitHub Copilot deliver increasingly impressive results. Yet what is convincing...

AI
UX/UI
Frontend
Generative AI

16.6.2026 | 9 minutes reading time

Michel Ehmen

Playwright Auth Mocking Done Right: No Runtime Flags, No Factory Patterns...

When you work on a project that uses a third-party authentication provider, you will inevitably face this question: how do I run my Playwright tests without dealing with real login flows? Real authentication involves browser redirects, multi-factor prompts...

Frontend
Testing

28.5.2026 | 8 minutes reading time

Maryna Tochkova

Building MCP Servers with Spring AI

Introduction The Model Context Protocol (MCP) is an open standard that defines how AI models communicate with external tools, services, and data sources. It replaces ad-hoc integrations with a single, well-defined JSON-RPC 2.0 protocol, making it easy...

AI
Software development

17.5.2026 | 5 minutes reading time

Tobias Trelle

From Inference to Governance: Why Agent Metadata Matters When LLMs Already...

Modern LLMs demonstrate strong capability in inferring meaning from column names. A tool such as Genie can typically resolve pct_cust_attrit_q to "churn" or map rev_mrr_usd to a"MRR" through pattern recognition alone. On a small, well-structured table...

AI
LLM
Big Data
Database

15.5.2026 | 6 minutes reading time

Niklas Niggemann

The Accessible Domain: Knowledge Engineering for AI-Assisted Development

The Old Promise In the late 1970s, Stanford computer scientist Edward Feigenbaum coined the term "Knowledge Engineering". He described it as the process of extracting expert knowledge, structuring it, and making it usable within a software system. Central...

Generative AI
AI
LLM
Software Modernization
Software development

11.5.2026 | 10 minutes reading time

Johannes Barop

Benjamin Font Pera

Data Quality Powers AI Analytics: Building Trustworthy Genie Spaces in...

Garbage In, Garbage Out. This computing truism has never been more critical than in the age of AI. Large Language Models don't amplify poor data quality, they wrap it in confident-sounding prose that can mislead even experienced users. As organizations...

Generative AI
LLM
AI
Data

7.5.2026 | 8 minutes reading time

Niklas Niggemann

16,000 Tests in 4 Days – Reaching 80% Test Coverage with Claude Code

The Starting Point When we at codecentric recently took over a codebase from a previous service provider for a client, it quickly became clear that this would be no ordinary challenge. Backends, frontends, batch jobs, services — a grown application landscape...

AI
Software development
Testing

5.5.2026 | 12 minutes reading time

Selvarajah Sivarupan

Is Spring Boot Becoming Obsolete?

In March 2026, we kicked off a modernization project for a client. Spring Boot was an obvious choice. There was a strategic decision behind it. There was existing know-how. There was existing infrastructure. The team was set. The work began. One of the...

Generative AI
LLM
AI
Software development
Software architecture

27.4.2026 | 7 minutes reading time

Johannes Barop

EXACT Coding: AI-powered development that prioritizes quality over chaotic...

TL;DR Uncontrolled agentic coding (“vibe coding”) delivers code quickly—and often leads to security and maintenance issues as soon as the software goes live. EXACT Coding (Example-guided AI-Collaborative Test-driven Coding) combines best practices: ....

Generative AI
AI
Test Driven Development

22.4.2026 | 7 minutes reading time

Marco Emrich

Ferdinand Ade

The Ralph Wiggum Loop: Autonomous Code Generation with a Fresh Context

Ralph Wiggum is the simple-minded boy from The Simpsons who says things like "I'm learnding!" and eats glue. Of all people, he is now the namesake for a technique for autonomous code generation. The idea behind: If the thought of letting code be generated...

Generative AI
LLM
AI
Software development

6.4.2026 | 7 minutes reading time

Johannes Barop

KubeCon Europe 2026: AI agents go to production

tl;dr A summary of KubeCon Europe 2026: It is the year AI agents move from prototypes to production. This article covers what that means: giving agents verifiable identities, routing inference traffic with the new Gateway API Inference Extension, governing...

Cloud native
AI

31.3.2026 | 11 minutes reading time

AI Code Tsunami Hits the QA Dam: The End of Balanced Velocity

Note upfront: This article is specifically aimed at teams working on the modernization and further development of existing systems, not at greenfield projects where completely different rules apply. Everyone is talking about the massive productivity ...

Generative AI
AI
DevOps
Test Driven Development
Testing

30.3.2026 | 8 minutes reading time

Ensuring accessibility with AI: what works today (and what doesn't)

AI and accessibility: two sides

Where we stand today

The experiment

Level 1: The CI scanner: scan and fix

Level 2: Claude Code

Level 3: MCP servers for accessibility

Level 4: The two-agent workflow

Level 5: Combining agents and MCP servers

Conclusion

Was this post helpful?

Blog author

More articles in this subject area

Agentic Engineering: Where Loops Fail in Practice and Why

Holistic AI Transformation: 7 challenges beyond tool choice

AI Code Review: Why Loops Without Tests Are Dangerous

Loop Engineering, Harness Engineering, Context Engineering: what's the...

Selfhosting AI models in your kuberenetes clusters

Why every redesign breaks your Playwright project — and how three layers...

Replacing Low-Code Platforms with AI-Driven Custom Development in Healthcare

Autonomous development workflows with Claude Code

From prompt to product: Why the design step matters

Playwright Auth Mocking Done Right: No Runtime Flags, No Factory Patterns...

Building MCP Servers with Spring AI

From Inference to Governance: Why Agent Metadata Matters When LLMs Already...

The Accessible Domain: Knowledge Engineering for AI-Assisted Development

Data Quality Powers AI Analytics: Building Trustworthy Genie Spaces in...

16,000 Tests in 4 Days – Reaching 80% Test Coverage with Claude Code

Is Spring Boot Becoming Obsolete?

EXACT Coding: AI-powered development that prioritizes quality over chaotic...

The Ralph Wiggum Loop: Autonomous Code Generation with a Fresh Context

KubeCon Europe 2026: AI agents go to production

AI Code Tsunami Hits the QA Dam: The End of Balanced Velocity