Popular searches
//

Ensuring accessibility with AI: what works today (and what doesn't)

2.6.2026 | 11 minutes reading time

Since June 2025, the Barrierefreiheitsstärkungsgesetz (BFSG), Germany's law implementing the European Accessibility Act, has been in effect. Most teams know they should be doing something about it, but in day-to-day work, the topic usually falls by the wayside. Too much effort, too specialized, not enough expertise on the team, and besides, there are more important things to do. At the same time, AI tooling is exploding. The tempting question is obvious: Can AI save us the effort of ensuring accessibility, or are we still far from that? I set out to explore this question. I tried different AI tools and approaches to see how far you can get with AI and accessibility as of May 2026. I'd like to share my findings with you.

AI and accessibility: two sides

In general, AI can support us on the topic of accessibility from two different directions: from the user perspective and from the developer perspective.

From the user perspective, AI could help by interpreting visual content and assisting users in navigating the internet. Examples include Seeing AI, UserWay or Be My AI, which promises:

"No more guessing. Whether it's a menu, a document or a tricky interface, AI helps you navigate with confidence".

However, this comes with certain risks and has its limitations. Descriptions can be inaccurate or even incorrect, and for motor impairments, for example, it doesn't help at all. Moreover, you can't assume that people with impairments will purchase such expensive devices or subscriptions. Therefore, the responsibility remains with us developers to ensure accessibility.

So: How can AI support us on the developer side? Let's take a look.

Where we stand today

Before we talk about tools, a quick reality check. Automated accessibility testing engines like axe-core detect between 20% and 57% of WCAG violations, depending on the study.

WCAG (Web Content Accessibility Guidelines) are international guidelines for making websites and digital content accessible.

The often-cited 57% figure from Deque is based on a modified metric that counts issues by frequency rather than by WCAG criteria. Independent comparative tests arrive at significantly lower numbers. The remaining violations cannot be detected automatically and must be found and assessed manually.

This is because different aspects of accessibility lend themselves to automation to varying degrees. Color contrasts, missing alt texts for images, or incorrect ARIA attributes (additional information for screen readers) can be reliably detected by a tool. Whether a heading hierarchy makes sense in terms of content is already more difficult. And whether a person using a screen reader can actually understand and operate the page would need to be tested manually by a human. Or can AI do it already?

The experiment

I took an application that wasn't fully accessible and tried to improve it using only AI tools. I skipped explicit accessibility checks in E2E tests and didn't test the page manually. Instead, I relied entirely on AI tools. The AI was supposed to understand the use cases from the code and test the page independently. Ideally, it should also test components that weren't directly visible, such as error messages or toasts.

I tried several approaches: a CI scanner that automatically creates issues; Claude Code with and without tools; specialized MCP servers; and a two-agent workflow that ultimately found over twenty real problems. I'd like to share what came out of each approach.

Level 1: The CI scanner: scan and fix

First, the idealistic vision: a tool scans the finished application, finds problems, and ideally fixes them itself. The GitHub AI-powered Accessibility Scanner was supposed to do exactly that. You define URLs, the scanner checks them, takes screenshots along the way, creates a GitHub issue for each problem, and optionally tasks Copilot with creating a fix as a pull request. The idea appealed to me immediately: aside from the CI pipeline configuration, you don't have to touch the code yourself. Everything else runs automatically.

Disclaimer: As of May 2026, the scanner is still in Public Preview and is being actively developed; the limitations described may change.

In practice, I ran into problems. The scanner couldn't log in because it looks for hard-coded labels in the login form: username and password. If the labels have different names or the app isn't in English, it bails out. This could be solved with a workaround: I handled authentication via an API call upfront and passed in the session cookies.

However, even with that, I didn't get real results: the scanner runs the axe scan before React has finished rendering the page, so it checks an empty DOM. It reported errors that didn't actually exist and immediately created more than 15 repetitive issues, one per URL. I couldn't solve the empty DOM problem.

And ultimately: under the hood, it's just an axe scan, that's not really "AI." It only checks the initially loaded view, no modals, no forms, no keyboard navigation, no VoiceOver, and above all, no assessment of "is it actually usable?" Even if the client-side rendering problem were solved, it wasn't worth the effort to me: you can build axe-core into your existing E2E tests faster than going through the whole setup.

Level 2: Claude Code

The next approach was simpler: I gave Claude Code a maximally simple and deliberately general prompt: "Make my app accessible." Without additional tools, Claude looks at the source code and fixes what it finds.

I had expected Claude to only find a few obvious missing aria-label attributes, but it found significantly more, for example:

  • missing button semantics,
  • non-focusable elements with click handlers,
  • contrast issues,
  • missing ARIA states,
  • <div> elements acting as buttons,
  • issues with the heading hierarchy

With Playwright MCP, an interface through which AI assistants can call external browser tools, the approach can be extended: Claude can open the running app, navigate through pages, find runtime issues, and fix them directly. It discovered things that the static code review couldn't find:

  • page titles didn't update on route changes
  • a combobox had aria-label="null" as a string — a runtime bug that looks like a valid variable in the code
  • clickable table rows were partially unreachable for keyboard users
  • the login form used only placeholders instead of real labels, hard to spot in the code since an external component library is used.

The page got better as a result, but it was still far from fully accessible.

The problem is the lack of systematic approach. Claude does reference WCAG rules when it finds problems, but it doesn't work through a checklist. It walks through the app and fixes what catches its eye. Playwright MCP's browser_snapshot would have returned the accessibility tree, but it wasn't even used. With it, Claude could have seen which elements have no understandable name for screen readers and which roles are missing or incorrectly set. The landmark structure from a screen reader's perspective would have been visible too. Instead, it relied on what was visually apparent. In the end, the app was better than before, but I had no confidence in what was still missing.

Level 3: MCP servers for accessibility

With MCP, AI assistants can call external tools that are specifically predefined and developed for the task at hand. From this step, I hoped to gain more systematic structure with specialized MCP servers for accessibility on one hand, and qualitatively better results on the other.

I tried different open-source MCP servers. Unfortunately, when it came to live testing, they all had the same login problem as the GitHub Accessibility Scanner. I managed to solve it with a workaround for one server.

The MCP server MCP Accessibility Scanner runs as a Docker container with its own Playwright browser and comes with over 30 different tools. Besides the usual browser tools (navigate, click, type, screenshot), it has dedicated accessibility tools:

scan_page

axe-core scan with configurable WCAG tag filters (WCAG 2.0 to 2.2, Level A to AAA)

audit_keyboard

simulates tab navigation, checks skip links, focus visibility, and focus traps

scan_page_matrix

repeats the scan automatically across different viewports, zoom levels, and media features (forced colors, reduced motion)

audit_site

crawler that follows internal links and scans all pages found

browser_snapshot

returns the accessibility tree of the page as a screen reader sees it

The server couldn't log in out of the box either, but with Claude's help and a workaround, it managed: Vite configuration adjusted, login via native JavaScript setters instead of the built-in tools. The result was qualitatively different from the previous levels: a systematic keyboard audit uncovered invisible focus points and focus jumps. It also found a touch target issue according to WCAG 2.2, which a normal axe scan doesn't even check. All results came back as reports with concrete WCAG references.

The MCP server itself finds problems and reports them, but it doesn't understand the code and doesn't suggest fixes. It also only tests the specified URLs in their initial state, no dynamic content like dialogs or error messages. An AI agent could work through the findings and change code, but without expertise in accessibility best practices, the result wouldn't be better than Level 2. The next question was: What happens when you deploy specialized agents that know what to look for?

Level 4: The two-agent workflow

I wanted to go a step further and deploy agents specialized in accessibility. At Awesome GitHub Copilot, you can find various open-source agents. I "hired" an Accessibility Expert and an Accessibility Runtime Tester for the project and ran them in parallel.

The Accessibility Expert analyzes source code against WCAG 2.1/2.2: semantic correctness, keyboard and focus behavior, ARIA roles and states, form labeling, focus management in modals, live regions for dynamic updates. It follows a defined protocol and also checks components that aren't currently visible on screen, which is the key difference from Level 2.

The Accessibility Runtime Tester navigates through critical user flows via keyboard, opens dialogs and checks focus behavior, injects axe-core with WCAG filters, and uses browser_snapshot to see which elements have no name for screen readers. The tester's guiding question is: "Can a keyboard user complete this flow from start to finish?"

The two agents can link their findings, discuss with each other, and help each other out. The Accessibility Expert identified an interactive element without keyboard support in the code, and the Runtime Tester confirmed that it wasn't reachable via keyboard. A static scan delivers only one of those two pieces of information.

In the end, the agents found significantly more problems than Claude alone in Level 2, presumably thanks to best practices and concrete checklists. Although I also had to emphasize here that they should please open ALL popups and modals and please click through EVERYWHERE.

A few errors that were found with the MCP server's tools in Level 3 were missed by the agents, but they additionally tested user flows. What was newly discovered:

  • interactive elements that work via click but aren't reachable via keyboard because role, tabIndex, or keyboard handlers are missing
  • focus after closing a dialog: focus jumps to <body> instead of returning to the triggering element

The setup effort is the lowest of all tested approaches: drop two Markdown files into the project, done. No npm install, no Docker, no CLI configuration. The result isn't deterministic and isn't suitable as a CI check, but for a one-time structured review from which you can derive targeted E2E tests, I was able to achieve quite convincing results.

Level 5: Combining agents and MCP servers

The logical consequence of everything: deploy the specialized agents from Level 4 together with the MCP server from Level 3. When the Accessibility Runtime Tester has access to the MCP server's tools, it can build them directly into its testing workflow instead of navigating everything manually.

The MCP server provides measurements: contrast ratios, pixel sizes, tab orders. The agent interprets them, links them to the code, and can assess whether a problem actually blocks a user. The tools provide more context, which hopefully means we can expect more precise findings.

The combination of specialized agents and MCP servers delivers more than any approach alone: deterministic measurements from the MCP tools, contextual understanding from the agent, and user flows that a pure scanner would never cover.

Conclusion

Can you build a fully accessible application with AI alone? Based on my experiments: not yet. But you can get significantly further than I expected.

Above all, it allows you to quickly find and fix a lot of obvious problems without specialized knowledge, making applications more accessible than before, but not 100%, even if the AI claims otherwise. Because: "The tool says it's okay" doesn't mean "It's okay."

The example of accessibility overlays illustrates this once more: tools like accessiBe promised to make any website accessible via an AI widget. The US Federal Trade Commission fined the company one million dollars in early 2025 for misleading claims.

The specialized agents partially manage to trigger and check dynamic elements like toasts, error messages, or dialogs, but not all of them. With documented user stories as input, this would presumably be more reliable, provided the documentation already exists. And even then: whether VoiceOver announces a button at the right moment depends on the context and the sequence of interactions. Non-technical requirements that need contextual knowledge can't be evaluated by any tool: Is the page structure logical? Are form elements meaningfully labeled? Is the error handling understandable?

The ecosystem is evolving rapidly. What didn't exist a year ago is now available as an open-source project and works: MCP as a standard, specialized accessibility agents, axe-core directly in the IDE. The tools are still young, but the direction is right. AI is already making accessibility audits significantly more efficient today, but it replaces neither the understanding of what accessibility truly means nor manual validation.

share post

//

More articles in this subject area

Discover exciting further topics and let the codecentric world inspire you.