Create your own parser with the Nearley Parser

24.10.2018 | 5 minutes reading time

For a project I needed to parse data that is being delivered through email; yes, I know, when was the last time you received production data through email instead of a clean REST API? For sure it is not an ideal interface but it is one we have to deal with as we cannot change the source to provide a clean REST API on short notice; and perhaps also familiar, the project needs to deliver value as soon as possible. Luckily I came across the Nearly Parser.

Nearley Parser

So at first you might think to try and parse the email and extract the data by using regular expressions or even just on keywords, rows, and columns. However, the nature of email and mail servers can cause changes to the content of the message like messages being forwarded or signatures being added or text changing into HTML. So I needed something that is more intelligent and robust to parse these mail messages.

Searching for a document parser that could do the job, I found something more interesting. Why would you not explain the parser how to read your document and as a result provide you with some meaningful JSON objects containing your data?

The tool for this job is the Nearley Parser. It was named after its inventor Jay Earley.

To really go into depth on how this parsing algorithm works, I would recommend you to read the explanation of the algorithm by the author himself.

We can make the algorithm work for us by providing a definition file called the grammar file. But before we do, we can experiment with our grammar file with an online tool which is obviously called the Nearley Playground . Here you create a test, which is basically the content you need to parse, and provide the grammar which is real-time compiled, and it directly shows you the result.

Nearley Parsing Primer

The data we need to parse is coming from a sensor and contains, among other things, an information status about its battery. This data needs to be read and stored in a document store. So wouldn’t it be great if we could feed this piece of data to a document parser that can parse this and that returns a JSON object containing the data?

So the first sentence of data we are interested in:


Battery: 51%, 4.01 Volt

The sentence starts with a word, then a colon, white space and the data.

The Nearley Parser reads each line of the document and treats every word and character as what is called a “terminal”. When there is a match with one of the strings or characters, we can use a post-processor method to actually do something with the data and in this case construct a JSON object.

So to parse “Battery:”, the grammar is


sentence -> “Battery:”

But our line contains more characters. To tell the parser there can be one or more spaces after “Battery:”, we can use the built-in function “whitespace.ne” simply by adding an underscore in our grammar. So now our grammar becomes:


sentence -> “Battery:” _

Then we encounter our first value; a percentage value. To parse this we can use the built-in function “number.ne” as follows:


“sentence -> “Battery:” _ percentage

This way we can parse the complete line with the following grammar, including the built-in functions:


@builtin "whitespace.ne" # `_` means arbitrary amount of whitespace
@builtin "number.ne"     # `int`, `decimal`, and `percentage`

sentence -> “Battery:” _ percentage “,” _ decimal _ “Volt”

All this can be easily run as an experiment on the Nearley playground website.

Real usage

This is all nice in a playground web application to learn and write your grammar file, but now we need to put this to use in an application.

Nearley consists of two components, the compiler and the parser. The compiler is used to compile your grammar file and can be used with the parser and your document to be parsed.

Both components are available as npm packages and can be installed through npm.

To install the parser in your project:


npm install --save nearley

This will add it as dependency in the package.json

To use the compiler to compile your grammar, install it as follows:


npm install -g nearley

Store the grammar as shown above into a file called grammar.ne and compile it using the following command:


nearleyc grammar.ne -o grammar.js

This will compile the grammar file into a JavaScript Parser module. Now we can use the test tool provided by the Nearley compiler:


nearley-test ./grammar.js --input “Battery: 51%, 4.01 Volt”

And it will show the results:


Parse results:
[ [ 'Battery:', null, 0.51, null, ',', null, 4.01, null, 'Volt' ] ]

As you can see, it will output some arrays containing our data and also some null values for the whitespaces in our string. To clean this up and make it return a JSON object, we can add a post-processing method as follows:


sentence -> "Battery:" _ percentage _ "," _ decimal _ "Volt" 
  {% ([,,level,,,,volts]) => 
    ({battery:{percentage: level, value: volts}}) %}

When we compile and test this, we get the following result:


Parse results:
[ { battery: { percentage: 0.51, value: 4.01 } } ]

To use this in your code, you can simply include the parser and provide your compiled grammar and data and you get the result back as an array:


const nearley = require("nearley");
const grammar = require(“./grammar.js");
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));

console.log(parser.feed("Battery: 51%, 4.01 Volt”));

Conclusion

I have written numerous lines of code which parses some data in one or another form; single lines and multilines. So also for this problem at hand, my first choice was to use some kind of regular expression to parse each line in the mail message. Which of course could have worked well, but the knowledge that the content in the mail can vary made me look for something that can handle this content variation in a clean manner without creating a complex unreadable regular expression or complex piece of code; instead the Nearley Parser grammar provides you with clear, semantically readable code.

So luckily my search brought the Nearley Parser to my attention and although it has a steep learning curve, you can create something useable quite quickly. Yes, the above example is just one line that is parsed and could have been done much quicker with a regular expression. However, as this line is somewhere in the message and also has some variation in spacing and there are numerous other pieces of data in the message that you want to read, it can become more complex quite quickly. To be fair, I am definitely not calling myself an expert on the vocabulary of the Nearley Parser, but I thought it was worth spreading the word!

Was this post helpful?

Blog author

Harald Rietman

Do you still have questions? Just send me a message.

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

20 years of coding

We all grow older. It is simply inevitable. As the saying goes, The only way to not grow old is to die young. Recently, I've completed my 20th year in the development industry. Through academia, consulting, and a stint in product development, I've learned...

Software development
Training
Culture

11.4.2025 | 10 minutes reading time

Elisabeth Schulz

Hexagonal Architecture is just an island

Imagine an island called "Alistair Island." This island is a vibrant place with houses, fertile soil, and a well-coordinated community of residents who live by well-defined routines. Every activity on the island has significance and serves a specific...

Software architecture
Testing
Software development

22.1.2025 | 10 minutes reading time

Danny Steinbrecher

Spring and Vue - A setup for small projects (Part 2)

In the first part we presented a setup for a combination of Spring Boot and Vue.js. Now we have to look at how to connect two type-safe languages, TypeScript for the frontend and Java for the backend, through a REST-API and in a type-safe manner. We ...

Spring
Frontend
API
JavaScript
Java

17.1.2025 | 10 minutes reading time

Roger Butenuth

Nils Winking

Spring and Vue - A setup for small projects (Part 1)

Quickly adding a new Vue.js application to an existing Spring Boot project should be pretty easy, or at least a googleable problem, or so we thought. But in the end, it wasn't. However, with the right combination of configuration, components, and some...

Spring
Frontend
JavaScript
Java
API

10.1.2025 | 8 minutes reading time

Roger Butenuth

Nils Winking

ArchUnit in practice: Keep your Architecture Clean

Who hasn’t been there: A new project kicks off or the old code finally needs a cleanup. A big meeting with all the developers is called: “This time, we’ll do it right—clean, correct, and structured!” Architecture Decision Records (ADRs) are created to...

Software architecture
Java
Kotlin
Software development

20.9.2024 | 18 minutes reading time

Danny Steinbrecher

Integrating Dapr with Azure Kubernetes Service (AKS): Portability is key

In a recent blog post, we explored how Dapr works and how to test it on a simple local Kubernetes cluster. One of Dapr's key advantages is its component system, which enhances portability. In this post, we'll take our previously daperized demo app and...

Software development
Cloud
Azure
Cloud native

22.7.2024 | 10 minutes reading time

Manuel Zapf

React is dead, long live React - React 19 is here

The world of frontend development has changed once again, and this time React 19 is leading the way. This version brings a variety of new features and improvements, but the most exciting innovation is the brand new compiler, which already requires React...

React
Frontend
Software development
JavaScript
Webdevelopment

19.7.2024 | 6 minutes reading time

Michel Ehmen

Exploring Dapr: A Deep Dive into Distributed Application Runtime

In a recent blog post, we introduced Dapr (Distributed Application Runtime) and highlighted its potential as a valuable tool for cloud-native applications, in combination with Aspire. This post dives deeper into the inner workings of Dapr, explaining...

Software development
Cloud native
Software architecture
Open Source

10.7.2024 | 10 minutes reading time

Manuel Zapf

Spring Boot and HTMX: The boring app

Motivation Most apps I touched in the wild follow the same two tiered approach. A backend delivering JSON (some may call this REST) and a frontend framework, consuming JSON from the backend converting it to the HTML displayed to the user. Worst case,...

Software architecture
Software development
Spring
Kotlin

28.6.2024 | 16 minutes reading time

Server Actions in Next.js 14

Server Actions were introduced in Next.js 14 as a new method to send data to the server (see the documentation). They are asynchronous functions that can be used in server components, within server-side forms, as well as in client-side components. While...

Webdevelopment
React
JavaScript

10.6.2024 | 9 minutes reading time

Lukas Lehmann

Charge your APIs Volume 25: Contract Testing

I feel the way we do integration testing is sort of like setting your house on fire to test your smoke alarm. It is excessive, tiresome and way too costly. This is not a quote from myself. I typically don't come up with such good ideas when I need....

Testing
Software development
API

2.4.2024 | 11 minutes reading time

Pasquale Brunelli

A/B Testing: Tool support and testing GrowthBook

In the previous blog post we introduced some general concepts of A/B testing: we explored the main aspects, defined test types and explained the most common statistical methods. Now we want to explore the areas in which A/B testing tools can provide...

Testing
Python
Data
UX/UI
Analysis
JavaScript

18.3.2024 | 20 minutes reading time

Francesca Diana

How to gain visibility as a software developer?

No matter if junior, medior or senior, introverted or extroverted: Every software developer can increase their visibility with different tools and should treat the topic as important. The only question is: how and with what effort? In this blog post,...

Training
Software development
Community
Open Source

21.2.2024 | 6 minutes reading time

Building desktop apps with web technologies

Building desktop apps with web technologies In this article I share insights into Electron and what to consider when shipping an desktop app with Electron. After that I introduce you to a new alternative called Tauri. It the end I provide an estimation...

Frontend
JavaScript
Node.js
Open Source
Webdevelopment

20.9.2023 | 13 minutes reading time

Modern data fetching with Redux Toolkit Query

First released seven years ago, Redux was already modernized four years ago with Redux Toolkit (RTK). Then in June 2021, Redux reached the next stage of evolution by adding a dedicated data fetching solution with Redux Toolkit Query. With respect to ...

React
JavaScript
Frontend

28.2.2023 | 10 minutes reading time

Björn Heiß

The best of both worlds: Harnessing the benefits of object-oriented and...

Functional programming and OOP are often viewed as two separate paradigms in programming. And it is true that programming languages lean more towards one or the other, which influences how we are "supposed to" solve a problem in this language. In this...

Pattern
Functional programming
Software development

1.2.2023 | 8 minutes reading time

Thomas Buß

Targeting WebAssembly with Go

When it comes to building applications in WebAssembly, Rust and C++ are the two most frequently used languages at the moment. However, both languages have relatively steep learning curves, so it makes sense to also consider other options. Go is known...

Webdevelopment
Go
JavaScript

10.11.2022 | 3 minutes reading time

API consumers – between search and feedback

Approaches for API consumers“We do know our consumers. We know exactly what they want.” Very often I hear these two sentences at the beginning or in the middle of projects. But who is a consumer of an API or a digital product in the fist place?This is...

API
Software development

19.9.2022 | 9 minutes reading time

Daniel Kocot

Hotwire: A new (old) approach for modern web applications

Hotwire (HTML over the wire) was introduced by Basecamp in late 2020 and promises to be an alternative approach to developing modern web applications with less JavaScript:Hotwire is an alternative approach to building modern web applications without...

Frontend
Software architecture
Microservices
JavaScript
Webdevelopment

30.8.2022 | 10 minutes reading time

Intro to monorepo with Nx

What is a monorepo?To understand why you may want to use the build system Nx (see website ), we should first talk about what a monorepo is and why you may want to use one. A monorepo is a repository (probably a Git repository) that contains more than...

Frontend
Git
Node.js
React
JavaScript

10.7.2022 | 10 minutes reading time