Kofax Transformation Modules: Natural Language Processing, sentiments and entities

6.4.2020 | 8 minutes reading time

Kofax Transformation Modules (KTM) offers several tools for document classification and data extraction. There are some older blog articles about these tools:

– Document classification
– Data extraction with format locators
– Machine Learning

The recent version 6.3 of KTM got another interesting tool with the release of Service Pack 6.3.1: Natural Language Processing (NLP).

Natural Language Processing will analyse the text of a document and tries to understand the relation of words to gain information and knowledge.

The Kofax NLP package appears to be based on the Salience engine from Lexalytics.

Because NLP was introduces by a service pack, its documentation is kind of rudimentary. The main documentation of KTM doesn’t contain anything about NLP. It will be updated with the next main KTM release. However, parts of the NLP documentation are included within the Readme of the service pack.

This article provides an overview about the NLP installation, two new locators (extraction tools) related to NLP and individual expansions to entities and sentiments.

1. NLP package: installation
2. Entities and sentiments
3. Custom entities and sentiments

1. NLP package: installation

The NLP-package is not included in the installation sources of KTM 6.3 or ServicePack 6.3.1. However the ServicePack includes two new locators which will make use of the NLP package.

NLP is a separate installation package, wich can be downloaded at delivery.kofax.com . The download consists of three msi installation files which cover different languages:

KofaxTransformation_Salience6.4_LanguageBundle_western-default.msi
KofaxTransformation_Salience6.4_LanguageBundle_western-extended.msi
KofaxTransformation_Salience6.4_LanguageBundle_extended.msi

The three packages can be installed one after the other.

2. Entities and sentiments

The known tools (format locators, database locators, etc.) enable us to extract a lot of information from documents. But most of them are one-dimensional data as a date, numbers, amounts or strings.

The new entity locator enables KTM to recognize objects like persons, products, places, URLs, email addresses, schools, organizations, cities and so on. This locator is configurable. The result can be a list of all found entities (no matter which type), or the search can be restricted to a single entity type.

The second new locator (sentiment locator) tries to determine the sentiment of a document. Is there a positive mood within the text or is the document a complaint with a negative wording? The sentiment locator cannot be configured despite the region settings known from the other locators.

2.1 Entity locator

The entity locator may be used at every document class in the KTM class tree, just like every other locator. There is a specialty: within the class properties you have to configure the language to be used by the new locator. You can select the languages which were installed by the three msi install packages. This new setting can be found at the bottom of the class properties:

The following languages are selectable:

The upcoming examples will use the following letter:

The Entity-locator offers three different modes:

Mode 1: simple field and no filter on entity type

The result of this mode is a list of all entities found in the document:

Mode 2: simple field and filter on entity type

As expected, you will only get the results which meet the filter on entity type:

The result list of both modes will only list the entity text and not the entity type. This is unpleasant, especially in mode 1 without a filter on entity type.

But there is a third mode:

Mode 3: table field with or without filter on entity type

Before using this mode, you first have to create a simple table model (the same models as when using the table locator). I named the columns just Text, Confidence, EntityType and Sentiment and the tabel model is named Entity. This mode delivers the most meaningful results:

The result is a table with values for confidence, value/text, type and sentiment for each row. But I doubt the meaning of the sentiment in this context.

What are the advantages and disadvantages of mode 1/2 and mode 3?

The result of mode 1/2 can be assigned directly to a KTM field (the best alternative). But the field will only contain the value and not the type of the entity. The other alternatives of the locator may be queried by a KTM script as usual.

The results of mode 3 can only be assigned to a KTM table field, as it is a table. You get all alternatives in this table field and you get access to all values (confidence, value, entity type, sentiment) instead of only the entity value in mode 1/2.

Overall, I prefer the table mode of the entity locator because I have access to all result information at a single point.

2.2 Sentiment locator

This locator tries to find out the basic mood of a document.
The result of this locator is a value between -1.00 and +1.00 and may be assigned to a simple KTM field. Kofax explains the range of values as follows:

Positive: 0.12 to 1.00
Neutral: -0.025 to 0.11
Negative: -0.026 to -1.00

The sentiment locator automatically searches for a basic mood and has no configurable settings. But you can restrict the search area by specifying regions as known by other KTM locators.

The properties page of the sentiment locator:

A test with the sample document above gives the following result:

The value of +0.227859 suggest a positive general mood of the document. Which is correct in my opinion.

I cannot give a final assessment of this locator up to now because I did not have enough documents with ‘moods’ available to me. But since the locator is very easy to use, you can just let it run in various projects and gain experience with it.

3. Custom entities and sentiments

3.1 Custom entities

The NLP of KTM 6.3.1 contains several predefined entities as persons, places etc. Furtheron the NLP offers the possibility to define custom, project-specific entities and use them for your document processing.

An example when to use custom entites:

A company provides products that are broken down into categories, and these categories each have multiple model numbers. For basic correspondence the model number is not needed but the category is always needed. Because of this, you can provide a custom entity file that contains a list of model numbers for each category. When extraction is performed, a model number is recognized but the category is returned in the extraction results.

The procedure for using custom entities:

The project specific entity file must be located within the path of the KTM project:

…project directory\Custom\SalienceData\en\salience\entities\

If the path doesn’t exist, you have to create it manually.

For other languages you have to replace en by the language short cut (de = German, es = Spanish, etc.)

The name of defines the entity type which will be returned by the entity locator in table mode.

The entity file with the custom content must be placed in the entity directory. The name of the entity file doesn’t matter, but the file extension must be .cdl.

The lines of the cdl-file must have the following structure:

Search TextEntity LabelEntity Name

‘Entity Label’ is not supported up to now, but can be included in the file. If you don’t use entity label both must appear in the line.
‘Search Text’ will be searched within the document and ‘Entity Name’ will be returned as the result of the locator in the text column.

For example you need information about vehicle categories out of your documents. First you create an entity file MyEntity.cdl:

VW Tiguan      car         Volkswagen (car)
VW Golf        car         Volkswagen (car)
VW Bulli       truck       Volkswagen (truck)
Skoda Octavia  car         Skoda (car)
Skoda Fabia    car         Skoda (car)
Honda CBR 650  motorbike   Honda (motorbike)
Honda CBR 123  motorbike   Honda (motorbike)

The file is stored in the following path:

…Project Directory\Custom\SalienceData\en\salience\entities\vehicles\MyEntity.cdl

The directory vehicles defines the type of your custom entity.

Testing with a suitable document:

gives the following result:

The custom defined entities unfortunately do not appear as an entity type under the entity type filter.

In order to make the new custom entities known to KTM (in Project Builder), the underlying salience engine must be stopped once. This can be done under the properties of the document class where the entity locator was defined:

The engine is automatically restarted during the next extraction run.

3.2 Custom sentiments

Similar to the entities, you can also add custom definitions to the sentiment locator. The NLP provides a sentiment definition file for many languages.

The english sentiment file is located at this path:

C:\Program Files (x86)\Common Files\Kofax\Salience6.4\en\salience\sentiment

and its name is general.hsd.

This an extract from the provided general.hsd:

insolvent -0.501875
insomnia -0.49
inspiration 0.49
inspirational 0.6
inspire 0.6
inspired 0.6
inspiring 0.6
inspirit 0.53737307
inspirited 0.51783335
inspiriting 0.511
inspiritment 0.5266667
instability -0.5228571

First is the rated phrase and seperated by a the rated value between -1 and +1.

To change the rating of certain phrases or add new phrases you have to copy the file general.hsd into the path of your KTM project:

…Project directory\Custom\SalienceData\en\salience\sentiment

There you can edit the copied file general.hsd and add your custom phrases and ratings.

In order to make the new custom sentiments known to KTM (in Project Builder), the underlying salience engine must be stopped once, as described under ‘custom entities’.

Conclusion

Natural Language Pack for Kofax Transformaton Modules provides an interesting and promising extension for document classification and data extraction. I hope this article was able to provide an initial overview of the possibilities of the NLP and to replace the currently missing Kofax documentation.

Was this post helpful?

Blog author

Jürgen Voss

Do you still have questions? Just send me a message.

DeepFake: Detect AI-Generated Images in 5 Steps

We live in a time when an image is no longer a reliable guarantee of truth. AI‑generated content floods social media feeds, news platforms and messenger groups every single day, and only very few people are able to tell the difference. What once required...

IT-Security
AI
Generative AI
Search
Google
data protection
Digitalization

16.3.2026 | 5 minutes reading time

From Stories to Code: How Domain Storytelling and EventStorming Give LLMs...

The Broken Promise of AI-Assisted Development By now, most development teams have tried using an LLM to generate code. The results are familiar: syntactically correct, superficially plausible, and frequently wrong in ways that take hours to diagnose...

4.3.2026 | 15 minutes reading time

Don't Let Your AI Cheat: Isolated Specification Testing with Claude Code

AI agents are powerful — but they will cheat if you let them. Letting the same agent develop and test your application risks one thing: it will no longer fulfill the specification, it will simply learn to pass the tests. This article shows how to ...

AI
LLM
Testing

2.3.2026 | 12 minutes reading time

Thomas Jaspers

GenAI Beyond Application Code – Custom Tooling and Analysis

When GenAI in software development comes up, many think first of generating application code – and thus of developers as the primary audience. Yet much of its value lies precisely where the business side and technology intersect: in answering questions...

Generative AI
Software Modernization
Content Management

26.2.2026 | 19 minutes reading time

Patrick Krings

5 reasons we developers misjudge agentic software engineering

Throughout 2025 a kind of trench warfare raged between software developers on the pro and anti-AI development camps. We are, by definition, the experts on software creation. Ironically, this also makes us highly biased, and is exactly the reason you ...

Generative AI
AI

8.1.2026 | 5 minutes reading time

John Fletcher

The developer's dilemma - mastering the transition to AI engineering

Dear software developer, please choose one of the following options for 2026 and beyond:a) finding yourself with obsolete skills, and eventually, unemployed. b) salary increases lower than inflation, whilst expectations of your output continually increase...

AI
Generative AI

1.1.2026 | 11 minutes reading time

John Fletcher

Open Source hits Billion-Dollar Market: DeepSeek-R1 is shaking up the ...

On January 27, 2025, the technology stock exchange experienced an unexpected crash: The NVIDIA stock price plummeted by over 17%, temporarily wiping out nearly $600 billion in market value and setting a new historical record in the stock market. Many...

AI
Generative AI
LLM

29.1.2025 | 8 minutes reading time

How we can hack an AI with just a few words

How we can hack an AI with just a few words Artificial intelligence (AI) has undergone an astonishing transformation in recent years and is now present in many areas of life. Whether in the form of chatbots that help us with everyday questions or generative...

IT-Security
AI

27.1.2025 | 4 minutes reading time

Simplifying LLM Application Development: A Newcomer's Perspective

I. Introduction Large Language Models (LLMs) have become highly popular due to their transformative impact on various fields, especially within IT. They enable developers to create innovative software applications centered around AI interactions, offering...

Generative AI
AI

6.12.2024 | 13 minutes reading time

Function Calling with GPT Models

GenAI is a powerful tool for generating content and interacting with applications using natural language. However, this tool also has significant limitations when you plan to use it in your own software. GenAI's knowledge is limited to information that...

Generative AI
AI
LLM

6.9.2024 | 5 minutes reading time

Answer questions about your documents with OpenAI and Pinecone

In recent years, large language models (LLMs) have made remarkable progress in interacting with humans, showcasing their ability to answer a wide array of questions. Trained on publicly accessible internet content, these models have broad knowledge across...

13.11.2023 | 12 minutes reading time

Lukas Lehmann

Fighting Gandalf with magic spells (the spells are prompt injections) ...

Note: Do not attack any systems for which you do not have explicit permission to do so. In this article, I will recount the tale of outwitting a large language model by performing prompt injection attacks. Before we start, let's establish a common baseline...

IT-Security
AI

10.7.2023 | 12 minutes reading time

Michael Wagner

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

In this article, we'll explore how to use the Poetry package manager to manage the dependencies of a machine learning project that makes use of the M1 GPU for TensorFlow training. We'll cover the motivation for using Poetry in this context, and we'll...

Machine Learning
Apple
Data
AI
Python

11.1.2023 | 3 minutes reading time

Denis Stalz-John

How to use Java classes in Python

There is an old truism: “Use the right tool for the job.” However, in building software, we are often forced to nail in screws, just because the rest of the application was built with the figurative hammer Java. Of course, one of the preferred solutions...

AI
Java
Python

15.11.2021 | 8 minutes reading time

Hendrik Schawe

The universal recommender in Action(ML)

IntroductionRecommender systems have become crucial for many different businesses. E-commerce uses recommenders to guide their customers in finding the right products and to assure they stay on the site. Newspapers or entertainment websites want to keep...

AI
NoSQL
Data
Machine Learning
Python

18.4.2021 | 11 minutes reading time

Francesca Diana

NER with little data? Transformers to the rescue!

How do you solve deep learning problems with too little labelled data? The answer, of course, is transfer learning. In this post, we will apply this concept to named entity recognition (NER) andfine-tune a pre-trained BERT to extract information from...

Data
Machine Learning
AI
NLP
Agile transformation

14.12.2020 | 8 minutes reading time

Take control of named entity recognition with your own Keras model!

This post shows how to extract information from text documents with the high-level deep learning library Keras : we build, train and evaluate a bidirectional LSTM model by hand for a custom named entity recognition (NER) task on legal texts.In a previous...

Data
Python
AI
NLP
Machine Learning

13.11.2020 | 9 minutes reading time

NER @ CLI: Custom-named entity recognition with spaCy in four lines

Named entity recognition is a technical term for a solution to a key automation problem: extraction of information from text. Applications includeautomation of business processes involving documentsdistillation of data from the web by scraping websitesindexing...

Data
AI
NLP
Machine Learning

6.11.2020 | 9 minutes reading time

DISH-O-TRON – Train that vision model!

With this article we continue our endeavor of building dish-o-tron – an AI system designed to prevent the sudden appearance of dirty dishes in the community kitchen sink, and hence turning the community kitchen into a place of peace and harmony.This ...

AI
Computer Vision

11.10.2020 | 11 minutes reading time

Marcel Mikl

DISH-O-TRON – Gather that DATA you must!

This is the second article in our dish-o-tron series (a non-standard Deep Learning tutorial) in which we tackle one of the biggest problems in community kitchens: coming across someone else’s dirty dishes. We are facing this problem by building a state...

AI
Computer Vision
Machine Learning

24.9.2020 | 11 minutes reading time