Spring Batch 2.2 – JavaConfig Part 1: A comparison to XML

1.6.2013 | 10 minutes reading time

This is the first part of a series on Java based configuration in Spring Batch. Spring Batch 2.2 will be out in a few weeks (update: was released 6/6), and it will have a Java DSL for Spring Batch, including its own @Enable annotation. In Spring Core I prefer Java based configuration over XML , but Spring Batch has a really good namespace in XML. Is the Java based approach really better? Time to take a deep look into the new features!
In this first post I will introduce the Java DSL and compare it to the XML version, but there’s more to come. In future posts I will talk about JobParameters, ExecutionContexts and StepScope , profiles and environments , job inheritance , modular configurations and partitioning and multi-threaded step , everything regarding Java based configuration, of course. You can find the JavaConfig code examples on Github . If you want to know when a new blog post is available, just follow me on Twitter (@TobiasFlohre) or Google+.

Back in the days – a simple configuration in XML

Before we start looking at the new Java DSL, I’ll introduce you to the job we’ll translate to Java based configuration. It’s a common use case, not trivial, but simple enough to understand it in a reasonable amount of time. It’s the job’s job to import partner data (name, email address, gender) from a file into a database. Each line in the file is one dataset, different properties are delimited by a comma. We use the FlatFileItemReader to read the data from the file, and we use the JdbcBatchItemWriter to write the data to the database.
We split the configuration in two parts: the infrastructure configuration and the job configuration. It always makes sense to do that, because you may want to switch the infrastructure configuration for different environments (test, production), and you may have more than one job configuration.
An infrastructure configuration in XML for a test environment looks like this:

1<context:annotation-config/>
2 
3<batch:job-repository/>
4 
5<jdbc:embedded-database id="dataSource" type="HSQL">
6    <jdbc:script location="classpath:org/springframework/batch/core/schema-hsqldb.sql"/>
7    <jdbc:script location="classpath:schema-partner.sql"/>
8</jdbc:embedded-database>
9 
10<bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
11    <property name="dataSource" ref="dataSource" />
12</bean>
13 
14<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
15    <property name="jobRepository" ref="jobRepository" />
16</bean>

Note that we create our domain database tables here as well (schema-partner.sql), and note that it’s done in an In-Memory-Database. That’s a perfect scenario for JUnit integration tests.
Now let’s take a look at the job configuration:

1<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader">
2    <property name="resource" value="classpath:partner-import.csv"/>
3    <property name="lineMapper" ref="lineMapper"/>
4</bean>
5<bean id="lineMapper" class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
6    <property name="lineTokenizer">
7        <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
8            <property name="names" value="name,email"/>
9            <property name="includedFields" value="0,2"/>
10        </bean>
11    </property>
12    <property name="fieldSetMapper">
13        <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
14            <property name="targetType" value="de.codecentric.batch.domain.Partner"/>
15        </bean>
16    </property>
17</bean>
18 
19<bean id="processor" class="de.codecentric.batch.LogItemProcessor"/>
20 
21<bean id="writer" class="org.springframework.batch.item.database.JdbcBatchItemWriter">
22    <property name="sql" value="INSERT INTO PARTNER (NAME, EMAIL) VALUES (:name,:email)"/>
23    <property name="dataSource" ref="dataSource"/>
24    <property name="itemSqlParameterSourceProvider">
25        <bean class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider"/>
26    </property>
27</bean>
28 
29<batch:job id="flatfileJob">
30    <batch:step id="step">			
31        <batch:tasklet>
32            <batch:chunk reader="reader" processor="processor" writer="writer" commit-interval="3" />
33        </batch:tasklet>
34    </batch:step>
35</batch:job>

Note that we almost only use standard Spring Batch components, with the exception of the LogItemProcessor and, of course, our domain class Partner.

Java – and only Java

Now it’s time for the Java based configuration style. You can find all the examples used in this blog post series here .

Infrastructure configuration

First, we’ll take a look at the infrastructure configuration. Following one of the patterns I described here , I provide an interface for the InfrastructureConfiguration to make it easier to switch it in different environments:

1public interface InfrastructureConfiguration {
2 
3    @Bean
4    public abstract DataSource dataSource();
5 
6}

Our first implementation will be one for testing purposes:

1@Configuration
2@EnableBatchProcessing
3public class StandaloneInfrastructureConfiguration implements InfrastructureConfiguration {
4 
5    @Bean
6    public DataSource dataSource(){
7        EmbeddedDatabaseBuilder embeddedDatabaseBuilder = new EmbeddedDatabaseBuilder();
8        return embeddedDatabaseBuilder.addScript("classpath:org/springframework/batch/core/schema-drop-hsqldb.sql")
9                .addScript("classpath:org/springframework/batch/core/schema-hsqldb.sql")
10                .addScript("classpath:schema-partner.sql")
11                .setType(EmbeddedDatabaseType.HSQL)
12                .build();
13    }
14 
15}

All we need here is our DataSource and the small annotation @EnableBatchProcessing. If you’re familiar with Spring Batch, you know that the minimum for running jobs is a PlatformTransactionManager, a JobRepository and a JobLauncher, adding a DataSource if you want to persist job meta data. All we have right now is a DataSource, so what about the rest? The annotation @EnableBatchProcessing is creating those component for us. It takes the DataSource and creates a DataSourceTransactionManager working on it, it creates a JobRepository working with the transaction manager and the DataSource, and it creates a JobLauncher using the JobRepository. In addition it registers the StepScope for usage on batch components and a JobRegistry to find jobs by name.
Of course you’re not always happy with a DataSourceTransactionManager, for example when running inside an application server. We’ll cover that in a future post . The usage of the StepScope will be covered in a future post as well.
I left out two new components that are registered in the application context as well: a JobBuilderFactory and a StepBuilderFactory. Of course we may autowire all of those components into other Spring components, and that’s what we’re gonna do now in our job configuration with the JobBuilderFactory and the StepBuilderFactory.

Job configuration

1@Configuration
2public class FlatfileToDbJobConfiguration {
3 
4    @Autowired
5    private JobBuilderFactory jobBuilders;
6 
7    @Autowired
8    private StepBuilderFactory stepBuilders;
9 
10    @Autowired
11    private InfrastructureConfiguration infrastructureConfiguration;
12 
13    @Bean
14    public Job flatfileToDbJob(){
15        return jobBuilders.get("flatfileToDbJob")
16                .listener(protocolListener())
17                .start(step())
18                .build();
19    }
20 
21    @Bean
22    public Step step(){
23        return stepBuilders.get("step")
24                .<Partner,Partner>chunk(1)
25                .reader(reader())
26                .processor(processor())
27                .writer(writer())
28                .listener(logProcessListener())
29                .build();
30    }
31 
32    @Bean
33    public FlatFileItemReader<Partner> reader(){
34        FlatFileItemReader<Partner> itemReader = new FlatFileItemReader<Partner>();
35        itemReader.setLineMapper(lineMapper());
36        itemReader.setResource(new ClassPathResource("partner-import.csv"));
37        return itemReader;
38    }
39 
40    @Bean
41    public LineMapper<Partner> lineMapper(){
42        DefaultLineMapper<Partner> lineMapper = new DefaultLineMapper<Partner>();
43        DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
44        lineTokenizer.setNames(new String[]{"name","email"});
45        lineTokenizer.setIncludedFields(new int[]{0,2});
46        BeanWrapperFieldSetMapper<Partner> fieldSetMapper = new BeanWrapperFieldSetMapper<Partner>();
47        fieldSetMapper.setTargetType(Partner.class);
48        lineMapper.setLineTokenizer(lineTokenizer);
49        lineMapper.setFieldSetMapper(fieldSetMapper);
50        return lineMapper;
51    }
52 
53    @Bean
54    public ItemProcessor<Partner,Partner> processor(){
55        return new LogItemProcessor();
56    }
57 
58    @Bean
59    public ItemWriter<Partner> writer(){
60        JdbcBatchItemWriter<Partner> itemWriter = new JdbcBatchItemWriter<Partner>();
61        itemWriter.setSql("INSERT INTO PARTNER (NAME, EMAIL) VALUES (:name,:email)");
62        itemWriter.setDataSource(infrastructureConfiguration.dataSource());
63        itemWriter.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<Partner>());
64        return itemWriter;
65    }
66 
67    @Bean
68    public ProtocolListener protocolListener(){
69        return new ProtocolListener();
70    }
71 
72    @Bean
73    public LogProcessListener logProcessListener(){
74        return new LogProcessListener();
75    }
76}

Looking at the code you’ll find the ItemReader, ItemProcessor and ItemWriter definition identical to the XML version, just done in Java based configuration. I added two listeners to the configuration, the ProtocolListener and the LogProcessListener.
The interesting part is the configuration of the Step and the Job. In the Java DSL we use builders for building Steps and Jobs. Since every Step needs access to the PlatformTransactionManager and the JobRepository, and every Job needs access to the JobRepository, we use the StepBuilderFactory to create a StepBuilder that already uses the configured JobRepository and PlatformTransactionManager, and we use the JobBuilderFactory to create a JobBuilder that already uses the configured JobRepository. Those factories are there for our convenience, it would be totally okay to create the builders ourselves.
Now that we have a StepBuilder, we can call all kinds of methods on it to configure our Step, from setting the chunk size over reader, processor, writer to listeners and much more. Just explore it for yourself. Note that the type of the builder may change in your builder chain according to your needs. For example, when calling the chunk method, you switch from a StepBuilder to a parameterized SimpleStepBuilder, because from now on the builder knows that you want to build a chunk based Step. The StepBuilder doesn’t have methods for adding a reader or writer, but the SimpleStepBuilder has those methods. Because the SimpleStepBuilder is typesafe regarding the item type, you need to parameterize the call to the chunk method, like it is done in the example with the item type Partner. Normally you won’t notice the switching of builder types when constructing a builder chain, but it’s good to know how it works.
The same holds for the JobBuilder for configuring Jobs. You can define all kinds of properties important for the Job, and you may define a Step flow with multiple Steps, and again, according to your needs, the type of the builder may change in your builder chain. In our example we define a simple Job with one Step and one JobExecutionListener.

Connecting infrastructure and job configuration

One more thing about the job configuration: we need the DataSource in the JdbcBatchItemWriter, but we defined it in the infrastructure configuration. That’s a good thing, because it is very low level, and of course we don’t want to define something like that in the job configuration. So how do we get the DataSource? We know that we’ll start the application context with an infrastructure configuration and one or more job configurations, so one option would be to autowire the DataSource directly into the job configuration. I didn’t do that, because I believe that minimizing autowire magic is one important thing in the enterprise world, and I could do better. Instead of injecting the DataSource I injected the InfrastructureConfiguration itself, getting the DataSource from there. Now it’s a thousand times easier to understand where the DataSource comes from when looking at the job configuration. Note that the InfrastructureConfiguration is an interface and we don’t bind the job configuration to a certain infrastructure configuration. Still there’ll be only two or three implementations, and it’s easy to see which one is used under which circumstances.

Fault-tolerant steps: skipping and retrying items

If you want to use skip and/or retry functionality, you’ll need to activate fault-tolerance on the builder, which is done with the method faultTolerant. Like explained above, the builder type switches, this time to FaultTolerantStepBuilder, and a bunch of new methods appear, like skip, skipLimit, retry, retryLimit and so on. A Step configuration may look like this:

1@Bean
2    public Step step(){
3        return stepBuilders.get("step")
4                .<Partner,Partner>chunk(1)
5                .reader(reader())
6                .processor(processor())
7                .writer(writer())
8                .listener(logProcessListener())
9                .faultTolerant()
10                .skipLimit(10)
11                .skip(UnknownGenderException.class)
12                .listener(logSkipListener())
13                .build();
14    }

Conclusion

The Spring Batch XML namespace for configuring jobs and steps is a little bit more concise than its Java counterpart, that’s a plus on that side. The Java DSL has the advantage of type-safety and the perfect IDE support regarding refactoring, auto-completion, finding usages etc. So you may say it’s just a matter of taste if you pick this one or the other, but I say it’s more than that.
90 % of all batch applications reside in the enterprise, big companies like insurances or financial services. Batch applications are at the heart of their business, and they are business critical. Every such company using Java for batch processing has its own little framework or library around solutions like Spring Batch to adapt it to its needs. And when it comes to building frameworks and libraries, Java based configuration is way ahead of XML, and here are some of the reasons:

We want to do some basic configurations in the framework. People add a dependency to our framework library and import those configurations according to their needs. If these configurations were written in XML, they would have a hard time opening them to look what they are doing. No problem in Java. Important topic for transparency and maintainability.
There’s no navigability in XML. That may be okay as long as you don’t have too many XML files and all of them are in your workspace, because then you can take advantage of the Spring IDE support. But a framework library usually should not be added as a project to the workspace. When using Java based configuration you can perfectly jump into framework configuration classes. I will talk more about this subject in a following blog post .
In a framework you often have requirements the user of the library has to fulfil in order to make everything work, for example the need for a DataSource, a PlatformTransactionManager and a thread pool. The implementation doesn’t matter from the perspective of the framework, they just need to be there. In XML you have to write some documentation for the users of framework, telling them they need to add this and this and this Spring bean under this name to the ApplicationContext. In Java you just write an interface describing that contract, and people using the library implement that interface and add it as a configuration class to the ApplicationContext. That’s what I did with the interface InfrastructureConfiguration above, and I will talk more about it in a future post .

All these advantages become even more important when there’s not only one common library but a hierarchy of libraries, for example one for the basic stuff and then one for a certain division. You really need to be able to navigate through everything to keep it understandable. And Java based configuration makes it possible.

Was this post helpful?

Blog author

Tobias Flohre

Do you still have questions? Just send me a message.