Java Guild · 2026

Spring Batch

A Practical Introduction for Developers

spring-boot 3.2 java 17 h2 in-memory db

JR

Jürgen Roos

OCTO

Agenda

1

What is batch processing & why Spring Batch?~10 min

2

Core concepts — Jobs, Steps, Chunks, Readers, Processors, Writers~10 min

3

Demo 1 — The Basics (CSV → transform → CSV)~10 min

4

Demo 2 — Advanced (10 k customers · 2 steps · database · aggregation)~15 min

5

Error handling, listeners & testing~10 min

6

Live run & Q&A~5 min

Part 1

What is Batch Processing?

Batch vs. Online Processing

Online / Request-Response

User triggers a single request
Response must come back fast (<1 s)
Small payload per call
Examples: REST APIs, web pages, mobile apps

Batch Processing

No human waiting for a response
Large volumes of data processed at once
Usually scheduled — overnight, end-of-month
Throughput over latency

Rule of thumb: If you're processing more records than a human can review in real time, you need batch.

When do you reach for batch?

Bank statement generation (month-end)
Payroll calculations
ETL — extract, transform, load
Sending bulk email / notifications

Data migration between systems
Nightly report generation
Inventory reconciliation
Log aggregation & analytics

Today's scenario: A global ecommerce platform exports 10 000+ raw order records every night. A Spring Batch job validates and cleans them, loads them into a reporting database, then aggregates revenue by country — so the sales dashboard is ready by morning.

Why Spring Batch?

You could write a loop. But then you'd also need to build:

Without a framework

Manual restart & recovery logic
Rolling back partial writes on failure
Progress tracking & logging
Memory management for huge files
Parallel processing plumbing
Job history & auditing

With Spring Batch

All of the above — built in
Clear, testable structure
Integrates with the Spring ecosystem
Production-proven at scale
Built-in I/O for CSV, JSON, JDBC, JPA, Kafka…
Declarative, familiar configuration

Part 2

Core Concepts

The Building Blocks

Job // a named unit of batch work └── Step // a discrete phase inside the job └── Chunk<Input, Output> // reads N items, processes each, writes all N ├── ItemReader // reads one item at a time ├── ItemProcessor // transforms / validates (optional) └── ItemWriter // writes the whole chunk at once └── Step 2 ... // a job can have as many steps as needed

A Job is made of Steps. Each Step processes data in Chunks. The Reader loops one item at a time; the Writer flushes the whole batch — inside one transaction.

Chunk-Oriented Processing

Instead of read-all then write-all, data flows in bounded chunks — each chunk is one database transaction.

sourceCSV / DB

→

Readerreads 1

→

Processortransforms 1

→

Writerwrites chunk

→

targetCSV / DB

ItemReader

Returns one item per call.
Returns null when the source is exhausted — no loop needed.

ItemProcessor

Transforms or validates. Return null to skip an item. Entirely optional.

ItemWriter

Receives the whole Chunk<O>. Writes in bulk for efficiency and atomicity.

Spring Batch Keeps Its Own History

Every run is automatically stored in a JobRepository — a set of tables in your database.

Table	Stores
`BATCH_JOB_INSTANCE`	Unique combination of job name + parameters
`BATCH_JOB_EXECUTION`	Each run: start time, end time, final status
`BATCH_STEP_EXECUTION`	Per-step metrics: read / write / skip counts
`BATCH_JOB_EXECUTION_PARAMS`	Parameters passed to the job

This is how Spring Batch supports restartability — it knows exactly which chunk a failed job reached and can pick up from there. In this demo we use H2 in-memory; in production, point it at Postgres or MySQL.

Part 3 · Demo 1

The Basics

CSV → uppercase names → CSV

Basics Job — Overview

inputinput.csv

→

ReaderFlatFileItemReader

→

ProcessorPersonProcessor

→

WriterFlatFileItemWriter

→

outputbasics-output.csv

Input — input.csv

firstName,lastName
Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
John,Doe

Output — basics-output.csv

firstName,lastName
JILL,DOE
JOE,DOE
JUSTIN,DOE
JANE,DOE
JOHN,DOE

Basics — The Reader

basics/config/BasicsJobConfig.java

@Bean
public FlatFileItemReader<Person> basicsReader() {
    return new FlatFileItemReaderBuilder<Person>()
        .name("personItemReader")
        .resource(new ClassPathResource("basics/input.csv"))
        .delimited()
        .names("firstName", "lastName")   // CSV column names
        .targetType(Person.class)          // maps to POJO via reflection
        .build();
}

The reader returns one Person per call, and returns null when the file is exhausted. Spring Batch drives the loop — you never write it yourself.

Basics — The Processor

basics/processing/PersonProcessor.java

public class PersonProcessor implements ItemProcessor<Person, Person> {

    @Override
    public Person process(Person person) {
        String firstName = person.firstName().toUpperCase();
        String lastName  = person.lastName().toUpperCase();

        Person transformed = new Person(firstName, lastName);
        log.info("Converting {} to {}", person, transformed);

        return transformed;   // return null here to SKIP this item entirely
    }
}

The interface is generic: ItemProcessor<Input, Output>. Input and output types can differ — useful when translating between two domain models.

Basics — Wiring the Step & Job

@Bean
public Step basicsStep(JobRepository repo, PlatformTransactionManager tx,
                       FlatFileItemReader<Person> reader,
                       PersonProcessor processor,
                       FlatFileItemWriter<Person> writer) {

    return new StepBuilder("basicsStep", repo)
        .<Person, Person>chunk(10, tx)   // chunk size = 10 items per transaction
        .reader(reader)
        .processor(processor)
        .writer(writer)
        .build();
}

@Bean
public Job basicsJob(JobRepository repo, Step basicsStep) {
    return new JobBuilder("basicsJob", repo)
        .start(basicsStep)
        .build();
}

Everything assembled with builders. Every piece is a Spring bean — testable, injectable, familiar.

Part 4 · Demo 2

The Advanced Demo

A global ecommerce sales pipeline · 41 markets · nightly batch job

The Scenario

The problem

A global ecommerce platform sells across 41 countries. Every night the ordering system dumps a raw CSV export of the day's transactions. The data is messy — mixed case, missing emails, bad amounts. The sales team needs a clean per-country revenue breakdown ready in their dashboard by 08:00.

The solution

A nightly Spring Batch job runs at 02:00, validates and normalises every order record, loads the clean data into a reporting database, then aggregates it into a per-country summary CSV that the dashboard reads on startup.

Step 1 — validate & load raw orders

nightly exportorders.csv

→

validate & cleanCustomerProcessor

→

~9 200 clean orderscustomers table

Step 2 — aggregate for the dashboard

clean orderscustomers table

→

group by countryCountryStatisticsReader

→

dashboard feedcountry-statistics.csv

10k+Raw orders in

~9 200Valid orders

~800Rejected

41Markets reported

Real-World Data Quality Problems

The nightly CSV export is generated by multiple regional order systems — they don't agree on formatting conventions.

id,firstName,lastName,email,country,purchaseAmount
1,alex,johnson,alex.j@email.com,south africa,523.45   ← lowercase name & country
2,MARIA,SMITH,,united states,-15.00                   ← guest checkout (no email) + bad amount
3,Wei,Chen,wei@test.com,china,341.20                  ← clean record

Inconsistent casing

Regional systems export names and countries in different formats. Normalised to "Alex" and "SOUTH AFRICA" for consistent grouping.

Guest checkouts

~5% of orders come from guests with no account — no email on file. These can't be attributed to a customer, so they're excluded from the report.

Erroneous amounts

Some regional systems write refund entries as negative purchase amounts instead of separate records. These are rejected to avoid skewing revenue figures.

CustomerProcessor — Filter & Clean

@Override
public Customer process(Customer c) {

    // FILTER — return null to skip this record silently
    if (c.email() == null || c.email().isBlank())             return null;
    if (c.purchaseAmount() == null || c.purchaseAmount() < 0) return null;

    // TRANSFORM
    return new Customer(
        c.id(),
        capitalize(c.firstName()),    // "ALEX" → "Alex"
        capitalize(c.lastName()),
        c.email().toLowerCase(),
        c.country().toUpperCase(),    // "south africa" → "SOUTH AFRICA"
        c.purchaseAmount()
    );
}

Returning null is the idiomatic Spring Batch way to skip a record — no exception, no special configuration, just return null.

Step 1 — Writing to the Database

@Bean
public JdbcBatchItemWriter<Customer> customerWriter(DataSource ds) {
    return new JdbcBatchItemWriterBuilder<Customer>()
        .dataSource(ds)
        .sql("""
            INSERT INTO customers
                (id, first_name, last_name, email, country, purchase_amount)
            VALUES
                (:id, :firstName, :lastName, :email, :country, :purchaseAmount)
            """)
        .beanMapped()   // maps Java record fields to named SQL parameters
        .build();
}

Chunk = transaction. With chunk size 25, every 25 valid customers are inserted in a single batch inside one transaction. A failure rolls back only that chunk — previously committed chunks are untouched.

Step 2 — Custom Aggregating Reader

Step 2 needs to aggregate rows by country. No built-in reader does this — so we implement ItemReader ourselves.

public class CountryStatisticsReader implements ItemReader<CountryStatistics> {
    private Iterator<CountryStatistics> iterator;

    @Override
    public CountryStatistics read() {
        if (iterator == null) {
            // Called once on the first read — load and aggregate everything
            List<Customer> all = jdbc.query("SELECT * FROM customers", ...);
            Map<String, CountryStatistics> map = new HashMap<>();
            for (Customer c : all)
                map.merge(c.country(), new CountryStatistics(c), CountryStatistics::merge);
            iterator = map.values().iterator();
        }
        return iterator.hasNext() ? iterator.next() : null;  // null = done
    }
}

Step 2 — Country Statistics Output

country-statistics.csv

country,customerCount,totalRevenue,avgPurchase
CANADA,227,116600.54,513.66
VIETNAM,243,122213.53,502.94
TURKEY,212,99352.62,468.64
GERMANY,198,95432.11,481.98
...
ICELAND,1,850.75,850.75
(41 countries total)

CountryStatistics.java

public record CountryStatistics(
    String country,
    long   customerCount,
    double totalRevenue,
    double averagePurchaseAmount
) {}

CountryStatisticsProcessor rounds monetary values to 2 decimal places before the writer flushes.

Part 5

Listeners, Error Handling & Testing

Listeners — Hooks at Every Level

JobExecutionListener

void beforeJob(JobExecution e);
void afterJob(JobExecution e);

Logs job name, start/end time, final status.

StepExecutionListener

void beforeStep(StepExecution e);
ExitStatus afterStep(StepExecution e);

Read count, write count, skip count per step.

ChunkListener

void beforeChunk(ChunkContext c);
void afterChunk(ChunkContext c);
void afterChunkError(ChunkContext c);

Running totals across each chunk.

new StepBuilder("processStep", repo)
    .listener(jobListener).listener(stepListener).listener(chunkListener)
    ...

Error Handling Strategies

Strategy	What it does	Use when
`return null` processor	Silently skips the item — used in this demo for invalid emails and negative amounts.	Bad data is expected and should be excluded.
`.skip(Ex.class)` `.skipLimit(N)`	Skips items that throw a specific exception, up to N times total.	Occasional bad records in otherwise good data.
`.retry(Ex.class)` `.retryLimit(N)`	Retries the chunk when a transient exception occurs.	Flaky external services, transient DB errors.
Chunk rollback	Automatic — a failed write rolls back only the current chunk.	Always on, no configuration needed.

Testing

Integration test — run the real job

@SpringBatchTest
@SpringBootTest
class BatchIntegrationTest {

    @Autowired JobLauncherTestUtils utils;

    @Test
    void customerJob_completesSuccessfully() throws Exception {
        JobExecution exec = utils.launchJob();

        assertEquals(COMPLETED, exec.getStatus());

        StepExecution step1 = getStep(exec, "processStep");
        assertEquals(13, step1.getWriteCount()); // 2 of 15 skipped
    }
}

Unit test — processor in isolation

class CountryStatisticsProcessorTest {

    @Test
    void roundsToTwoDecimals() {
        var input = new CountryStatistics(
            "CANADA", 10,
            116600.5432, 513.6612
        );
        var result = processor.process(input);

        assertThat(result.totalRevenue())
            .isEqualTo(116600.54);
        assertThat(result.averagePurchaseAmount())
            .isEqualTo(513.66);
    }
}

Part 6

Live Demo

mvn spring-boot:run

github.com/darthrevanyunka/springbatch-demo

What to watch

In the console

Job start banner from DemoJobExecutionListener
Chunk progress: "Read 25, Written 25"
"Skipping customer …" warnings for invalid rows
Step execution summaries with counts
Job completion status & total duration

Output files to inspect

basics-output.csv — 5 uppercased names
country-statistics.csv — 41 countries with revenue totals

Both jobs run sequentially on startup via a CommandLineRunner. Each run passes a unique timestamp as a job parameter so it can run repeatedly without conflicting with previous instances.

Should I use Spring Batch?

Good fit ✓

Processing thousands to millions of records
Nightly / scheduled jobs
ETL pipelines
You need restart / resume on failure
Data transformation with an audit trail
You're already on the Spring stack

Probably overkill ✗

One-off migration script (just write a script)
Very small datasets (< a few hundred rows)
Real-time streams (Kafka Streams, Flink)
Simple scheduled tasks (@Scheduled is enough)

Key Takeaways

Read → Process → Write in chunks is the core pattern. Everything else builds on top.
Returning null from a processor is how you skip items — no special API, just return null.
Each chunk is one transaction — a failure never corrupts previously committed data.
Spring Batch saves job history automatically — restart, auditing, and metrics come for free.
Built-in readers/writers for CSV, JSON, JDBC, JPA, Kafka — you rarely write I/O plumbing.
Listeners give you observability at Job, Step, and Chunk level with zero coupling to business logic.
Everything is a Spring bean — familiar, testable, and injectable.

🌿

Questions?

Thanks for your attention

github.com/darthrevanyunka/springbatch-demo

mvn spring-boot:run · docs.spring.io/spring-batch