Java Guild · 2026

Spring Batch

A Practical Introduction for Developers

spring-boot 3.2 java 17 h2 in-memory db
JR
Jürgen Roos
OCTO

Agenda

1
What is batch processing & why Spring Batch?~10 min
2
Core concepts — Jobs, Steps, Chunks, Readers, Processors, Writers~10 min
3
Demo 1 — The Basics (CSV → transform → CSV)~10 min
4
Demo 2 — Advanced (10 k customers · 2 steps · database · aggregation)~15 min
5
Error handling, listeners & testing~10 min
6
Live run & Q&A~5 min

What is Batch Processing?

Batch vs. Online Processing

Online / Request-Response

  • User triggers a single request
  • Response must come back fast (<1 s)
  • Small payload per call
  • Examples: REST APIs, web pages, mobile apps

Batch Processing

  • No human waiting for a response
  • Large volumes of data processed at once
  • Usually scheduled — overnight, end-of-month
  • Throughput over latency
Rule of thumb: If you're processing more records than a human can review in real time, you need batch.

When do you reach for batch?

  • Bank statement generation (month-end)
  • Payroll calculations
  • ETL — extract, transform, load
  • Sending bulk email / notifications
  • Data migration between systems
  • Nightly report generation
  • Inventory reconciliation
  • Log aggregation & analytics
Today's scenario: A global ecommerce platform exports 10 000+ raw order records every night. A Spring Batch job validates and cleans them, loads them into a reporting database, then aggregates revenue by country — so the sales dashboard is ready by morning.

Why Spring Batch?

You could write a loop. But then you'd also need to build:

Without a framework

  • Manual restart & recovery logic
  • Rolling back partial writes on failure
  • Progress tracking & logging
  • Memory management for huge files
  • Parallel processing plumbing
  • Job history & auditing

With Spring Batch

  • All of the above — built in
  • Clear, testable structure
  • Integrates with the Spring ecosystem
  • Production-proven at scale
  • Built-in I/O for CSV, JSON, JDBC, JPA, Kafka…
  • Declarative, familiar configuration

Core Concepts

The Building Blocks

Job // a named unit of batch work └── Step // a discrete phase inside the job └── Chunk<Input, Output> // reads N items, processes each, writes all N ├── ItemReader // reads one item at a time ├── ItemProcessor // transforms / validates (optional) └── ItemWriter // writes the whole chunk at once └── Step 2 ... // a job can have as many steps as needed
A Job is made of Steps. Each Step processes data in Chunks. The Reader loops one item at a time; the Writer flushes the whole batch — inside one transaction.

Chunk-Oriented Processing

Instead of read-all then write-all, data flows in bounded chunks — each chunk is one database transaction.

sourceCSV / DB
Readerreads 1
Processortransforms 1
Writerwrites chunk
targetCSV / DB

ItemReader

Returns one item per call.
Returns null when the source is exhausted — no loop needed.

ItemProcessor

Transforms or validates. Return null to skip an item. Entirely optional.

ItemWriter

Receives the whole Chunk<O>. Writes in bulk for efficiency and atomicity.

Spring Batch Keeps Its Own History

Every run is automatically stored in a JobRepository — a set of tables in your database.

TableStores
BATCH_JOB_INSTANCEUnique combination of job name + parameters
BATCH_JOB_EXECUTIONEach run: start time, end time, final status
BATCH_STEP_EXECUTIONPer-step metrics: read / write / skip counts
BATCH_JOB_EXECUTION_PARAMSParameters passed to the job
This is how Spring Batch supports restartability — it knows exactly which chunk a failed job reached and can pick up from there. In this demo we use H2 in-memory; in production, point it at Postgres or MySQL.

The Basics

CSV → uppercase names → CSV

Basics Job — Overview

inputinput.csv
ReaderFlatFileItemReader
ProcessorPersonProcessor
WriterFlatFileItemWriter
outputbasics-output.csv

Input — input.csv

firstName,lastName
Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
John,Doe

Output — basics-output.csv

firstName,lastName
JILL,DOE
JOE,DOE
JUSTIN,DOE
JANE,DOE
JOHN,DOE

Basics — The Reader

basics/config/BasicsJobConfig.java

@Bean
public FlatFileItemReader<Person> basicsReader() {
    return new FlatFileItemReaderBuilder<Person>()
        .name("personItemReader")
        .resource(new ClassPathResource("basics/input.csv"))
        .delimited()
        .names("firstName", "lastName")   // CSV column names
        .targetType(Person.class)          // maps to POJO via reflection
        .build();
}
The reader returns one Person per call, and returns null when the file is exhausted. Spring Batch drives the loop — you never write it yourself.

Basics — The Processor

basics/processing/PersonProcessor.java

public class PersonProcessor implements ItemProcessor<Person, Person> {

    @Override
    public Person process(Person person) {
        String firstName = person.firstName().toUpperCase();
        String lastName  = person.lastName().toUpperCase();

        Person transformed = new Person(firstName, lastName);
        log.info("Converting {} to {}", person, transformed);

        return transformed;   // return null here to SKIP this item entirely
    }
}
The interface is generic: ItemProcessor<Input, Output>. Input and output types can differ — useful when translating between two domain models.

Basics — Wiring the Step & Job

@Bean
public Step basicsStep(JobRepository repo, PlatformTransactionManager tx,
                       FlatFileItemReader<Person> reader,
                       PersonProcessor processor,
                       FlatFileItemWriter<Person> writer) {

    return new StepBuilder("basicsStep", repo)
        .<Person, Person>chunk(10, tx)   // chunk size = 10 items per transaction
        .reader(reader)
        .processor(processor)
        .writer(writer)
        .build();
}

@Bean
public Job basicsJob(JobRepository repo, Step basicsStep) {
    return new JobBuilder("basicsJob", repo)
        .start(basicsStep)
        .build();
}

Everything assembled with builders. Every piece is a Spring bean — testable, injectable, familiar.

The Advanced Demo

A global ecommerce sales pipeline · 41 markets · nightly batch job

The Scenario

The problem

A global ecommerce platform sells across 41 countries. Every night the ordering system dumps a raw CSV export of the day's transactions. The data is messy — mixed case, missing emails, bad amounts. The sales team needs a clean per-country revenue breakdown ready in their dashboard by 08:00.

The solution

A nightly Spring Batch job runs at 02:00, validates and normalises every order record, loads the clean data into a reporting database, then aggregates it into a per-country summary CSV that the dashboard reads on startup.

Step 1 — validate & load raw orders

nightly exportorders.csv
validate & cleanCustomerProcessor
~9 200 clean orderscustomers table

Step 2 — aggregate for the dashboard

clean orderscustomers table
group by countryCountryStatisticsReader
dashboard feedcountry-statistics.csv
10k+Raw orders in
~9 200Valid orders
~800Rejected
41Markets reported

Real-World Data Quality Problems

The nightly CSV export is generated by multiple regional order systems — they don't agree on formatting conventions.

id,firstName,lastName,email,country,purchaseAmount
1,alex,johnson,alex.j@email.com,south africa,523.45   ← lowercase name & country
2,MARIA,SMITH,,united states,-15.00                   ← guest checkout (no email) + bad amount
3,Wei,Chen,wei@test.com,china,341.20                  ← clean record

Inconsistent casing

Regional systems export names and countries in different formats. Normalised to "Alex" and "SOUTH AFRICA" for consistent grouping.

Guest checkouts

~5% of orders come from guests with no account — no email on file. These can't be attributed to a customer, so they're excluded from the report.

Erroneous amounts

Some regional systems write refund entries as negative purchase amounts instead of separate records. These are rejected to avoid skewing revenue figures.

CustomerProcessor — Filter & Clean

@Override
public Customer process(Customer c) {

    // FILTER — return null to skip this record silently
    if (c.email() == null || c.email().isBlank())             return null;
    if (c.purchaseAmount() == null || c.purchaseAmount() < 0) return null;

    // TRANSFORM
    return new Customer(
        c.id(),
        capitalize(c.firstName()),    // "ALEX" → "Alex"
        capitalize(c.lastName()),
        c.email().toLowerCase(),
        c.country().toUpperCase(),    // "south africa" → "SOUTH AFRICA"
        c.purchaseAmount()
    );
}
Returning null is the idiomatic Spring Batch way to skip a record — no exception, no special configuration, just return null.

Step 1 — Writing to the Database

@Bean
public JdbcBatchItemWriter<Customer> customerWriter(DataSource ds) {
    return new JdbcBatchItemWriterBuilder<Customer>()
        .dataSource(ds)
        .sql("""
            INSERT INTO customers
                (id, first_name, last_name, email, country, purchase_amount)
            VALUES
                (:id, :firstName, :lastName, :email, :country, :purchaseAmount)
            """)
        .beanMapped()   // maps Java record fields to named SQL parameters
        .build();
}
Chunk = transaction. With chunk size 25, every 25 valid customers are inserted in a single batch inside one transaction. A failure rolls back only that chunk — previously committed chunks are untouched.

Step 2 — Custom Aggregating Reader

Step 2 needs to aggregate rows by country. No built-in reader does this — so we implement ItemReader ourselves.

public class CountryStatisticsReader implements ItemReader<CountryStatistics> {
    private Iterator<CountryStatistics> iterator;

    @Override
    public CountryStatistics read() {
        if (iterator == null) {
            // Called once on the first read — load and aggregate everything
            List<Customer> all = jdbc.query("SELECT * FROM customers", ...);
            Map<String, CountryStatistics> map = new HashMap<>();
            for (Customer c : all)
                map.merge(c.country(), new CountryStatistics(c), CountryStatistics::merge);
            iterator = map.values().iterator();
        }
        return iterator.hasNext() ? iterator.next() : null;  // null = done
    }
}

Step 2 — Country Statistics Output

country-statistics.csv

country,customerCount,totalRevenue,avgPurchase
CANADA,227,116600.54,513.66
VIETNAM,243,122213.53,502.94
TURKEY,212,99352.62,468.64
GERMANY,198,95432.11,481.98
...
ICELAND,1,850.75,850.75
(41 countries total)

CountryStatistics.java

public record CountryStatistics(
    String country,
    long   customerCount,
    double totalRevenue,
    double averagePurchaseAmount
) {}
CountryStatisticsProcessor rounds monetary values to 2 decimal places before the writer flushes.

Listeners, Error Handling & Testing

Listeners — Hooks at Every Level

JobExecutionListener

void beforeJob(JobExecution e);
void afterJob(JobExecution e);

Logs job name, start/end time, final status.

StepExecutionListener

void beforeStep(StepExecution e);
ExitStatus afterStep(StepExecution e);

Read count, write count, skip count per step.

ChunkListener

void beforeChunk(ChunkContext c);
void afterChunk(ChunkContext c);
void afterChunkError(ChunkContext c);

Running totals across each chunk.

new StepBuilder("processStep", repo)
    .listener(jobListener).listener(stepListener).listener(chunkListener)
    ...

Error Handling Strategies

StrategyWhat it doesUse when
return null processor Silently skips the item — used in this demo for invalid emails and negative amounts. Bad data is expected and should be excluded.
.skip(Ex.class)
.skipLimit(N)
Skips items that throw a specific exception, up to N times total. Occasional bad records in otherwise good data.
.retry(Ex.class)
.retryLimit(N)
Retries the chunk when a transient exception occurs. Flaky external services, transient DB errors.
Chunk rollback Automatic — a failed write rolls back only the current chunk. Always on, no configuration needed.

Testing

Integration test — run the real job

@SpringBatchTest
@SpringBootTest
class BatchIntegrationTest {

    @Autowired JobLauncherTestUtils utils;

    @Test
    void customerJob_completesSuccessfully() throws Exception {
        JobExecution exec = utils.launchJob();

        assertEquals(COMPLETED, exec.getStatus());

        StepExecution step1 = getStep(exec, "processStep");
        assertEquals(13, step1.getWriteCount()); // 2 of 15 skipped
    }
}

Unit test — processor in isolation

class CountryStatisticsProcessorTest {

    @Test
    void roundsToTwoDecimals() {
        var input = new CountryStatistics(
            "CANADA", 10,
            116600.5432, 513.6612
        );
        var result = processor.process(input);

        assertThat(result.totalRevenue())
            .isEqualTo(116600.54);
        assertThat(result.averagePurchaseAmount())
            .isEqualTo(513.66);
    }
}

Live Demo

mvn spring-boot:run

github.com/darthrevanyunka/springbatch-demo

What to watch

In the console

  • Job start banner from DemoJobExecutionListener
  • Chunk progress: "Read 25, Written 25"
  • "Skipping customer …" warnings for invalid rows
  • Step execution summaries with counts
  • Job completion status & total duration

Output files to inspect

  • basics-output.csv — 5 uppercased names
  • country-statistics.csv — 41 countries with revenue totals
Both jobs run sequentially on startup via a CommandLineRunner. Each run passes a unique timestamp as a job parameter so it can run repeatedly without conflicting with previous instances.

Should I use Spring Batch?

Good fit ✓

  • Processing thousands to millions of records
  • Nightly / scheduled jobs
  • ETL pipelines
  • You need restart / resume on failure
  • Data transformation with an audit trail
  • You're already on the Spring stack

Probably overkill ✗

  • One-off migration script (just write a script)
  • Very small datasets (< a few hundred rows)
  • Real-time streams (Kafka Streams, Flink)
  • Simple scheduled tasks (@Scheduled is enough)

Key Takeaways

Questions?

Thanks for your attention

github.com/darthrevanyunka/springbatch-demo
mvn spring-boot:run  ·  docs.spring.io/spring-batch