In this section, you will dive into the world of batch processing! As an essential tool in data processing and automation, batch processing involves executing a series of jobs where large volumes of data can be processed in batches. Spring provides a robust and feature-rich batch processing framework for your Java projects, and it's time to learn all about its features and associated terminologies. Sounds exciting? Time to get started!
This lesson offers an introduction to Spring Batch. Learn about: what is batch processing, examples of batch processing in the real world, and fundamental Spring Batch components that manage and execute batch processes in your Spring Boot projects.
What is Batch Processing?
As the name suggests, batch processing refers to processing batches of complex tasks on bulk data without any user interaction. These tasks can involve heavy computation and run for long periods of time. Very often, your business requirements demand a processed output of bulk data for syncing between databases or preparing reports or periodically executing tasks on bulk data. Batch processing helps you to perform these tasks with ease.
Why is Batch Processing Useful?
You might wonder the reasoning here if you haven't had to deal with bulk data processing yet. Consider you have to report data on Apple's stock for the last five years. This report must tell the user how often the intraday price movement exceeded a certain threshold. You can assume this threshold is the previous day's close price. Sounds easy? It is quite easy in fact! All you need to do is pass the intraday data to your service, have an if-else statement that checks the price, and observe if it's above or below the threshold. Right? Well, partly.
Intraday data here means tick-by-tick data for a single day. This amounts to 60 ticks every minute (considering a tick here is a second), which is 60 * 60 tickets every hour and 60 * 60 * 8 (considering 8 hours for the day). This totals 28,800 ticks for one single day! For one stock! If you do this for five years, imagine the amount of data you will have. How do you manage so much data? You can use Spring Batch. It breaks down your data into optimal batches and performs your required operations. It is quick, reliable, and straightforward to set up!
Real-World Examples of Batch Processing
Every app that deals with huge amounts of data, be it your favorite social networking website, your credit card provider, or the airline company that you love, they all have batch jobs running on their infrastructure that are essential for supporting their business. Here you will see some of the most common examples that utilize the power of batch processing.
Transaction Management
Transaction management systems heavily depend on batch processing, especially those used in banks. Banks generally process transactions 24/7, and batch processing helps them do this faster and in a very well-defined way.
These processes range from generating the monthly bank statements for every bank customer (you can imagine how big and taxing it can be) to generating the entire bank's balance sheet and account statement. Such automated batch processes also help the bank detect and deter fraud, as well as find faults in their own systems.
Billing
What kind of firms are covered in this topic? Any company that is providing subscription-based services such as Netflix or your phone service provider. They all use batch-processing jobs to process your data and generate various outputs, one of which is surely the monthly bill that you receive in your inbox.
Research
This may not seem as believable as the rest of the examples in this section, but believe it or not, batch processing is extensively used in doing some of the most cutting-edge and complex research on the planet. A researcher would generally be running a batch job to perform complex computations as a part of their research.
Health Care
In the health care sector, batch processing plays a crucial role in managing and analyzing patient data. Hospitals and health care providers use batch jobs to process medical records, lab results, and patient histories to improve patient care and efficiency. Tying in with the example above, batch processing can also be used to research trends in patient admissions and to track the spread of diseases.
Inventory Management
Batch processing is extensively used in the e-commerce industry, particularly for inventory management. Online retailers and marketplaces run batch jobs to update inventory levels, process orders, and manage supply chain logistics.
These are just a few examples of where batch processing is used in real life. Plenty of other industries heavily depend on batch processing, and if all of them were listed here it would be a VERY BIG section! :)
Now on to how this ties in with the Spring Framework.
What is Spring Batch?
Spring Batch is a Java implementation for supporting batch processing. It is an open-source, lightweight, and comprehensive framework providing reusable functions that are essential to processing large volumes of data. Additionally, it also provides other useful features such as logging, tracing, transaction management, job processing statistics, job start/stop/restart/retry/skip, and resource management.
According to the Spring Batch documentation, it provides the following features:
- Transaction Management
- Chunk based processing
- Declarative I/O
- Start/Stop/Restart
- Retry/Skip
- Web-based administration interface via Spring Cloud Data Flow
Components of Spring Batch Processing
The diagram above shows all the basic components of a typical batch process. However, there are other components as well that are designed for advanced usage. You will get introduced to the essential components first and then dive into the practical usage.
Job
Batch processes are called jobs. A job is the high-level context of what you are going to do. It can be anything like syncing databases, preparing reports, sending emails or messages, cleaning databases or temporary file storage etc. Simply put, any task that you do in batch processing is either a complete or partial job. For example, you might define a job that fetches data from a database.
Step
Step is the next critical component in batch processing. While the job describes what you want to do, a step is the modular structure within it. Jobs can consist of one or more steps. A step can be configured to work in "chunks", or groups of data and the whole chunk is processed together. Continuing the earlier database example, the data fetching job might consist of multiple steps such as connecting to the DB, creating the query, running the query, processing the output, etc. Simply put, these are the " steps" you would take to achieve your goal!
ItemReader, ItemProcessor and ItemWriter
ItemReader reads data from an external or internal source (database, file, URL, etc.) and supplies it to an ItemProcessor in your desired object format. ItemProcessor then processes each object and provides a list of objects to an ItemWriter for saving. Based on the same example, if the SQL query returns objects and you need JSON, the ItemReader could read the input and then pass it for processing by the ItemProcessor. The JSON output formed by the ItemProcessor could then again be converted from JSON for writing into a DB, file, or pushed to another location. So, the flow looks like Read -> Process -> Write.
JobLauncher
JobLauncher is the starting point for the execution of any job in the Spring Batch framework. It can be called manually via command line, web interface, or automatically via a scheduler or your Java code.
JobRepository
As the name suggests, JobRepository is where all the information related to your jobs is stored. It holds the statistical information of how many jobs were run, the status of each batch, processed data, skipped data, etc.
JobExplorer
The JobExplorer interface provides an entry point for browsing executions of running or historical jobs and steps. It is helpful for accessing information in the JobRepository.
Summary: Spring Batch
- Batch processing refers to performing complex tasks on bulk data without user interaction.
- Spring Batch is a framework for supporting batch processing in Java. It is open-source, lightweight, and comprehensive.
- Although it looks simple and straightforward, batch processing is one of the most important frameworks used in application development.
Spring Batch Components
- A job is the high-level context of what you are going to do.
- A step is the modular structure within a job.
- ItemReader loads data from external or internal sources and supplies it to an ItemProcessor.
- JobLauncher is the starting point for the execution of any job in the Spring Batch framework.
- JobRepository stores information related to your jobs.
- The JobExplorer interface provides a way to access information about executions of running or historical jobs and steps.
Examples of Batch Processing
- Transaction Management
- Billing
- Research
- Health Care
- Inventory Management
Now let's get to implementing Spring Batch into a project!