Presentation is loading. Please wait.

Presentation is loading. Please wait.

Christopher Jeffers August 2012

Similar presentations


Presentation on theme: "Christopher Jeffers August 2012"— Presentation transcript:

1 Christopher Jeffers August 2012
Spring Batch Christopher Jeffers August 2012

2 Agenda Intro to Spring Batch and Use-Cases
Spring Batch Technical Explanation Architecture The Batch Job Skipping and Retrying Steps Scaling Features Spring Batch Evaluation Solving Use-Cases Benefits Issues Integration Options Future Steps

3 Spring Batch Overview Lightweight framework designed to enable the development of robust batch applications used in enterprise systems As a part of Spring, it builds on the ease of use of the POJO-based development approach, while making it easy for developers to use more advanced enterprise services when necessary Provides reusable functions that are essential in processing large volumes of data Provides scaling features, including multi-threading and massive parallelism for Spring Batch Jobs

4 Batch Use-Cases DataRoomBatch Public Records Batch Processing
Physically delete all rows marked for deletion from a given bucket (DeepSix) Rerun user documents through publishing workflow Proactive auditing of the environment Public Records Batch Processing User inputs file with search criteria for many individuals and program searches database for changes in information, returning a report of hits to user Read, Process, and Write sequence Satisfies Government and Corporate requirements

5 Reason for Spring Batch POC
Current batch system for public records is not powerful enough to handle very large requests Have had to turn away customers because of this A more powerful and flexible batch solution could solve this problem

6 Agenda Intro to Spring Batch and Use-Cases
Spring Batch Technical Explanation Architecture The Batch Job Skipping and Retrying Steps Scaling Features Spring Batch Evaluation Solving Use-Cases Benefits Issues Integration Options Future Steps

7 Architecture Layered architecture
The application layer contains all batch jobs and custom code Batch Core contains runtime classes necessary to launch and control a batch job Batch Infrastructure contains common readers and writers, and services used by both the application and the core framework

8 The Batch Job A Job entity encapsulates an entire batch process
A Job is comprised of Steps, which encapsulate a phase of a batch job Step can be as complex or simple as developer wants

9 Chunk Processing Typical Spring Batch Step
Read, Process, Write sequence Multiple items are read and processed before being written as a “chunk” Size of chunk declared in configuration (commit-interval)

10 Step Flow Steps can be configured to flow sequentially or conditionally Allows for some complex jobs

11 Job Repository The JobRepository is used to do CRUD operations with Meta-Data relating to Job and Step execution Example: Job Parameters, Job/Step status, etc.

12 Step Skipping Step is skipped if an exception listed in the configuration is thrown, rather than stopping the batch execution Used for exceptions that will be thrown on every attempt of the Step FileNotFoundException, Parse Exceptions, etc. SkipListener can be used to log skipped items

13 Retrying Steps If an exception listed in the configuration is thrown, the operation is attempted again Used for exceptions that may not be thrown on every attempt of the Step ConcurrencyFailureException, DeadlockLoserDataAccessException, etc. Can set a limit on number of retries RetryListener can be used to log retried items RetryTemplate can be used to further customize retry logic

14 Scaling Features (Single Process)
Multi-Threaded Jobs or Steps Using Spring’s TaskExecutor object Parallel Steps Using split flows and a TaskExecutor in Job configuration.

15 Scaling Features (Multi-Process)
Remote Chunking Splits Step processing across multiple processes, using some middleware to communicate

16 Scaling Features (Multi-Process)
Step Partitioning Splits input and executes remote steps in parallel PartitionHandler sends StepExecution requests to remote steps Partitioner generates the input for new step executions

17 Job Flow with Client/Server and Partitioning

18 Agenda Intro to Spring Batch and Use-Cases
Spring Batch Technical Explanation Architecture The Batch Job Skipping and Retrying Steps Scaling Features Spring Batch Evaluation Solving Use-Cases Benefits Issues Integration Options Future Steps

19 Solving the Use-Cases DataRoomBatch (DeepSix Example)
Bucket is input to JdbcCursorItemReader Create an Item Processor to check if the row is marked for deletion and delete it if so Item Writer could be empty or used to output statistics Partitioning easily done by dividing up number of rows per partition

20 Solving the Use-Cases Public Records Batch Processing
Input file is input to FlatFileItemReader Custom Item Processor to search the database for hits Custom Item Writer to compile report of search results Following step to send report to user Easy to implement a Partitioner for the input file

21 Benefits of Spring Batch
Part of Spring Framework Allows easy integration with other Spring features General simplicity offered by Spring Step flow customizable Basic Item Readers and Writers already available Features available for monitoring Jobs and Steps Many scaling options available

22 Issues with Spring Batch
No built-in scheduler Not a big issue, scheduler libraries easily integrated Potentially a lot of XML configuration Business logic across Java and XML files can complicate debugging and maintenance Annotations can help Anything but very basic components will need to be created as new classes

23 Helpful Integration Options
Spring Batch Admin Web-Based administration console Contains Spring Batch Integration, allowing use of Spring Integration messages to launch and monitor jobs Scheduler (cron, Spring Scheduling, Quartz) Clustering Framework (Hadoop, GridGain, Terracotta) Ideal for improving horizontal scaling Spring Data Hadoop is a fairly new Spring feature that helps integrate Spring with Hadoop

24 Future Steps Get Spring Batch set up with a clustered environment
Evaluate performance Figure out dynamic load balancing Play around with more features and integration options Spring Batch Admin, manual job restarting, etc. Implement Spring Batch Admin into Cobalt GUI? Look more into the information stored in Meta-data database and figure out how to use for monitoring/managing jobs Look into Partitioning and how much must be done to implement sending partitions off to remote machines Look into job/step timeout

25 Questions?


Download ppt "Christopher Jeffers August 2012"

Similar presentations


Ads by Google