Sei sulla pagina 1di 6

High-Performance Batch Processing Framework

It is hard to find a mid to large sized business today that does not have at least a batch job or process that runs independent of the web application running within a web browser. In most businesses, in fact, there are several batch jobs or processes that automate core, high volume business tasks like mass mailing reports to customers, creating nightly reports for all customers, processing and/or transmitting data (files) sent from/to external partners (interface data processing), importing/exporting bulk data in and out of web applications/databases. The characteristics of the typical Batch Process include: A long-running process that must occur on a regularly scheduled basis (say month-end, midnight each day, etc.,) Process running asynchronously from user interaction, i.e. it is not part of a user session in an online system. In most cases, a user does not start it and is not waiting on it to complete. Sometimes, a user initiates it by clicking on a button or link within an online application but does not want to be held up from doing anything else within the application but wants a high-volume processing task to be completed as soon as possible and wants to be notified when the processing is completed. (Real-time or Near, Real-time Asynchronous Processing) There may be complex logic or calculations to perform on the data. The volume of data to be processed is high, usually on the order of tens of thousands to millions of records. The process may require a large set of data from an external system that is delivered in sets on a schedule. Technology existed for a long time to develop, deploy and operate batch jobs for maximum performance. Multi-core processors are standard these days in any servers. Memory has become very inexpensive and most servers come with high memory (RAM and disk space). Most programming languages like Java and .NET offer support for building multi-threaded applications that can perform tasks in parallel, making use of the full power of the underlying hardware.
2009 Tripod Technologies, LLC

In spite of the availability of the technology to build scalable and reliable Batch Processes, most businesses have been unable to reap its full benefits because of the following reasons: Setting out on the wrong foot: Batch programs, in most projects, get developed and deployed as a "Process" with no multi-threading support. Only upon deployment into Production/UAT (or if you are fortunate, during the late phases of QA where performance testing may get done) it becomes obvious that the business/performance goals will not be met unless parallelism is built into the batch processing by means of multi-threading support. This leads to unanticipated and costly delays and lot of rework on the code. Im not comfortable developing multi-threaded programs Syndrome: Programming with threads and other technologies [like JMS, RMI] is very complex, tricky and error-prone. Most developers are not experienced in and not comfortable with building multi-threaded applications. JDK 5.0 made significant improvements in the area of multi-threading support. But even with those improvements, the complexities of developing multi-threading batch jobs have not gone away. Additionally, since a majority of the business applications written in the past decade and a half have been written for deployment on Web servers, developers have relied on these Web servers to handle issues like scalability and multi-threading. This reliance has reduced the experience level (to non-existent in some cases) of developers in terms of these skills. Debugging headaches I dont understand why my code does not work: Unwitting programmers often developed code that introduced hard-todebug programming errors. Without strong foundation of operating system principles like threads, deadlocks, resource pools, CPU utilization, memory management, etc., it is very hard to debug problems. Where does the buck stop?: Tuning the performance of the batch jobs for maximum performance requires thorough understanding of the hardware and the operating system and other software components running on that hardware. Most software service providers and most teams lack wellcoordinated communication/collaboration between developers, architects, network and system engineers, quality engineers and business leaders that is required to "tune" the programs for maximum performance.

2009 Tripod Technologies, LLC

Tuning is not my responsibility mindset: Most development shops have the mindset that configuring the batch programs for optimum performance is not their responsibility and that it is the sole responsibility of infrastructure team. Since the infrastructure team has limited knowledge of how the application is architected or built they are unable to tune the batch jobs for optimum performance. Operational Challenges: o Is it ok to kill a Batch Job? In most cases when software upgrades or patches have to be pushed out to production and the batch jobs that are running have to be brought down, system/network engineers managing the production systems do not have a clue as to whether or not it is ok to kill a process that is executing the batch job. No visibility exists to see if the batch job is currently processing some business tasks or if it is just in the sleep (wait) mode. This can lead to unnecessary and sometimes costly errors where the transaction does not get processed completely. o No support to adjust the amount of audit information logged: Operationally, when problems occur (like processes running slow or processes not running at all or abruptly stopping), system/network engineers and help desk personnel struggle to identify the source of the problem. In most situations, either too much unnecessary information is being written into the applications audit log files or too little useful information is being written into the log files. Most developers (or development shops) build in support for infrastructure support or help desk team to adjust the level of audit trail information on the fly to help triage the issue towards resolution faster. o No automated/emailed alerts & notifications to the Business managers and End-users: Most development shops or developers do not build support within batch jobs for automated alerts/notifications to be sent to Business users (via email, SMS etc) when problems occur within batch jobs. Keeping business managers and end-users abreast of slowdowns or delays in processing can go a long way in managing end customer expectations. This also significantly reduces the unnecessary load on the application/IT help desk and developers in manually handling the huge in-flow of enduser and management calls into the help desk either reporting about delays or inquiring about delays in processing.
2009 Tripod Technologies, LLC 3

o Lack of consolidated support to perform a Full Health Check or a Pulse Check: Most development shops or developers do not build support within batch jobs for system/network engineers to do a health check of some or all of the vital stats of the batch job like memory management, CPU utilization, number of active threads, processing speed, etc., This leaves the system engineers to rely of numerous mostly expensive monitoring tools and utilities to check each vital stat separately making it difficult to visualize the big picture. The net result of one or more of the above reasons is that businesses spend significant portions of their operational and development/maintenance budgets in maintaining and managing the numerous batch jobs that automate the core business tasks. Tripods High Performance Batch Processing Framework helps companies reduce the Total Cost of Ownership (TCO) of its batch jobs Tripod leverages this framework and its Distributed Agile Development Process to address each of the challenges faced in building and managing batch jobs or processes. Tripods High-Performance Batch Processing Framework is a Java-based framework that encapsulates the collective experience of Tripods architects, developers, quality assurance engineers, help desk personnel, and systems/network engineers on numerous, successfully delivered and/or managed client implementations/projects that involved complex, high-volume transactional systems and batch processes. The framework out-of-the-box provides the necessary plumbing that is needed for a batch job to be developed economically and rapidly, and for the batch job to operationally perform, scale and be operated with ease in production. This framework efficiently addresses the following core aspects of any Batching framework:

Job control (start and stop; immediate shut-down and graceful shut-down) Job partitioning (partition or breakup your job into smaller chunks of work) Parallel processing and distribution (multi-threading at two levels and distribution of load onto multiple java virtual machines, java application servers, and physical/virtual servers) Fine-grained transaction control (Each chunk of work gets completed or doesnt; ensure that work does not get completed partially and that data does not get corrupted)
2009 Tripod Technologies, LLC 4

Error handling (understanding when an error occurs and what caused it, error notification, controlling whether the job chunk continues after error and the overall job stops or continues). Job monitoring (mechanisms to check the status of the job to address questions like is the job complete? what portion of the job is complete?, how much more time is needed to complete a job?, etc.,)

The key benefits that Tripods Batch Framework offers to its Clients are: Reduction in development time for new batch jobs: Each time a new batch job needs to be developed, development is just limited to programming the specific business logic and not the internal plumbing (multithreading, memory management, shared memory, resource pooling, scheduling, run time audit trail information logging, performance, etc.,) required to execute and manage the batch jobs. Reduction in QA time for new batch jobs: Since Tripods framework is performance and time-tested, testing new batch jobs should be essentially a simple task of testing the business specific features that are batch job specific. Lot of time does not have to be spent on testing the plumbing features like parallelism of threads, memory management, CPU management, etc., Improved Quality of Tripod deliverables: It is just a direct benefit gained by using time-tested reusable components. Improved End-User/Client Satisfaction Levels: It is just a direct benefit gained by not having bugs go into UAT or production and clients seeing wellperforming and scalable batch jobs. Improved Reduced Cost of Development and Deployment: Tripod can develop, perform quality assurance and deliver high-transactional batch jobs faster and with less number of resources. The cost savings achieved are directly passed on to Tripods Clients.

2009 Tripod Technologies, LLC

Operational efficiencies: In production, we have numerous benefits: 1. Better Performance and Scalability: Overall the jobs will perform better and will be more scalable. By adjusting parameters like concurrent thread pool size within text-format and human-readable configuration files, the performance of the batch jobs can be tuned dynamically without any code-compile-release cycles 2. Better Operational Visibility: The framework supports Run-time Inquiry, a feature which allows the system/network engineers to improve operational visibility by easily integrating with other monitoring services/tools. 3. Better Manageability: Graceful Shutdown and Health Check are standard features that come with the framework for all batch jobs. The framework enables systems/network engineers to on the fly query the batch job if it is busy processing business tasks and/or to pass on a message (signal) to the batch job to gracefully shut down on its own when its completed all tasks that it is currently processing but not take on additional tasks. System/Network engineers can use these features out of the box, with no additional coding to efficiently and economically manage the batch jobs in production.

2009 Tripod Technologies, LLC ALL RIGHTS RESERVED Copyright in whole and in part of this document High Volume Batch Processing Framework) belongs to Tripod Technologies, LLC. This work may not be used, sold, transferred, adopted, abridged, copied or reproduced in whole or in part in any manner or form or in any media without the prior written consent of Tripod Technologies, LLC.

Potrebbero piacerti anche