Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers. These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debuggging and loading into data warehouse when compared to the old method. There are a number of ETL tools available in the market to do ETL process the data according to business/technical requirements. Following are some those. Pentaho Kettle Informatica PowerCenter Inaplex Inaport Talend Parameters: Some are the parameters to compare etl tools Total Cost of Ownership Total Cost of Ownership means the over all cost for a certain product.This can mean initia ordering, licensing servicing, support, training, consulting,and any other additional payments that need to be made before the product is in full use.Commercial Open Source products are typically free to use, but the support,training and consulting are what companies need to pay for. Risk There are always risks with projects, especially big projects. The risks for projects failing are: Going over budget Going over schedule Not completing the requirements or expectations of the customers
Open Source products have much lower risk then Commercial ones since they do not restrict the use of their products by pricey licenses.
Speed
It requires manual tweaking and prior knowledge of the specific data source to reduce network traffic and processing. flat files, xml files, excel files and webservices, but is reliant on Java drivers to connect to those data sources. Offers support but mainly resides in US
Connectivity
Support
Can connect to any(windows) connection. usually gets its data from outlook, ACT and excel files. wide Mainly resides in UK
Space required
Platform required
two CPUs with 1Gb ram for Standard Edition Server Is a stand-alone Windows, java engine that Solaris, HP-UX, can run on any IBM-UX, Redhat, machine SUSE linux that can run java. Low risk Medium commercial open-source suite
one CPU with one 1Ghz CPU 50mbs ram. I and 512mbs ram Can run on any windows platform that has .NET 2.0 installed Creates a java file or perl file that can be run on any machine with very little resource Low risk medium open-source data integration tool
High risk Medium risk High cost then medium other tools commercial data BI integration suite
Ease of Use All of the ETL tools, apart from Inaport, have GUI to simplify the development process. Having a good GUI also reduces the time to train and use the tools. Support Nowadays, all software products have support and all of the ETL tool providers offer support. Speed The speed of ETL tools depends largely on the data that needs to be transferred over the network and the processing power involved in transforming the data. Data Quality Data Quality is fast becoming the most important feature in any data integration tool. Connectivity In most cases, ETL tools transfer data from legacy systems. Their connectivity is very important to the usefulness of the ETL tools. Conclusion: By the comparing some of etl tools it is concluded that informatica and pentaho are good enough then other tools nd have wide vriety of products.informatica has larg vriety of products handling bussines processes and commercially have a place at market but its expensive then pentaho and have more risk in failing projects then pentaho. It is proved by MySQL and many of companies by their case studies that pentaho can handle small to large scale systems.Pentaho is gaining fast momentum with businesses that would not have considered using open source products before.