Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
If the words extract transform load sound like a foreign language, youre in the right place. Making sense of the business intelligence (BI) software market can be challenging for even the most technical of software buyers. Its complex stuff. Its important, however, to grasp the main concepts and to understand what BI tools do before you dive headfirst into a purchase. Youll want to be literate when discussing a BI purchase with your management team, IT staff, or vendors sales reps. Were here to help you understand basic BI concepts. In this Beginners Guide to Business Intelligence, well introduce three foundational BI components, explain why and when you would need to use them, and share examples of vendors that offer these capabilities. The three components well cover are:
Data Warehouses Extract, Transform and Load (ETL) Online Analytical Processing (OLAP)
A data warehouse is organized in a way that is optimized for complex analysis of data from multiple systems, whereas the underlying operational systems are optimized to handle a high volume of transactions specific to their function. Whats the difference? An operational system supports daily activities such as entering sales interactions in a sales automation app or expenses in an accounting system. You can run simple queries and analyses on an operational systems database. However, you may impede performance and slow down other processes it was originally intended for. For example, picture a line of frustrated customers waiting for your point of sale (POS) system to process a sale while your manager is in the back office analyzing how many yellow tshirts to order next month for your three retail stores. These activities should not be hitting the same database at the same time. Also, that POS systems operational database probably doesnt have all of your accounting data; that data is in the accounting system. And data from your distribution center is probably in your supply chain management system. So if you are just analyzing the POS database, you wont get a comprehensive view of the whole process. Data warehouses, on the other hand, are specifically designed to run complex analysis on large volumes of historical data originating from multiple source systems. Data warehouses can be built on standard relational database management systems (RDBMS) or on a database designed specifically for data warehouse applications.
Analysts may refer to this as a data warehouse database management system," while some vendors (e.g. Netezza, Vertica and Greenplum) use "high-performance data warehouse." There are a variety of vendors, large (e.g. IBM, Oracle and Teradata) to small (e.g. illuminate and SAND technology), that offer a data warehouse DBMS.
Extract data from sources such as ERP or CRM applications; Transform that data into a common format that fits with other data in the warehouse; and, Load the data into the data warehouse for analysis.
The ETL concept sounds easy, but the execution is complex. Were not talking about simple copy and paste stuff here. Each step in the process has its challenges. For example, during the extract step, data may come from different source systems (e.g. Oracle, SAP, Microsoft) and different file formats such as XML, flat files with delimiters (e.g. CSV), or the worst old legacy systems that store data in arcane formats no one else uses anymore. The transform step may include multiple data manipulations such as splitting, translating, merging, sorting, pivoting and more. For example, a customer name might be split into first and last name, or dates might be changed to the standard ISO format (e.g. from 11-21-11 to 2011-11-21). The final step, load, involves loading the transformed data into the data warehouse. This can either be done in batch processes or row by row, more or less in real-time. ETL tools often come bundled with databases or sold as bolt-on tools. For example, Microsoft, Oracle and IBM all offer some type of ETL capabilities with their databases. Meanwhile, third-party ETL vendors offer tools that will support a variety of disparate applications and data structures. As a final option, some BI buyers choose to build their own custom ETL tools. We should mention that despite being a core component of data warehouse environments, ETL is not unique to data warehousing. This concept and technology has existed in some form or fashion for a long time. It can be used to move data between databases, transactional systems (e.g. ERP to CRM) and of course, data warehouses.
A fact table is essentially a single table with rows and columns think spreadsheet that contains business data. For example, one table could include data about sales:
product_code
units_sold
customer__code
sales_value
112233
12345
100
112234
12346
75
112235
12347
100
A dimension table contains information that describes the records in the fact table. It contains textual attributes, and those attributes may be descriptive or may provide instructions on how the fact table data should be summarized. Additionally, the information in dimension tables is independent of information in other dimension tables. For example, a product dimension table has information about products while a customer dimension table has information about customers. Now onto cubes. Cubes are the core components of OLAP systems. They aggregate facts from every level in a dimension provided in a schema. For example, they could take data about products, units sold and sales value, then add them up by month, by store, by month and store and all other possible combinations. Theyre called cubes because the end data structure resembles a cube.
OLAP systems are able to provide fast responses to queries because of this cube data structure. They essentially already have all the answers to your queries. For example, if you ask for two years of revenue data for store A and for product B, the cube already has this information aggregated and can spit it back out in seconds. Almost all BI vendors will offer an OLAP tool or a similar type of analytical tool. Similarly, most BI buyers will need OLAP or a similar tool for analysis and reporting. OLAP tools can run on data warehouses or on transactional databases. So, companies can
purchase OLAP without having to invest in a complete data warehouse environment. Though, as mentioned earlier, there could be performance issues when taking this approach.
Take, for example, the online retailer EBay. They have over 200 million items for sale, separated into 50,000 categories, and bought and sold by 100 million registered users. This amounts to 9 petabytes of data, according to a recent articlefrom The New York Times. Theyre not alone. Google is said to process ~24 petabytes of data per day; AT&T processes 19 petabytes through their networks each day, and; the video game World of Warcraft uses 1.3 petabytes of storage. Thats a ton of data to store, process and manage. The analyst that can wield this data stands to make some interesting, if not profitable, discoveries. Enterprises are privy to this, and so are software vendors. So several BI vendors are developing technology to support demand for this growing market. In-Memory Processing A classic case of Moores Law, in-memory processing is gaining traction because of the reduced costs and increased power of random access memory (RAM). Instead of loading data onto disks (i.e. hard drives) in the form of tables and cubes, data is loaded into RAM. Accessing data in memory is literally millions of times faster than accessing data from disk, suggests BI analyst Cindi Howson. Not to mention, it provides more flexibility. Users dont have to pre-process data and organize it into cubes. They can perform ad hoc queries to make quick, insightful business decisions. Many vendors now offer some type of in-memory solution, including Tibco, Tableau, QlikTech, SAP and more. This concludes our beginners guide to business intelligence. If you need additional help with your software research, call us for a free consultation. If there are other tools youd like to learn about, or if you have an idea for a future report, leave us a comment below or get in touch through Google+.