Sei sulla pagina 1di 9

DATA WAREHOUSING & CLOUD COMPUTING

Govindrajan Satya RESEARCH PAPER STRATFORD UNIVERSITY July 18 2011

ABSTRACT: The following paper discusses a new trend in IT which is Cloud Computing and tries to collaborate with data warehousing. The paper looks as to how it can be collaborated and its advantages and disadvantages.

Govindrajan Satya email: psgsatya@gmail.com STRATFORD UNIVERSITY

Govindrajan Satya email: psgsatya@gmail.com STRATFORD UNIVERSITY

Data Warehousing and Cloud Computing

Data Warehousing :
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by William Inmon:

Subject Oriented Integrated Nonvolatile Time Variant

Subject Oriented Data warehouses are designed to help you analyze data. For example, to learn more about your companys sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item
2

Govindrajan Satya email: psgsatya@gmail.com STRATFORD UNIVERSITY

last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented.

Integrated Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated. Nonvolatile Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred. Time Variant In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to Online Transaction Processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouses focus on change over time is what is meant by the term time variant.

Govindrajan Satya email: psgsatya@gmail.com STRATFORD UNIVERSITY

Cloud Computing :
Introduction "Cloud Computing," to put it simply, means "Internet Computing." The Internet is commonly visualized as clouds; hence the term cloud computing for computation done through the Internet. With Cloud Computing users can access database resources via the Internet from anywhere, for as long as they need, without worrying about any maintenance or management of actual resources. Besides, databases in cloud are very dynamic and scalable. Cloud computing is unlike grid computing, utility computing, or autonomic computing. In fact, it is a very independent platform in terms of computing. The best example of cloud computing is Google Apps where any application can be accessed using a browser and it can be deployed on thousands of computer through the Internet.

Key Characteristics Cloud computing is cost-effective. Here, cost is greatly reduced as initial expense and recurring expenses are much lower than traditional computing. Maintenance cost is reduced as a third party maintains everything from running the cloud to storing data. Cloud is characterized by features such as platform, location and device independency, which make it easily adoptable for all sizes of businesses, in particular small and midsized. However, owing to redundancy of computer system networks and storage system cloud may not be reliable for data, but it scores well as far as security is concerned. In cloud computing, security is tremendously improved because of a superior technology security system, which is now easily available and affordable. Yet another important characteristic of cloud is scalability, which is achieved through server virtualization. In a nutshell, cloud computing means getting the best performing system with the best value for money.

Govindrajan Satya email: psgsatya@gmail.com STRATFORD UNIVERSITY

Different forms of Cloud Computing Google Apps., Salesforce.com, Zoho Office and various other online applications use cloud computing as Software-As-Service (SAAS) model. These applications are delivered through browser, and multiple customers can access it from various locations. This model has become the most common form of cloud computing because it is beneficial and practical for both the customers and the services providers. For customers, there is no upfront investment and they can Pay-As-They-Go and Pay-As-They-Grow. On the other hand, the service providers, can grow easily as their customer base grows. Aamzon.com, Sun and IBM offer on-demand storage and computing resources. Web service and APIs enable developers to use all the cloud from internet and allow them to create large-scale, full-featured application. Cloud is not simply limited to providing data storage or computing resources, it can also provide managed services or specific application services through web. How it works Cloud computing involves pooling the processing power of multiple remote computers in "the cloud" to achieve a task, such as data warehousing of hundreds of terabytes, managing and synchronizing multiple documents online, or computationally intensive work. These tasks would normally be difficult, time consuming, or expensive for a single computer to accomplish. The result of this processing is then served using the Internet to one or more clients working on their local computer. In essence, the heavy lifting of a task is outsourced to an external entity with more resources and expertise. The services such as data storage and processing - and software are provided by the company hosting the remote computers. The clients are only responsible for having a simple computer with a connection to the Internet in order to make requests to and receive data from the cloud. Computation and storage is divided among the remote computers in order to handle large volumes of both, thus the client need not purchase expensive hardware to handle the task.

Govindrajan Satya email: psgsatya@gmail.com STRATFORD UNIVERSITY

Cloud Computing Concerns Security of confidential data (e.g., SSN or Credit Card Numbers) is a very important area of concern as it can make way for very big problems if unauthorized users get access to it. Misuse of data can create big issues; hence, in cloud computing it is very important to be aware of data administrators and their extent of data access rights. Large organizations dealing with sensitive data often have well laid out regulatory compliance policies. However, these polices should be verified prior to engaging them in cloud computing. There is a possibility that in cloud computing network, sometimes the network utilizes resources from another country or they might not be fully protected; hence, the need arises for appropriate regulatory compliance policies. In cloud computing, it is very common to store data of multiple customers at one common location. Cloud computing should have proper techniques where data is segregated properly for data security and confidentiality. Care must be taken to ensure that one customers data does not affect another customers data. In addition, Cloud computing providers must be equipped with proper disaster recovery policies to deal with any unfortunate event.

Govindrajan Satya email: psgsatya@gmail.com STRATFORD UNIVERSITY

Cloud Computing & Data Warehousing Architecture:


Thus is the dynamic landscape of the emerging cloud computing environment. What will the effect be of the encounter between cloud computing and data warehousing? First, data warehousing will do to the cloud what it did to web service raise the bar. Second, it will push the pendulum back in the direction of data marts. Third, it will deflate the inevitable hype being generated in the press. First, data warehousing raises the bar on cloud computing. Capabilities such as data aggregation, roll up and related query intensive operations may usefully be exposed at the interface whether as Excel-like functions or actual API calls. Cloud computing is the opposite of traditional data warehousing. Cloud computing wants data to be location independent, transparent and function shippable, whereas the data warehouse is a centralized, persistent data store. Run-time metadata will be needed so that data sources can be registered, get on the wire and be accessible as a service. In the race between computing power and the explosion of data, large volumes of data continue to be stuffed behind I/O subsystems with limited bandwidth. Growing data volumes are winning. Still, with cloud computing (as with web services), the service, not the database, is the primary data integration method.

Second, data warehousing in the cloud will push the pendulum back in the direction of data marts and analytic applications. Why? Because it is hard to image anyone moving an

Govindrajan Satya email: psgsatya@gmail.com STRATFORD UNIVERSITY

exiting multiterabyte data warehouse to the cloud. Such databases will be exposed to intra-enterprise corporate clouds, so the database will need to be web service friendly. In any case, it is easy to imagine setting up a new ad hoc analytic app based on an existing infrastructure and a data pull of modest size. This will address the problem of data mart proliferation since it will make clear the cost and provide incentives for the business to throw it away when it is no longer needed. Third, the inevitable hype around cloud computing will get a good dose of reality when it confronts the realities of data warehousing. Questions that a client surely needs to ask are: If I want to host the data myself, is there a tool to move it? Since this might be special project, how much does it cost? What are the constraints on tariffs (costs)? The phone company requires regulatory approval to raise your rates; but that is not the case with Amazon or Google or Layered Technology. Granted that strong incentives exist to exploit network effects (economies of scale and Moores Law like pricing). It is a familiar and proven revenue model to give away the razor and charge a little bit extra for the razor blade. Technology lock-in! It is an easy prediction to make that something like that will occur once the computing model has been demonstrated to be scalable, reliable and popular. Under a best case scenario, economies of scale large data warehousing applications will enable a win-win scenario where large clients benefit from inexpensive options. However, in an economic downturn, the temptation will be overwhelming to raise prices once technology lock-in has occurred. Since this is a new infrastructure play, it is too soon for anything like that to occur. Indeed, this is precisely the kind of innovation that will enable the economy to dig itself out of the hole into which the mortgage mess has landed us. Unfortunately, it will not make houses more affordable. It will, however, enable business executives and information technology departments to do more with less, to work around organizational latency in any department and to compete with agility in the digital economy. It is simply not credible to assert that any arbitrary cloud computing provider will simply be able to accommodate a new client who starts out requiring an

Govindrajan Satya email: psgsatya@gmail.com STRATFORD UNIVERSITY

extra ten terabytes of storage. Granted, the pipeline to the hardware vendors is likely to be a high priority one. The sweet spot for fast provisioning of data warehousing in the cloud is still small- and medium-sized business and applications. CONCLUSION: The following paper discusses a new trend in IT which is Cloud Computing and tries to collaborate with data warehousing. The paper looks as to how it can be collaborated and its advantages and disadvantages. In cloud computing, it is very common to store data of multiple customers at one common location. Cloud computing should have proper techniques where data is segregated properly for data security and confidentiality. Care must be taken to ensure that one customers data does not affect another customers data. In addition, Cloud computing providers must be equipped with proper disaster recovery policies to deal with any unfortunate event.

REFERENCES : www.informationweek.com/news/cloud-computing www.netezza.com/data-warehouse-appliance.../cloud.aspx www.google.com www.wikipedia.org www.infosys.com www.lntinfotech.com

Potrebbero piacerti anche