Sei sulla pagina 1di 8

HOME WORK- 3

CAP 501: DATA WAREHOUSING


AND DATA MINING
DOA: 28-Mar-11 DOS: 8-Apr-11

Part A
1. Taking example of Indian Railways, illustrate the significance of Business
value.
Ans:- Indian Railways is the world’s largest employer and one of the biggest and busiest rail
networks in the world, carrying some 17 million people and more than one million tonnes of freight
daily. Indian Railways is one of INSEAD’s biggest executive education clients.

► Significance of Business value in Indian railways: -


 It’s a strategy based on volume. While output has increased threefold, real operating costs have
fallen over the last 25 years. By increasing the capacity of a typical long-distance train to 2000
passengers from 800, unit costs fell by 45 per cent.

 The practice of taking seven days to load or unload a freight train was reduced to five, and
systematic changes have helped to rein in corruption.

 Through joint ventures, we will set up cold storage and purchase points in stations, as well as
freezer containers, so they can send agricultural produce around the country and beyond.

 We will charge farmers appropriate and reasonable prices. This will enrich farmers, and this
increase in income will mean they can buy the things that everyone else is buying.”

 IR is in the process of upgrading stations, coaches, tracks, services, safety, and security, and
streamlining its various software management systems including crew scheduling, freight, and
passenger ticketing.

 Crew members will be able to log in using biometric scanners at kiosks while passengers can
avail themselves of online booking.

 Outdated communication, safety and signaling equipment, which used to contribute to failures
in the system, is being updated with the latest technology.

 A number of train accidents happened on account of a system of manual signals between


stations, so automated signaling is getting a boost at considerable expense.
2. Make a comparison of product of other vendors providing Data
Warehousing and Data Mining toolset with Microsoft.
Ans:- Comparison among some of the popular vendors in the market: -
• IBM Cognos.
• Microsoft.
• MicroStrategy.
• Oracle Hyperion.
• QlikTech.
• SAP Business Objects.
• SAS.

►To simplify the selection process, start by understanding how BI products map onto the BI
architecture: -
• Stack Vendors: Provide comprehensive BI solutions that target the entire BI architecture.
• Pure Play Vendors: Primarily focus on providing analytical and reporting solutions.
• Component Vendors: Point solutions for a single component of the BI architecture (e.g. analysis).

Key Selection Criteria


To determine the ratings for vendors in the business intelligence industry, Info-Tech compared vendor
performance in these five areas:

Product List
The following vendors and their products were evaluated in this landscape:
Vendor Evaluation
The Scorecard

Q3 Using a diagram, highlight major components of Microsoft Toolset for


DW/BI system?

Ans- Architecture of a Microsoft DW/BI System


■ The Microsoft DW/BI Toolset

The core set of DW/BI tools that Microsoft Corporation sells is Microsoft SQL Server 2005. SQL
Server includes several major components of primary interest for DW/BI projects:

• The relational engine (RDBMS) to manage and store the dimensional data warehouse
database.
• Integration Services to build the extract, transformation, and load (ETL) system.
• An OLAP database in Analysis Services to support users’ queries, particularly ad hoc use.
• Analysis Services data mining to develop statistical data mining models, and also to include
those models in advanced analytic applications.
• Reporting Services to build predefined reports. Most of the Reporting Services features are
most appropriate for the DW/BI team, but you may provide some ad hoc query and report
building functionality with Report Builder.
• Development and management tools, especially SQL Server BI Development Studio and
SQL Server Management Studio, to build and manage your DW/BI system.

► The SQL Server product contains the software necessary to build, deploy, populate, manage, and
access your DW/BI system. A second significant set of Microsoft tools are designed for the business
user. These include Microsoft Office, notably Excel, Office Web Components, Data Analyzer, and
Share Point Services.

Office and Share Point provide tools that you can use to build end-user applications to access the
data warehouse databases. Many DW/BI systems supplement Microsoft end-user tools with third-
party software.
An increasingly important set of software developed by Microsoft is packaged analytic
applications. Examples of these packaged analytic applications are:

• The analytic functionality that Microsoft is increasingly adding to transaction systems like
Commerce Server and Microsoft Business Systems Great Plains
• Standalone analytic applications, which will be released after this book goes to print

PART B
4. Taking a sample organization, comment on its DW/BI system
configuration.
Ans:- - HP and Microsoft data warehousing and business intelligence reference
configurations: -

1) Too much data—not enough information


Organizations face growing concerns about how to simplify their business critical environment, how
to leverage investments for highest return, and how to maintain user service level agreements
(SLAs). Another concern is how to confidently determine the appropriate hardware infrastructure
required supporting user workloads while minimizing both costs and the risk to the business.

2) Simplify and consolidate with HP and Microsoft


Based on an analysis of combinations of CPU, memory, servers and storage for a range of DW and
BI implementations and business-critical workloads, these configurations help you:

• Harness and consolidate data across multiple systems and platforms

• Control mounting costs of storing and managing ever-growing volumes of data

• Make better decisions in less time

• Deliver business-critical information to the right people, at the right time, in an easy-to-use format

• Track your organization’s performance against key business goals and financial targets

3) Industry leading products:

• HP Storage Works storage solutions –reliable, cost effective data storage and protection
solutions

• HP Systems Insight Manager and HP Integrity Essentials management software – software


tools to setup and manage your HP Integrity server and HP Storage Works systems.

• Microsoft SQL Server 2008 – enterprise-level data management and analysis for demanding
DW/BI workloads

• Microsoft SQL Server Reporting Services – comprehensive, server-based reporting

• Microsoft SQL Server Integration Services – data integration platform that can integrate data
from any source; provides an extract, transformation and load (ETL) platform

• Microsoft SQL Server Analysis Services – unified and integrated view of all business data-
foundation for all traditional reporting, OLAP analysis, key performance indicator (KPI) scorecards
and data mining

5. Taking example of LPU, decide various systems configuration


considerations and justify your answer.

Ans:-

systems configuration considerations:- This includes the configuration of resource


definitions, process forms, approval processes (and other records that will affect provisioning)
within the Oracle Identity Manager Design Console and the editing of the relevant configuration
files to support the desired functionality within Oracle Identity Manager Administrative and User
Console. Not all of these settings will be relevant for all users. Review this section prior to
deploying your Oracle Identity Manager Administrative and User Console to ensure that you have
configured the product to function as intended. The parallel engine's view of your system is
determined by the contents of your current configuration file. Your file defines the processing nodes
and disk space connected to each node that you allocate for use by parallel jobs. When invoking a
parallel job, the parallel engine first reads your configuration file to determine what system
resources are allocated to it and then distributes the job to those resources. When you modify the
system by adding or removing nodes or disks, you must modify your configuration file
correspondingly. Since the parallel engine reads the configuration file every time it runs a parallel
job, it automatically scales the application to fit the system without your having to alter the job code.

6. Justify the need of RAID and SAN in Data Warehousing and Data
Mining?

ANS:- Data warehousing and data mining applications are important drivers of and SAN
configurations. Data warehousing is built around relational databases. Data mining is the process of
retrieving the warehoused information. The efficiency of a database is directly proportional to the
speed at which files can be stored, sorted, indexed and retrieved.

Solid State Disks can significantly improve the performance, efficiency and effectiveness of an
external storage library. They can be attached directly to the library system or to the host system that
is connected to the network.

The degree of performance improvement achieved by using Solid State Disks in NAS and SAN
structures depends on system configuration, application and application workload. In the library-
attached mode, the data base index files are stored in each storage disk battery's cache SSD Solid State
Disks, thus shortening the read access time by 10% to 80%, to nearly work memory performance
levels.

Need of RAID Applications

RAID subsystems are currently the most used storage sub-systems because of the high degree of
data integrity and high data availability they offer. Data protection, accessibility and availability are
the drivers for the success of RAID enterprise storage configurations. However, due to RAID built-
in redundancy and data integrity enhancement algorithm, the performance is diminished in
comparison to the performance of simple HDD configurations. The diminished performance can be
significantly improved by using SSDs in the disk matrix. In the upper configuration the SSD is used
as a resilient write cache or a supplemental read-write cache. In the lower configuration, in addition
to the supplemental cache function, SSDs are used as matrix disk members. Theoretically, all disk
members of a RAID configuration can be replaced with SSDs. The number of usable SSDs utilized
is based on a trade off between performance and cost.

Depending on the application, SSDs can be used as write cache, parity disk or matrix disk. Using
SSDs to write the parity information greatly improves the overall RAID performance, as parity
generation delays the write operation in an all-mechanical RAID configuration.

RAID systems can be used either in stand-alone computing configurations or in networked


configurations.
RAID, especially RAID 5, is nearly standard on configurations larger than desktop PCs

Need of SAN: -

• A storage area network (SAN) is a high-speed special-purpose network (or sub network) that
interconnects different kinds of data storage devices with associated data server s on behalf
of a larger network of users. Typically, a storage area network is part of the overall network
of computing resources for an enterprise. A storage area network is usually clustered in close
proximity to other computing resources such as IBM S/390 mainframes but may also extend
to remote locations for backup and archival storage, using wide area network carrier
technologies such as asynchronous transfer mode or Synchronous Optical Networks.

• A storage area network can use existing communication technology such as IBM's optical
fiber ESCON or it may use the newer Fibre Channel technology. Some SAN system
integrators liken it to the common storage bus (flow of data) in a personal computer that is
shared by different kinds of storage devices such as a hard disk or a CD-ROM player.
• SANs support disk mirroring, backup and restore, archival and retrieval of archived data,
data migration from one storage device to another, and the sharing of data among different servers in a
network. SANs can incorporate sub networks with network-attached storage (NAS) systems.

Potrebbero piacerti anche