Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Business Intelligence helps to manage data by applying different skills, technologies, security and quality risks. This also helps in achieving a better understanding of data. Business intelligence can be considered as the collective information. It helps in making predictions of business operations using gathered data in a warehouse. Business intelligence application helps to tackle sales, financial, production etc business data. It helps in a better decision making and can be also considered as a decision support system.
arehousing?
&T! is &'tract Transform !oad. It is a process of fetching data from different sources, converting the data into a consistent and clean form and load into the data warehouse. %ifferent tools are available in the market to perform &T! 1obs.
arehousing?
&T! stands for &'traction, transformation and loading. That means e'tracting data from different sources such as flat files, databases or 2)! data, transforming this data depending on the application*s need and loads this data into data warehouse.
arehousing.
%ata warehousing is merely e'tracting data from different sources, cleaning the data and storing it in the warehouse. 3here as data mining aims to e'amine or e'plore the data using queries. These queries can be fired on the data warehouse. &'plore the data in data mining helps in reporting, planning strategies, finding meaningful patterns etc. &.g. a data warehouse of a company stores all the relevant information of pro1ects and employees. 4sing %ata mining, one can use this data to generate different reports like profits generated etc.
arehousing.
%ata mining is a method for comparing large amounts of data for the purpose of finding patterns. %ata mining is normally used for models and forecasting. %ata mining is the process of correlations, patterns by shifting through large data repositories using pattern recognition techniques. %ata warehousing is the central repository for the data of several business systems in an enterprise. %ata from various resources e'tracted and organi$ed in the data warehouse selectively for analysis and accessibility.
What is sno
" snowflake ,chema in its simplest form is an arrangement of fact tables and dimension tables. The fact table is usually at the center surrounded by the dimension table. 5ormally in a snow flake schema the dimension tables are further broken down into more dimension table.
&.g. %imension tables include employee, pro1ects and status. ,tatus table can be further broken into status6weekly, status6monthly.
What is sno
,now flake schema is one of the designs that are present in database design. ,now flake schema serves the purpose of dimensional modeling in data warehousing. If the dimensional table is split into many tables, where the schema is inclined slightly towards normali$ation, then the snow flake design is utili$ed. It contains 1oins in depth. The reason is that, the tables split further.
ith an example.
%ata warehouses commonly use a surrogate key to uniquely identify an entity. " surrogate is not generated by the user but by the system. " primary difference between a primary key and surrogate key in few databases is that #7 uniquely identifies a record while a ,7 uniquely identifies an entity. &.g. an employee may be recruited before the year 8999 while another employee with the same name may be recruited after the year 8999. 0ere, the primary key will uniquely identify the record while the surrogate key will be generated by the system .say a serial number/ since the ,7 is 5 T derived from the data.
ith an example.
" surrogate key is a unique identifier in database either for an entity in the modeled word or an ob1ect in the database. "pplication data is not used to derive surrogate key. ,urrogate key is an internally generated key by the current system and is invisible to the user. "s several ob1ects are available in the database corresponding to surrogate, surrogate key can not be utili$ed as primary key. -or e'ample, a sequential number can be a surrogate key.
" tracking process or collecting status can be performed by using fact less fact tables. The fact table does not have numeric values that are aggregate, hence the name. )ere key values that are referenced by the dimensions, from which the status is collected, are available in fact less fact tables.
Explain the difference bet een star and sno fla$e schemas.
Ans er " snow flake schema design is usually more comple' than a start schema. In a start schema a fact table is surrounded by multiple fact tables. This is also how the ,now flake schema is designed. 0owever, in a snow flake schema, the dimension tables can be further broken down to sub dimensions. 0ence, data in a snow flake schema is more stable and standard as compared to a ,tart schema. &.g. ,tar ,chema( #erformance report is a fact table. Its dimension tables include performance6report6employee, performance6report6manager ,now -lake ,chema( the dimension tables can be broken to performance6report6employee6weekly, monthly etc.
Explain the difference bet een star and sno fla$e schemas.
,tar schema( " highly de-normali$ed technique. " star schema has one fact table and is associated with numerous dimensions table and depicts a star. ,now flake schema( The normali$ed principles applied star schema is known as ,now flake schema. &very dimension table is associated with sub dimension table. %ifferences(
" dimension table will not have parent table in star schema, whereas snow flake schemas have one or more parent tables.
The dimensional table itself consists of hierarchies of dimensions in star schema, where as hierarchies are split into different tables in snow flake schema. The drilling down data from top most hierarchies to the lowermost hierarchies can be done.
arehouse?
" data cube stores data in a summari$ed version which helps in a faster analysis of data. 3here as linked cubes use the data cube and are stored on another analysis server. !inking different data cubes reduces the possibility of sparse data. &.g. " data cube may store the &mployee6performance. 0owever in order to know the hours which calculated this performance, one can create another cube by linking it to the root cube .in this case employee6performance/.
arehouse?
!ogical data representation of multidimensional data is depicted as a Cube. %imension members are represented by the edge of cube and data values are represented by the body of cube. !inked cubes are the cubes that are linked in order to make the data remain constant.
*eal time Data Warehouse) %ata warehouses are updated based on transaction or event basis in this stage. "n operational system performs a transaction every time. Integrated Data Warehouse) The activity or transactions generation which are passed back into the operational system is done in this stage. These transactions or generated transactions are used in the daily activity of the organi$ation.
arehousing?
The transactional data captured and reposited in the "ctive %ata 3arehouse. This repository can be utili$ed in finding trends and patterns that can be used in future decision making.
arehousing?
"n "ctive data warehouse aims to capture data continuously and deliver real time data. They provide a single integrated view of a customer across multiple business lines. It is associated with Business Intelligence ,ystems.
arehouse
%,.
arehouse
n the other hand independent
" dependent data warehouse stored the data in a central data warehouse. data warehouse does not make use of a central data warehouse.
What is data modeling and data mining? What is this used for?
%ata modeling aims to identify all entities that have data. It then defines a relationship between these entities. %ata models can be conceptual, logical or #hysical data models. Conceptual models are typically used to e'plore high level business concepts in case of stakeholders. !ogical models are used to e'plore domain concepts. 3hile #hysical models are used to e'plore database design. %ata mining is used to e'amine or e'plore the data using queries. These queries can be fired on the data warehouse. %ata mining helps in reporting, planning strategies, finding meaningful patterns etc. it can be used to convert a large amount of data into a sensible form.
What is snapshot
arehouse?
" snapshot of data warehouse is a persisted report from the catalogue. The persistence into a file is done after disconnecting report from the catalogue.
What is snapshot
arehouse?
" snapshot is in a data warehouse can be used to track activities. -or e'ample, every time an employee attempts to change his address, the data warehouse can be alerted for a snapshot. This means that each snap shot is taken when some event is fired. " snapshot has three components @
Time when event occurred. " key to identify the snap shot. %ata that relates to the key.
" degenerate table does not have its own dimension table. It is derived from a fact table. The column .dimension/ which is a part of fact table but does not map to any dimension. &.g. employee6id
%irect or -aster load(- The data is directly loaded without checking for any constraints
arehouse?
%ata from different data sources is stored in a relational database for end use analysis. %ata organi$ation is in the form of summari$ed, aggregated, non volatile and sub1ect oriented patterns. ,upports the analysis of data but does not support data of online analysis. !nline Anal#tical "rocessing 3ith the usage of analytical queries, data is analy$ed and evaluated in the data ware house. %ata aggregation and summari$ation is utili$ed to organi$e data using multidimensional models. ,peed and fle'ibility for online data analysis is supported for data analyst in real time environment.
arehouse?
" data warehouse serves as a repository to store historical data that can be used for analysis. !"# is nline "nalytical processing that can be used to analy$e and evaluate data in a warehouse. The warehouse has data coming from varied sources. !"# tool helps to organi$e data in the warehouse using multidimensional models.
Describe the foreign $e# columns in fact table and dimension table.
The primary keys of entity tables are the foreign keys of dimension tables. The #rimary keys of fact dimensional table are the foreign keys of fact tables.
Describe the foreign $e# columns in fact table and dimension table.
" foreign key of a fact table references other dimension tables. n the other hand, dimension table being a referenced table itself, having foreign key reference from one or more tables.
The facts that can not be summed up for the dimensions present in the fact table are called non-additive facts. The facts can be useful if there are changes in dimensions. -or e'ample, profit margin is a nonadditive fact for it has no meaning to add them up for the account level or the day level.
ther tools does not have forecasting tool. -or this reason, ,", is used in most in Clinical Trials and health care industry.
List out difference bet een SAS tool and other tools.
,", provides more features in comparison to other tools. it supports almost "!! database interfaces and has its own e'tensive database engine.
can
e do that?
%ata cleaning is also known as data scrubbing. %ata cleaning is a process which ensures the set of data is correct and accurate. %ata accuracy and consistency, data integration is checked during data cleaning. %ata cleaning can be applied for a set of records or multiple sets of data which need to be merged.
%ata cleaning is performed by reading all records in a set and verifying their accuracy. Typos and spelling errors are rectified. )islabeled data if available is labeled and filed. Incomplete or missing entries are completed. 4nrecoverable records are purged, for not to take space and inefficient operations.
can
e do that?
%ata cleaning is the process of identifying erroneous data. The data is checked for accuracy, consistency, typos etc. )ethods("arsing - 4sed to detect synta' errors. Data Transformation - Confirms that the input data matches in format with e'pected data. Duplicate elimination - This process gets rid of duplicate entries. Statistical ,ethods- values of mean, standard deviation, range, or clustering algorithms etc are used to find erroneous data.