Sei sulla pagina 1di 54

1) What are the advantages data mining over tradition... A) Data Mining is used for the estimation of future.

For example, if we take a compan !"usiness organi#ation, " using the concept of Data Mining, we can predict the future of "usiness in terms of $evenue %or) &mplo ees %or) 'ustomers %or) (rders etc. )raditional approaches use simple algorithms for estimating the future. *ut, it does not give accurate results when compared to Data Mining. +) What is the difference "etween views and materiali#ed ,iews+A) ,iew . stores the /01 statement in the data"ase and let ou use it as a ta"le. &ver time ou access the view, the /01 statement executes. Materiali#ed view . stores the results of the /01 in ta"le form in the data"ase. /01 statement onl executes once and after that ever time ou run the 2uer , the stored result set is used. 3ros include 2uick 2uer results. +*) ,4&W5 )his is a 3/&6D( ta"le that is not stored in the data"ase and it is 7ust a 2uer . MA)&$4A148&D ,4&W/5 )hese are similar to a view "ut these are permantel stored in the data"ase and often refreshed. )his is used in optimi#ation for the faster data retrieval and is useful in aggregation and summari#ation of data. 9) What is the main difference "etween 4nmon and :im"all...9A) "asicall speaking, 4nmon professes the /nowflake /chema while :im"all relies on the /tar /chema 9*) "oth differed in the concept of "uilding the data warehouse... According to :im"all... :im"all views data warehousing as a constituenc of data marts. Data marts are focused on delivering "usiness o"7ectives for departments in the organi#ation. And the data warehouse is a conformed dimension of the data

marts. ;ence a unified view of the enterprise can "e o"tained from the dimension modeling on a local departmental level. 4nmon "eliefs in creating a data warehouse on a su"7ect." .su"7ect area "asis. ;ence the development of the data warehouse can start with data from the online store. (ther su"7ect areas can "e added to the data warehouse as their needs arise. 3oint.of.sale %3(/) data can "e added later if management decides it is necessar .

i.e., :im"all..First Data Marts..'om"ined wa ...Data warehouse 4nmon...First Data warehouse..1ater....Data marts

9') the main difference "!w the :im"all and inmon technologies is... :im"all ... creating data marts first then com"ining them up to form a data warehouse 4nmon....'reating data warehouse ... then data marts

9D) actuall , the main difference is :im"all5 fallows Dimensional Modeling 4nmon5 fallows &$ Modeling " e Ma ee 9&) $alf :im"all5 he follows "ottom.up approach i.e., first creates individual Data Marts from the existing sources and then create Data Warehouse. *ill4mmon5 he follows top.down approach i.e., first creates Data Warehouse from the existing sources and then create individual Data Marts.

<) What is 7unk dimension- What is the difference "etween 7unk dimension and degenerate dimension<A) a 7unk dimension is a collection of random transactional codes, flags and text attri"utes that are unrelated to an particular dimension. )he 7unk dimension is simpl a structure that provides the convenient place to store the 7unk dimension. <*) a =7unk= dimension is a collection of random transactional codes, flags and!or text attri"utes that are unrelated to an particular dimension. )he 7unk dimension is simpl a structure that provides a convenient place to store the 7unk attri"utes. Where as A degenerate dimension is data that is dimensional in nature "ut stored in a fact ta"le. <') 7unk dimension5 )he column which we are using rarel or not used, these columns are formed a dimension is called 7unk dimension Degenerative dimension )he column which we use in dimension is degenerative dimension &xample5 &M3 ta"le has empno, ename, sal, 7o", deptno *ut We are talking onl the column empno, ename from the &M3 ta"le and forming a dimension this is called degenerative dimension >) What is the definition of normali#ed and denormali#ation>A) ?ormali#ation is the process of removing redundancies. Denormali#ation is the process of allowing redundancies.

(1)3 uses the ?ormali#ation process and )he (1A3!DW uses the denormalised process to capture greater level of detailed data %each and ever transaction) @) Wh fact ta"le is in normal form@A) a fact ta"le consists of measurements of "usiness re2uirements and foreign ke s of dimensions ta"les as per "usiness rules. @*) "asicall the fact ta"le consists of the 4ndex ke s of the dimension!look up ta"les and the measures. /o when ever we have the ke s in a ta"le .that itself implies that the ta"le is in the normal form. A) What is Difference "etween &.$ Modeling and Dimensional modelingAA) *asic diff is &.$ modeling will have logical and ph sical model. Dimensional model will have onl ph sical model. &.$ modeling is used for normali#ing the (1)3 data"ase design. Dimensional modeling is used for de.normali#ing the $(1A3!M(1A3 design. A*) &.$ modeling revolves around the &ntities and their relationships to capture the overall process of the s stem. Dimensional model!Muti.Dimensinal Modeling revolves around Dimensions %point of anal sis) for decision making and not to capture the process.

A') 4n &$ modeling the data is in normali#ed form. /o more num"er of Boins, which ma adversel affect the s stem performnace.Whereas in

Dimensional Modeling the data, is denormalised, so less num"er of 7oins, " which s stem performance will improve. C) What is conformed factCA) conformed dimensions are the dimensions which can "e used across multiple Data Marts in com"ination with multiple facts ta"les accordingl C*) 'onformed facts are allowed to have the same name in separate ta"les and can "e com"ined and compared mathematicall . C') the relationship "etween the facts and dimensions are with 9?F, and can works in an t pe of 7oins are called as conformed schema, the mem"ers of that schema are called so... CD) 'onformed dimensions are those ta"les that have a fixed structure. )here will "e no need to change the metadata of these ta"les and the can go along with an num"er of facts in that application without an changes C&) A dimension ta"le which is used " more than one fact ta"le is known as a conformed dimension. D) What are the methodologies of Data WarehousingDA) ever compan has methodolog of their own. *ut to name a few /D1' Methodolog , A4M methodolog are stardadl used. (ther methodologies are AMM, World class methodolog and man more.

D*) Most of the time, we use Mr. $alph :im"all methodologies for data warehousing design. )wo kind of schema5 star and snow flake. D') most pro"a"l ever one fallows &ither star schema or snowflake schema DD) there r + methodologies 1) :im"all.first data marts then &DW; +) inmon.first &DW; then data marts from edwh

D') regarding the methodologies in the Data warehousing. )he are mainl + methods. 1. $alph :im"all Model +. 4nmon Model. :im"all model alwa s structured as Denormalised structure. 4nmon model structed as ?ormali#ed structure. Depends on the re2uirements of the compan an one can follow the compan Es DW; will choose the one of the a"ove models.

DD) in Data warehousing contains the )wo Methods 1FF )op Down Method +FF*ottom up method 4n )op Down method first loads the Datamarts and then loads the data ware house. 4n *ottom 6p method first loads the Data warehouse and then loads the Data marts. D&) )op Down approach is first Data warehouse then Data marts. *ottom Down approach is first Data marts then Data warehouse. DF) there are + methodologies 1. :im"all +. 4nmon likewise 1. /tar Flake +. /now Flake schemas

DG) )here are two approaches in Data ware housing named as

)op down Approach and *ottom.up Approach )op down approach in the sense preparing individual departments data %Data Marts) from the &nterprise Data warehouse *ottom up Approach is nothing "ut first gathering all the departmentsH data and then cleanse the data and )ransforms the data and then load all the individual departments data into the enterprise data ware house 1I) what is *6/ /chema1IA) *6/ /chema is composed of a master suite of confirmed dimension and standardi#ed definition if facts. 1I*) A *6/ /chema or a *6/ Matrix- A *6/ Matrix %in :im"all approach) is to identif common Dimensions across *usiness 3rocessesJ i.e.5 a wa of identif ing 'onforming Dimensions. 11) What is Data warehousing ;ierarch 11A) ;ierarchies are logical structures that use ordered levels as a means of organi#ing data. A hierarch can "e used to define data aggregation. For example, in a time dimension, a hierarch might aggregate data from the month level to the 2uarter level to the ear level. A hierarch can also "e used to define a navigational drill path and to esta"lish a famil structure. Within a hierarch , each level is logicall connected to the levels a"ove and "elow it. Data values at lower levels aggregate into the data values at higher levels. A dimension can "e composed of more than one hierarch . For example, in the product dimension, there might "e two hierarchies..one for product categories and one for product suppliers. Dimension hierarchies also group levels from general to granular. 0uer tools use hierarchies to ena"le ou to drill down into our data to view different levels of granularit . )his is one of the ke "enefits of a data warehouse. When designing hierarchies, ou must consider the relationships in "usiness structures. For example, a divisional multilevel sales organi#ation.

;ierarchies impose a famil structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. )hese familial relationships ena"le anal sts to access data 2uickl . 1evels A level represents a position in a hierarch . For example, a time dimension might have a hierarch that represents data at the month, 2uarter, and ear levels. 1evels range from general to specific, with the root level as the highest or most general level. )he levels in a dimension are organi#ed into one or more hierarchies. 1evel $elationships 1evel relationships specif top.to."ottom ordering of levels from most general %the root) to most specific information. )he define the parent.child relationship "etween the levels in a hierarch . ;ierarchies are also essential components in ena"ling more complex rewrites. For example, the data"ase can aggregate existing sales revenue on a 2uarterl "ase to a earl aggregation when the dimensional dependencies "etween 2uarter and ear are known. 1+) What are data validation strategies for data mart v...1+A) Data validation is to make sure that the loaded data is accurate and meets the "usiness re2uirements. /trategies are different methods followed to meet the validation re2uirements 19) What r the data t pes present in *o- ? what happens 4... 19A) three different data t pes5 Dimensions, Measure and Detail. ,iew is nothing "ut an alias and it can "e used to resolve the loops in the universe. 19*) in m knowledge, these are called as o"7ect t pes in the *usiness ("7ects.

And alias is different from view in the universe. ,iew is at data"ase level, "ut alias is a different name given for the same ta"le to resolve the loops in universe. 19') the different data t pes in "usiness o"7ects are5 1. 'haracter.+. Date.9. 1ong text.<. ?um"er 19D) dimension, measure, detail are o"7ects t pe. Data t pes are character, date and numeric 1<) What is surrogate ke - Where we use it explain W4... 1<A) /urrogate ke is the primar ke for the Dimensional ta"le. 1<*) surrogate ke is a su"stitution for the natural primar ke . 4t is 7ust a uni2ue identifier or num"er for each row that can "e used for the primar ke to the ta"le. )he onl re2uirement for a surrogate primar ke is that it is uni2ue for each row in the ta"le. Data warehouses t picall use a surrogate, %also known as artificial or identit ke ), ke for the dimension ta"les primar ke s. )he can use in se2uence generator, or (racle se2uence, or /01 /erver 4dentit values for the surrogate ke . 4t is useful "ecause the natural primar ke %i.e. 'ustomer ?um"er in 'ustomer ta"le) can change and this makes updates more difficult. /ome ta"les have columns such as A4$3($)K?AM& or '4)LK?AM& which are stated as the primar ke s %according to the "usiness users) "ut ,not onl can these change, indexing on a numerical value is pro"a"l "etter and ou could consider creating a surrogate ke called, sa , A4$3($)K4D. )his would "e internal to the s stem and as far as the client is concerned ou ma displa onl the A4$3($)K?AM&. +. Adapted from response " ,incent on )hursda , March 19, +II9

Another "enefit ou can get from surrogate ke s %/4D) is5 )racking the /'D . /lowl 'hanging Dimension. 1et me give ou a simple, classical example5 (n the 1st of Banuar +II+, &mplo ee E&1E "elongs to *usiness 6nit E*61E %thatEs what would "e in our &mplo ee Dimension). )his emplo ee has a turnover allocated to him on the *usiness 6nit E*61E "ut on the +nd of Bune the &mplo ee E&1E is muted from *usiness 6nit E*61E to *usiness 6nit E*6+.E All the new turnover has to "elong to the new *usiness 6nit E*6+E "ut the old one should *elong to the *usiness 6nit E*61.E 4f ou used the natural "usiness ke E&1E for our emplo ee within our data warehouse ever thing would "e allocated to *usiness 6nit E*6+E even what actuall "elongs to E*61.E 4f ou use surrogate ke s, ou could create on the +nd of Bune a new record for the &mplo ee E&1E in our &mplo ee Dimension with a new surrogate ke . )his wa , in our fact ta"le, ou have our old data %"efore +nd of Bune) with the /4D of the &mplo ee E&1E M E*61.E All new data %after +nd of Bune) would take the /4D of the emplo ee E&1E M E*6+.E Lou could consider /lowl 'hanging Dimension as an enlargement of our natural ke 5 natural ke of the &mplo ee was &mplo ee 'ode E&1E "ut for ou it "ecomes &mplo ee 'ode M *usiness 6nit . E&1E M E*61E or E&1E M E*6+.E *ut the difference with the natural ke enlargement process is that ou might not have all part of our new ke within our fact ta"le, so ou might not "e a"le to do the 7oin on the new enlarge ke .F so ou need another id.

1<') When creating a dimension ta"le in a data warehouse, we generall create the ta"les with a s stem generated ke to uni2uel identif a row in the dimension. )his ke is also known as a surrogate ke . )he surrogate ke is used as the primar ke in the dimension ta"le. )he surrogate ke will also "e placed in the fact ta"le and a foreign ke will "e defined "etween the

two ta"les. When ou ultimatel 7oin the data it will 7oin 7ust as an other 7oin within the data"ase. 1<D) A surrogate ke to a data warehouse is what a primar ke is for an (1)3 source. 4t is used to uni2uel identif a record in dimension ta"les. 4t provides the solution for the critical col., pro"lem. mar 1<&) surrogate ke is s stem generated artificial primar ke values e.g.5 an candidate ke can "e considered as surrogate ke .

1<F) /urrogate ke is a uni2ue identification ke , it is like an artificial or alternative ke to production ke , "ecause the production ke ma "e alphanumeric or composite ke "ut the surrogate ke is alwa s single numeric ke . Assume the production ke is an alphanumeric field. 4f u creates an index for this fields it will occup more space, so it is not advisa"le to 7oin!index, "ecause generall all the data warehousing fact ta"le are having historical data. )hese fact ta"les are linked with so man dimension ta"les. 4f itEs a numerical field the performance is high 1<G) surrogate ke in a data warehouse is more than 7ust a su"stitute for a natural ke . 4n a data warehouse, a surrogate ke is a necessar generali#ation of the natural production ke and is one of the "asic elements of data warehouse design 1<F) /urrogate ke is a s stem generated ke , 4t is mainl used for criticalum in dwh, ;ere criticalum means nothing "ut it is a column which when we updated in the them most dwh in to oltp s stems 1<G) surrogate ke s r that which 7oin dimension ta"les and fact ta"le 1<;) /urrogate :e is the solution for critical column pro"lems. For example the customer purchases different items in different locations, for this situation we have to maintain historical data.

* using surrogate ke we can introduce the row in the data warehouse to maintain historical data. 1>) What is a linked cu"e1>A) a cu"e can "e partitioned in 9 wa s.$eplicate, )ransparent and 1inked. 4n the linked cu"e the data cells can "e linked in to another anal tical data"ase. 4f an end.user clicks on a data cell, ou are actuall linking through another anal tic data"ase. 1>*) linked cu"e in which a su".set of the data can "e anal #ed into great detail. )he linking ensures that the data in the cu"es remain consistent.

1@) 3artitioning a cu"e1@A) 3artitioning a cu"e mainl used for optimi#ation.%ex) 6 ma have data for >g" to create a report u can specif a si#e for a cu"e as +g" so if the cu"e exceeds +g" it automaticall creates the second cu"e to store the data. 1A) What is meant " metadata in context of a Data ware house1AA) in context of a Data warehouse metadata is meant the information a"out the data .)his information is stored in the designer repositor . 1A*) Meta data is the data a"out dataJ *usiness Anal st or data modeler usuall capture information a"out data . the source %where and how the data is originated), nature of data %char, varchar, nulla"le, existence, valid values etc) and "ehavior of data %how it is modified ! derived and the life c cle ) in data dictionar a.k.a metadata. Metadata is also presented at the Data mart level, su"sets, fact and dimensions, (D/ etc. For a DW user, metadata provides vital information for anal sis ! D//.

1A') metadata is data a"out data, it including things name, location, and length including things.

We can u store data in metadata in data warehouse 1C) What is incremental loading- +. What i... 1CA) 4ncremental loading means loading the ongoing changes in the (1)3. Aggregate ta"le contains the NmeasureO values, aggregated !grouped!summed up to some level of hierarch .

1C*) 3lease learn to spell incremental and cross reference firstP (r at least use a spell checkP 1D) What are the possi"le data marts in $etail sales....1DA) product information, sales information +I) What is the main difference "etween schema in $D*M/ and schemas in Data Warehouse....+IA) $D*M/ /chema Q 6sed for (1)3 s stems Q )raditional and old schema Q ?ormali#ed Q Difficult to understand and navigate Q 'annot solve extract and complex pro"lems Q 3oorl modeled DW; /chema Q 6sed for (1A3 s stems Q ?ew generation schema Q De ?ormali#ed Q &as to understand and navigate Q &xtract and complex pro"lems can "e easil solved Q ,er good model +I*) /chema is nothing "ut the s stematic arrangement of ta"les

4n (1)3 it will "e normali#ed 4n Data warehouse it will "e denormali#ed +I') the difference depends on the context. )echnicall , if (racle is used, a schema is a =user=. 4n that context there is no difference "etween the schemas in (1)3 or $(1A3.Although, denormali#ed!normali#ed ta"les are given as examples a"ove, it is not the difference. +ID) $D*M/.normali#ed Data warehouse .Denormali#ed +I&) Diff ".w (1)3 and (1A35 ........................ (1)3 /chema5 Q ?ormali#ed Q More no. of )rans Q 1ess time for 2ueries execution Q More no. of users Q ;ave 4nsert, delete and update )rans. (1A3 %DW;) /chema5 Q De ?ormali#ed Q 1ess no. of )rans Q 1ess no. of users Q More time for 2uer exec Q Will not have more insert, delete and updates. +1) What are the various &)1 tools in the Market+1A) 4nformatica, Ascential Data stage, A"4nitio +1*) ,arious &)1 tools used in market are5 4nformatica Data /tage (racle Warehouse *uilder

A" 4nitio Data Bunction +1') 1. 4nformatica 3ower 'enter +. Ascential Data /tage 9. &// *ase ; perion <. A" 4ntio >. *( Data 4ntegrator @. /A/ &)1 A. M/ D)/ C. (racle (W* D. 3ervasive Data Bunction 1I. 'ognos Decision /tream +1D) &)1 )((1/ *L different vendors 4nformatica Ascential Data stage A"4nitio +1&) 4nformatica Data /tage M/./01 D)/ %4ntegrated /ervices +II>) A"initio /01 1oader /unopsis (racle Warehouse *uilder Data Bunction Data 4ntegrator %*usiness ("7ects) +1F) ;ave an come acress &)1 tool =sunopsis=..- 4f not please check this 6$1....4t is ama#ing... http5!!www.sunopsis.com ++) What is Dimensional Modeling++A) 4n Dimensional Modeling, Data is stored in two kinds of ta"les5 Fact )a"les and Dimension ta"les.

Fact )a"le contains fact data e.g. sales, revenue, profit etc..... Dimension ta"le contains dimensional data such as 3roduct 4d, product name, product description etc.....

++*) Dimensional Modeling is a design concept used " man data warehouse designers to "uild their data warehouse. 4n this design model all the data is stored in two t pes of ta"les . Facts ta"le and Dimension ta"le. Fact ta"le contains the facts!measurements of the "usiness and the dimension ta"le contains the context of measurements i.e., the dimensions on which the facts are calculated. +9) Wh is Data Modeling 4mportantData modeling is pro"a"l the most la"or intensive and time consuming part of the development process. Wh "other especiall if ou are pressed for time- A common response " practitioners who write on the su"7ect is that ou should no more "uild a data"ase without a model than ou should "uild a house without "lueprints. )he goal of the data model is to make sure that the all data o"7ects re2uired " the data"ase are completel and accuratel represented. *ecause the data model uses easil understood notations and natural language, it can "e reviewed and verified as correct " the end.users. )he data model is also detailed enough to "e used " the data"ase developers to use as a ="lueprint= for "uilding the ph sical data"ase. )he information contained in the data model will "e used to define the relational ta"les, primar and foreign ke s, stored procedures, and triggers. A poorl designed data"ase will re2uire more time in the long.term. Without careful planning ou ma create a data"ase that omits data re2uired to create critical reports, produces results that are incorrect or inconsistent, and is una"le to accommodate changes in the userEs re2uirements.

++') /teps 4n *uilding the Data Model

While &$ model lists and defines the constructs re2uired to "uild a data model, there is no standard process for doing so. /ome methodologies, such as 4D&F4R, specif a "ottom.up development process were the model is "uilt in stages. ) picall , the entities and relationships are modeled first, followed " ke attri"utes, and then the model is finished " adding non.ke attri"utes. (ther experts argue that in practice, using a phased approach is impractical "ecause it re2uires too man meetings with the end.users. )he se2uence used for this document is5 4dentification of data o"7ects and relationships Drafting the initial &$ diagram with entities and relationships $efining the &$ diagram Add ke attri"utes to the diagram Adding non.ke attri"utes Diagramming Generali#ation ;ierarchies ,alidating the model through normali#ation Adding "usiness and integrit rules to the Model ++D) Dimensional Modeling is a logical design techni2ue that seeks to present the data in a standard, intuitive framework that allows for high. performance access. 4t is inherentl dimensional, and it adheres to a discipline that uses the relational model with some important restrictions. &ver dimensional model is composed of one ta"le with a multipart ke , called the fact ta"le, and a set of smaller ta"les called dimension ta"les. &ach dimension ta"le has a single.part primar ke that corresponds exactl to one of the components of the multipart ke in the fact ta"le. Dimensional Modeling

++&) itHs a process or techni2ue of designing a data"ase model.

++F) a centrali#ed ta"le is called as fact ta"le which is connected to multiple dimension ta"le is called as dimensional modeling or star schema

++G) / stematic arrangement of Fact!Dimension ta"les are called as /chema, the designing a schema in data warehouse ! data mart is known as Dimension modeling ++;) Dimensional Modeling, 4t is a modeling technic used in (1A3 s stem, ;ere one fact ta"le and surrounded " different dimensions. +9) What is ,1D*+9A) ver large data"ase +9*) the perception of what constitutes a ,1D* continues to grow. A one tera" te data"ase would normall "e considered to "e a ,1D*. +9') Data "ase is too large to "ack up in a time frame )hen itEs a ,1D* +9D) ,1D* stands for ,er 1arge Data *ase, an data"ase too large %normall more than 1)*) considered as ,1D*. +9&) ,er 1arge Data"ase %,1D*) 4t is sometimes used to descri"e data"ases occup ing magnetic storage in the tera" te range and containing "illions of ta"le rows. ) picall , these are decision support s stems or transaction processing applications serving large num"ers of users. +<) What is real time data.warehousing+<A) $eal.time data warehousing is a com"ination of two things5 1) real. time activit and +) data warehousing. $eal.time activit is activit that is

happening right now. )he activit could "e an thing such as the sale of widgets. (nce the activit is complete, there is data a"out it. Data warehousing captures "usiness activit data. $eal.time data warehousing captures "usiness activit data as it occurs. As soon as the "usiness activit is complete and there is data a"out it, the completed activit data flows into the data warehouse and "ecomes availa"le instantl . 4n other words, real.time data warehousing is a framework for deriving information from data as the data "ecomes availa"le.

+<*) A real time data warehouse provide live data for D// %ma not "e 1IIS up to that moment, some latenc will "e there). Data warehouse have access to the (1)3 sources, data is loaded from the source to the target not dail or weekl , "ut ma "e ever 1I minutes through replication or log shipping or something like that. /A3 *W is providing real time DW, with the help of extended star schema, source data is shared. +<') in real.time data warehousing, our warehouse contains completel up.to.date data and is s nchroni#ed with the source s stems that provide the source data. 4n near.real.time data warehousing, there is a minimal dela "etween source data "eing generated and "eing availa"le in the data warehouse. )herefore, if ou want to achieve real.time or near.real.time updates to our data warehouse, ouHll need to do three things5 $educe or eliminate the time taken to get new and changed data out of our source s stems. &liminate, or reduce as much as possi"le, the time re2uired to cleanse, transform and load our data. $educe as much as possi"le the time re2uired to update our aggregates. /tarting with version Di, and continuing with the latest 1Ig release, (racle has graduall introduced features into the data"ase to support real.time and near.real.time, data warehousing. )hese features include5 'hange Data 'apture &xternal ta"les, ta"le functions, pipelining, and the M&$G& command, and Fast refresh materiali#ed views

+<D) $eal time Data warehousing means com"ination of heterogeneous data"ases and 2uer and anal sis purpose and Decision.making and reporting purpose. +>) What is a lookup ta"le+>A) when a ta"le is used to check for some data for its presence prior to loading of some other data or the same data to another ta"le, the ta"le is called a 1((:63 )a"le. +>*) when a value for the column in the target ta"le is looked up from another ta"le apart from the source ta"les, that ta"le is called the lookup ta"le. +>') when we want to get related value from some other ta"le "ased on particular value... suppose in one ta"le A we have two columns empKid, name and in other ta"le * we have empKid address in target ta"le we want to have empKid, name, address we will take source as ta"le A and look up ta"le as * " matching empKid we will get the result as three columns...empKid, name, address +>D) A lookup ta"le is nothing "ut a ElookupE it gives values to referenced ta"le %it is a reference), it is used at the run time, it saves 7oins and space in terms of transformations. &xample, a lookup ta"le called states, provide actual state name %E)exasE) in place of )R to the output. +>&) "ased on responsi"ilit how to protect!secure!hide even lookup values such as meaning+>F) when a ta"le is used to check for some data for its presence prior to loading of some other data or the same data to another ta"le, the ta"le is called a 1((:63 )a"le. +>G) reference ta"le can "e otherwise called as lookup ta"le

+>;) in DW )erminolog the Dimension ta"le is also called as 1ook up )a"le %/pecific 4* *usiness ("7ects) /ince the index ke in the fact ta"le is from %referencing) the particular Dimension ta"le so itHs also called as look up ta"le.

+>4) )he 1ook 6p ta"le provides the detailed information a"out the attri"utes. For example, the lookup ta"le for the 2uarter attri"ute would include a list of all the 2uarters availa"le in the data warehouse.i.e. First 2uarter of +II1 ma "e represented as =01 +II1= or =+II1 01=.*L&. +@) What is a general purpose scheduling tool+@A) General purpose of scheduling tool ma "e cleansing and loading data at specific given time +@*) the "asic purpose of the scheduling tool in a DW Application is to stream line the flow of data from /ource to )arget at specific time or "ased on some condition. +A) What t pe of 4ndexing mechanism do we need to use for a t pical data warehouse +AA) "itmap index +A*) Function 4ndex, *.tree 4ndex, 3artition 4ndex, ;ash index etc... +A') on the fact ta"le it is "est to use "itmap indexes. Dimension ta"les can use "itmap and!or the other t pes of clustered!non.clustered, uni2ue!non. uni2ue indexes. )o m knowledge, /01/erver does not support "itmap indexes. (nl (racle supports "itmaps.

+AD)4t generall depends upon the data which u have in ta"le if u have less distinct values in particular column its alwa s that u "uilt up "it map index... rather that other one on dimension ta"les generall we have indexes... +A&) that is "ased on re2uirement and si#e of our data mart!data warehouseJ Most of the data warehouse is in *itmap index +C) &xplain the advantages of $A4D 1, 1!I, and >. What t pe of $A4D setup would ou put our )R logs +CA) $aid I . Make several ph sical hard drives look like one hard drive. ?o redundanc "ut ver fast. Ma use for temporar spaces where loss of the files will not result in loss of committed data. $aid 1. Mirroring. &ach hard drive in the drive arra has a twin. &ach twin has an exact cop of the other twinHs data so if one hard drive failsJ the other is used to pull the data. $aid 1 is half the speed of $aid I and the read and writes performance is good. $aid 1!I . /triped $aid I, then mirrored $aid 1. /imilar to $aid 1. /ometimes faster than $aid 1. Depends on vendor implementation. $aid > . Great for read.onl s stems. Write performance is 1!9rd that of $aid 1 "ut $ead is same as $aid 1. $aid > is great for DW "ut not good for (1)3. ;ard drives are cheap now so 4 alwa s recommend $aid 1. +D) What is a Data Warehousing+DA) Datawarehosing is a process of creating, 2ueriring and populating data warehouse. 4t includes a num"er of discrete technologies like 4dentif ing sources 3rocess of &''D, &)1 which includes data cleansing, data transforming and data loading to targets.

+D*) A Data warehouse is a su"7ect oriented, integrated, time.variant, nonvolatile collection of data to ena"le decision making across disparate group of users. +D') a data warehouse is a repositor containing su"7ect.oriented, integrated, time.variant and non.volatile collection of data, used for companiesH decision support s stems re2uirement +DD) Data warehousing is a su"7ect oriented, authoritative, integrated historical data"ase reflective of changes over meaningful time periods in order to facilitate 2uer and anal sis for useful management decision making.

+D&) Data warehousing is a su"7ect oriented, authoritative, integrated historical data"ase reflective of changes over meaningful time periods in order to facilitate 2uer and anal sis for useful management decision making.

+DF) Data warehouse contains a collection of historic %histor of data), integrated, non.volatile data, which is used for anal #ing and developing forecasting reports. 9I) What does level of Granularit of a fact ta"le signif 9IA) it descri"es the amount of space re2uired for a data"ase. 9I*) 1evel of Granularit indicates the extent of aggregation that will "e permitted to take place on the fact data. More Granularities implies more aggregation potential and vice.versa.

9I') in simple terms, level of granularit defines the extent of detail. As an example, let us look at geographical level of granularit . We ma anal #e data at the levels of '(6?)$L, $&G4(?, )&$$4)($L, '4)L and /)$&&). 4n this case, we sa the highest level of granularit is /)$&&). 9ID) level of granularit means the upper!lower level of hierarch , up to which we can see!drill the data in the fact ta"le.

9I&) Granularit means nothing "ut it is a level of representation of measures and metrics. )he lowest level is called detailed data And highest level is called summar data 4t depends of pro7ect we extract fact ta"le significance 91) What is data mining91A) Data mining is a process of extracting hidden trends within a data warehouse. For example an insurance data ware house can "e used to mine data for the most high risk people to insure in a certain geographical area. 91*) in its simple definition ou can sa data mining is a wa to discover new meaning in data. 91') Data mining is a concept of deriving!discovering the hidden, unexpected information from the existing data 91D) Data Mining is a non.trivial process of identified valid, potantiall useful and ultimatel understands of data

91&) A data warehouse t picall supplies answer to a 2uestion like Ewho is "u ing our products!=. A data mining approach would seek answer to 2uestions like =Who is ?() "u ing our products-T 9+) What is degenerate dimension ta"le9+A) the values of dimension which is stored in fact ta"le is called degenerate dimensions. )hese dimensions donHt have its own dimensions. 9+*) A attri"ute in fact ta"le itHs not a fact and itHs not a ke value 9+') in simple terms, the column in a fact ta"le that does not map to an dimensions, neither it s a measure column. For e.g. 4nvoiceKno, 4nvoiceKlineKno in fact ta"le will "e a degenerate dimension %columns), provided if ou donHt have a dimension called invoice.

9+D) Degenerate Dimensions5 4f a ta"le contains the values, which r neither dimension nor measures is called degenerate dimensions.&x5 invoice id, empno 99) ;ow do ou load the time dimension 99A)4n Data ware house we manuall load the time dimension

99*) &ver Data warehouse maintains a time dimension. 4t would "e at the most granular level at which the "usiness runs at %ex5 week da , da of the month and so on). Depending on the data loads, these time dimensions are updated. Weekl process gets updated ever week and monthl process, ever month. 99') )ime dimension in DW; must "e load manuall . We load data into )ime dimension using pl!s2l scripts.

99D) Generall we load the )ime dimension " using /ource /tage as a /e2 File and we use one passive stage in that transformer stage we will manuall write functions as Month and Lear Functions to load the time dimensions "ut for the lower level i.e., Da also we have one function to implement loading of )ime Dimension.

9<) What is &$ D4AG$AM9<A) &$ . /tands for entit relationship diagrams. 4t is the first step in the design of data model which will later lead to a ph sical data"ase design of possi"le an (1)3 or (1A3 data"ase 9<*) )he &ntit .$elationship %&$) model was originall proposed " 3eter in 1DA@ N'henA@O as a wa to unif the network and relational data"ase views. /impl stated the &$ model is a conceptual data model that views the real world as entities and relationships. A "asic component of the model is the &ntit .$elationship diagram which is used to visuall represent data o"7ects. /ince 'hen wrote his paper the model has "een extended and toda it is commonl used for data"ase design For the data"ase designer, the utilit of the &$ model is5 4t maps well to the relational model. )he constructs used in the &$ model can easil "e transformed into relational ta"les. it is simple and eas to understand with a minimum of training. )herefore, the model can "e used " the data"ase designer to communicate the design to the end user. 4n addition, the model can "e used as a design plan " the data"ase developer to implement a data model in a specific data"ase management software.

9<') &$ diagram is a entit relationship diagram that provides the entities along with attri"utes. 9<D) &.$ Diagram %&ntit $elationship diagram) means how the different data"ase ta"le related to each other and what r the primar ke and foreign ke and their relation. 4t is the first step of an data"ase pro7ect to "uild &.$ Diagram 9<&) 3h sical and logical arrangement of the data"ase ta"le and relationship is explained " a diagram, that diagram is known as &$ diagram 9<F) &$ diagram means it is a suita"le modeling technic in (1)3 s stems, ;ere contain one.one, ment .ment relationship 9>) Difference "etween /now flake and /tar /chema. What are situations where /now flake /chema is "etter9>A) star schema and snowflake "oth serve the purpose of dimensional modeling when it comes to data warehouses. /tar schema is a dimensional model with a fact ta"le %large) and a set of dimension ta"les %small). )he whole set.up is totall denormali#ed. ;owever in cases where the dimension ta"le are split to man ta"le that is where the schema is slightl inclined towards normali#ation % reduce redundanc and dependenc ) there comes the snow flake schema. )he nature!purpose of the data that is to "e feed to the model is the ke to our 2uestion as to which is "etter. 9>*) /tar schema contains the dimension ta"les mapped around one or more fact ta"les. 4t is a denormalised model. ?o need to use complicated 7oins. 0ueries results fastl .

/nowflake schema 4t is the normali#ed form of /tar schema. 'ontains in.depth 7oins, "ecause the ta"les r splitted in to man pieces. We can easil do modification directl in the ta"les. We have to use complicated 7oins, since we have more ta"les. )here will "e some dela in processing the 0uer .

9>') /tar /chema means A centrali#ed fact ta"le and surrounded " different dimensions /nowflake means 4n the same star schema dimensions split into another dimensions /tar /chema contains ;ighl Denormali#ed Data /now flake contains partiall normali#ed /tar can not have parent ta"le *ut snow flake contain parent ta"les Wh need to go there /tar5 ;ere 1) less 7oiners contain +) /impl data"ase 9) /upport drilling up options Wh need to go /nowflake schema5

;ere some times we used to provide seperate dimensions from existing dimensions that time we will go to snowflake Dis Advantage (f snowflake5 0uer performance is ver low "ecause more 7oiners is there 9>D) star schema and snowflake "oth serve the purpose of dimensional modeling when it comes to data warehouses. /tar schema is a dimensional model with a fact ta"le %large) and a set of dimension ta"les %small). )he whole set.up is totall denormali#ed. ;owever in cases where the dimension ta"le are split to man ta"le that is where the schema is slightl inclined towards normali#ation % reduce redundanc and dependenc ) there comes the snow flake schema. )he nature!purpose of the data that is to "e feed to the model is the ke to our 2uestion as to which is "etter 9@) What is a '6*& in data warehousing concept9@A) 'u"es are logical representation of multidimensional data. )he edge of the cu"e contains dimension mem"ers and the "od of the cu"e contains data values. 9@*) 'u"e is a logical schema which contains facts and dimensions 9@') cu"es r multi.dimensional view of DW or data marts. it is designed in a logical wa to drill, slice.n.dice. &ver part of the cu"e is a logical representation of the com"ination of facts.dimension attri"s. 9A) What is (D/ 9AA) (D/ stands for (nline Data /torage. 4t is used to maintain, store the current and up to date information and the transactions regarding the source data"ases taken from the (1)3 s stem.

4t is directl connected to the source data"ase s stems instead of to the staging area. 4t is further connected to data warehouse and moreover can "e treated as a part of the data warehouse data"ase. &dit " Admin5 (D/ /tands for (perational Data /tore not (nline Data /torage

9A*) (D/ stands for (perational Data /tore. 4t is the final integration point in the &)1 process "efore loading the data into the Data Warehouse.

9A') (D/ stands for (perational Data /tore. 4t contains near real time data. 4n t pical data warehouse architecture, sometimes (D/ is used for anal tical reporting as well as source for Data Warehouse. 9AD) (perational Data /ervices is ; "rid structure that has some aspects of a data warehouse and other Aspects of an (perational s stem. 'ontains integrated data. 4t can support D// processing. 4t can also support ;igh transaction processing. 3laced in "etween Warehouse and We" to support we" users.

9A&) the form that data warehouse takes in the operational environment. (perational data stores can "e updated, do provide rapid constant time, and contain onl limited amount of historical data

9AF) An (perational Data /tore presents a consistent picture of the current data stored and managed " transaction processing s stem. As data is modified in the source s stem, a cop of the changed data is moved into the (D/. &xisting data in the (D/ is updated to reflect the current status of the source s stem 9AG) (D/ means (perational Data /tore 4t is used to store current data through transactional we"pplications, sap, and M0 series 'urrent data means particular data from one date into one date (D/ contains 9I.DI data

9A ;) an (perational Data /tore is a collection of data in support of an organi#ations need for up to operational, integrated, collective information. (D/ is purel operational construct to address the operational needs of a corporation. While loading data from /taging to (D/ we do the process of data scru""ing, data validation. 9C) What are conformed dimensions9CA) the are dimension ta"les in a star schema data mart that adhere to a common structure, and therefore allow 2ueries to "e executed across star schemas. For example, the 'alendar dimension is commonl needed in most data marts. * making this 'alendar dimension adhere to a single structure, regardless of what data mart it is used in our organi#ation, ou can 2uer " date!time from one data mart to another to another. 9C*) 'onformed dimensions are dimensions which are common to the cu"es. %'u"es are the schemas contains facts and dimension ta"les) 'onsider 'u"e.1 contains F1, D1, D+, D9 and 'u"e.+ contains F+,D1, D+, D< are the Facts and Dimensions

;ere D1, D+ are the 'onformed Dimensions

9C') if a ta"le is used as a dimension ta"le for more than one fact ta"les. )hen the dimension ta"le is called conformed dimensions. 9CD) confirmed dimensions are the dimensions which can "e used in multiple star schemas correct me if i am wrong. 9C&) 'onformed Dimensions are the one if the share one or more attri"utes whose values are drawn from the same domains.

9CF) the dimensions which is used more than one fact ta"le is called conformed dimensions 9CG) A conformed dimension is a single, coherent view of the same piece of data throughout the organi#ation. )he same dimension is used in all su"se2uent star schemas defined. )his ena"les reporting across the complete data warehouse in a simple format 9C;) 'onformed Dimensions are the Dimensions which are common to two cu"es .sa '6*&.1 contains F1,D1,D+,D9 and '6*&.+ contains F+,D1,D+,D< are the Facts and Dimensions ,here D1,D+ are the 'onformed Dimensions

9C4) if the dimension is 1IIS shara"le across the star schema then this dimension is called as confirmed dimension. $&5 Which columns go to the fact ta"le and which columns go the dimension ta"le 9D) What are /'D1, /'D+, and /'D9-

9DA) /'D 15 'omplete overwrite /'D +5 3reserve all histor . Add row /'D 95 3reserve some histor . Add additional column for ol!new. 9D*) /'D ) pe 1, the attri"ute value is overwritten with the new value, o"literating the historical attri"ute values. For example, when the product roll.up 'hanges for a given product, the roll.up attri"ute are merel updated with the current value. /'D ) pe +, a new record with the new attri"utes is added to the dimension ta"le. ;istorical fact ta"le rows continue to reference the old dimension ke with the old roll.up attri"uteJ going forward, the fact ta"le rows will reference the new surrogate ke with the new roll.up there" perfectl partitioning histor . /'D) pe 9, attri"utes are added to the dimension ta"le to support two simultaneous roll.ups . perhaps the current product roll.up as well as Ucurrent version minus oneT, or current version and original.

9D') /'D5 ........ )he value of dimensions is used change ver rarel that is called /lowl 'hanging dimensions ;ere mainl 9 1) /'D15 $eplace the old values overwrite " new values +) /'D+5 Bust 'reating Additional records 9) /'D95 4tEs maintain 7ust previous and recent 4n the /'D+ again 9 1) ,ersioning +) Flag value 9) &ffective Date range

,ersioning5 ;ere the updated dimensions inserted in to the target along with version num"er )he new dimensions will "e inserted into the target along with 3rimar ke Flagvalue5 )he updated dimensions insert into the target along with I And new dimensions inset into the target along with 1 <I) What is ?ormali#ation, First ?ormal Form, /econd ?ormal Form, and )hird ?ormal Form<IA) ?ormali#ation5 )he process of decomposing ta"les to eliminate data redundanc is called ?ormali#ation. 1?.F5. )he ta"le should contain scalar or atomic values. + ?.F5. )a"le should "e in 1?.F M ?o partial functional dependencies 9 ?.F5.)a"le should "e in + ?.F M ?o transitive dependencies <I*) +?F . ta"le should "e in 1?F M non.ke should not dependent on su"set of the ke %Vpart, supplierW, sup address) 9?F . ta"le should "e in +?F M non ke should not dependent on another non.ke %VpartW, warehouse name, warehouse addr) V3rimar ke W More... <, > ?F . for multi.valued dependencies %essentiall to descri"e man .to. man relations) <I') ?ormali#ation can "e defined as segregating of ta"le into two different ta"les, so as to avoid duplication of values. )he normali#ation is a step " step process of removing redundancies and dependencies of attri"utes in data structure )he condition of data at completion of each step is descri"ed as a Unormal formT. ?eeds for normali#ation5 improves data "ase design.

&nsures minimum redundanc of data. $educes need to reorgani#e data when design is modified or enhanced. $emoves anomalies for data"ase activities. First normal form5 X A ta"le is in first normal form when it contains no repeating groups. X )he repeating column or fields in a 6? normali#ed ta"le are removed from the ta"le and put in to ta"les of their own. X /uch a ta"le "ecomes dependent on the parent ta"le from which it is derived. X )he ke to this ta"le is called concatenated ke , with the ke of the parent ta"le forming a part it. /econd normal form5 X A ta"le is in second normal form if all its nonKke fields full dependent on the whole ke . X )his means that each field in a ta"le must depend on the entire ke . X )hose that do not depend upon the com"ination ke , are moved to another ta"le on whose ke the depend on. X /tructures which do not contain com"ination ke s are automaticall in second normal form. )hird normal form5 X A ta"le is said to "e in third normal form, if all the non ke fields of the ta"le are independent of all other non ke fields of the same ta"le. <ID) ?ormali#ation is a process of remove the redundanc and inconsistenc Mainl 9 normal forms 1normal form5 ;ere contain onl atomic values +normal form5 )he nonke values must "e depend upon the primar ke 9normal form5 not depended on transitivit

<I&) ?ormali#ation5 4t is the process of efficientl organi#ing data in a data"ase.)here is +.goals of the normali#ation process5 1. &liminate redundant data +. &nsure data dependencies make sense %onl storing related data in a ta"le)First ?ormal Form5 4t sets the ver "asic rules for an organi#ed data"ase. 1. &liminate duplicate columns from the same ta"le +. 'reate separate ta"les for each group of related data and identif each row with a uni2ue column or set of columns. /econd ?ormal Form5 Further addresses the concept of removing duplicative data. 1. $emove su"sets of data that appl to multiple rows of a ta"le and place them in separate ta"les. +. 'reate relationships "etween these new ta"les and their predecessors through the use of foreign ke s. )hird ?ormal Form5 1.$emove columns that are not dependent upon the primar ke . Fourth ?ormal Form5 1.A relation is in <?F if it has no multi valued dependencies. )hese normali#ation guidelines are cumulative. For a data"ase to "e in +?F, it must first fulfill all the criteria of a 1?F data"ase. <1) What is &)1<1A) &)1 is extraction, trasformation and loading, &)1 technolog is used for extraction the information from the source data"ase and loading it to the target data"ase with the necessar transformations done in "etween. <1*) &)1 is a short for &xtract, )ransform and 1oad. 4t is a data integration function that involves extracting the data from outside sources, transforming it into "usiness needs and ultimatel loading it into a data warehouse <1') &)1 is an a""reviation for =&xtract, )ransform and 1oadT. )his is the process of extracting data from their operational data sources or external data sources, transforming the data which includes cleansing, aggregation, summari#ation, integration, as well as "asic transformation and loading the data into some form of the data warehouse. <1 D) &xtraction )ransformations 1oading <1&) &5 &xtraction of data from the homogeneous!heterogeneous sources.

)5 )ransforming!modif ing the source data " appl ing some transformations like Filter, &xpression, $outer, Boiner, 6nion %or) 1ookup. 15 1oading the )ransformed data into corresponding )arget ta"les. <+) What are non.additive facts<+A) ?on.additive facts are facts that cannot "e summed up for an of )he dimensions present in the fact ta"le. &xample5 temparature, "ill num"er...etc <+*) fact ta"le t picall has two t pes of columns5 those that contain numeric facts %often called measurements), and those that are foreign ke s to dimension ta"les. A fact ta"le contains either detail.level facts or facts that have "een aggregated. Fact ta"les that contain aggregated facts are often called summar ta"les. A fact ta"le usuall contains facts with the same level of aggregation. )hough most facts are additive, the can also "e semi.additive or non. additive. Additive facts can "e aggregated " simple arithmetical addition. A common example of this is sales. ?on.additive facts cannot "e added at all. An example of this is averages. /emi.additive facts can "e aggregated along some of the dimensions and not along others. An example of this is inventor levels, where ou cannot tell what a level means simpl " looking at it. <+') 4f the columns of a fact ta"le is not a"le in the position to aggregate then it is called non.additive facts. <+D) ?on.Additive5 ?on.additive facts are facts that cannot "e summed up for an of the dimensions present in the fact ta"le. <9) ;ow are the Dimension ta"les designed-

<9A) most dimension ta"les are designed using ?ormali#ation principles up to +?F. 4n some instances the are further normali#ed to 9?F. <9*) Find where data for this dimension are located. Figure out how to extract this data. Determine how to maintain changes to this dimension %see more on this in the next section). 'hange fact ta"le and DW population routines. <<) Wh should ou put our data warehouse on a different s stem than our (1)3 s stem<<A)(1)3 s stem stands for on.line transaction processing. )hese are used to store onl dail transactions as the changes have to "e made in as few places as possi"le. (1)3 do not have historical data of the organi#ation Data warehouse will contain the historical information a"out the organi#ation <<*) Data Warehouse is a part of (1A3 %(n.1ine Anal tical 3rocessing). 4t is the source from which an *4 tools fetch data for Anal tical, reporting or data mining purposes. 4t generall contains the data through the whole life c cle of the compan !product. DW; contains historical, integrated, denormali#ed, su"7ect oriented data. ;owever, on the other hand the (1)3 s stem contains data that is generall limited to last couple of months or a ear at most. )he nature of data in (1)3 is5 current, volatile and highl normali#ed. /ince, "oth s stems are different in nature and functionalit we should alwa s keep them in different s stems. <<') An DW is t picall used most often for intensive 2uer ing . /ince the primar responsi"ilit of an (1)3 s stem is to faithfull record on going

transactions %inserts!updates!deletes), these operations will "e considera"l slowed down " the heav 2uer ing that the DW is su"7ected to. <>) What is Fact )a"le<>A) A ta"le in a data warehouse whose entries descri"e data in a fact ta"le. Dimension ta"les contain the data from which dimensions are created. <>*) a fact ta"le in data ware house is it descri"es the transaction data. 4t contains characteristics and ke figures. <>') A Fact ta"le is a collection of facts and foreign ke relations to the dimensions. <>D) Fact )a"le contains the measurements or metrics or facts of "usiness process. 4f our "usiness process is =/alesT, then a measurement of this "usiness process such as =monthl sales num"er= is captured in the Fact ta"le. Fact ta"le also contains the foreign ke s for the dimension ta"les. <>&) Fact ta"le contains the transactions data ,which have more columns and less no of rows. Among the data it also includes the foreign ke of the dimension ta"les which r attached to it.

<>F) Fact )a"le contains the ke s%primar ke ,foreign ke ) of the related dimension ta"les and measures which is "ased on the ke s.

<>G) Fact ta"le will have numeric columns. (r ,alues of the columns in Dimension ta"le

<>;) fact ta"le which represents the information of measurements and as well as the foreign ke of dimension ta"le.... if am wrong please inform me <@) What are /emi.additive and factless facts and in which scenario will ou use such kinds of fact ta"le<@A) /emi.Additive5 /emi.additive facts are facts that can "e summed up for some of the dimensions in the fact ta"le, "ut not the others. For example5 'urrentK*alance and 3rofitKMargin are the facts. 'urrentK*alance is a semi.additive fact, as it makes sense to add them up for all accounts %whatEs the total current "alance for all accounts in the "ank-), "ut it does not make sense to add them up through time %adding up all current "alances for a given account for each da of the month does not give us an useful information <@*) a factless fact ta"le captures the man .to.man relationships "etween Dimensions, "ut contains no numeric or textual facts. )he are often used to record events or 'overage information. 'ommon examples of factless fact ta"les include5 . 4dentif ing product promotion events %to determine promoted products that didnHt sell) . )racking student attendance or registration events . )racking insurance.related accident events . 4dentif ing "uilding, facilit , and e2uipment schedules for a hospital or universit <A) What is a level of Granularit of a fact ta"le<AA) 1evel of granularit means level of detail that ou put into the fact ta"le in a data warehouse. For example5 *ased on design ou can decide to put the sales data in each transaction. ?ow, level of granularit would mean what detail ou are willing to put for each transactional fact. 3roduct sales with respect to each minute or ou want to aggregate it up to minute and put that data. <A*) 4t also means that we can have %for example) data aggregated for a ear for a given product as well as the data can "e drilled down to Monthl ,

weekl and dail "asis...the lowest level is known as the grain. Going down to details is Granularit )he Aggregation or calculated value columns will go to Fact )a"le and details information will go to dimensional ta"le. <C) Which columns go to the fact ta"le and which columns go the dimension ta"le<CA) )o add on, Foreign ke elements along with *usiness Measures, such as /ales in Y amt, Date ma "e a "usiness measure in some case, units %2t sold) ma "e a "usiness measure, are stored in the fact ta"le. 4t also depends on the granularit at which the data is stored <C*) *efore "roken into columns is going to the fact After "roken going to dimensions

<D) What are the Different methods of loading Dimension )a"les<DA) the are of two t pes insert..F if it is not there in the dimension and update..F if it exists. <D*) 'onventional 1oad5 *efore loading the data, all the )a"le constraints will "e checked against the data. Direct load 5% Faster 1oading) All the 'onstraints will "e disa"led. Data will "e loaded directl . 1ater the data will "e checked against the ta"le constraints and the "ad data wonEt "e indexed.

<D') 'onventional and Direct load method are applica"le for onl oracle. )he naming convention is not general one applica"le to other $D*M/ like D*+ or /01 server.. >I) What are Aggregate )a"les>IA) Aggregate ta"les contain redundant data that is summari#ed from other data in the warehouse. >I*) these are the ta"les which contain aggregated ! summari#ed data. &.g. Learl , monthl sales information. )hese ta"les will "e used to reduce the 2uer execution time. >I') Aggregate ta"le contains the summar of existing warehouse data which is grouped to certain levels of dimensions. $etrieving the re2uired data from the actual ta"le, which have millions of records will take more time and also affects the server performance. )o avoid this we can aggregate the ta"le to certain re2uired level and can use it. )his ta"le reduces the load in the data"ase server and increases the performance of the 2uer and can retrieve the result ver fastl . >1) What is a dimension ta"le>1A) a dimension ta"le in data warehouse is one which contains primar ke and attri"utes. we called primar ke as D4M4DEs%dimension idEs). >1*) a dimensional ta"le is a collection of hierarchies and categories along which the user can drill down and drill up. it contains onl the textual attri"utes. >1') Dimension ta"les r nothing "ut a master ta"les ,thru which u can extract the actual transactions .Dimension ta"le contains less columns and more rows.

>1D) Dimensional ta"le is a ta"le which contains "usiness dimensions thru which v anal #e the "usiness matrices >+) What are the various $eporting tools in the Market>+A) 'ognos *usiness ("7ects Micro /trategies Actuate >+*) 1. M/.&xcel +. *usiness ("7ects %'r stal $eports) 9. 'ognos %4mpromptu, 3ower 3la ) <. Micro strateg >. M/ reporting services @. 4nformatica 3ower Anal #er A. Actuate C. ; perion %*$4() D. (racle &xpress (1A3 1I. 3roclarit >+') 4?&A M/.&xcel *usiness ("7ects %'r stal $eports) 'ognos %4mpromptu, 3ower 3la ) Micro strateg M/ reporting services 4nformatica 3ower Anal #er Actuate ; perion %*$4() (racle &xpress (1A3 3roclarit /A/ >+D) $eporting tools are entirel different from (1A3 tools

(1A3 tool are 1. 'ognos +. *usiness ("7ects 9. /A/ <. Microsoft /ource anal #er >. M/)$ @. ; perion %*$4()

>9) What is the Difference "etween (1)3 and (1A3>9A) (1)3 'urrent data /hort data"ase transactions (nline update!insert!delete ?ormali#ation is promoted ;igh volume transactions )ransaction recover is necessar (1A3 'urrent and historical data 1ong data"ase transactions *atch update!insert!delete Denormali#ation is promoted 1ow volume transactions )ransaction recover is not necessar

>9*) (1)3 is nothing "ut (nline )ransaction 3rocessing, which contains a normali#ed ta"les and online data, which have fre2uent insert!updates!delete.

*ut (1A3 %(nline Anal tical 3rogramming) contains the histor of (1)3 data, which is, non.volatile ,acts as a Decisions /upport / stem and is used for creating forecasting reports. >9') (1)35 F&W (1A35 MA?L B(4?/ (1)35 MA?L (1A35 F&W ><) What is a /tar /chema><A) A relational data"ase schema organi#ed around a central ta"le %fact ta"le) 7oined to a few smaller ta"les %dimension ta"les) using foreign ke references. )he fact ta"le contains raw numeric items that represent relevant "usiness facts %price, discount values, num"er of units sold, dollar value, etc.) ><*) /tar schema is a t pe of organi#ing the ta"les such that we can retrieve the result from the data"ase easil and fastl in the warehouse environment. 6suall a star schema consists of one or more dimension ta"les around a fact ta"le which looks like a star, so that it got its name. ><') itHs a t pe of organi#ing the entities in a wa , such that u can retrieve the result from the data"ase easil and ver fastl .6suall a star schema will have one or more dimension ta"les linking around a fact ta"le and looks like a star. ;ence got this name. ><D) /ingle fact ta"le with Z?H num"er of dimension ta"les >>) Wh are (1)3 data"ase designs not generall a good idea for a Data Warehouse-

>>A) (1)3 cannot store historical information a"out the organi#ation. 4t is used for storing the details of dail transactions while a data warehouse is a huge storage of historical information o"tained from different datamarts for making intelligent decisions a"out the organi#ation. >@) Differences "etween star and snowflake schemas>@A) the star schema is created when all the dimension ta"les directl link to the fact ta"le. /ince the graphical representation resem"les a star it is called a star schema. 4t must "e noted that the foreign ke s in the fact ta"le link to the primar ke of the dimension ta"le. )his sample provides the star schema for a salesK fact for the ear 1DDC. )he dimensions created are /tore, 'ustomer, 3roductK class and timeK" Kda . )he 3roduct ta"le links to the productKclass ta"le through the primar ke and indirectl to the fact ta"le. )he fact ta"le contains foreign ke s that link to the dimension ta"les. >@*) the snowflake schema is a schema in which the fact ta"le is indirectl linked to a num"er of dimension ta"les. )he dimension ta"les are normali#ed to remove redundant data and partitioned into a num"er of dimension ta"les for ease of maintenance. An example of the snowflake schema is the splitting of the 3roduct dimension into the productKcategor dimension and productKmanufacturer dimension.. Read more on this here.... http://www.exforsys.com/content/view/1301/332/This tutorial covers Desi nin the Dimensional !odel" Dimensional !odel schemas li#e $tar $chema" $nowfla#e $chema" %ptimi&in star schema and Desi n of the Relational Data'ase" %()* +u'es and Data minin tools" $ecurity considerations" metadata and 'ac#up and recovery plans >@') star schema uses denormali#ed dimension ta"les, "ut in case of snowflake schema it uses normali#ed dimensions to avoid redundanc ... >@D) /tar schema

A single fact ta"le with ? num"er of Dimension /nowflake schema An dimensions with extended dimensions are know as snowflake schema

>@&) /tar /chema snowflake schema ........... ................ /tar schema is normali#ed Denormalised. &as to use and understand &nd users will get confused. Want little efforts for maintenance &as to maintain Fast execution of 2ueries more time for exec "cas of more 7oins

>@F) star schema uses denormali#ed dimension ta"les,"ut in case of snowflake schema it uses normali#ed dimensions to avoid redundanc ... >A) What /now Flake /chema >AA) /nowflake schemas normali#e dimensions to eliminate redundanc . )hat is, the dimension data has "een grouped into multiple ta"les instead of one large ta"le. For example, a product dimension ta"le in a star schema might "e normali#ed into a products ta"le, a productKcategor ta"le, and a productKmanufacturer ta"le in a snowflake schema. While this saves space, it increases the num"er of dimension ta"les and re2uires more foreign ke 7oins. )he result is more complex 2ueries and reduced 2uer performance >A*) a normali#ed form of star schema is called snow flake schema. >A') the snowflake schema is an extension of the star schema, where each point of the star explodes into more points. )he main advantage of the snowflake schema is the improvement in 2uer performance due to minimi#ed disk storage re2uirements and 7oining smaller lookup ta"les. )he

main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase num"er of lookup ta"les.

>AD) some people are considering as ?ormali#ed star schema, "ut it is partiall normali#ed star schema. * partiall normali#ing it we ma save some disk space.

>A&) /tar schema A single fact ta"le with ? num"er of Dimension /nowflake schema An dimensions with extended dimensions are know as snowflake schema Multiple /tar %galax ) 4f the schema has more than one fact ta"le then the schema is said to "e Multiple star >C) What are modeling tools availa"le in the Market>CA) Modeling )ool ,endor [[[[[[[[[[[ [[[[[[[[[[ &rwin 'omputer Associates &$!/tudio &m"arcadero 3ower Designer / "ase (racle Designer (racle >C*) these tools are used for Data!dimension modeling (racle Designer &rwin %&ntit $elationship for windows) 4nformatica %'u"es!Dimensions) &m"arcadero

3ower Designer / "ase >D) What are slowl changing dimensions>DA) Dimensions that change over time are called /lowl 'hanging Dimensions. For instance, a product price changes over timeJ 3eople change their names for some reasonJ 'ountr and /tate names ma change over time. )hese are a few examples of /lowl 'hanging Dimensions since some changes are happening to them over a period of time >D*) if the data in the dimension ta"le happen to change ver rarel , then it is called as slowl changing dimension. @I) What are Data Marts@IA) Data mart is small su"set of the data warehouse. 4t contains "usiness division and department level. @I*) a data mart is a focused su"set of a data warehouse that deals with a single area %like different department) of data and is organi#ed for 2uick anal sis @I') Data Marts5 A su"set of data warehouse data used for a specific "usiness function whose format ma "e a star schema, h percu"e or statistical sample @ID) Data mart is the su" set of data ware housing and it is anal sis the data one particular department and particular point of view. @I&) Data Mart5 a data mart is a small data warehouse. 4n general, a data warehouse is divided into small units according the "usiness re2uirements. For example, if we take a Data Warehouse of an organi#ation, then it ma "e divided into the following individual Data Marts. Data Marts are used to improve the performance during the retrieval of data. e.g.5 Data Mart of /ales, Data Mart of Finance, Data Mart of Marketing, Data Mart of ;$ etc.

@1) What is the difference "etween Data warehousing and *usiness 4ntelligence@1A)Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data ac2uisition, data cleansing, data transformation, storage management, data distri"ution, data archiving, operational reporting, anal tical reporting, securit management, "ackup!recover planning, etc. *usiness intelligence, on the other hand, is a set of software tools that ena"le an organi#ation to anal #e measura"le aspects of their "usiness such as sales performance, profita"ilit , operational efficienc , effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. ) picall , the term U"usiness intelligenceT is used to encompass (1A3, data visuali#ation, data mining and 2uer !reporting tools. )hink of the data warehouse as the "ack office and "usiness intelligence as the entire "usiness including the "ack office. )he "usiness needs the "ack office on which to function, "ut the "ack office without a "usiness to support, makes no sense. @1*) Data ware house is a relational data"ase and it design anal sis and transformation processing. A Data warehousing is a su"7ect oriented, integrated, timevarient and nonvolatile collection of the data, A the support and management of the decision making process. *usiness 4ntelligence is collection of data warehousing, data mart and knowledge. @+) What is snapshot@+A) ou can disconnect the report from the catalog to which it is attached " saving the report with a snapshot of the data. ;owever, ou must reconnect to the catalog if ou want to refresh the data. @9) 4s (1A3 data"ases are called decision support s stems@9A) )rue

@<) What is active data warehousing@<A) an active data warehouse provides information that ena"les decision. makers within an organi#ation to manage customer relationships nim"l , efficientl and proactivel . Active data warehousing is all a"out integrating advanced decision support with da .to.da .even minute.to.minute.decision making in a wa that increases 2ualit of those customer touches which encourages customer lo alt and thus secure an organi#ationEs "ottom line. )he marketplace is coming of age as we progress from first.generation =passive= decision.support s stems to current. and next.generation =active= data warehouse implementations @<*) Active Data ware house means &ver user can access the data"ase an time +<!A )hat is called Active dwh

@<*) Active )ransformation means data can change and pass. @>) Wh $enormali#ation is promoted in 6niverse Design... @>A) in a relational data model, for normali#ation purposes, some lookup ta"les are not merged as a single ta"le. 4n a dimensional data modeling %star schema), these ta"les would "e merged as a single ta"le called D4M&?/4(? ta"le for performance and slicing data. Due to this merging of ta"les into one large Dimension ta"le, it comes out of complex intermediate 7oins. Dimension ta"les are directl 7oined to Fact ta"les. )hough, redundanc of data occurs in D4M&?/4(? ta"le, si#e of D4M&?/4(? ta"le is 1>S onl when compared to FA') ta"le. /o onl Denormali#ation is promoted in 6niverse Designing. @>*) in a relational data model, for normali#ation purposes, some lookup ta"les are not merged as a single ta"le. 4n a dimensional data modeling %star schema), these ta"les would "e merged as a single ta"le called D4M&?/4(?

ta"le for performance and slicing data. Due to this merging of ta"les into one large Dimension ta"le, it comes out of more intermediate 7oins. Dimension ta"les are directl 7oined to Fact ta"les. )hough, redundanc of data occurs in D4M&?/4(? ta"le, si#e of D4M&?/4(? ta"le is 1>S onl when compared to FA') ta"le. /o onl Denormali#ation is promoted in 6niverse Designing.

@@) &xplain in detail a"out t pe 1, t pe... @@A) ) pe.1 Most $ecent ,alue ) pe.+%full ;istor ) i) ,ersion ?um"er ii) Flag iii) Date ) pe.9 'urrent and one previous value @@*) /'D ) pe 15 overwrite data is to "e there. ) pe +5 current, recent and histor data should "e there. ) pe 95 current and recent data should "e there. @@') /'D ) pe 15 overwrite data is to "e there. ) pe +5 current, recent and histor data should "e there.

) pe 95 current and recent data should "e there. @@D) /'D means if the data in the dimension is happen to change ver rarel , Mainl /'D 9 t pes 1) /'D.15 ;&$& );& 3$&,4(6/ DA)A (,&$W$4)& *L '6$$&?) DA)A M&A?/ ;&$& (?1L MA4?)A4? '6$$&?) DA)A. +) /'D.+5 ;&$& B6/) ADD );& ADD4)4(?A1 $&'($D/ 4? /'D+ 9 )L3&/1) ,&$/4(?4?G+) F1AG,A16&9) &FF&') DA)& $A?G& ,&$/4(?4?G5 M&A?/ ;&$& B6/) /&?D );& 63DA)&D $&'($D/ )( );& )A$G&) A1(?G W4); ,&$/4(? ?6M*&$. ?&W $&'($D/ W411 *& /&?D4?G )( );& )A$G&) A1(?G W4); 3$4MA$L :&L F1AG,A16&5 ;&$& 63DA)&D $&'($D/ /&?D )( );& )A$G&) A1(?G W4); I A?D $&'&?) $&'($D/ /&?D )( );& )A$G&) A1(?G W4); 1 &FF&')4,& DA)& $A?G&5 M&A?/ ;&$& )$A':/ );& *(); 3$&,4(6/ A?D '6$$&?) DA)A 9) /'D.95 ;&$& MA4?)A4?/ B6/) 3$&,4(6/ A?D '6$$&?) DA)A. @A) What are non.additive facts in detail@AA) a fact ma "e measure, metric or a dollar value. Measure and metric are non additive facts.

Dollar value is additive fact. 4f we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount. A non additive fact, for e.g. measure height%s) for Eciti#ens " geographical locationE , when we rollup Ecit E data to EstateE level data we should not add heights of the citi#ens rather we ma want to use it to derive EcountE @A*) ) pes of Facts )here are three t pes of facts5 Additive5 Additive facts are facts that can "e summed up through all of the dimensions in the fact ta"le. /emi.Additive5 /emi.additive facts are facts that can "e summed up for some of the dimensions in the fact ta"le, "ut not the others. ?on.Additive5 ?on.additive facts are facts that cannot "e summed up for an of the dimensions present in the fact ta"le. @A') Factless Fact . same as non additive facts ... it can "e counted "ut cannot "e measured directl ... @AD) ?on.Additive5 ?on.additive facts are facts that cannot "e summed up for an of the dimensions present in the fact ta"le.

Potrebbero piacerti anche