DataStage Ques

1)Data Stage Architecture client components:-DS designer,DS administrator,DS manager,DS director 2) How do you create a project?
ans>>> through datastage administrator you can also set the some properties at project level. 3) How many projects can you create maximum? ans>>depends on licensing 4) How do you create users and give the permission 5) What are the permissions available in Administrator 6) Is it possible an operator to view the full log information 7) Tell me the type of jobs (Active or passive also odbc and plugins) ans>> server jobs,parallel jobs,mainframe jobs and job sequence 8)How do you lookup through seq file ans>> not possible 9)What is the Stage variable, ans>> An intermediate processing variable that retains value during read and doe snt pass the value into target column. 10)What does a constraint do ans>> constraint is a condition that evaluates to either true or false. It deter mines flow of data on input link. 11) What is the derivation do ans>> Derivation is an expression that specifies value to be passed on to the ta rget column. 12) Tell me the sequence of execution (StageVariable, Constraint, Derivation) ans>> StageVariable, Constraint, Derivation 13) Why do you use hash file ans>> hash file is used in server jobs only.It is used for lookup and to remove duplicates. 14) Difference between hash file and seq file ans>> 15) Name some type of seq file ? 16) What is the size of your hash file ?????? 32-bit and 64-bit....for 32-bit it is 2GB 17) how do we calculate the size of our hash files 18) What is hash algorithm 19) How many types are available in hash file ans>> static hash file,dynamic hash file 20) Which type of hash file do you used, why ans>> dynamic hash file 21) How do you create a hash file ans >> throguh designer we can create hash file... we need to use hash file stage for that. 22) How do you specify the hash file 23) Is it possible to view the records in hash file through any editor, if yes w hich editor 24) What is the extension of a hash file 25) Is it possible to create a hash file contains all the columns in a normal se q file(with out key columns)
ans>> no 26) Difference between static and dynamic hash file ans>> Dynamic hash file allocates dynamically memory size, where as static hash file does not beyond the specified file. In both scenarios, there is data file and overflow file is used.In the case of s tatic hash file,data file is of fixed size while in the case of dynamic hash file, data file grows dynamically. 27) Tell me the different types of stages ans>> basically there are two types of stages in datatage active stages and passive stages in active stage some kind of processing is done like sorting,filtering,aggregati on while passive stages are those in which procesing is not done like eq file,odbc 29) Is it possible to check constraint at Active stages, if yes how ans>> through transformer stage,we can check the constraint. 30) Where do you define the constraint? ans>> in the transformer output link, you will get stage variable,constraint and derivation.Doubkle click on the constraint. you will get tabular format with link name,constraint,otherwise,abort after rows as column heading. In the constraint column,specify ur condtition which will evaluate to true or fa lse. 31) What is the job parameter, where do you define it ans>> job parameter are parameters through which we can pass the details/values are run-time. It can be defined through the designer at job level (explain with some examples) 32) What is the environment variable, where do you define it ans>> envt variables are like global variables which can be used across the proj ect Environment variable can be defined in the Administrator. (Explain with some exa mples) 33) Difference between job parameter, environment variable and stage variable ans>> Environment variable is one through which one can define project wide defa ults. Job parameter is one which through which one can override project wide set defau lts and Can be applicable to the particular job Stage Variable is one which is locally executed for the active stage (explain wi th some examples) 34) While running a job, Is it possible to control other job through a stage, no t job control coding, if yes how, and what are the stages supported it clarify............... 35) Have you written job control, what is it use ans>> no 36) How do you attach a job in job control check... 37) How do you set a job parameter in job control check.. 38) What is routine check... 39) Different types of routine
check... 40) What is the use of routine check 41) Where the routines are stored ..????? check 42) How many windows are shown in DS designer, what are they ans>> Designer window, repository,pallete 44) What is the use of merge stage ans>> Merge stage is used to merge two input sets and produce as one or more ou tput set server merge stage A Merge stage is a passive stage that can have no input links and one or more output links. parallel merge stage more than one input link(one master link and more than one reference links). can have reject link also 45) Is it possible to join more than two seq file using merge stage, if no is th ere any stage to solve this 46) Name all the join type ans>> inner join,left outer,righ outer,full outer join 47) How do you extract data from database? ans>> Through ODBC and OCI 48) Name all the update action ans>> there are 8 update actions (found on the input link of ODBC stage) 49) In job control which language is used to write ans>> BASIC 51) Is it possible to run a job in DS designer if yes how ans>> yes 52) Is it possible to lookup a lookup hash file, if yes how ans>> yes....check 53) What does the director do ans>> Job Locks. Job Resources. , Job Report in XML, Scheduling, Viewing Logs,Ru n the jobs.. 54) Have you schedule the job, how ans>> yes,through data stage director 55) A job is running, but I would like to stop the job, what are the ways to s top the job ans>> Through Director, .and through Cleanup resources in the director. 56) What is the use of log file ans>> You can check out how the job is executed.If the job failed then you can also check for errors. 57) Describe cleanup resource and clear status file ans>> Cleanup resources features only applies to server jobs. The Cleanup Resour ces command lets you: View and end job processes View and release the associated locks go to Choose Job Cleanup Resources from the menu bar. The Job Resources dialog box appears, from which you can view and clean up the resources of the selected job: Cleanup resource allows one to remove locks or / kill the jobs When you clear a job s status file you reset the status records associated with all stages in that job. To clear the job status file, choose Job Clear Status File from the menu bar. The job status changes to Compiled and no evidence will remain that
the job has ever run. 58) Situations wherein there is a need to clear the status file 59) Is it enabled in DS director if not how to enable it ans>> a)by default clean up resource and clear status file is not enabled in the director one can enable it through administrator by checking the Enable Job administrator in Director in general tab 61) How do you find the no of rows per second in DS director ans>> in designer we can do it by choosing view performance statistics but in dire ctor it is through tools-New monitor 62) How do you know the job status ans>> through director ..or dsjob.status 63) What is the difference between warning, fatal message ans>> Warnings do not abort the job where as fatal messages abort the job 65) What is a phantom error how do u resolve it ?.........?????? ans>> 67) What is the difference between run and validate a job ans>> run is to execute the job whereas validate is to check for errors like fil e exists odbc connections,intermediate existence of hash files . 69) How do you import export the project ans>> Through data stage manager 70) What is a Meta data, where is it stored 71) How do you write a routine ans>> we can write a routine by going to the routine category in ds manager and selecting create routine option routines are written in basic language 72) What is the use of release a job ans>> releasing a job is significant to clean up the resources of a job which is locked or idle 68) What is the use of DS manager a) Ds manager is used to edit and manage the contents of the repository..like u sed to create or edit routines ,table definations..export and import of jobs or the entire project 73) What is the use of table definition in Manager 74) What is the difference between local container and shared container ans>>a local container can be used within the job itself and does not appear in the repository window. Whereas shared containers are available throughout the project and appear in the repository window 75) what are containers ans>> containers are a collection of group stages and links which can be reused (s hared container) 76) Difference between Annotation and Description Annotation ans>>Annotations are short or long descriptions. we can have multiple annotation s in a job and they can be copied to other jobs as well where as we can have only one Description Annotation per job and they cannot be co pied into other jobs 77) What are the advantages of Description Annotation ans>> Description Annotation is gets automatically reflected in the Manager and director 78) What are the various types of compilation and run time errors u have faced ? ans>> 79) explain the allow stage write cache option for hash files and what are its im plications..? ans>> it caches the hash file in memory should no use this option when we are rea ding and writing to the same hash file 80) What are the caching properties while creating hash files? 81) Where do u specify the size of ur hash file
1)Lookup Stage :Is it Persistent or non-persistent?(What is happening behind the scene) Ans: Look up stage is non-persistent 2)Is Pipeline parallelism in PX is same what Interprocessesor does in Server? Ans: Yes and no. The IPC stage buffers data so that the next process (or next stage in the same process) can pick it up. Pipeline parallelism in parallel jobs is much more complete. Do you understand the relationship between stages and Orchestrate operators? Ess entially each stage generates an operator. These (assuming that they don't combine into single processes) can form a pipel ine so that, if you examine the generated OSH, it might have the form Code: Op1 < DataSet1 | op2 | op3 | op4 | op5 | op6 > DataSet2 Very slick, very fast. 3)How can we maintain the partitioning in Sort stage? 4)Where we need partitioning (In processing or some where) 5)If we use SAME partitioning in the first stage which partitioning method it wi ll take ans>> it will maintain the partitions done in previous stage as it is. 6)What is the symbol we will get when we are using round robin partitioning meth od? 7)If we check the preserve partitioning in one stage and if we don t give any part itioning method which partition method it will use? 8)What is orchestrate? Ans: Orchestrate was product from Torrent before being bought over by Ascential. Ochestrate provides the OSH framework, which has the UNIX command line interfac e. Hello Pradeep, Behind Orchestrate Framework, each of every stage would be converted to correspo nding operator within OSH, such as import, transform, copy, export, etc., and th en the OSH will be executed by Orchestrate Framework to process your ETL process es. You could check the option "Generated OSH visible for Parallel jobs in All p rojects" in Administrator->Projects->Properties->Parallel tab in order to observ e the OSH code generated in Designer->Job Properties->Generated OSH tab as you c ompile your job. So once you assign certain operator in Generic stage, it appear s to utilize that stage your specified operator points to. Due to many input lin ks and output links supported in Generic stage, you could achieve multiple opera tions just in one stage. But the option name and option values with the operator you assigned in Generic stage always bring out design overhead. Best Regards Brian 9)Can we give node allocations i.e. for one stage 4 nodes and for next stage 3 n odes? 10)What is combinability, non-combinability? 11)What are schema files? 12)Why we need datasets rather than sequential files? Ans: A sequential file as a source or target needs to be repartitioned as it is (as the name suggests) a single sequential stream of data. A dataset can be saved across nodes using the partitioning method selected so it is always faster when used as a source or target. 13)Is look-up stage returns multi-rows or single rows? 14)Why we need sort stage other than sort-merge collective method and perform so rt option in the stage advanced properties?
ans >> sort can be performed on each partitions separately..so it would be faste r.while sort-merge is a colletion method 15)For surrogate key generator stage where will be the next value stored? 16)When actually re-partition will occur? 17)In transformer stage can we give constraints? Ans: Yes, We Can give 18)what is a constraint in the Advanced tab? 19)What is the diff between Range and Range Map partitioning? ans>>range is one of the method of partitioning while range map partitioning is What is the difference between Job Control and Job Sequence What is the max size of Data set stage? (PX) no limit?
***how to develop the SCD using LOOKUP stage? we can impliment SCD by using LOOKUP stage, but it is for only scd1, not for scd 2. What are the errors you expereiced with data stage ans>>Here in datastage there are some warnings and some fatal errors will come i n the log file. If there is any fatal error means the job got aborted but if there are any warni ngs are there means the job not aborts but we have to handle those warnings also. logfile must be cleared with no warnings also. what are the main diff between server job and parallel job in datastage? server jobs: few stages logical intensive does not use MPP systems. Why you need Modify Stage? Modify Stage is used for the purpose of Datatype Change.\\
Data warehousing questions: 1)What's A Data warehouse 2)What is ODS? 3)What is a dimension table? 4)What is a lookup table? 5) Why should you put your data warehouse on a different system than your OLTP s ystem? 6) What are the various Reporting tools in the Market? 7)What is Normalization, First Normal Form, Second Normal Form , Third Normal Fo rm? 8) What is Fact table? 9) What are conformed dimensions? 10) What are the Different methods of loading Dimension tables? 11)What is conformed fact? 12)What does level of Granularity of a fact table signify?
13) What is the Difference between OLTP and OLAP? 14) What is SCD1 , SCD2 , SCD3? 15) Why are OLTP database designs not generally a good idea for a Data Warehouse ? 16) What is BUS Schema? 17) What is real time data-warehousing? 18) What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables? 19) Differences between star and snowflake schemas? 20) What is a Star Schema? 21) What is a general purpose scheduling tool? 22) What are Data Marts? 23) How are the Dimension tables designed? 24) What are non-additive facts? 25) What type of Indexing mechanism do we need to use for a typical datawarehous e? 26) What Snow Flake Schema? 27) What are Aggregate tables? 28)What is Dimensional Modelling? Why is it important ? 29) Why is Data Modeling Important? 30) What is data mining? 31) What is ETL? 32) What is ER Diagram? 33) Which columns go to the fact table and which columns go the dimension table? 34) What are modeling tools available in the Market? 35) How do you load the time dimension? 36) What is a CUBE in datawarehousing concept? 37) What is data validation strategies for data mart validation after loading pr ocess ? 38) what is the datatype of the surrgate key ? 39) What is degenerate dimension table? 40) What are the methodologies of Data Warehousing.? What is a linked cube? What is the main difference between Inmon and Kimball philosophies of data wareh ousing? What is Data warehosuing Hierarchy? What is the main differnce between schema in RDBMS and schemas in DataWarehouse. ...? What is hybrid slowly changing dimension? what is junk dimension? what is the difference between junk dimension and degene rated dimension? can a dimension table contains numeric values? What is the difference between view and materialized view? What is surrogate key ? where we use it expalin with examples
As I mentioned earliar, I have limited knowledge on this. One point I could thin k of is, the resource consumption of Each stages can be minimized by clubbing al l in a single stage and enabling 'CombinableOperator'. Its a behaviour of Datastage to combine possible operators in to one and compile the Combined code so that, during run time it act as just single operator which performs all the combined operators. Usually all Active operators are combinable. You can optionally avoid with the couple of options. One simple options would be to select the option "CombinalbeOperator" options to False in Transformer stage
. It is useful for debugging purpose. Dear Deepak, So as you mentioned, maybe one stage we've seen in the Palette would be compiled with multiple operators behind at runtime, correct or not? Orchestrate itself is an ETL tool with extensive parallel processing capabilitie s and running on UNIX platform. Datastage used Orchestrate with Datastage XE (Be ta version of 6.0) to incorporate the parallel processing capabilities. Now Data stage has purchased Orchestrate and integrated it with Datastage XE and released a new version Datastage 6.0 i.e Parallel Extender.

DataStage Ques

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

DataStage Ques

Caricato da

Copyright:

Formati disponibili

1)Data Stage Architecture client components:-DS designer,DS administrator,DS manager,DS director 2) How do you create a project?

Potrebbero piacerti anche