Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Course Number:
Duration: 4 days
Overview
The program is focused on enhancing data handling and integration capabilities. Create ETL
jobs that connect to almost any data source, Filter, Modify, unite data, Build standalone jobs
that run on a schedule or based on an event and Make jobs more user-friendly for
non-technical users. This course also covers Talend Big Data integration aspects
(Hortonworks Distribution)
Prerequisites
Participants should preferably have basic knowledge of a programming language like Java.
The participants must be familiar with RDBMS and SQL language.
Materials
● Exercise Manual
● Slides
Objectives
Outline
● Overview
o Introduction To Talend
o Why Talend?
o Talend Vs Other Tools
o Logical Architecture
o More On Data Integration Aspects
o Talend Big Data Integration
o Talend Open Studio Walkthrough
o Key Components In Palette
o Conclusion
o Introduction
o Hand-Cranking A Built-In Schema
o Propagating Schema Changes
o Creating A Generic Schema From The Existing Metadata
o Cutting And Pasting Schema Information
o Dropping Schemas To Empty Components
o Creating Schemas From Lists
● Validating Data
o Introduction
o Enabling And Disabling Reject Flows
o Gathering All Rejects Prior To Killing A Job
o Validating Against The Schema
o Rejecting Rows Using Tmap
o Checking A Column Against A List Of Allowed Values
o Checking A Column Against A Lookup
o Creating Validation Rules For More Complex Requirements
o Creating Binary Error Codes To Store Multiple Test Results
● Mapping Data
o Introduction
o Simple Mapping And Tmap Time Savers
o Creating Tmap Expressions
o Using The Ternary Operator For Conditional Logic
o Using Intermediate Variables In Tmap
o Filtering Input Rows
o Splitting An Input Row Into Multiple Outputs Based On Input Conditions
o Joining Data Using Tmap
o Hierarchical Joins Using Tmap
o Using Reload At Each Row To Process Real-Time / Near Real-Time Data
o Introduction
o Performing One-Off Pieces Of Logic Using Tjava
o Setting The Context And Globalmap Variables Using Tjava
o Adding Complex Logic Into A Flow Using Tjavarow
o Creating Pseudo Components Using Tjavaflex
o Creating Custom Functions Using Code Routines
o Importing Jar Files To Allow Use Of External Java Classes
● Managing Context Variables
o Introduction
o Creating A Context Group
o Adding A Context Group To Your Job
o Adding Contexts To A Context Group
o Using Tcontextload To Load Contexts
o Using Implicit Context Loading To Load Contexts
o Turning Implicit Context Loading On And Off In A Job
o Setting The Context File Location In The Operating System
o Introduction
o Setting Up A Database Connection
o Importing The Table Schemas
o Reading From Database Tables
o Using Context And Globalmap Variables In Sql Queries
o Printing Your Input Query
o Writing To A Database Table
o Printing Your Output Query
o Managing Database Sessions
o Passing A Session To A Child Job
o Selecting Different Fields And Keys For Insert, Update, And Delete
o Capturing Individual Rejects And Errors
o Database And Table Management
o Managing Surrogate Keys For Parent And Child Tables
o Rewritable Lookups Using An In-Process Database
● Managing Files
o Introduction
o Appending Records To A File
o Reading Rows Using A Regular Expression
o Using Temporary Files
o Storing Intermediate Data In The Memory Using Thashmap
o Reading Headers And Trailers Using Tmap
o Reading Headers And Trailers With No Identifiers
o Using The Information In The Header And Trailer
o Adding A Header And Trailer To A File
o Moving, Copying, Renaming, And Deleting Files And Folders
o Capturing File Information
o Processing Multiple Files At Once
o Processing Control/Validation Files
o Creating And Writing Files Depending On The Input Data
o Introduction
o Using Txmlmap To Read Xml
o Using Txmlmap To Create An Xml Document
o Reading Complex Hierarchical Xml
o Writing Complex Xml
o Calling A Soap Web Service
o Calling A Restful Web Service
o Reading And Writing To A Queue
o Ensuring Lossless Queues Using Sessions
o Introduction
o Find The Location Of Compilation Errors Using The Problems Tab
o Locating Execution Errors From The Console Output
o Using The Talend Debug Mode – Row-By-Row Execution
o Using The Java Debugger To Debug Talend Jobs
o Using Tlogrow To Show Data In A Row
o Using Tjavarow To Display Row Information
o Using Tjava To Display Status Messages And Variables
o Printing Out The Context
o Dumping The Console Output To A File From Within A Job
o Creating Simple Test Data Using Trowgenerator
o Creating Complex Test Data Using Trowgenerator, Tflowtoiterate, Tmap, And
Sequences
o Creating Random Test Data Using Lookups
o Creating Test Data Using Excel
o Testing Logic – The Most-Used Pattern
o Killing A Job From Within Tjavarow
o Introduction
o Creating Compiled Executables
o Using A Different Context
o Adding Command-Line Context Parameters
o Managing Job Dependencies
o Capturing And Acting On Different Return Codes
o Returning Codes From A Child Job Without Tdie
o Passing Parameters To A Child Job
o Executing Non-Talend Objects And Operating System Commands
o Introduction
o My Tab Is Missing
o Finding The Code Routine
o Finding A New Context Variable
o Reloads Going Missing At Each Row Global Variable
o Dragging Component Globalmap Variables
o Some Complex Date Formats
o Capturing Tmap Rejects
o Adding Job Name, Project Name, And Other Job Specific Information
o Printing Tmap Variables
o Stopping Memory Errors In Talend
o Creating A Job
o Adding Components To The Job
o Connecting The Components Together
o Configuring The Components
o Executing The Job
o Various Types Of Big Data Jobs
o Pig Workflow
o Reading And Writing To Hive On Hadoop
o Working With Hdfs
o Performing Sqoop
o Using Spark In Talend
o Kafka
● Carparts Project