Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Comoduty hardware
Mahout:-
Hive :-
Job Tacker -Task Trcaker
Cloudera Hotonwork
Master Slave
Within IBM InfoSphere DataStage, the user modifies a configuration file to define
multiple processing nodes. These nodes work concurrently to complete each job
quickly and efficiently. ... Parallel processing environments are categorized as
symmetric multiprocessing ( SMP ) or massively parallel processing ( MPP ) systems.
===============
data lake systems tend to employ extract, load and transform (ELT) methods for
collecting and integrating data, instead of the extract, transform and load (ETL)
approaches typically used in data warehouses. Data can be extracted and processed
outside of HDFS using MapReduce, Spark and other data processing frameworks
Hadoop data lakes have come to hold both raw and curated data.
https://www.ibm.com/blogs/insights-on-business/sap-consulting/enterprise-analytics-
reference-architecture/
http://www.ibmbigdatahub.com/blog/ingesting-data-data-value-chain
http://www.datavirtualizationblog.com/logical-architectures-big-data-analytics/
https://www.xenonstack.com/blog/data-engineering/ingestion-processing-data-for-big-
data-iot-solutions
Prescriptive vs Predictive
ETL tools like Talend/Pentaho?
Data storage format ( like Parquet/Avro)
Wherehouse:-Enterptise data,Structured Data
Data Lake :-Ingest data any shape any type.Not Structured Data
Startegic reportic: donene in wherhouse
==================================Types of No SQL DB=====
Key Value Store
Columnar Store
Graph Dat base
Document Database
=======================
Time Series DB for IoT and used for industrial data
Fast Load Vs Multiload - Teradata Community
Fastload has two phases: acquisition phase and application phase. mload has 5
phases(Note: however there is no acquisition phase for mload delete).
B2B exchange
Data governance
Data migration
Data warehousing
Data replication and synchronization
Integration Competency Centers (ICC)
Master Data Management (MDM)
Service-oriented architectures (SOA) and more.
Informatica PowerCenter is an enterprise data integration platform working as a
unit.
S3
casendra
==================
ELT vs ETL
LookUp optimization
Strategic Reporting
Facts and Dimension
In Memory Is cached is Huge
=================
Terdata:-Max Parellilism much more than other. High incentive Query is being
suupported primary Index
===========================
Pages--Blocks--Segment--->Extent
=========================MAPReduce============
==================
Primitive wrapper classes
These classes are conceptually similar to the primitive wrapper classes, such as
Integer
and Long found in java.lang. They hold a single primitive value that can be set
either
at construction or via a setter method.
BooleanWritable
ByteWritable
DoubleWritable
FloatWritable
IntWritable
LongWritable
VIntWritable – a variable length integer type
VLongWritable – a variable length long type
==================
https://www.hakkalabs.co/articles/cassandra-data-modeling-guide