Sei sulla pagina 1di 4

Hadoop Architecture....

mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Hadoop Architecture....mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Hadoop Architecture....mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Hadoop Architecture....mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Hadoop Architecture....mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

v
Hadoop Architecture....mapr/cloudera read ahead log
Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

v
Hadoop Architecture....mapr/cloudera read ahead log
Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Hadoop Architecture....mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Hadoop Architecture....mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Hadoop Architecture....mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

v
Hadoop Architecture....mapr/cloudera read ahead log
Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Hadoop Architecture....mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

v
v
Hadoop Architecture....mapr/cloudera read ahead log
Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Hadoop Architecture....mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Hadoop Architecture....mapr/cloudera read ahead log


Mapreduce:- splitting mapping shuffling reducing.
HDFS:- blocksize and split size. mapreduce.job.reduces=12 SET
mapred.reduce.tasks=12, default block size
Hive:- architeture, serde, internal tables, external tables, partitioning(coulmn on
which we are partitioning.), bucketing. ORC zlib(stripes) vs parquetsnappy,
distibute by clustered by, order by
sqoop:- eval Hcatalog, split by, number of mappers. incremental
update(append/lastmodified) and check column/lastvalue. -compress
spark:- RDD and Dataframe, driver node executer node.

Potrebbero piacerti anche