Sei sulla pagina 1di 3

SELECT

DBS.NAME AS TABLE_SCHEMA,
TBLS.TBL_NAME AS TABLE_NAME,
TBL_COMMENTS.TBL_COMMENT AS TABLE_DESCRIPTION,
COLUMNS_V2.COLUMN_NAME AS COLUMN_NAME,
COLUMNS_V2.TYPE_NAME AS COLUMN_DATA_TYPE_DETAILS
FROM DBS
JOIN TBLS ON DBS.DB_ID = TBLS.DB_ID
JOIN SDS ON TBLS.SD_ID = SDS.SD_ID
JOIN COLUMNS_V2 ON COLUMNS_V2.CD_ID = SDS.CD_ID
JOIN
(
SELECT DISTINCT TBL_ID, TBL_COMMENT
FROM
(
SELECT TBLS.TBL_ID TBL_ID, TABLE_PARAMS.PARAM_KEY,
TABLE_PARAMS.PARAM_VALUE, CASE WHEN TABLE_PARAMS.PARAM_KEY = 'comment'
THEN TABLE_PARAMS.PARAM_VALUE ELSE '' END TBL_COMMENT
FROM TBLS JOIN TABLE_PARAMS
ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID
) TBL_COMMENTS_INTERNAL
) TBL_COMMENTS
ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID;

TABLE SCHEMA

TABLE NAME

TABLE DESCRIPTION

COLUMN NAME

COLUMN DATA TYPE

COLUMN LENGTH

COLUMN PRECISION

COLUMN SCALE

NULL OR NOT NULL

PRIMARY KEY INDICATOR

Querying tables from metastore in Hive throwing error

hive> use mydb;


OK
Time taken: 0.052 seconds
hive> select * from DBS;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'DBS'
hive> select * from TBLS;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'TBLS'

ADWL01-- DONE
AAC-- DONE
ZMT-- DONE
ABKL01--DONE

https://stackoverflow.com/questions/49694901/synchronize-data-lake-with-the-
deleted-record

Limitation In Hadoop.

Usually this is always a constraint while creating datalake in Hadoop, one can't
just update or delete records in it.
There is one approach that you can try is
When you are adding lastModifiedDate, you can also add one more column naming
status. If a record is deleted, mark the status as Deleted.
So the next time, when you want to query the latest active records, you will be
able to filter it out.
You can also use cassandra or Hbase (any nosql database), if you are performing
ACID operations on a daily basis.
If not, first approach would be your ideal choice for creating datalake in Hadoop

https://www.kingswaysoft.com/products/ssis-integration-toolkit-for-microsoft-
dynamics-365

how many data source and nature of Data Loads(Relational or various file format).
size of data
One time load vs Incremental Load
semi-structured and unstructured data like social media feeds, log files, etc.is
available ?
build a data lake
Many data streams vs source.
Advanced file system that can process data over large cluster of commodity
hardwares.
Hadoop was/is an Apache Project to provide a framework for processing distributed
data using a storage abstraction ( HDFS)
and a processing abstraction ( Map-Reduce).>>(Use Spark Indeed).

In banking terms, only the data of value ends up in the Data Warehouse for ETL
processes.
What this mean is that you Extract the needed data into a staging area
(in relational term often staging tables or the so called global temporary tables),

segregate it from unwanted data, perform data manipulation (Transformation) and


finally Load it into target tables in a Data Warehouse.
Analysts then use appropriate BI Tools to look at macroscopic trends in the data.
This makes the process of data matching,

Potrebbero piacerti anche