Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
DBS.NAME AS TABLE_SCHEMA,
TBLS.TBL_NAME AS TABLE_NAME,
TBL_COMMENTS.TBL_COMMENT AS TABLE_DESCRIPTION,
COLUMNS_V2.COLUMN_NAME AS COLUMN_NAME,
COLUMNS_V2.TYPE_NAME AS COLUMN_DATA_TYPE_DETAILS
FROM DBS
JOIN TBLS ON DBS.DB_ID = TBLS.DB_ID
JOIN SDS ON TBLS.SD_ID = SDS.SD_ID
JOIN COLUMNS_V2 ON COLUMNS_V2.CD_ID = SDS.CD_ID
JOIN
(
SELECT DISTINCT TBL_ID, TBL_COMMENT
FROM
(
SELECT TBLS.TBL_ID TBL_ID, TABLE_PARAMS.PARAM_KEY,
TABLE_PARAMS.PARAM_VALUE, CASE WHEN TABLE_PARAMS.PARAM_KEY = 'comment'
THEN TABLE_PARAMS.PARAM_VALUE ELSE '' END TBL_COMMENT
FROM TBLS JOIN TABLE_PARAMS
ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID
) TBL_COMMENTS_INTERNAL
) TBL_COMMENTS
ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID;
TABLE SCHEMA
TABLE NAME
TABLE DESCRIPTION
COLUMN NAME
COLUMN LENGTH
COLUMN PRECISION
COLUMN SCALE
ADWL01-- DONE
AAC-- DONE
ZMT-- DONE
ABKL01--DONE
https://stackoverflow.com/questions/49694901/synchronize-data-lake-with-the-
deleted-record
Limitation In Hadoop.
Usually this is always a constraint while creating datalake in Hadoop, one can't
just update or delete records in it.
There is one approach that you can try is
When you are adding lastModifiedDate, you can also add one more column naming
status. If a record is deleted, mark the status as Deleted.
So the next time, when you want to query the latest active records, you will be
able to filter it out.
You can also use cassandra or Hbase (any nosql database), if you are performing
ACID operations on a daily basis.
If not, first approach would be your ideal choice for creating datalake in Hadoop
https://www.kingswaysoft.com/products/ssis-integration-toolkit-for-microsoft-
dynamics-365
how many data source and nature of Data Loads(Relational or various file format).
size of data
One time load vs Incremental Load
semi-structured and unstructured data like social media feeds, log files, etc.is
available ?
build a data lake
Many data streams vs source.
Advanced file system that can process data over large cluster of commodity
hardwares.
Hadoop was/is an Apache Project to provide a framework for processing distributed
data using a storage abstraction ( HDFS)
and a processing abstraction ( Map-Reduce).>>(Use Spark Indeed).
In banking terms, only the data of value ends up in the Data Warehouse for ETL
processes.
What this mean is that you Extract the needed data into a staging area
(in relational term often staging tables or the so called global temporary tables),