Sei sulla pagina 1di 12

Data Engineering at REA Asia

challenges, practices & technology stack

By Ashik M Chowdhury & John Sek


REA Group
Data service team at REA Asia

Data Engineering Development Squad BI Squad Data Science Squad


Squad

Data processing & Data infrastructure , Reporting & ETL Machine learning & AI
Modelling architecture & pipelines
Development
Easily scale across
market
Tech challenges
Large scale data/log processing
Real time event
from multiple sources
processing,querying & analytics

Fast reporting for business


decisions
Security & data Data acquisition and
protection processing

Low experimentation Low infrastructure operational


cost effort & fast deployment
Data Acquisition and Processing Architecture
Batch ETL Strategy

Challenges:
● 100+ Data
Pipelines
● 400+ ETL Jobs
in Legacy
● On and Off ETL
Failure
Data Structure challenges

Unifying and standardization of Unstructured and semi-structured


data structure across different data source
markets

Association of data Multi-language and alias support for


different markets

Legacy data migration & Dynamic data field


compatibility/mapping
Designing Data Structure
● Single and dynamic data structure for Location, Property, POI, Transaction data across all market
○ Legacy location and property database have 30+ tables which could support only 5 markets with lot’s of
limitation
○ New location and property database have only 9 tables which can handle any number of market data with
unlimited capability
● UUID for data uniqueness across all markets
● Relational table for data association with association type for
○ Legacy data mapping
○ Location hierarchy, duplicate data
● Hybrid data structure into Data-lake
○ Relational
○ Normalized - Denormalized
○ NoSQL
○ Data Warehouse - Star Schema with Fact and Dimension
Designing Data Structure
● Nested data type (RECORD and REPEATED) for
○ Dynamic data field
○ Multi-language support
○ Alias support

● JSON string metadata field for


○ structured/semi-structured source data
○ Uncommon data field
○ Legacy data
● Using UTIL_ATTRIBUTE table for all kind of common attribute/dimension with attribute type
Data authentication challenges

● Data authentication for BI and Data Analyst


● Role wise data access
● Dynamically assign and revoke user access to role
● GCP/Non-GCP user access for BI and Data-lake
● Cell level data filter based on role
● Single and standard data source for same kind of dataset
● In-memory fast data access
● Hidden data authentication layer
Data Authentication Architecture
Thanks!!!

● Ashik M Chowdhury
○ Email: ashik.m.shuvo@gmail.com , ashik.chowdury@rea-group.com
○ WhatsApp: +60164521252
○ Linkedin: https://www.linkedin.com/in/ashik-m-shuvo/
● John Sek
○ Email: john.sek@rea-group.com
○ WhatsApp: +60162198172
○ Linkedin: https://my.linkedin.com/in/sek-jia-sheng-b3a35543

Potrebbero piacerti anche