100%(1)Il 100% ha trovato utile questo documento (1 voto)
328 visualizzazioni34 pagine
This document outlines an agenda for a training on big data analytics. It discusses scaling data by parallelizing processes across multiple servers and distributing tasks. It also describes using MapReduce patterns to split work into mapping and reducing functions. Finally, the document outlines exercises that will have trainees access regional data warehouses to retrieve statistics, aggregate results with parallel calls, monitor outputs with dashboards, and simulate issues to study results.
This document outlines an agenda for a training on big data analytics. It discusses scaling data by parallelizing processes across multiple servers and distributing tasks. It also describes using MapReduce patterns to split work into mapping and reducing functions. Finally, the document outlines exercises that will have trainees access regional data warehouses to retrieve statistics, aggregate results with parallel calls, monitor outputs with dashboards, and simulate issues to study results.
This document outlines an agenda for a training on big data analytics. It discusses scaling data by parallelizing processes across multiple servers and distributing tasks. It also describes using MapReduce patterns to split work into mapping and reducing functions. Finally, the document outlines exercises that will have trainees access regional data warehouses to retrieve statistics, aggregate results with parallel calls, monitor outputs with dashboards, and simulate issues to study results.
Sales Engineers How big is big? Response time requirements Scalability requirements Budget Big Data Analytics Overview Big Data By The Numbers Data load limit ~400 MB/sec (commodity server) 3 terabyte data load $180 hard drive ~7860 sec (~2 ! hrs) 1 exabyte $63 million 87 years Big Data Analytics The key to Big Data Analytics: PARALLELIZE! (if you want a quick result, that is) Big Data Analytics Big Data Analytics Overview Parallelization Academy Model Exercises Agenda Splitting data over multiple servers Domain or functional decomposition Academy will concentrate on domain model Partitioning By design Search engine By evolution Corporate acquisitions Our example! Domain Decomposition Considerations Minimize communication Compare to ECP Server architecture High availability requirements Optimal number of threads Task distribution Split and delegate task - Map Aggregate partial results Reduce Result has same format 1 to N Aggregation should not be bottleneck MapReduce Pattern How much set up is required? Despite what you may read about other technologies, development work is necessary for all implementations Do I need to install additional software? No! " MapReduce Questions Multiple web shops Regional warehouses Europe, Asia, Americas Big Web Shop Outsources all orders to web shops Academy Scenario Big shop category managers want to know about any of their products being frequently out-of-stock Measure of unhappiness Product is out of stock at time of order too often Product will still be delivered but might be late The Problem Web shop simulator Business service Big web shop order distribution Business process and business operation Warehouses Data model and pivot table Web service Initial Infrastructure HoleFoods Web Shop HoleFoods Data Model Outlet Population Country City Country Name Region Product Name Region Name Type Transaction Actual Date Of Sale Product Outlet Channel AmountOfSale Units Sold InStock Category Price SKU DeepSee Data Model Cubes Defines dimensions and measures Subject Areas Views on cubes Provides automatic filtering KPIs Makes more sophisticated computations available to dashboards Can make use of DeepSee, SQL, or custom logic DeepSee Performance and Scalability Multi-level, incremental caching to support large data models (100M+ facts) Support for parallel execution of queries to exploit multi-core architectures: Queries are split by # of facts Queries are split by # of cells Subqueries and joins Logic for updates to Data Model is streamlined Academy setup BigData Asia Europe Americas Order Distributor Web shop Simulator Four Ensemble instances: In this exercise you will familiarize yourself with a regional warehouse (DeepSee) and use the web shop simulator. Exercise 1 MDX MDX (MultiDimensional eXpressions) standard query language for OLAP (online analytical processing) Provides standard syntax to execute queries against a cube When you create a pivot table DeepSee generates and uses an MDX query, which you can view directly Analyzer provides an option for directly running MDX queries You can run MDX queries in the DeepSee shell DeepSee provides an API that you can use to run MDX queries on your DeepSee cubes MDX Example
SELECT NON EMPTY [OUTLET].%TOPMEMBERS ON 0,NON EMPTY [CHANNEL].%TOPMEMBERS ON 1 FROM [SALES] WHERE [MEASURES].[AMOUNT SOLD]
In this exercise you will access your warehouse analytics programatically, using MDX, and publish the results as a web service. Exercise 2 Ens.CallStructure Holds a request object and a target name Also has a slot for the Response Ens.Host.SendRequestSyncMultiple Accepts a list of Ens.CallStructure Makes calls in parallel Adds response objects to Ens.CallStructure How to parallelize dynamically set tCall = ##class(Ens.CallStructure).%New() set tCall.TargetDispatchName = MyBusinessHostClass" set tCall.Request = ##class(MyRequestClass).%New()
set pRequestList = pRequestList + 1 set pRequestList(pRequestList) = tCall
set tSC = ..SendRequestSyncMultiple(.tRequestList) How to parallelize dynamically In this exercise you will retrieve statistics from the relevant regional warehouses, using parallel calls. Exercise 3 Dashboards Widgets In this exercise you will aggregate the results from Exercise 3 and monitor the aggregated results using a dashboard. Exercise 4 Warehouse problem simulator Business Rule Creates decision in point in business process Change at runtime In this exercise you will force a product category to be out-of-stock and watch the results deteriorate Exercise 5 With InterSystems technology: When does big data become big data? When distributing data: DeepSee (and perhaps iKnow) on the nodes ECP useful for maintaining code Conclusion Questions? Thank you Developer Connection developer.intersystems.com
Your Global Summit Every Day We want your feedback Wed love your feedback on the academy you just attended. Go to: intersystems.com/survey
Select the date, time, and academy you attended and complete the short evaluation form.
Thank you Big Data Analytics Otto Medin & Louise Parberry Sales Engineers
TikTok Algorithms 2024 $15,000/Month Guide To Escape Your Job And Build an Successful Social Media Marketing Business From Home Using Your Personal Account, Branding, SEO, Influencer
Branding: What You Need to Know About Building a Personal Brand and Growing Your Small Business Using Social Media Marketing and Offline Guerrilla Tactics