OpenSAP Bw4h2 Week 3 Transcript en

openSAP
Modern Data Warehousing with SAP BW/4HANA

Week 3 Unit 1
00:00:09 Hello and welcome to week three, Modern Trends in Data Management,
00:00:12 Unit one, Overview. In this week we are basically going to talk about
00:00:16 things like data lakes and big data, machine learning and how all this relates
00:00:20 with the data warehouse. So let's have a look at the agenda for today.
00:00:25 First we will talk about the combination of big data and enterprise data
00:00:31 and what this looks like from a technical perspective but also from a real-world example
perspective.
00:00:36 So we will give you some real-world examples which we see with our customers.
00:00:40 We will also talk about the challenges which you have in bringing these two data assets
together,
00:00:44 or these two types of data assets together, and we will show you how you can overcome
00:00:48 these challenges with SAP data. So how are modern data warehouse landscapes looks like?
00:00:57 On this picture you can see we have a data warehouse on the right side and of course
00:01:01 a data lake on the other side. And in this unit we will talk about
00:01:04 the interoperability between both worlds. So the data warehouse world is of course
00:01:11 more or less dominated by structure data which are coming from different sources
00:01:15 like ERP systems, the CRM systems, HR systems. It's more or less a standardized data
model,
00:01:22 we have harmonized data or the data are already harmonized.
00:01:27 And on the other side we have a typical data lake architecture
00:01:31 with a massive account of let's say raw data, without any structure,
00:01:37 data is more or less unstructured, we have non-relational data
00:01:41 and we load the data from different sources into this data lake.
00:01:46 Typical data for this data lake data are sensor data, web streams, social media data,
00:01:53 data from any devices, mass data in the end.
00:01:57 And of course what you can do or what is possible as well is
00:02:00 to use the data warehouse and archive data into a data lake scenario
00:02:09 with the possibility to report on that data. So this means in that example
00:02:14 we bring structured data from a data warehouse into a data lake and reporting this is of course
even possible.
00:02:21 It's not just reporting actually it's also doing other processes with this historic data.
00:02:25 Like using it for example for machine learning, that's also one of the big and interesting points
00:02:31 about the data lake, that we have this archive,

00:02:33 and we can grow this archive at very moderate costs but you can still get valuable information
00:02:40 out of this data lake. Exactly.
00:02:41 I mean reporting is one example, but it's also all the other stuff
00:02:44 you can imagine on the data lake side. So we really see that integration in both directions
00:02:49 is key at least on the picture here. Now let's look at a couple of examples
00:02:53 which really give you examples of interoperability and exchange
00:02:59 of data in these two directions. The first example which we have here is
00:03:04 basically an example of extending the data warehouse to unstructured data, right?
00:03:09 So what we see here on the right-hand side in the example is basically
00:03:13 that and it's as we say it's a real-world customer example,
00:03:18 in this scenario the customer is basically collecting data from a variety of social media sources
00:03:24 or web sources. And we've listed a couple of them here,

00:03:28 I think in the real example we were talking about like 30 to 35 different sources, right.
00:03:34 So a huge variety of sources with totally different formats, but everything is first collected in a
cloud storage,
00:03:42 just to have a place to store this massive amount of data. And then of course the question is
00:03:47 how do you combine this data, how do you bring this data to a kind of unified format,
00:03:53 and how do you then, the social media data which you probably use
00:03:57 for things like sentiment analysis and stuff in that area,
00:04:00 how do you bring this together with the data from your enterprise data warehouse
00:04:03 to see how the sentiment actually correlates with, for example, your sales?
00:04:07 Right? And you see this in this picture quite nicely
00:04:12 that after putting the data in the cloud storage you actually use Hadoop technologies,
00:04:17 HDFS technologies, big data technologies of any sort
00:04:20 to basically bring these multiple completely different structured data assets
00:04:26 into a harmonized format. And then basically at some point
00:04:30 you either transfer it into the data warehouse or you access it from the data warehouse in a
virtual way.
00:04:35 So it's really a scenario which basically extends the idea of the data warehouse
00:04:39 like harmonizing data that's basically what you do here on the big data side. So it really
extends the reach of the data warehouse
00:04:47 to unstructured sources. That's what we sometimes call
00:04:50 big data warehousing basically. Now the point which you see here is of course
00:04:55 you need this interoperability and the exchange of data,
00:04:57 the provisioning of data from the big data side to the data warehouse.
00:05:04 And of course you also need some sort of orchestration around it
00:05:07 if you really want to load the data from the Hadoop side,
00:05:12 the refined data on the top layer in this picture on the blue side,
00:05:17 if you really want to bring this to the data warehouse in a scheduled manner then of course
00:05:20 you need tools which allow you to schedule this and monitor this,
00:05:24 also monitor ideally the whole process from the cloud storage to the load into the data
warehouse
00:05:31 and so on. So that's basically one of the typical examples
00:05:34 and as I said, based on the real-world scenario which we see
00:05:38 with one of our customers. Another example which goes in the other direction.
00:05:43 Where it's about bringing the data from the data warehouse into the Hadoop world.
00:05:47 This is what we can see on that picture here. So in the middle we have a typical data lake
00:05:51 architecture based on Hadoop as we or some other cloud-based data.
00:05:56 And we load the data, machine data, typically via streams into the data lake.
00:06:03 And then typical use case is that the data scientist are entering the data and running some
00:06:12 or using some tools You see them on the right-hand side
00:06:15 Yeah on the right side we have a couple of tools as examples
2
00:06:18 like TensorFlow, R, OpenCV, to run these tools based on the data
00:06:24 on the unstructured data in the data lake to get a result.
00:06:27 But on the other side it's of course needed to integrate structured data
00:06:33 from an enterprise data warehouse. And now the question is how we can combine
00:06:38 the enterprise data warehouse or how we can access the enterprise data warehouse data
00:06:44 into the data lake tool-based... ...framework.
00:06:51 Framework here. So that's why
00:06:53 The way it's working. Exactly.
00:06:54 The question is how to provide consistent data of course and then when you have a result then
00:06:59 the question is how we can productize this result and bring the result back into
00:07:07 a structured analysis maybe or like we did it on the BW side.
00:07:13 And maybe we can enrich master data here, we can generate some other actions
00:07:19 It's also always about closing the loop in this scenario so basically coming from the inside
00:07:24 which you get in the data lake here to an action for the business user.
00:07:30 So in both of these examples you have seen that as long as this is like a one-off effort which
you do once
00:07:36 you probably don't need a lot of tools, right? I mean bringing the data technically
00:07:40 from the Hadoop site into your data warehouse, we have interfaces.
00:07:45 For the other way around we also have interfaces. Like you can use any type of tool,
00:07:48 there's a lot of that out of the box with BW/4HANA and for example HDFS,
00:07:53 there's a multitude of possibilities for this. But the real challenge actually comes from the
situation
00:07:59 when such a scenario gets productized as Gordon just mentioned.
00:08:03 So when you basically turn this into something which is repeatable,
00:08:06 which is part of a large process which is constantly running to give results
00:08:11 and which business relies on to actually do their conclusions and come to actions.
00:08:17 And this picture actually shows nicely how different the two worlds are from a tool perspective.
00:08:24 On the left-hand side in the data lake we have a zoo of tools,
00:08:28 a huge variety of different tools, there are multiple possible cloud storages
00:08:33 and vendors who basically have offerings here. When it comes to the transformation part,
00:08:39 basically everybody uses the kind of tool they like. It could be Spark,
00:08:42 it could be Python, it could be anything else, right?
00:08:46 There's a multitude of different possibilities here. So it's really a lot of variation here,
00:08:55 a lot of fluctuation because new tools come up and new people come up who use other tools,
00:09:00 and you have to be ready for everything here. So it's a very, very flexible world.
00:09:06 And that's one of the core strengths of this. On the other hand we have the data warehouse
00:09:09 where things are pretty much standardized and we are used to automation and
00:09:13 putting standards on top. So how do you basically bring all this together
00:09:18 if you have a process which starts on the left-hand side and ends on the right-hand side?
00:09:23 And that's the key point where SAP positions and introduces a solution called SAP Data Hub.
00:09:30 Yep, that's the Data Hub. Now the question is
00:09:33 the idea is that we have on the one side of course we have our enterprise systems
00:09:39 as Uli already mentioned. We have our data warehouses
00:09:43 and on the other side we have this data lake architecture. And what we definitely need is the
interoperability
3
00:09:51 between both worlds. And now the question is who's doing this interoperability?
00:09:56 As you already mentioned we have APIs for that, we have some other tools,
00:10:00 but what is the umbrella term here? And the umbrella term is in that case
00:10:06 is the SAP Data Hub. So this means the Data Hub is able to connect
00:10:09 all these different sources under one umbrella. What the Data Hub is able to do is
00:10:17 they can organize and manage all the data sets so this means the Data Hub is more or less
00:10:24 can read or can access the metadata here, and is the owner of the metadata.
00:10:29 So you really can look into each of the connected systems and get a full overview
00:10:33 of what data assets you have in the systems. Like if you connect the BW/4HANA system for
example,
00:10:38 you can get a list of all the InfoProviders, all the queries which you potentially have
00:10:42 and which you can potentially leverage in such a scenario,
00:10:46 to exchange the data with any of the other sources connected or systems connected to the
data.
00:10:51 Yeah. That's the metadata governance part
00:10:53 and data discovery part. And you can of course orchestrate and monitor
00:10:58 the data processes here. So interesting to understand is
00:11:02 that there's no need to rebuild all the logic from the data lake and or the data warehouse.
00:11:09 Once again in the Data Hub, this is not the case here, the Data Hub is executing the different
tools
00:11:19 or different tools sets or in the end the existing assets
00:11:23 so this means the Data Hub is running the Python script on the Hadoop world
00:11:28 and is running process chains on the BW/4 and then the Data Hub can move data
00:11:34 but this is not needed in some cases or not necessary for some of the processes
00:11:40 but the Data Hub owns the logic to run this different tasks here.
00:11:46 That's the idea here that the Data Hub acts more or less as an orchestration tool
00:11:51 and monitoring tool. So maybe coming back to the example
00:11:53 which we just saw on the previous slide with the zoo of tools
00:11:57 like we had some sort of cloud storage with certain interfaces to the HDFS world,
00:12:04 then all the various programming models which you can use on the HDFS side,
00:12:09 you can use Spark, you can use Python, you can use anything else.
00:12:12 If you already have these individual process steps in place,
00:12:16 as Gordon said with Data Hub it's not necessary to re-implement them using some Data Hub
functionality
00:12:22 but you can really use these prebuilt assets, these prebuilt functionalities,
00:12:26 and just use Data Hub to orchestrate them. So basically to put them in the right order
00:12:30 as you for example on the BW side know from a process chain, right?
00:12:34 But you can really leverage all the assets which you already have on the data lake side
00:12:38 and on the data warehouse side with, for example, your process chains,
00:12:43 and bring these under the umbrella of the Data Hub and put them in the right order to, for
example,
00:12:46 orchestrate a scenario as we saw it with the social media example in one of the first slides
00:12:52 so that basically all the... from the ingestion to the transformations on the data lake side,
00:12:59 everything is basically orchestrated and monitored within the Data Hub up to the delivery
00:13:06 of the data to the data warehouse. And that's the strong point of it.
00:13:10 So it's basically giving you an overview of all your data assets
00:13:14 and all your data processes. You can create these data processes
4
00:13:18 but you don't have to reinvent everything. And you get for example the data scientist used
00:13:24 to a new environment, no they can continue to work
00:13:27 with what they're used to. So what are the key takeaways of this lesson
00:13:32 or this unit? First of all we see that it's an important trend
00:13:36 and it's growing in importance to bring enterprise data and big data together.
00:13:42 In the examples like we showed here but there's of course a plenitude of other examples.
00:13:49 One of the key challenges to scalability if you want to productize such scenarios
00:13:54 is the lack of standardization of the data lake. On the other hand the lack of standardization
00:13:58 is also one of the and the flexibility which comes with it
00:14:01 is also one of the strong points of the data lake. So to overcome these challenges,
00:14:08 we actually position SAP Data Hub because it allows you to leverage
00:14:11 what you have already and combine it in new ways
00:14:15 and in overarching umbrella ways with your whole data landscape.
00:14:21 And with that, it's time for your self-test.
5
Week 3 Unit 2
00:00:08 Hello and welcome to week three, unit two - SAP Data Hub overview.
00:00:14 What will we cover today? So from a content perspective first,
00:00:19 we would like to highlight the challenges we have right now in data landscapes.
00:00:24 And afterwards getting an insight into SAP Data Hub
00:00:28 and how it can organize your landscape. Let's have a look at the overview about landscapes.
00:00:35 Usually it's growing and growing. You know this, you started with an SAP system,
00:00:41 you have your ERP and maybe some other components
00:00:44 plus over the years, there are some more systems
00:00:47 and data storage coming. You have the cloud storage somewhere from Azure,
00:00:52 you have it from the Google Cloud platform, or maybe from AWS, or even a homegrown
Hadoop environment,
00:00:58 as well, with new connected third-party systems, data is everywhere. And now the key
challenge is bringing this data together
00:01:05 because when you can bring the data together, you can derive from this
00:01:09 information you can use for taking decisions and this is, at the end of the day, what you want.
00:01:14 You would like to get a better insight based on all your data
00:01:17 to make better decisions, especially when it becomes
00:01:21 to get an competitive advantage. So when we started speaking with customers,
00:01:27 there are several kinds of challenges they faced when we have big data in the environment,
00:01:34 as well as enterprise data. So first of all there is a missing link between these two areas,
00:01:39 so this could be from an organizational point of view because different departments are owning
00:01:44 the different sets of data. So, for example, marketing is in lead
00:01:48 for all social media related data, which is maybe stored in a data center
00:01:54 run by Google, for example. So they just directly get their
00:01:58 social media data in there and there are some analytic tools on top
00:02:04 who are working with this data. On the other side you have the IT guys running the ERP
systems.
00:02:10 So these two worlds, they don't know maybe each other
00:02:12 or they don't want to work with each other. This is one thing where we would like
00:02:16 to tackle with the SAP Data Hub to bring these two worlds together,
00:02:20 as well, we have a lack of enterprise readiness for Big Data solutions in a lot of companies,
00:02:26 so they have maybe built up a cluster for Big Data, but they don't know how to operate this
00:02:33 and bring this into a format where they, at the end of the day,
00:02:36 can run an end-to-end process daily, and also can ensure that the life cycle management
00:02:42 is done correctly. So this is also where data governance comes into the picture,
00:02:47 how to deal with the data. Last but not least,
00:02:51 what we also see really often, especially if there is a data scientific
00:02:55 footprint in the companies, that they use a lot of tools.
00:02:58 So think about the data scientists, the data scientist has his environment to work.
00:03:02 Different data scientists prefer different tools, really often coming from the open source world.
00:03:07 So we see now that there's a slew of different tools, which lead to a total cost of ownership,
00:03:13 which increase by more and more tools coming in, and there we would like to streamline a little
bit
6
00:03:19 that we can embed existing coding into an environment which we call SAP Data Hub.
00:03:26 Despite the facts and challenges we just tackled, there are also from a technical point of view,
00:03:33 and from the system's, more and more challenges we see,
00:03:37 especially when you would like to bring different key topics together.
00:03:41 On the one side you have the existing system landscape, ERP, BW, and so on,
00:03:46 but nevertheless there are more and more new technologies coming into the picture
00:03:51 when you think about your end-to-end architecture at your company.
00:03:55 So there is Hadoop, you would like to execute Spark
00:03:58 on the one side for mass processing, then you have cloud storage which would be included.
00:04:02 This could be, for example, something from Amazon or the Google Cloud platform or Azure,
00:04:07 as well machine learning should be adaptive. This is a really hot topic right now,
00:04:11 there's Python code, Spark should be executed,
00:04:15 as well as libraries. For example, TensorFlow for image recognition.
00:04:19 But how can this fit together? And last but not least,
00:04:23 one trend we see in our industry right now, is to containerize software
00:04:28 and that is, for example, Docker as the container foundation and as well,
00:04:32 Kubernetes as the infrastructure to deploy and execute and run it.
00:04:37 So these are all key topics which should be included,
00:04:41 which we tackle also with the SAP Data Hub. So when you think now about the SAP Data Hub
00:04:48 and you transform it into the real world, how could it look like?
00:04:51 When you think about the SAP Data Hub in the real world, it could be like a tower at the
airport.
00:04:56 So the tower at the airport knows which plane is coming,
00:05:00 which plane is taking off. They know which passengers are there,
00:05:05 to which gate the plane should go. They know the weather condition.
00:05:09 Based on weather condition, they decide maybe to change the starting of a plane,
00:05:13 or to stop a plane from a start. Things like that.
00:05:17 All this information, and this is really important,
00:05:20 it's not stored by the tower itself, so the tower is not the master of this data.
00:05:25 The tower derives the data from different data pools,
00:05:28 and based on this you take decisions for the whole end-to-end process of an airplane.
00:05:37 And this is similar to the SAP Data Hub. The first thing which is important -
00:05:41 The SAP Data Hub will not persist any data. It should get the information
00:05:45 where the data in the company is and provides this to you
00:05:49 that you can work with it at the end of the day.
00:05:52 So when we switch now to the technical view of the data hub,
00:05:55 how it looks like in your environment, what we see is we have
00:05:59 on the one side the existing systems, it could be an SAP ERP system,
00:06:03 it could be an SAP S/4HANA system, it could be BW/4HANA and so on.
00:06:07 On the other side, you have the distributed systems for data, mainly Hadoop, a cloud storage,
or environments for machine learning.
00:06:16 And now the point is that the SAP Data Hub would like to bring these two worlds together and
will be a foundation on top
00:06:23 to orchestrate within your whole landscape. When we look into the SAP Data Hub,
00:06:30 there are three main buckets. The first one is the data discovery
7
00:06:35 and metadata governance piece. This is there that you get
00:06:38 a holistic data view about your landscape. With the data discovery,
00:06:43 you could jump into a connected system. For example, into a Hadoop system.
00:06:48 You can browse the folders, you can see which files are there,
00:06:52 and you can actually do profiling. Profiling means to get a better understanding
00:06:57 about the rows and the records which are part of exactly this file.
00:07:02 So this means you get information about data quality, are there null values existing,
00:07:08 and so on. When you would like to bring
00:07:12 this into the data hub, this is called cataloging.
00:07:15 So what we would like to achieve is that we have a sample metadata catalog
00:07:19 in the data hub itself. Deriving all metadata
00:07:22 from all connected systems and at the end of the day
00:07:25 you should be able to work with this, so you get a fundamental understanding
00:07:29 what is going on in the different systems, which data is there,
00:07:32 and so on. Metadata information is the only piece
00:07:35 we persist on the SAP Data Hub. In the next stage,
00:07:41 we would like to use exactly this metadata information to build applications.
00:07:45 So this is the future, to become like the cataloging,
00:07:49 plus a second bucket, the orchestration pipelining.
00:07:52 So this is the link between these two different pieces.
00:07:56 The data hub can orchestrate different systems remotely. For example, inter-services
communicate with BW system.
00:08:03 As well, you can build new applications. So this is done with the pipeline modeler we have in
place.
00:08:08 Within a pipeline modeler, you can define end-to-end work processes,
00:08:12 collecting data from a Hadoop system, manipulating the data, writing into Evora,
00:08:18 based on this, this goes to Hadoop and so on. Despite that,
00:08:22 you can also add operators for machine learning, so this could be predefined operators
00:08:26 which are delivered, as well as own written operators
00:08:30 and this is really more like a platform concept, where you can bring whatever code
00:08:34 you need to execute in, wrap it as a docker container,
00:08:37 and can execute it into end-to-end pipeline. To get all these pieces running
00:08:43 of course we also need a strategy for connectivity integration ingestion.
00:08:48 So there we have the possibilities to derive data from several sources,
00:08:51 as well to define connectivity. And there we have a list out,
00:08:55 but this will improve more and more and more. If there is, for example,
00:09:00 not connectivity which is needed for your special scenario,
00:09:04 for example, consuming for MongoDB, it's not an issue.
00:09:08 You would just write your own operator with connectivity connected to the MongoDB
00:09:13 and you can derive the information you need for your special end-to-end processes.
00:09:20 Now a quick look from a deployment perspective.
00:09:23 So if you would like to install the data hub, it will run into a Kubernetes cluster.
00:09:28 So this is the first thing which is important. Kubernetes cluster can be operated
00:09:33 by third-party vendors, so you have an Amazon, Azure, and Google account
00:09:40 you would like to deploy there. They have Kubernetes services.
00:09:43 You deploy this, you start the service, and on top there will be then
8
00:09:47 the SAP Data Hub deployed. You can also have a homegrown Kubernetes
00:09:53 or a private offering from companies like Virtual Stream or Cisco.
00:09:57 There you would, at the end of the day, deploy the data hub.
00:10:00 Everything within the data hub is containerized, so everything can easily be deployed.
00:10:05 As a database, we use within the data hub to store,
00:10:09 for example, our internal data, plus the meta-information, we have a HANA environment.
00:10:14 and this HANA environment is containerized as well, so this is the way
00:10:19 how you would work with it. Within this picture here,
00:10:21 you also see the different components we have there. For example, metadata management,
00:10:26 self-data preparation, pipelining, and so on.
00:10:29 Everything will be executable on the Kubernetes instance
00:10:34 and run independently. Now a little bit more of the meat
00:10:41 we have there with the metadata. So metadata is one really important aspect
00:10:46 where we have a lot of efforts in because we believe this should be
00:10:51 the single environment for all the metadata in your company.
00:10:54 So this means we connect the SAP world. We have four storage providers
00:10:59 also the possibility to get the data. In the future there should be also APIs
00:11:04 for other repositories to join there as well. When we have all the data in, in the future
00:11:10 and we would like to apply there also machine learning algorithm to make it more intuitive and
also to give you the possibility
00:11:17 to get greater insights. So how great could it be
00:11:21 if you get, for example, a business partner from the SAP world
00:11:24 and the system automatically tells you there is also meta-information
00:11:28 which could belong to a business partner, but stored in third-party system like HDFS.
00:11:33 So now we could combine this and get a more sophisticated view
00:11:37 on all this meta-information. Beside that, to create new applications,
00:11:44 we use our pipeline model. This is what we also see here on the screens,
00:11:49 so we have one modeling environment for workflows, for structured transforms. Structured
transforms are, for example, join condition
00:11:57 from two different files being in HDFS, the other file is, for example, in HANA,
00:12:03 you can join them, you can bring this together
00:12:06 and write it into a totally different target as well. So this could be also like a cue from Kafka and
so on,
00:12:12 depending on your needs actually. So this is an environment for modeling in the platform.
00:12:17 One important point is all our applications are built there and if something from an operator
perspective
00:12:24 is not ready for you, you need some really special things, build it on your own.
00:12:28 We also would like that partners engage there more and more,
00:12:32 and build their own operator, bring it into this environment
00:12:34 that the customer get the best out of it. And important again the underlying technology is,
00:12:39 at the end of the day, a Docker container where you can design and build
00:12:44 whatever you like. So with this we are at the end of this session.
00:12:50 So the key takeaways: the SAP Data Hub is the product
00:12:54 and tool at SAP to bring all the different systems together. This is the SAP world,
00:13:00 SAP systems, cloud systems, on-premise systems, so it's the glue between it
00:13:04 for the sharing data, building new application
00:13:08 and bringing especially the Big Data world into it, so that I can now merge and manipulate,
9
00:13:14 or as well Big Data combined with SAP data, for better decision finding.
00:13:20 We have possibilities for metadata repositories where you can deep dive into the different
systems,
00:13:25 get an understanding what is going there, actually bring the metadata in
00:13:29 to work with the metadata. Last but not least there is
00:13:31 also a good integration with the BW because especially there when we think
00:13:36 about getting big data, bringing it into a format, manipulating it, and then give it back to the BW
system,
00:13:41 and joining them with the SAP world with the data we get from an S/4 for example.
00:13:47 At the end of the day, there is really where we can make a big difference, where you can really
have insights
00:13:54 you didn't have before. With this, we are at the end of this session.
00:13:58 Thank you so much and good luck with the self-test
00:14:03 you will do in the next hours or days. Thank you and goodbye.
10
Week 3 Unit 3
00:00:08 Hello, and welcome to week

three, unit three - SAP Data Hub, integration with SAP BW/4HANA.
00:00:17 In this unit, we will cover the integration, or the tight integration, between SAP BW/4HANA,
00:00:24 and the SAP Data Hub. We will show, in demonstration, how the Data Hub
00:00:29 integrates BW, and vice versa, how BW can integrate Data Hub workflows.
00:00:38 So what we see here, is like typical pictures at our customers and also at SAP, so we have
different pools
00:00:43 where different data is stored, and at the end of the day, you would like to bring this together.
00:00:47 This could be like an SAP system, this could be a BW system, you will have, for example,
00:00:52 a standalone HANA database, or, as well, you have cloud storage,
00:00:57 and so this could be Microsoft Azure, you have the Google Cloud Platform,
00:01:00 or something in AWS, and you need to bring everything together,
00:01:04 because when you combine the data, you can make more and better decisions based on the
information
00:01:09 you derive out of this. All right, first of all, I'm going to start
00:01:15 with the integration in BW/4HANA. So now, I will show you how BW can integrate to Data Hub.
00:01:22 So this means, we can start Data Hub task workflows within the BW/4HANA process chain, so
this means,
00:01:29 we have a specific process type here, our process type is called Data Hub Workflow,
00:01:35 and then we can execute the Data Hub Workflow within the BW/4HANA Process Chain.
00:01:40 That's the integration from the BW side. Furthermore, we can move data from the BW/4 side
00:01:48 into the HDFS world. So this means, we can write to HDFS files
00:01:53 via an OpenHub Destination. A connectivity via HTTP is also possible.
00:01:58 Now we will talk about the integration based on the SAP Data Hub side.
00:02:05 Exactly, so this is now one example. So we have seen how the data can trigger from the BW,
00:02:10 but now it's how does a workflow actually look like in the Data Hub modeler.
00:02:15 So this is a pipeline, usually you have a starting point and an end point. What we want to do
here is,
00:02:20 we have on the one side, social media data, and would like to join it with the SAP world
00:02:25 for better decision making. So, we start with the trigger, the trigger gets scheduled,
00:02:30 so there's a scheduler in the background, every day, every night, for set dedicated aspect,
00:02:35 to trigger it. Then our workflow is split,
00:02:37 because we would like to parallelize. First of all on the top, we have a Union,
00:02:41 and a Cleanse activity. So what we do there is, we are deriving data from an S3 bucket
00:02:46 where we have social media data, and we would like to cleanse it, that we don't have null
values into it.
00:02:52 At the same time, we do a pull request from BW so we can execute a query to get information
00:02:58 and we will temporarily store this in SAP Vora at the end of the day for further and faster
processing
00:03:03 later on. Afterwards, this is joined, the workflow is going,
00:03:07 so every single step has to be executed, and then the flow is going on.
00:03:11 Afterwards, we have a join operation, where we're bringing this together,
00:03:15 and we join the product data with the social media assets,
11
00:03:18 at the end of the day to get a better inside view, and in the last step, we will enrich at the end
of the BW,
00:03:25 which means, we will write this information back to the BW, and then a normal BW colleague,
and person,
00:03:31 can work with exactly this data, which is enriched, by data coming from the big data, social
media world.
00:03:38 That's good. So, we will jump into a system demo right now.
00:03:42 In another system demo we prepared, and we will open an existing Data Hub Workflow
00:03:47 for Claims Management. We will explain the whole concept.
00:03:51 We will give an overview about the structure and the main tasks in the Data Hub Workflow,
00:03:56 and we will show you the scheduling options here, so let's jump into the demo.
00:04:02 Here we are, now we are on the Data Hub Overview page. Thank you, so what we see here
00:04:08 is exactly the modeling environment we have in place. So this is an SAP Data Hub modeler
environment.
00:04:13 So on the left side, you see different operators, we can link them together.
00:04:17 There are also operators for BW for the data transfer, for the communication at the end of the
day.
00:04:22 What we have here is a scenario where we work with IoT data, like sensor data,
00:04:29 and at the other side with claims, so we would like to bring this together to understand
00:04:34 for dedicated devices, we have in our portfolio, can some claims be triggered in the future?
00:04:41 So it's a mix between getting data from social media, plus getting data out of our BW system,
00:04:47 bringing it together, running machine learning algorithm on top of it to identify will we run into
issues with a dedicated set of devices.
00:04:56 So what we see here, as well, is we have a trigger, so this can be scheduled,
00:05:00 and we have a terminator at the end. So this is like the shell around.
00:05:03 This can be scheduled now by interval, by time, by certain other conditions,
00:05:09 but what we do here now, is like first, we enrich a device.
00:05:12 So when we jump into it, we see a structured transformation. So this means, we're getting data
00:05:19 from two different sources, so this is for devices and for customers,
00:05:22 and we do a join operation at the end of the day. Why we're doing this?
00:05:26 So when we, for example, looking into the devices, this is data coming from the Google Cloud
Storage,
00:05:32 and you see all the meta information as well. This will be joined with data coming from a
customer.
00:05:39 This is, for example, stored in an Amazon environment in an S3 bucket.
00:05:44 So, two different data sources can be combined, even when they're in totally different data
centers
00:05:51 from different vendors. The result is, that we have an enriched device,
00:05:55 which means, in this particular scenario, the device sometimes has null values for the
countries,
00:06:01 and what we do is we enrich the customer data, where we assure always the country is filled,
00:06:06 and this result, will be then the enriched devices. When we now go back to our Workflow, we
will see it
00:06:13 with this information then, some additional joined operation will be fulfilled.
00:06:17 When we jump into it, then we have the different joins. Beside this, we have also, the
possibility for the claims,
00:06:25 and there we derive out of the BW system, this information. Last but not least, we join these
two different areas,
12
00:06:34 and then, process some machine learning. This is in our self-written, machine learning
algorithm
00:06:39 to identify exactly, what can come up depending on the different circumstances.
00:06:45 So, which IoT sensor, in which country, with which information from legal aspects, and so on,
00:06:54 and based on this we have a definition, yeah, this will lead to issues, and we can bring it back
and load the result itself
00:07:01 to the BW system at the end of the day. Afterwards, when this is done, the BW receive the
data,
00:07:07 we get the callback also to the Data Hub and our end-to-end pipeline will be terminated
00:07:12 at the end of the day, and this is a process which now can be scheduled,
00:07:16 depending on your needs. So usually, this can be then run automatically,
00:07:21 at this point in time, or you have it in a process embedded, and then it's every night for
example,
00:07:26 a batch run, because you have mass data you would like to process.
00:07:30 But this is at the end of the day, how we're bringing now, social media data,
00:07:33 which is maybe in different cloud storages available, together with the integration of Data Hub
BW,
00:07:40 and we can merge it, massage it, and at the end of day, you have a better insight,
00:07:45 also in the BW world, about the whole big data you have in your environment.
00:07:51 And this means and, maybe as an outcome, business analysts can now work on the, let's say,
00:07:57 generated streams, or on the generated leads, to let's say, talk to the customers, and so on,
00:08:03 and so forth. So this means, what we are doing here is,
00:08:06 we are simply enriching data with BW data do machine learning on top of it,
00:08:13 which is quite cool, without touching any machine learning tool or suite, so we can do it,
00:08:21 just completely in the Data Hub, that's very fine, and we can schedule it, that's good, yeah, all
right.
00:08:28 Yep, with this, we can go back to the slides. Yeah, I will go back to the slides.
00:08:34 So, now we talk about what you have learned in this unit. So we saw, during the last minutes,
00:08:43 that a Data Hub and the BW/4HANA is tight integrated, so this means, we can integrate Data
Hub tasks
00:08:52 into BW, and vice versa as well. So this means, the Data Hub, can of course,
00:08:58 integrate BW/4 artifacts or BW/4 data, like queries, without moving data from BW into a Data
Hub,
00:09:08 or something like that. So no persistency is needed, on the Data Hub side,
00:09:13 so we bring the tools to the data, that's the idea here, and, yeah, and this is what,
00:09:20 I think the demo was very amazing, so, with that, thanks for listening to this unit,
00:09:27 thanks for listening, and good luck with the self-test, and of course, thank you very much
Tobias.
00:09:33 Thank you, and enjoy the rest of the course, thanks.
13
Week 3 Unit 4
00:00:09 Hello and welcome to Week 3, Unit 4 Machine Learning with SAP BW/4HANA.
00:00:14 The idea of this unit is, of course, not to give you a general introduction to machine learning
00:00:18 for which there's a multitude of online resources by the way, including here on openSAP,
00:00:25 but to basically show you what's possible with the combination of SAP BW/4HANA
00:00:29 and the HANA platform underneath. So what's the content of this unit?
00:00:33 First, we will describe the ideas and concepts behind machine learning.
00:00:37 We will show you how machine learning with SAP Predictive Analytics works,
00:00:41 and we will show you how you can automate machine learning processes and really close the
loop
00:00:45 from insight to action using SAP BW/4HANA and SAP Predictive Analytics.
00:00:52 Typically our customers have a huge amount of data and the idea is to analyze the data
00:01:00 based on statistic models. Therefore, we create models or we train data models
00:01:06 to, let's say, to automate or speed up decision making for the business users.
00:01:12 Once this is done, we of course generate possible actions to improve the quality of the
decision making.
00:01:21 Typically, we have these two different stakeholders here. On the one side, we have the
business analysts
00:01:28 or the business users. Typically, here, we have a lot of users
00:01:34 who know the business and the business challenges very well. But this kind of users are not
skilled to use
00:01:41 or build statistical methods or processes. And on the other side we have the data scientists.
00:01:49 These data scientists have, of course, statistical skills to improve business processes, but this
user
00:01:57 does not have knowledge in business processes, and now the question is how we can close
the gap
00:02:04 between the business users and data scientists. Let's have a look at the process of machine
learning
00:02:12 and training in a little bit more detail and with a couple of examples.
00:02:15 The basic idea is always that you collect huge amounts, ideally large amounts, of historic data.
00:02:22 It could be customer data. It could be product-related data.

00:02:25 It could be sales transactions. Whatever you have.
00:02:29 Based on this data you basically try to discover patterns. That's what machine learning really is
about.
00:02:34 You have algorithms which are basically trained using these huge amounts of data to basically
00:02:42 detect patterns and thereby also allow you to make predictions for the future.
00:02:48 Now, examples could be for example we have churn here so if you realize that the customers
are running away from you,
00:02:56 then, of course, that insight in itself is maybe interesting, but the real point
00:03:02 is that you want to close the loop. You want to take action based on this.
00:03:04 So you would basically turn this into a marketing campaign. The same as if you realized your
demand is probably going down.
00:03:12 You also want to take actions to mediate this. And basically, starting with the training of a
model
00:03:22 is basically the first step, but closing the loop to the action of the business user is actually
14
00:03:28 the second important step which is very important here. And that's basically also the story
behind machine learning
00:03:33 on data warehouse data here on BW/4HANA, that we have the possibility to really close the
loop
00:03:39 from the data to the action of the business user. Okay, now back to the business user.
00:03:47 So what is so difficult to find statistical processes, or to identify or qualifying leads
00:03:54 based on the huge amount of data? One point is we have a huge variety of data.
00:04:00 So we have data from different sources. We have conversion data.
00:04:05 We have machine data. We have transactional data.
00:04:09 In this unit, we will concentrate on the transaction data. Structured data based or stored on the
BW side.
00:04:19 This is one point. The other point is that we have
00:04:22 an analytical skill gap more or less. So, we have a shortage of experts.
00:04:27 We have a shortage of these experts called data scientists. It's hard to find the right people.
00:04:34 It's hard to hire people here. These resources are expensive.
00:04:39 On the other side, as I've already told, we have a lot of data analysts or business users.
00:04:46 These users know the processes and the business challenges very well,
00:04:52 but sometimes these business users can't start with analyzing the data because they think
00:04:59 that it's too complex, we have too many data, we don't have the right tools in place,
00:05:04 and so on and so forth and that's why data scientists come into the game.
00:05:09 Right, so what's the perspective from a data scientist's perspective?
00:05:15 The first step is doing a pilot on using machine learning on business data.
00:05:22 That's typically very simple because what's typically happening there is you take the data out
of, for example,
00:05:27 your data warehouse or your operational system. You bring it into the data scientist's
environment,
00:05:31 whatever that is, the kind of tool they prefer to work with. Then, they do some magic there.
00:05:36 They train their models. They come back with data models,
00:05:40 trained models and they can do predictions. But that way only works nicely for a one-off
scenario.
00:05:48 If you want to do this once, right? What typically is important if you get this
00:05:53 into productive environment is, of course, that you do this again and again, and you basically
00:05:57 industrialize the whole process. So, you basically have to retrain models
00:06:01 on a regular basis because the kind of historic data is changing.
00:06:06 Day by day new transactions come in, and they basically change the way
00:06:10 the model should work and should look like. The parameters will have to be adapted, and so
on.
00:06:17 And if you think about such a process which runs on a regular basis,
00:06:20 then it's clear that some automation is needed because the process of getting the data out,
00:06:25 bringing it to the data scientist's environment, have the data scientists work, bring the data
back.
00:06:30 That's error prone. It's expensive, and as we said, expensive people
00:06:34 are actually focusing on kind of stupid, straightforward and repetitive tasks.
00:06:39 Which is not what you want, right? Therefore, that's the key message of
00:06:45 this unit to a certain degree. The combination and the possibility to do all of this
00:06:49 out of the box on your data in the data warehouse. That's really key for that kind of scenario.
00:06:54 It works really well on structured data as we have it in our data warehouse.
15
00:07:01 So now this is an overview between the typical traditional approaches versus our SAP
Predictive Analytics.
00:07:10 What is typical for this kind of approach is when you have third-party tools in play,
00:07:17 or stuff like that. The idea is that you extract data from the system,
00:07:22 bring the data to another system, do analysis on this third-party tool,
00:07:26 and then load the data back into the original system and do the action there.
00:07:31 So that's the idea here. So this relies on a few highly-skilled people
00:07:36 and certain tools. You have special tools for each business question maybe,
00:07:41 and it's typically that you load data from one system to the other
00:07:45 to do the action and load the data back. This is not the case when you have SAP Predictive
Analytics.
00:07:52 With SAP Predictive Analytics, we bring the right tool to the data,
00:07:58 and not the data to the tool. That's the idea here.
00:08:01 So this means that we are enabling business users to build or consume the end models;
00:08:09 predictive analytics models, machine learning models, and close the gap between build
models and do the action.
00:08:16 This is the closed loop architecture, and it's possible to automate that process.
00:08:22 Automate means that you can run or schedule this machine learning once and provide some
actions
00:08:31 to do further analysis or to do further actions here. That's the idea.
00:08:37 The main idea here is bring the tools to the data. So, here let's have a look at this from
00:08:43 an architecture perspective and how the boxes fit together. The key point is here, and we
basically said it already,
00:08:51 that everything works directly on the data in your SAP BW/4HANA system.
00:08:56 So BW/4HANA stores the data in a HANA database, and Predictive Analytics works right on
this data.
00:09:01 So there's no data duplication. There's no need to export and reimport data.
00:09:05 It all happens inside this single box. This is where the source data is,
00:09:10 which is used for training, and this is where the results come out after the model
00:09:15 has basically been trained and applied to new data. Everything happens in memory.
00:09:22 So you can also rely on the high speed of the HANA database. It supports automated model
creation and consumption,
00:09:29 which is basically an additional aspect of the whole thing you mentioned earlier,
00:09:35 which is geared towards business users to really also enable business users
00:09:39 to do certain steps. It's not only geared towards the data scientists,
00:09:43 but really towards a much wider user group. And, of course, by having all this inside the HANA
platform,
00:09:50 you can also extend the capabilities and the functionalities with other
00:09:54 engines inside of SAP HANA. There are some interesting use cases
00:09:58 to bring in other data as well in other engines. Alright, so what kind of scenarios do we have
going?
00:10:07 In the end, we have different scenarios to cover. So we have different requirements
00:10:11 based, of course, on the business processes for some of the requirements.
00:10:17 Data, let's say, this data automation or the data predictions
00:10:26 is needed in real-time or for some other batch process to wonder
00:10:32 if the predictive model is good enough. So this means that we offer more than one option
00:10:40 to create a prediction in our tools. As I said, one option is to run a batch job,
16
00:10:47 generate the predictions, store the predictions in a table and consume that via BW/4HANA,
and on the other side,
00:10:56 we are able to run this prediction in real-time. So this means we call it real-time scoring in the
end.
00:11:03 So this means use data, run the prediction, and generate the action directly.
00:11:08 So this is maybe needed for online campaigns, or for predict prices for used cars
00:11:15 or stuff like that. Therefore, this is perfect.
00:11:20 Another scenario is, of course, that BW can schedule these predictions.
00:11:27 So this means it is possible via the BW/4HANA and the HANA Analysis Process HAP
00:11:33 to schedule the predictive run and the predictive tool and consume the data on its own.
00:11:40 So the actual added value of the last scenario is really that all of the process orchestration,
00:11:48 all the scheduling, is then completely done inside BW/4HANA. So once you basically have the
model
00:11:55 you can plug this into a straightforward and kind of standard BW process.
00:11:59 Whereas, for example, the scenario 1 would rely on the scheduling capabilities
00:12:04 of the predictive tool. So let's have a look at the rough architecture
00:12:10 for such an example here. We talk about customer churn.
00:12:14 Of course, this is kind of agnostic, and the same pattern would apply to any other scenario as
well.
00:12:20 It's basically always based on the generated calculation views
00:12:26 which BW/4HANA provides you with. Remember this is what we covered in week two.
00:12:33 It's possible to basically on every level and layer of BW/4HANA generate a calculation view
00:12:38 and consume this data on the HANA side. Now, the SAP Predictive Analytics can also
00:12:45 leverage these calculation views, and use them as a source for training the data model.
00:12:53 The results of this training process can then, for example, in this example here
00:12:59 be written into a table. You see it in the second screenshot
00:13:02 in the little one on the lower part of the screen. We basically produce an output table
00:13:07 where we assign a name, and this output table is then, as Gordon just described,
00:13:10 is located somewhere in some HANA schema inside your database, and you can
00:13:15 access this data from BW using Open ODS, using composite providers if you
00:13:19 put a calculation here on top, or you can put a data source on top
00:13:23 and actually load the results into BW as well, if that makes sense.
00:13:26 So you have all the options and you benefit really from this integration and the openness of
BW/4HAN,
00:13:32 which we described in basically week two. We have both options here as well;
00:13:37 we can use it in real-time and in a batch mode. The example here is a batch mode,
00:13:41 but real-time will basically work in the same schema. So, what are the core predictive
functions,
00:13:48 or machine learning algorithms, you could also say, which are supported by SAP Predictive
Analytics?
00:13:55 We have classification algorithms which basically classify, for example,
00:14:00 customers according to certain groups or certain characteristics.
00:14:04 For example, create or ask for two customer groups which are likely to buy or likely not to buy.
00:14:11 Then you can basically buy more or not buy more. Then you can basically according to these
two
00:14:17 classifications, run an algorithm which puts a certain group of customers
00:14:22 into the first bucket and another group of customers into the second bucket,
17
00:14:25 and then you can derive actions on this. You can either think about marketing campaigns in
one area,
00:14:31 or additional incentives in another area, or whatever makes sense from your business
perspective.
00:14:36 I'm not a big business person to be honest. Maybe Gordon you have better ideas here.
00:14:41 Regression algorithms. That's basically when it comes to predicting
00:14:44 more key figure-like stuff. So, predicting prices.
00:14:48 Gordon had the example of, for example, used car prices. If you have a huge amount of data
of real-life
00:14:54 car prices from used cars in the past, then of course it makes sense to leverage this data
00:15:00 to make predictions based on characteristics of the car. Like the amount of mileage of the car,
00:15:08 how old the car is, the type of the car, location probably also plays a part, plays a role.
00:15:18 Basically all of these aspects and all of these parameters could play a role
00:15:21 in the car price and then you can basically train a model using these characteristics or using
00:15:27 these parameters on new incoming used cars to predict what a decent price for such a car
would be.
00:15:36 Segmentation and clustering algorithms that's also an interesting topic
00:15:40 where you basically just want to find groups. Not knowing the classification beforehand,
00:15:44 but you basically want to find clusters or buckets of customers with similar behavior, for
example.
00:15:50 We have things like forecasting based on time-series analysis and link analysis as you, for
example,
00:15:55 would know it from social media. So all this is contained in the functionality
00:16:00 of SAP Predictive Analytics and it can leverage it more or less in the way which we described
in these slides.
00:16:06 So, what have you learned in this unit? We talked about the dilemma between business users,
00:16:14 data scientists, the huge amount of data, and how to create the right models
00:16:18 to support a business scenario. That's why we introduced SAP Predictive Analytics
00:16:25 to enable machine learning automation. Automation here means close the loop between data
models
00:16:33 and use the data models in one closed loop, and hopefully, you see that SAP is offering simple
tools to exactly close
00:16:47 that bridge between the different stakeholders here. And with that.
00:16:53 Get ready for your self-test. That's my clause.
00:16:55 Oh okay.
18
Week 3 Unit 5
00:00:08 Hello and welcome to Week 3, Unit 5 Data Tiering Optimization with SAP BW/4HANA
00:00:14 In this unit we will give you an overview of SAP Data Tiering Optimization.
00:00:19 We will explain the architecture and implementation steps for DTO. And we will also introduce
you
00:00:25 to the monitoring capabilities which SAP BW/4HANA brings in the Data Tiering Optimization
area.
00:00:31 And finally, of course, we will show you an extensive demo which also shows you all these
aspects.
00:00:40 Data Tiering Optimization, what it is. First I'd like to introduce the different tiers
00:00:48 we have with BW/4HANA and the idea behind the Data Tiering Optimization. So the idea is
that we have one in-memory
00:00:58 database which is our SAP HANA database and besides that we have tiers with cheaper TCO
00:01:07 where we can store the data and move the data out of the hot area into cheaper tiers.
00:01:15 We have of course the hot data tier which is used for mission-critical data
00:01:21 for real-time processing and real-time analytics. The data is stored in the in-memory database,
00:01:27 in the in-memory section of the SAP HANA database. This is what we called the hot memory
part or the hot data.
00:01:35 That's basically, if you don't care about anything like data tiering,
00:01:39 this is exactly what you have, all your database is, the hot data.
00:01:43 So basically Data Tiering Optimization is about extending this to less frequently
00:01:48 access data in the other tiers. Exactly. Besides that we have the warm data tier.
00:01:52 The warm data tier is dedicated for scale-out scenarios. In the scale-out scenario,
00:01:59 scale-out is when you have more than one HANA node then you can flag a node as a warm
data part.
00:02:08 This means, this data is used with reduced performance SLAs. So this means this is the
typical place
00:02:17 for data which is not that frequently accessed, maybe for historical data
00:02:21 and the data is stored on lower storage cost but these warm data areas are part of the HANA
platform.
00:02:31 We will show you what does it mean later on. So from a functional perspective,
00:02:36 you don't have any deficits, you can run all the operations as on the hot data because it's still
part
00:02:41 of the online database, but as we said from a performance perspective, it doesn't satisfy,
00:02:47 it doesn't give you the same SLAs as you have in the hot area. On the other hand
00:02:50 you can store much more data in these nodes, than in a hot node.
00:02:54 And last but not least, we have the so-called external storage
00:02:57 which is the cold data area. This is more or less for archive data, this is for data which are
sporadically accessed
00:03:06 so this means older data with much less SLAs, but the interesting thing is this is completely
separated
00:03:16 from the SAP HANA database, this could be an IQ database or an Hadoop database
separated from the HANA database
00:03:26 but you can access the data via the tools in our analytical portfolio seamlessly. So this means
you can run a query
00:03:35 and access data from all the three different tiers simultaneously.
19
00:03:40 So let's look a little bit more in detail at the extension node concept.
00:03:43 What's the idea of extension nodes? Basically Gordon described it already,
00:03:47 extension nodes are nothing else but additional nodes in your HANA landscape.
00:03:51 Many of you have probably heard of the scale-out concept in general, scale-out means
00:03:56 you have multiple nodes in your HANA database to really scale horizontally,
00:04:00 to extremely high data volumes. Now if you have such a scale-out landscape with
00:04:04 multiple database nodes, you can actually dedicate certain nodes of this landscape as warm
data nodes
00:04:12 and those will basically keep the warm data. So from a hardware perspective, it's exactly the
same
00:04:18 as the other nodes. The main point is that we allow you to store much more data in the warm
data nodes
00:04:26 than in the hot data nodes, that's what's indicated in the picture on the right-hand side.
00:04:31 You might be aware that in a normal HANA node, be it a single node instance
00:04:37 or a standard scale-out configuration, only 50% of the overall RAM capacity can be used for
data,
00:04:43 the rest has to be reserved for processing purposes and internal storages of HANA.
00:04:50 For the warm data nodes, we have actually relaxed this and we allow to store up to 200% of
the RAM capacity
00:04:57 of such a node, on the node. Which basically means that the node is typically not capable,
00:05:02 if you really fill it up to 200%, to keep all the data it has in memory.
00:05:08 That's basically what we mean with relaxed SLAs. It basically might take some time to load the
data
00:05:15 from disk into memory before it can actually be processed and part of the data always has to
be displaced
00:05:20 so you always have to take into account that accessing this data might require additional time
00:05:25 to actually bring it into memory before it can be used. On the other hand, from a functional
perspective,
00:05:31 you can do anything with this data. You can update it, you can read it at any point in time,
00:05:37 you can do all the SQL stuff with it. It's fully integrated in the life cycle,
00:05:42 it's also integrated in backups, so from all the operations perspective,
00:05:46 it's just part of your database. Now when it comes to the availability
00:05:49 of this extension node concept, it's both available as a HANA native concept on the SQL level.
00:05:59 So if you think about SQL data warehousing, it can be used in that context, starting with HANA
2 SPS03.
00:06:07 With SAP BW/4HANA its available since HANA 1.0 SPS12. How does the data aging process
or these
00:06:19 data aging properties, with Data Tiering Optimization, look like? We already explained that we
have
00:06:25 hot, warm and cold data. Another question is what is inside the different pillars here. The hot
data as we already said
00:06:34 is in-memory data, the data allocated by an advanced DSO completely.
00:06:41 You can push the complete DSO between hot, warm, and cold, or for each aDSO partition. On
the Extension Node part,
00:06:54 as we already explained, an extension node is a node with relaxed memory requirements
00:07:01 and we have cold data. Cold data is archiving, we mean an SAP IQ database, Hadoop storage
is possible,
00:07:09 HDFS storage, as well as SAP Vora on disk. So the idea is that we have a consistent
approach
20
00:07:17 for all kinds of these temperature levels, means for hot, cold, and warm data
00:07:22 and we can allocate the temperature partition wise so this means it's possible to create a
persistency object
00:07:31 like Advanced Data Store Object and then we can identify or create different partitions and
then
00:07:36 we can move or we can assign each partition to a different temperature schema.
00:07:42 And this could be done completely automated via APIs we deliver. Now I will give you an
overview,
00:07:52 or we will give you an overview, what are the needed implementation steps.
00:07:58 First of all, you have to create a data persistency. Here we have an Advanced Data Store
Object
00:08:06 and in the Advanced Data Store Object, in the metadata maintenance, we have a small
section where you can choose
00:08:14 if Data Tiering Optimization is needed and if you like to use hot, warm, cold
00:08:19 or a combination of these three. Once you have flagged this opportunity, you have to select if
you like to add
00:08:33 the different temperature levels to the complete object or to separate partitions. Once this is
done,
00:08:40 you have to create a partition schema If you already have data loaded into the Advanced Data
Store Object
00:08:47 and you change the partitions, then repartitioning is maybe needed for that case. Once this is
done
00:08:56 and the object is activated then we jump into a so-called temperature definition. This is the
plan and execution
00:09:05 path of the process. Here you can assign the different partitions of the persistence object,
assign the different
00:09:14 partitions to a temperature schema. Once this is done, store that and then we come to the
execution part.
00:09:20 The execution part means, run the Data Tiering Optimization job via the Web GUI or via the
SAP GUI.
00:09:29 Here we plan to integrate that operation into our Web GUI with a nice UI5 Fiori-based
interface,
00:09:39 but currently this one's in the SAP GUI. You can run this program and then the system
automatically
00:09:48 moves the partitions from one temperature level-
00:09:54 Storage tier. Or storage tier to the other, so this means
00:09:57 on a physical storage side, we have the hot, warm, and cold storage. Once you change the
partitions,
00:10:05 the program moves exactly that partition between the different tiers. And you can of course
00:10:13 execute that program manually or you can use that in a process chain via a process chain
variant.
00:10:23 Let's come to the monitoring part of it, of course you need to keep track of where your data
resides,
00:10:27 what data volumes you have in these areas, I mean keeping track of the temperatures
assigned to
00:10:33 an object is basically what we saw in the last slide, if you want to see what partition is assigned
00:10:40 to what data temperature, but of course you also need an understanding of how much data
you have in each of these
00:10:44 partitions, especially when it comes to archiving, it makes a lot of sense to understand
21
00:10:48 how the data is distributed. For that we basically have a new version, a dedicated version of
technical content
00:10:55 which gives you monitors to get a feeling and impression of the size of the individual object.
And that includes also of course
00:11:06 the data temperatures for all the partitions. On the right-hand side what you see is a query
00:11:11 a little bit of a query layout which is shown here, an analysis for Microsoft Office, based on
views
00:11:22 from our new technical content which are built using CDS technology. So we basically
leveraged CDS technology
00:11:29 to really expose the internal data, the administrative data of BW/4HANA in a consumable way
for query purposes
00:11:38 and you can start building your own queries on top of this technical content or even build small
dashboards
00:11:44 which give you an overview of how your system is currently behaving, like here we have a
huge amount of
00:11:48 hot, very little warm, and very little cold. Which is maybe not so typical if you think about
archiving
00:11:55 old data then the typical situation would probably be that you have a rather small part of hot
data,
00:12:05 maybe same amount of warm data and over time the cold data starts growing and growing and
will be certainly the largest
00:12:11 amount of data at some point in time. This is the monitoring side of it and
00:12:18 how you get this kind of information. Now let's jump into the system and do a quick demo.
00:12:23 The demo will basically cover all the aspects which we mentioned, so it will start in
00:12:27 the maintenance of the Advanced Data Store Object, show you what flags you have to set to
actually activate
00:12:34 Data Tiering Optimization, what you have to do on the partition side. Then we'll jump into the
00:12:41 maintenance UI to assign the individual temperatures to the partitions and then we'll even
execute
00:12:48 the partition movement to complete the demo. Alright Gordon.
00:12:55 Let's go. Here we are, this is the data flow object I will create
00:13:01 a new Advanced Data Store Object, double-click on it, assign a name,
00:13:09 maybe set openSAP sales, I will use a template I already created.
00:13:21 Advanced data store object, here we are. And here you can see the different Data Tiering
properties
00:13:28 we have. I will flag hot, warm, and cold. And here you can see you can combine it,
00:13:34 you can say okay, hot, warm on object level. With that setting you can move the complete
00:13:41 object between hot and warm. In our case I will use cold as an archive option as well,
00:13:48 and then this is based on partition level, so now the metadata management is done
00:13:54 I created or I maintained the metadata here properly so here we have the details part,
00:14:02 I have to add an InfoObject here. I will add simply a time InfoObject for 0CALYEAR here we
are.
00:14:15 Is this the partitioning criteria-
00:14:17 which you're going to use? I will use that as a partition criteria
00:14:20 therefore I have to... Include it in the key I guess?
00:14:25 Yeah, add this in the key definition, here we are so now we can see on the settings side,
00:14:34 here we have the partitions, now I have to create partitions.
00:14:37 In this state we have only one partition for that object, this makes not that much sense when
we try to move the data,
22
00:14:44 partition wise. So I will create, let's say, ten partitions Simply press Add, I will-
00:14:53 See how quick- As an example, 2010 between and 2020, that's okay
00:15:02 now we will split that partition into 10 partitions based on one year each, so this is a split based
on year,
00:15:11 so now I have ten partitions and here we are. Now we have ten partitions.
00:15:17 On the Advanced Data Store Object, I will now activate that object first.
00:15:22 So that's basically all the configuration which you have to do within the aDSO,
00:15:26 now we switch to the maintenance where we assign the data temperatures to each of the
partitions.
00:15:33 What we now have is a persistency object, the Advanced Data Store Object with ten partitions,
00:15:37 and now we'll start to assign a temperature to each partition,

00:15:44 therefore we have this button, this Maintain Temperatures here. Now we jump into the SAP
GUI, the embedded SAP GUI,
00:15:54 as I already said, we work on a integration into the Web UI, so this is data from 2010 to 2013-
00:16:02 So that's really old and-
00:16:03 Probably something which we will assign to cold. I will assign it to cold, exactly,
00:16:07 and for the next let's say 2000- Yeah, that's fine.
00:16:11 Maybe until 2017, this is maybe-
00:16:14 warm, and that's it. Now the partition definition is done.
00:16:20 I have to save that of course, now you can see here in the temperature status, that a
temperature change
00:16:29 is needed and therefore of course, I have to execute a job. This icon basically says that the
green lights
00:16:36 in the lower part basically tell you that the hot partitions, these two partitions,
00:16:40 have not changed the Exactly.
00:16:42 temperature assignment, for the other partitions the temperature assignment has changed
00:16:44 so something has to be done to really move the data. Yeah, its a comparison between the
current
00:16:49 and the planned temperature, so in my example I will start and execute the job manually as I
already said
00:16:57 you can of course integrate a job into a process chain and fully automate that step.
00:17:04 So here we are I will now execute the one, this may take some time-
00:17:10 In our case, we don't have a lot of data in this aDSO so it should not take too long I guess.
00:17:15 It takes a while. Here we are, now it's back, here you can see now we moved
00:17:22 the partitions from hot to warm and from hot to cold and the system executes the partition
movement here.
00:17:32 I mean the data movement was not what was time consuming here, it was basically creating
the
00:17:36 partitions on the right node or in the external database. Which is an IQ installation in our case
or is it Hadoop?
00:17:44 It's an IQ installation in that example. So basically creating the tables in the right
00:17:48 temperature tier was basically what took us some time here. Now even I understand.
00:17:55 So okay, that's it from a demo part, what we learnt is that it's quite easy to let's say
00:18:02 to move partitions between the different tiers and we can automate that movement into a
process chain,
00:18:12 or via other, yeah. Alright, so lets summarize what we saw.
00:18:18 Yeah, what we saw was that Data Tiering Optimization helps to classify data
23
00:18:25 into hot, warm, and cold and find the right temperature for each of your data and Data Tiering
Optimization
00:18:34 provides a central UI based on Eclipse at the moment but we hardly work on an integration
into the Web.
00:18:41 We work hard! If we work hard-
00:18:43 Not, we hardly work That's right
00:18:47 And the displacement of data could be a simple and periodic housekeeping activity, so this
means
00:18:55 that you can integrate this data movement in your weekly or monthly housekeeping activities
and then it's quite simple
00:19:02 and in the end Data Tiering Optimization helps to streamline your administration and your
development and in the end
00:19:10 it saves some money because you can move the data between different and cheaper tiers.
00:19:17 Alright, I think with that we have reached the end of this unit, and the end of this week,
00:19:21 so it's time for your weekly assignment.
24
www.sap.com/contactsap
© 2018 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable
for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality
mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are
all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation
to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from e xpectations. Readers are
cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other
countries. All other product and service names mentioned are the trademarks of their respective companies. See http://www.sap.com/corporate-en/legal/copyright/index.epx for additional trademark
information and notices.

OpenSAP Bw4h2 Week 3 Transcript en

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

OpenSAP Bw4h2 Week 3 Transcript en

Caricato da

Copyright:

Formati disponibili

openSAP

Modern Data Warehousing with SAP BW/4HANA

00:02:31 about the data lake, that we have this archive,

00:03:24 or web sources. And we've listed a couple of them here,

00:00:08 Hello, and welcome to week

00:02:22 It could be customer data. It could be product-related data.

00:15:37 and now we'll start to assign a temperature to each partition,

© 2018 SAP SE or an SAP affiliate company. All rights reserved.

Potrebbero piacerti anche