ETL process using SAP Datasphere and SAP Analytics Cloud

Karolina Górecka, Business Intelligence Consultant

7th April 2023
Trends in SAP
5 min

Data is the backbone of any business. Access to company records facilitates key decision-making, influencing, for example, an increase in production, a change in the source of supplies, or the expansion of the store network. Most often, we see such statistics when we analyze various types of reports, from which we can quickly draw meaningful conclusions. First, the data must be well prepared to use such reports. In the following article, I’ll share more details about this process.

What is the ETL process?

Figure 1: The general ETL process

ETL (Export, Transform, Load) is the process by which data is extracted, transformed, and loaded to a target destination. We can compare this series of activities to making a cocktail – first, we prepare the necessary ingredients, then mix them and pour them into the right glass. We could consume all the components individually, e.g., eat the sugar and ice, drink the juice, and then the alcohol, but we would probably get quickly disappointed because the ingredients taste worse consumed separately than mixed together and poured into a nice glass. The same goes for creating a report – processed data gives better results than the original.

We can only process already existing data in one database while uploading data we want to include would be inefficient – just like mixing ingredients in the juice carton we used for making a drink. Suitable tools have been developed for such purposes, where data from multiple sources can be combined and prepared for analysis.

Data acquisition

The first step in the ETL process is acquiring data from multiple sources.

It’s common for crucial data to be placed in various locations and held there in an incomprehensible form. For example, instead of a name, a product is identified by a string of characters, such as KDP03545989. When extracting data from the databases, we sometimes come across fields that are not relevant to us, such as the number of a machine producing the item.

In order to obtain only the information that is significant to us, we can use views that indicate from which place in the database the data should be extracted and what transformations are performed on it. Data can also come from various sources, such as CSV files, Excel, or other systems. After selection, we bring the necessary data into the system, where we perform the transformation.

Transformation

At this step, we combine data, clean, filter, format, and convert according to business requirements. Such transformations are often performed in a data warehouse or tool for data preparation. A table is created as a foundation for further analysis at this stage.

Data loading

The final step is to upload the prepared data to a target location, such as a last database, a suitable place in a data warehouse, or an analytical tool. The data for analysis will be extracted from the destination where we uploaded it after the transformation.

ETL process using SAP Datasphere and SAP Analytics Cloud tools

SAP has several solutions in its portfolio used in the ETL process, including S/4HANA, BW/4HANA, SAP Datasphere, SAP Analytics Cloud, SAP Lumira, and many others. It is entirely up to the company’s representatives to decide which of these will be used in each enterprise. Although a choice of solution may affect minor differences in the ETL process, the main steps will remain the same. Below I present a sample ETL process using selected tools.

Figure 2: ETL process using SAP Datasphere and SAC

Proces ETL z wykorzystaniem SAP Datasphere i SAC

Imagine that we own a company that produces many kinds of chocolates. We want to know how many chocolates we made in February 2022 and where they were delivered to. The production is on such a large scale that the company uses machines for manufacturing.

Each machine records in a database Tab.1 every bar of chocolate produced. The information in the database table includes the unique identifier, product id, production date, weight, and the number of a machine that made the item. The following example illustrates how information about the produced chocolate is stored in the database.

Table 1 with details of manufactured products

TIMESTAMP	MACHINE NO	SERIAL NO	PRODUCT ID	WEIGHT
2022-02-03 23:48:03	TX05WM	22020300052476	CHB167938D	100,00
2022-02-03 23:48:04	TX05WM	22020300052477	CHB167938D	100,00
2022-02-03 23:48:04	TX06WM	22020300052478	CHB165958D	100,00

Such data is incomprehensible to us, so we need to process it before creating a report. The first step in this task is to collect the necessary data and send it to a data warehouse – in our case, it is SAP Datasphere. We need the data located in three places: a table with production data (Table 1), a table with the names of the products and their IDs (Table 2) and a table with information on where a particular product was shipped (Table 3).

Below I present an example of how the tables might look like:

Table 2 with product names

PRODUCT ID	NAME OF PRODUCT
CHB167938D	Milk chocolate “Paradise pleasure” large
CHB167938M	Milk chocolate “Paradise pleasure” small
CHB165958D	Dark chocolate with orange peel large

Table 3 with product shipping locations

DATE	SERIAL NO	LOCATION	DESTINATION NAME	TYPE
2022-04-14	22020300052476	Warsaw	“Polka” Restaurant	Restaurant
2022-02-03	22020300052477	Kutno	FHU Jan Kowalski	Shop
2022-02-03	22020300052478	Wrocław	“Lemur” Supermarket	Supermarket

Tables 1 and 2 are located in one database, while Table 3 is placed elsewhere. Before sending the data from Tables 1 and 2, we can process them at the database level, which can be done using a certain view. Doing so, we’ll select and forward only the data we seek. As a result, we’ll reduce the amount of data sent to SAP Datasphere, leading to a shorter transfer time from one location to another. The table passed to the data warehouse will look as follows:

Table 4 created from tables 1 and 2

DATE	SERIAL NO	PRODUCT ID	NAME OF PRODUCT
2022-02-03	22020300052476	CHB167938D	Milk chocolate “Paradise pleasure” large
2022-02-03	22020300052477	CHB167938D	Milk chocolate “Paradise pleasure” large
2022-02-03	22020300052478	CHB165958D	Dark chocolate with orange peel large
2022-02-03	22020300052391	CHB167938M	Milk chocolate “Paradise pleasure” small

Let’s call it “Table 4”. We send Tables 3 and 4 to the warehouse, where the transformation is performed.

The second step is the transformation, which we’ll carry out in SAP Datasphere. In our case, we only need to merge data from two tables, but this tool allows much more, such as various calculations, formatting fields, renaming columns, and adding or deleting them.

Wanting to combine data from two tables, we must first create a spot where we’ll save the processed information. A new table will be the proper location.

For our analysis, we need the date of production, the name of the item, its serial number, and details of where the data was sent. In order to combine messages about one product from two tables, we need to find a certain key. It should consist of values found in both tables we want to consolidate and be unique for each product. In our case, such values are found in the SERIAL NO columns – a unique number for each chocolate bar. Based on it, we can combine the data from both tables. The table created as a result of this transformation looks as follows:

Table 5 created from tables 3 and 4

PRODUCTION DATE	ID	PRODUCT NAME	LOCATION	DESTINATION NAME	DESTINATION TYPE
2022-04-14	22020300052476	Milk chocolate “Paradise Pleasure” large	Warsaw	“Polka” Restaurant	Restaurant
2022-02-03	22020300052477	Milk chocolate “Paradise Pleasure” large	Kutno	FHU Jan Kowalski	Shop
2022-02-03	22020300052478	Dark chocolate with orange peel large	Wrocław	“Lemur” Supermarket	Supermarket

The final step in the ETL process is to send the processed data to the location where it is used. Once the data has been imported into SAP Analytics Cloud, we can proceed to analyze it and present the results on the pre-prepared charts in this software tool.

In conclusion – using ETL tools is a solid practice for working with data. This approach brings guidelines and principles that organize the process and streamline the data flow, making it easier and quicker to gain better-quality insights.

Do you want to integrate data from multiple systems and applications?

On 07/04/2023

Tags: BI, Business Intelligence, etl, ETL process, SAC, SAP, sap analytics cloud, SAP Datasphere

Procurement & analytics

Customer Experience

Process automation

Application & Cloud

Regulations

challenges

management

integrations

services

ETL process using SAP Datasphere and SAP Analytics Cloud

Karolina Górecka, Business Intelligence Consultant