ETL process using SAP Datasphere and SAP Analytics Cloud
Karolina Górecka, Business Intelligence Consultant
- 7th April 2023
- Trends in SAP
- 5 min
Data is the backbone of any business. Access to company records facilitates key decision-making, influencing, for example, an increase in production, a change in the source of supplies, or the expansion of the store network. Most often, we see such statistics when we analyze various types of reports, from which we can quickly draw meaningful conclusions. First, the data must be well prepared to use such reports. In the following article, I’ll share more details about this process.
What is the ETL process?
Figure 1: The general ETL process
ETL (Export, Transform, Load) is the process by which data is extracted, transformed, and loaded to a target destination. We can compare this series of activities to making a cocktail – first, we prepare the necessary ingredients, then mix them and pour them into the right glass. We could consume all the components individually, e.g., eat the sugar and ice, drink the juice, and then the alcohol, but we would probably get quickly disappointed because the ingredients taste worse consumed separately than mixed together and poured into a nice glass. The same goes for creating a report – processed data gives better results than the original.
We can only process already existing data in one database while uploading data we want to include would be inefficient – just like mixing ingredients in the juice carton we used for making a drink. Suitable tools have been developed for such purposes, where data from multiple sources can be combined and prepared for analysis.
Data acquisition
The first step in the ETL process is acquiring data from multiple sources.
It’s common for crucial data to be placed in various locations and held there in an incomprehensible form. For example, instead of a name, a product is identified by a string of characters, such as KDP03545989. When extracting data from the databases, we sometimes come across fields that are not relevant to us, such as the number of a machine producing the item.
In order to obtain only the information that is significant to us, we can use views that indicate from which place in the database the data should be extracted and what transformations are performed on it. Data can also come from various sources, such as CSV files, Excel, or other systems. After selection, we bring the necessary data into the system, where we perform the transformation.
Transformation
At this step, we combine data, clean, filter, format, and convert according to business requirements. Such transformations are often performed in a data warehouse or tool for data preparation. A table is created as a foundation for further analysis at this stage.
Data loading
The final step is to upload the prepared data to a target location, such as a last database, a suitable place in a data warehouse, or an analytical tool. The data for analysis will be extracted from the destination where we uploaded it after the transformation.
ETL process using SAP Datasphere and SAP Analytics Cloud tools
SAP has several solutions in its portfolio used in the ETL process, including S/4HANA, BW/4HANA, SAP Datasphere, SAP Analytics Cloud, SAP Lumira, and many others. It is entirely up to the company’s representatives to decide which of these will be used in each enterprise. Although a choice of solution may affect minor differences in the ETL process, the main steps will remain the same. Below I present a sample ETL process using selected tools.
Figure 2: ETL process using SAP Datasphere and SAC
Imagine that we own a company that produces many kinds of chocolates. We want to know how many chocolates we made in February 2022 and where they were delivered to. The production is on such a large scale that the company uses machines for manufacturing.
Each machine records in a database Tab.1 every bar of chocolate produced. The information in the database table includes the unique identifier, product id, production date, weight, and the number of a machine that made the item. The following example illustrates how information about the produced chocolate is stored in the database.
Table 1 with details of manufactured products
TIMESTAMP | MACHINE NO | SERIAL NO | PRODUCT ID | WEIGHT |
2022-02-03 23:48:03 | TX05WM | 22020300052476 | CHB167938D | 100,00 |
2022-02-03 23:48:04 | TX05WM | 22020300052477 | CHB167938D | 100,00 |
2022-02-03 23:48:04 | TX06WM | 22020300052478 | CHB165958D | 100,00 |
Such data is incomprehensible to us, so we need to process it before creating a report. The first step in this task is to collect the necessary data and send it to a data warehouse – in our case, it is SAP Datasphere. We need the data located in three places: a table with production data (Table 1), a table with the names of the products and their IDs (Table 2) and a table with information on where a particular product was shipped (Table 3).
Below I present an example of how the tables might look like:
Table 2 with product names
PRODUCT ID | NAME OF PRODUCT |
CHB167938D | Milk chocolate “Paradise pleasure” large |
CHB167938M | Milk chocolate “Paradise pleasure” small |
CHB165958D | Dark chocolate with orange peel large |
Table 3 with product shipping locations
DATE | SERIAL NO | LOCATION | DESTINATION NAME | TYPE |
2022-04-14 | 22020300052476 | Warsaw | “Polka” Restaurant | Restaurant |
2022-02-03 | 22020300052477 | Kutno | FHU Jan Kowalski | Shop |
2022-02-03 | 22020300052478 | Wrocław | “Lemur” Supermarket | Supermarket |
Tables 1 and 2 are located in one database, while Table 3 is placed elsewhere. Before sending the data from Tables 1 and 2, we can process them at the database level, which can be done using a certain view. Doing so, we’ll select and forward only the data we seek. As a result, we’ll reduce the amount of data sent to SAP Datasphere, leading to a shorter transfer time from one location to another. The table passed to the data warehouse will look as follows:
Table 4 created from tables 1 and 2
DATE | SERIAL NO | PRODUCT ID | NAME OF PRODUCT |
2022-02-03 | 22020300052476 | CHB167938D | Milk chocolate “Paradise pleasure” large |
2022-02-03 | 22020300052477 | CHB167938D | Milk chocolate “Paradise pleasure” large |
2022-02-03 | 22020300052478 | CHB165958D | Dark chocolate with orange peel large |
2022-02-03 | 22020300052391 | CHB167938M | Milk chocolate “Paradise pleasure” small |
Let’s call it “Table 4”. We send Tables 3 and 4 to the warehouse, where the transformation is performed.
The second step is the transformation, which we’ll carry out in SAP Datasphere. In our case, we only need to merge data from two tables, but this tool allows much more, such as various calculations, formatting fields, renaming columns, and adding or deleting them.
Wanting to combine data from two tables, we must first create a spot where we’ll save the processed information. A new table will be the proper location.
For our analysis, we need the date of production, the name of the item, its serial number, and details of where the data was sent. In order to combine messages about one product from two tables, we need to find a certain key. It should consist of values found in both tables we want to consolidate and be unique for each product. In our case, such values are found in the SERIAL NO columns – a unique number for each chocolate bar. Based on it, we can combine the data from both tables. The table created as a result of this transformation looks as follows:
Table 5 created from tables 3 and 4
PRODUCTION DATE | ID | PRODUCT NAME | LOCATION | DESTINATION NAME | DESTINATION TYPE |
2022-04-14 | 22020300052476 | Milk chocolate “Paradise Pleasure” large | Warsaw | “Polka” Restaurant | Restaurant |
2022-02-03 | 22020300052477 | Milk chocolate “Paradise Pleasure” large | Kutno | FHU Jan Kowalski | Shop |
2022-02-03 | 22020300052478 | Dark chocolate with orange peel large | Wrocław | “Lemur” Supermarket | Supermarket |
The final step in the ETL process is to send the processed data to the location where it is used. Once the data has been imported into SAP Analytics Cloud, we can proceed to analyze it and present the results on the pre-prepared charts in this software tool.
In conclusion – using ETL tools is a solid practice for working with data. This approach brings guidelines and principles that organize the process and streamline the data flow, making it easier and quicker to gain better-quality insights.
Do you want to integrate data from multiple systems and applications?
- On 07/04/2023
0 Comments