Post

Data Integration (EP 1) – Give me your data

There are jargons that is ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform)

Data Integration (EP 1) – Give me your data
In this series

Greeting all guys and myself … again!!

This is my new series about data management. Let’s go!


I would inform you all that my main role, data engineer, is to maintain the data. I mean all of the data that our organisation is taking care of, no matter where they are and what form they are. The core idea is how to manage them all in the place and format that our customers can access to with ease to use.

Normally, here are the main topics we must know before the actions:

  1. Where is the source
    e.g. CSV files, Excel files, APIs provided by some websites, or database systems
  2. Where is the destination
    for example, data from Excel files will be in our database system
  3. How to transform the data
    such as we need gender field by applying a condition on a form of address
  4. When and how often
  5. How is after process
    For example, move the source files to backup folders

There is a jargon that is:

ETL (Extract-Transform-Load)

It is to extract data (from source), transform or bending data, then load or store into the destination. However, I usually perform this below:

ELT (Extract-Load-Transform)

The difference is ELT is for loading the raw data without transformation. This can prevent data loss for some cases but trade-off with more space of our system.


Suggested tools

talend logo source: https://commons.wikimedia.org/wiki/File:Talend_logo.svg

Talend is a company working on data managements. One of their products is Talend Open Studio and can be download via the link below:

Pros: It is a freeware as a community version. We can access the forum in case of any problems.

Cons: RAM thirst, due to this is based on JAVA. I recommend 8 GB of RAM as the minimum requirement


Begin the lesson

Let’s say we already downloaded the program. Once we open it, it ask us the project. the project is like a folder of our works.

talend start

For example, we selected the “Local_Project”. Click Finish.

talend welcome page

After project, we go create a new job.

talend new job

For example, we name it “sample_job01”. A window of package installation will be appeared. Those packages are the component-related external libraries. We skip it for this time and we can install them later.

talend workspace

Yeah, we finally reach the main window of this program and can start work on it.

Next episode, we will see how to start a sample job.

See you next time 👋🏼

This post is licensed under CC BY 4.0 by the author.