More examples of Talend. Executing SQL statements constructed at runtime.

After completing the Time dimension with using Talend ETL processes, to review records DWD_TIEMPO table, we see that for some years, the last week of the year was filled with the value 1. The explanation is that Java uses the ISO standard for the number of weeks, and it can never exceed 52. Therefore, for some years, the last week of the year has been recorded with the value 1.

This issue serves as a base to develop our FOLLOWS ETL process that anger encandenado to generate records of the time dimension, which will aim to arrange the records that have been wrong in the database.

The procesor have the following steps:

1) Recovery for each year, the greatest number of weeks recorded in the table: To do this, we execute a SQL statement, using the component group TMySqlInput Databases, Mysql).

The statement executed is:

"SELECT dwd_tiempo.ano_id, max (dwd_tiempo.sem_id) weeks as
FROM dwd_tiempo group by dwd_tiempo.ano_id
order by dwd_tiempo.ano_id " 

This control generates a stream with all the records returned by the SQL statement and each record will do the following:

2) For each year, we execute a SQL statement built at runtime with the data passed by the previous control, to fix the wrong week number (using as many weeks + 1).To do this, use the control tMySqlRow. This control lets you execute a SQL statement for each record in the flow stream and transmitting the job to the next step.

The statement executed is the following (note as we build the SQL statement by concatenating pieces of fixed text with the values ​​of the row variable, which will be passed by the previous sequential component):

"Update in September dwd_tiempo sem_id =" + row2.semana +
"+ 1 WHERE Ano_ =" +
 row2.ano_id + "and sem_id = 1 and Mes_ = 12" 

3) We finished the process of correcting the field under semano_id compound, which is also wrong and it is for the week of the year in the notation YYYY-SS, where YYYY is the year and SS is the week.To do this, we also use tMySqlRow control.

The statement executed is as follows (also built by concatenating pieces of fixed text with the values ​​of the row variable, which will be passed by the previous sequential component).In this case, we are using MySQL functions (CONCAT and CAST) to get an idea of ​​the power of SQL language in combination with the use of Talend variables:

"Update enobi.dwd_tiempo" +
"Set semano_id = CONCAT (CAST (AS CHAR ),'-', Ano_ CAST (AS CHAR sem_id))"
+ "Where Ano_ =" + row2.ano_id + "and Mes_ = 12" 

Steps 2 and 3 could have been made andn a single SQL statement, but we separated for clarity.

The entire process would be:


Full Outline of Talend Job in the settlement of weeks

We continue to advance our project and one of the things that is becoming clear is that knowledge of BI consultant must cover many areas: databases, sql, some programming languages ​​(Java in the case of Talend), tools modeling, ETL tools, theory of multidimensional data modeling and its variants, some statistics for data mining, perception of ERP's, CRM's, etc..That without the knowledge of business, business indicators and the different view that there will be to contribute to the company you realize the project. We again remind ourselves of what he said Jorge Fernandez on his blog (" The BI consultant, that weirdo ").

Then, we will perform the ETL process for the Product dimension.