Pentaho Kettle Solutions: Reviews

Good ..

I just finished reading Pentaho Kettle Solutions. This book is so extensive and interesting, I've been reading longer than I thought, but I was surprised with the quality of content and diversity of topics covered.
Here are the review:
1) Overview: This is a very entertaining book to read, which analyzes many complex issues easy to understand explanations keeping. The highlight of this book is written by professionals who have worked with PDI (akaKettle) for many years, making numerous contributions and even have carried out their development (in the case of Matt Casters). That is, there is a big gap between knowing what it is for a 'step', and know why they created this 'step', and this difference is noticeable everywhere in this book.
Pentaho Kettle Solutions Although not intended for beginners, is ideal for anyone who is currently working with PDI or want to do in the future, covering topics that are vital to have them in mind when using this tool.
2) Detailed Review: this book inience with an introduction about the ETL and the particularities that should have an ETL tool, then presents Kettle and describes their characteristics, how to install and run it, and how it is designed, ie details on Transformations, Jobs, jumps, data types , repository types, parameters, variables, etc.
Then presents examples to take data from a transactional database and load a data warehouse, addressing complex issues related to slowly changing dimensions , change detection in the data (CDC - Change Data Capture), denormalization, etc.
Follow with the explanation of the 34 subsystems of the ETL process as defined by Ralph Kimball, and then address each subsystem from Kettle, with emphasis on how it resolves each situation and illustrating in each case.The topics covered to highlight are:
  • Job execution, backtracking, in parallel, slave servers (letter).
  • Transformations Running multi-threaded, distribution lines, clustering and management of partitions changes.
  • Connections to databases: general and advanced options, pooling, clustering, management of connections and transactions.
  • Performance and scalability.
  • Extraction of data, data profiling (using DataCleaner), CDC.
  • Data cleasing, management of different types of errors, auditing, duplicate data, scripting.
  • Key management, loading dimension tables (snowflake schema and star), implementation of different types of dimension tables.
  • Different types of fact tables, Bulk Load, loading and handling.
  • Extract data from multiple OLAP technologies.
  • Lifecycle ETLs development, good and bad practices, agile development, test, debug and documentation.
  • Scheduling (cron, at, xaction, PDS and Pentaho) and monitoring.
  • Using Dynamic Clusters (Amazon EC2).
  • Integration & religious affiliationse, n real-time data.
  • Handling complex data formats (non-relational, unstructured).
  • Using Web Service, examples of XML, SOAP and RSS.
Since reaching the final details how to get and compile Kettle, Kettle using Java API by example, and how to develop plugins to extend Kettle own!
Pentaho Kettle Solutions addresses the integration of data (the 34 subsystems) and systematized whole lot of concepts, examples, best practices, design and performance issues, making it an option recommended for those who belong entirely to the BI world, as for those who need to make any kind of data integration.
3) final Review: Definitely, this is another book essential reading, whether you are working with Pentaho BI solutions or implementing transactional systems.Because the integration of data covers many topics and addresses many issues that are present in any company / organization that has transactional systems, BI systems, use DBMS or simple spreadsheets.