When running an ETL in Pentaho Data Integration (Kettle), logging is normally done in a log file on the local system. When running an ETL through Carte or the BI server, it will add the information to their own separate local log file. These directions will allow the log information to be placed in a centralized database, keeping them all visible in one place, and also enabling PDI’s powerful performance monitoring. This was originally created using Pentaho 3.8 and an Oracle 11g database.
Find the kettle.properties file
You’ll want to modify your kettle.properties file which is in the user folder of both Linux and Windows. Spoon, Carte, and the BI Server will all look in the same place. Find your kettle.properties in these locations depending on your OS:
- Windows: C:\<User>\.kettle\kettle.properties
- Linux: /home/<User>/.kettle/kettle.properties
The .kettle folder might be hidden. Open with a great Text Editor like Notepad++
Setting up the connections to your Database
Ensure that you have full access to do inserts and create tables. It’s recommended you set up your connections on your BI Server and/or Carte as JNDI connections. Ensure you have the proper JDBC drivers locally as well as on the servers.
Make sure that they have the same connection name. See the directions on how to setup a local JNDI connection in Spoon. You could potentially just store the connection information within the Spoon job as well and not setup Server based connections and avoid JNDI altogether. Though that would make it more difficult to pool connections and switch from Development and then production environments.
Adding the Variables:
Because of a bug in Pentaho, you should not fill in any variable related to Schema, especially if it’s implied what the default schema is. Add the following to your kettle.properties on both your server and local machine you use Spoon on:
Testing Logging to Database on Spoon
Save your file locally first in on the specified .kettle folder depending on your OS. Make sure you shut down Spoon if it’s not shut down already. Then reboot the application.
Check to see that your Kettle.properites file was properly loaded. Go to Edit Edit the Kettle Propreties File. You should see variable names on the left correspond to the values on the right you just edited in the file. If you don’t see any values filled out, you’ve got your file in the wrong place. The picture below actually shows a file with empty log variables. You could optionally fill in the values here or go back to your file. Either way, you’ll have to restart Spoon after making your modifications.
Set up your Jobs and Transformations for Logging
Next, open a Job you want to log into your logging tables. Go to Edit and then Settings. Click on the logging tab. There are three sections each representing a different logging table for jobs. Make sure for each section all the appropriate fields are selected and that you fill out all the appropriate environment variables you defined in the kettle.properties file. Most people will pick all of them. If you haven’t done so already, click on the SQL button. It will create all necessary tables for all sections.
Now open a transformation, and go through exactly the same motions as a job. Notice there 4 sections instead of three. Make sure everything is checked and that all tables are created.
Moving your Logging to Carte or BI Server
This assumes that you have created a JNDI connection in Spoon with the same name as the connections on your server. Also, you’ve verified the connections work and have all the appropriate database drivers.
Make sure you move the kettle.properties file you’ve already configured locally to the server in the appropriate user folder and the .kettle folder as illustrated above. You’ll have to shutdown and restart the server since you are adding new environment variables to the server instance.
Now that everything is configured locally, on the sever and we’ve made the appropriate changes to the solution files, we can move the solutions to the server. Simply move them to Carte or for the BI Server, move the solution files with corresponding XACTIONs that execute the solution files. You’ll have to refresh the repository as necessary.
Execute the solutions on the server. All logging should go to the exact same place locally as well as remotely.
Make a template for both Job and Transformation files with all the appropriate logging configuration setup. This way you don’t have to make them over and over again. If you have one job executing other jobs, it will pick up the logging of the children jobs. If executed alone, it will not. The master job will not pick up the transformation logging unless specified in the file itself.
Extra Hot Tip: Kinda a big deal if your tables begin to lock:
Since PDI is designed to be platform agnostic, the logging table mechanism attempts to find a happy medium between all databases. However, there are some database specific things up you must do especially if you’re processing a lot of data to avoid concurrency issues and you may get a table lock. We ran into this problem with Oracle 11g when processing about 8000 rows/s through PDI. Matt Casters, the founder of the Kettle project, has a Pentaho Forum post that shows you how get around the concurrency table lock issue using both Oracle and MySQL on this Pentaho Forum post.