Pentaho SSO with CAS and LDAP: Tips and Notes

After hearing various comments and setting up some more production implementations, I figured I would share some tips beyond the straight setup here. I will reserve the other post for keeping the basic instructions and use this one for any additional notes as I come across them.

Version compatibility

Many people seemed to have trouble with the directions using more recent versions of the software than this was made on.  I have successfully connected Pentaho 4.1 to CAS 3.4.11 following these directions, so they will most likely remain valid until Pentaho implements a more recent version of Spring Security (which would probably be Pentaho 5 at the earliest).  The only part of the install that is absolutely version specific is the spring-security-cas-client-2.0.5.RELEASE.jar file.  A later version of this jar will not work.

The difficulty connecting that many people seem to have most likely comes from applicationContext-spring-security.xml file.  Since the directions were made with a specific version of CAS, all of the URL references in it are made to the “http://localhost:8080/cas-server-webapp-3.4.8/” version of the webapp.  Since you can technically name the webapp anything, these URL references need to be changed to match, such as “http://localhost:8080/cas-server-webapp-3.4.11/” for the current version.

Security

While CAS will complain on its login page about not using SSL, I am as of yet to find a version that will actually not allow pentaho login. This does not mean that certain setups will not run into this problem, and many other CAS clients require SSL connections  Either way, setting up your server to use SSL is always a good idea.  The instructions apache provides (http://tomcat.apache.org/tomcat-6.0-doc/ssl-howto.html for the version of tomcat pentaho uses) should be sufficient for setting this up, and remains mostly consistent between later versions of tomcat.

Again changes need to be made to the applicationContext-spring-security.xml file’s URLs, this time make sure to change the “http://localhost:8080″ to the equivalent secure URL such as “https://localhost:8443″ to make sure it actually uses the SSL.

XML Encoding

This is something that I’m sure some people have run into before, namely that when copying the XML over it sometimes cannot be read by the server.  The primary cause of this is due to the XML file requiring UTF-8 encoding, and many text editing programs (especially in Windows land) tend to automatically save to more modern text encodings.  So When editing these files, make sure to force UTF-8 encoding when you save.

Authentication vs Authorization

CAS like its spelled out name implies is designed to handle authentication, i.e. verifying a valid username and password combination.  This does not include authorization, i.e. checking if a user is actually allowed to access something.  Authorization is handled on the client end, in this case Pentaho.  Any time a user successfully logs in to CAS but does not have any authorization in pentaho, they will be redirected to the default public/casFailed page which does not exist by default.  As usual, this can be changed in the applicationContext-spring-security.xml file.

Normally both are handled in Pentaho, with authentication handled in the applicationContext-spring-security.xml that is modified as part of the CAS integration, and also in another file that is imported through the pentaho-spring-beans.xml file, the default being called applicationContext-spring-security-hibernate.xml. After CAS is integrated, this imported file is mostly ignored, as CAS is bypassing all of the beans in it.

Authorization is handled primarily by another file imported through the pentaho-spring-beans.xml file called applicationContext-pentaho-security-hibernate.xml by default.  Like the similarly named file mentioned previously, there are a number of alternate versions that can be used instead of the default hibernate version, such as -jdbc (checks against a database query), -ldap (checks against an ldap server), -memory (checks against data written directly in the XML file).

I haven’t fully explored all of the possibilities those different authorization options may provide, so it is very possible that there is an elegant solution available in modifying those files.  If the user base for a pentaho deployment is not especially large and/or does not change often, updating user permissions by hand may be a more cost effective solution than spending hours coming up with a custom solution.  When confronted with this same problem with a client who had potentially thousands of users, I went to the Pentaho developer’s multipurpose tool:  PDI.

This ETL is designed to check the USERS table found in the hibernate database that Pentaho CE comes with against an LDAP server and make comparisons for any new users, which it then adds to the Hibernate database with the basic Authenticated role.  The main advantage of this technique is that it is not difficult to implement against any size, and can be scheduled to run regularly to keep the records up to date.  It also allows the Admin Console to still be used to add additional roles to the users.  The main disadvantage is the fact that there will always be a time delay until its next scheduled run or the next time someone manually does so.  It also lacks flexibility, as it grants the same permission to every user it adds, so still requires human intervention to make users an admin or some other role. This could probably be overcome with further development and more precise deployment specifications from a client, and in its current form is still very effective at authorizing large groups of users at once.

 

If there are any other solutions out there, or other comments about these or any other issues, let me know.

Posted in BI Server, Pentaho | Tagged , | Leave a comment

PowerPivot and Production Processes

I am a strong advocate of Microsoft Excel and PowerPivot for the analysis and organization of information.  While it may seem odd to many, Microsoft should be applauded for the newest release of Excel and, in particular, the free add-on of PowerPivot enabling the business intelligence like crunching of information.  I’m impressed.  As with most good things, there’s a caution that comes with this new tool.

I’ve had the implications of poor internal controls beat into me many moons ago when I managed an IT Auditing group for an international firm.  As a result, I can’t shake the butterflies when I see organizations relying on these tools for complex and manually intensive data analysis and reporting.  Splitting large files into chunks of under one million rows for PowerPivot and combining the reporting results on the back can introduce an opportunity for inconsistency.  Combining that with the manual intervention necessary to integrate multiple data sources compounds the risk of inconsistencies; particularly if the individual with all the knowledge of manual manipulation is not available for whatever reason.  Ouch.

To be clear, I am not discouraging the use of Excel and PowerPivot.  They are excellent tools and I use them quite effectively on an ongoing basis.  However, if you need a repeatable, reliable process to generate information used in key decisions, get yourself a production worthy tool.  Spend your time utilizing the information as opposed to manually crunching it.  The number of affordable solutions that can solve your production challenges might surprise you.

Posted in Uncategorized | Leave a comment

How to Use MySQL and Microsoft PowerPivot Together

Overview

This document shows how an Open Source Database like MySQL can be used with Microsoft’s popular Excel plugin PoweverPivot.  Many clients enjoy the familiarity of Excel and ease of use that PowerPivot brings but the clients do not want to married with any one particular vendor in their Business Intelligence process.  Consumers should have the choice and be aware that they can mix and match their solutions if they so choose.  

Technologies Used

Please go to the corresponding web pages to download if you don’t have them.

  • Windows 7 x64
  • Microsoft Excel 2010 
  • Excel PowerPivot Plugin (http://www.powerpivot.com/)
  • MySQL Workbench 5.2 (http://dev.mysql.com/downloads/workbench/)
  • MySQL 5.5.13 (http://www.mysql.com/downloads/mysql/)
  • MySQL 5.1 ODBC Driver (http://dev.mysql.com/downloads/connector/odbc/)

Define your MySQL connection on Windows

If using Windows 7, go to the search bar at the start menu and type in “Data Sources” and then click on the Data Sources (ODBC) icon that appears.  A window will pop up.  Verify your installation of MySQL ODBC Driver has installed by click on the Drivers Tab.  Look for the MySQL ODBC 5.1 Driver listed on the left:


Click on the System DSN tab and then click Add.  The MySQL Connector window will open up.  Fill in the appropriate information.  If you’ve installed it locally, type local host for your Server or the IP Address , or the host of your remote server.  Be sure to test your connection and give your connection a meaningful name and click OK.  


You should see your connection as below (Other connection names erased for confidentiality):

Click OK .

Viewing your MySQL information with MySQL Workbench

The reason I include showing how to use MySQL Workbench is to show those new to SQL or MySQL how to get started easily by visualizing the database.  MySQL includes a very useful tool for seeing the objects in the database.  It also does a variety of other useful features in one place and is somewhat similar to SQL Server Management Studio and the biggest advantage is that it’s free.  

When you first go into the Workbench, you’ll have to define a new connection.  Click new Connection at the bottom left of the screen:


This screen will pop-up: 


Fill out the information like you did for the ODBC driver.  I chose my default schema to be my “scratch” database.

Click on the close button.  Now double click on your new connection you’ve just created:


You’ll see how it organizes all your tables and views very nicely by schema.  You can very easily see the table structure.  This is what I recommend using to build your SQL or explore the database.  The part that malfunctions when using MySQL with PowerPivot is when you try to import whole tables.  However, it’s easy to write simple SQL using Workbench to put into Power Pivot.   Workbench can write a lot of it for you. Not only that, you’ll learn more about your data by seeing it visually.

Connecting MySQL to PowerPivot

Open Excel and then click on PowerPivot tab and then on the PowerPivot Window icon.  You’ll want to then click on the “From Other sources” icon:

Scroll down and then go to the Others(OLEDB/ODBC).  Click on it and then click “Next” and the Table Import Wizard will come up.

Let’s call our connection “Sales.”  Then click on the build button.  What’s a little tricky here is that the window goes to the “Connection” tab first which defaults you to a Microsoft SQL Server driver first.  Click on Provider and then select  Microsoft OLE DB Provider for ODBC” as shown below:


Click “Next” at the bottom.  Find the Local MySQL we defined earlier or your remote MySQL.  Select your default catalog (schema) to use and then test your connection.  Once tested press OK.  

You’ll see an OLE DB connection string created for you.  Now you’ll want to press next:

The first option below will not work because MySQL does not allow brackets in the SQL written to it.  This is the SQL that PowerPivot writes by default.  You can use this option to see what’s available in the database to create a simple Select * form table format.  You can at least the list of tables and be able to use that to get a simple Select * (this is all that this wizard writes for SQL anyway).

Click Back, and go back to the second option “write a query that will specify the data to import.  Write a simple query to get the table you’re looking for.  We will grab the sales table.  You can click the validate button to make sure your SQL is good to go:

Click finish and you’re done!  You’ll be able to import the data right into PowerPivot without any issues. There are a lot of different types of databases that can be imported into PowerPivot because most databases have ODBC connectivity.  The part that makes it tricky is that:

Not all databases allow brackets in their SQL, but this is easily overcome by using your own SQL.  You can still preview the tables to make it easier to know what tables to grab.

That you have to use the Microsoft OLE DB for ODBC.  I wonder if Microsoft was trying to make it harder on you by using the SQL Server driver by default and putting you on the connection tab by default.  . . .

You now can use nearly any database you want and enjoy the ease of use of PowerPivot.  Enjoy!

Using a non-Microsoft OLAP Engine?

You can in fact use an OLAP data source other than Microsoft Analysis Services, but it’s a little harder to implement like Pentaho’s Mondrian OLAP engine.  One easy way to do it is to use  JasperSoft’s ODBC connect.  Unfortunately, it’s not Open Source.  I’m sure there’s a way to to do this since it looks like it works based on XMLA.  I believe that’s how SAP’s OLAP engine works with PowerPivot.   

Posted in Business Intelligence, Microsoft, Mondrian, MySQL, OLAP, Pentaho, PowerPivot, Reporting | Tagged , , , , , , | 3 Comments

Pentaho SSO Setup using CAS and LDAP

Hello.  Today I’m going to be discussing how to go about setting up SSO (Single Sign On) support for Pentaho using CAS (Central Authentication Service) and an LDAP (Lightweight Directory Access Protocol) Server.  By doing this, you can allow Pentaho to utilize the same usernames and passwords as your other programs requiring authentication, removing the burden of having to track and maintain multiple usernames per employee, as well as allow CAS compatible programs to share session data so that end users only have to log in once for all of them.  Continue reading

Posted in Business Intelligence, Pentaho | Tagged , | 19 Comments

Pentaho Analysis Mondrian OLAP Security Setup – Part I

Hello. This is the first Datamensional blog and personally the first blog I’ve ever wrote! Yes, I know I’m a little behind the times but our team aims to add useful and interesting knowledge for both Pentaho and the Business Intelligence communities.

What I have here is a YouTube Video that shows how to setup security roles for Mondrian on the Pentaho BI Server. The BI Server doesn’t come preconfigured to do Mondrian roles and there is actually quite a few ways to configure your Mondrian. I’ll show you the most common way in this video. (I think they should mention this in the next Pentaho Bootcamp. In fact, I’m teaching the next one in July ’11 in San Francisco and I’m going to bring that up)!

Now why on earth would you want to go through all this trouble? (Well, it’s really not too much trouble if you know exactly what to do, you could do it in about 10 minutes). What’s great about using role-based security in Mondrian is you can restrict data within the Mondrian Data Model also known as a cube based on the user who logs in. That user is then associated to a security role, such as California. The classic example is if a manager of sales who is responsible for the California sales territory logs into Pentaho and wants to view their sales cube, they will only see their California sales data. The advantage to the BI Developer is that they can centrally manage all access on the BI Server and only create one cube for the entire installation.

Without further ado, here is my YouTube video which shows you how to configure the BI Server for Role-based security.

I’ll be releasing another video in blog soon to show you how to implement this security in the cube itself. Stay tuned . . .

Posted in Business Intelligence, Mondrian, OLAP, Pentaho | 13 Comments