While Microsoft have great tools for data science (Excel being one of the best front end lately with PowerBI powers), they are also great at shooting themselves in the foot. I mean, connecting to the universally acclaim open source PostgreSQL database (check PostgreSQL for Data Science Pro and cons) should be working right out of the box? Just like any other BI tools, right? You bet …
While they were kind enough to put PostgreSQL in the database category, it just doesn’t work when you select it.
Most tools would have a “Take care of it” button to solve the problem, but here you just have a link to a release page of an unknown project on github with no clue of what to do.
The easy path: Using ODBC
The easier way is to use an ODBC connection (with the ODBC connector in PowerBI and not the PostgreSQL one). ODBC stands for Open DataBase Connectivity. It provide an unified interface that each database provider can implement by providing an ODBC driver that you can install on your computer.
PostgreSQL has a dedicated website for it. You just have to go to the download page and take the last MSI version which has x64 in it (it should match the version of PowerBI which is always 64bits I think).
You can then add a datasource inside the ODBC Source manager (Windows key then type ODBC) where you can add a data source (PostgreSQL UNICODE). Pick the Unicode version as it’s better for text handling.
You can also don’t add a Datasource in the ODBC tool and type all the connection information in PowerBI as done in this blog article. As I use Excel as well, I personally prefer to define setting only once.
In all cases, credential settings are not used, as PowerBI will ask you for them the first time and store them in File -> Option and settings -> Data source settings.
Let’s go back to setting up the PostgreSQL (non ODBC) connection. The github page of Npgsql allows us to install the tool using an MSI installer. When you install it, be careful to add the feature Npgsql GAC Installation. It will not work without it.
After restarting PowerBI, it should now work.
If you use schemas, I recommend that you select “Navigate using full hierarchy” under the advanced options.
If you don’t use the default port of PostgreSQL, you must add it in the server section like “myserver.com:1111” if you use port 1111.
But your pain is not over yet, as you will most likely end up with this error:
PowerBI complains that it is not sure if the remote server is really legit as the SSL certificate is not approved by an authority. It will not propose you to accept this certificate.
The easy way around is to go to ” File -> Option and settings -> Data source settings”, go to the data source, and unselect the “Encrypt connections” option. Obviously not a good solution as every exchange between the server and PowerBI will be unencrypted.
If you have an user account on the PostgreSQL server (if it’s Linux), you can go back to some security by setting up a SSH tunnel with the following command (assuming you have Linux tools on your Windows).
ssh -L 5432:localhost:5432 email@example.com
While the tunnel is up you can connect to the remote database by connecting to your local computer (server = localhost). I stop here, because I’m unlikely to convince you that it is easy or convenient. Just remember that this trick is also useful to connect to a database that is not listening on the network.
SSL and Let’s Encrypt
At this point, you need your PowerBI to accept that the PostgreSQL server certificate is legit.
One way is to add the certificate to your Windows whitelist. You can see this post for the procedure if you use an AWS RDS server. If you are not on Amazon or a big cloud provider, it’s unlikely to be easy to create a chain of certificate to satisfy Windows. You can read the SSL PostgreSQL page if you have some hours to spare (to save you some time, you can use this website to convert PEM files to PKCS#7).
Luckily, there is a way to certify your server from a recognized authority with Let’s Encrypt. For instance, this website is using HTTPS and is recognized secure by your web browser thanks to Let’s Encrypt (like tens of millions of website). Obviously, your server should be reachable from internet (as the authority that certify that you are who you claim use internet to check).
There is a good tutorial from Pavel Evstigneev to help you with that.
Now your server is legit, and it works on PowerBI Desktop but you are not done.
And you sought that the cloud was in the cloud?
Now, you have a proper PostgreSQL connector, your database is using SSL so everyone should be happy. So, you publish your report on PowerBI Service and hit refresh.
And then it fails miserably.
Somehow, for Microsoft, the cloud is a synonym for Microsoft and some partners products. For the rest, you need to set a gateway and act like they are “on premise”.
The PowerBI Gateway is a program that run under a Windows computer. There is two flavor of it, the personal and the enterprise one. If you are provisioning a Windows VM, I see no reason to go for the personal mode.
As we are under a Windows server, we can stay with an ODBC connection. Remind that such ODBC connection by be system wide and not just for the user (by default, it’s the user). I never got Nsql work on PowerBI Service and haven’t tried lately.
Now that the gateway server is set up, you can go back to PowerBI Service and select Settings from your Report/Dataset. Under the Gateway connection, you can check that the gateway work and you need to provide (easy to forget) the mapping from the ODBC connection in the PowerBI Report to the ODBC connection under PowerBI Gateway (most likely the same name).
Congrats, you now have a PowerBI Service working on top of a PostgreSQL database. Quite an adventure, right?
While you are here, if your PostgreSQL is a data warehouse you might want to check Data Brewery. It’s an ETL tool to supercharge your data warehouse.
Let's stay in touch with the newsletter