In order to use Spark on windows you need to install winutils.exe and change some environment variables. Here is a nice fix.
data warehouse, postgresql
Is PostgreSQL a good companion for a data scientist at a startup? At which maturity stage should it be used? Let’s find out!
data manipulation, hadoop
I’ve spent some time lately to dig into the Hadoop ecosystem both from a product survey and some hands on. Here is some remarks about […]
data manipulation, data warehouse
My last post discuss about SQL queries. Nevertheless, sometimes data came from differents databases. In such cases, it is no longer possible to use SQL. […]
Data manipulation is a big part of a data mining process. Some authors claims it could take 80% of a data mining project. I could […]
data mining, statistics
When it comes to data mining the tool you use is very important. It seems that peoples use many software (see How many software packages […]
data warehouse, mysql, olap
PS : This post is quite old now and isn’t relevant anymore. MySQL 5.6 introduced hash join which basically makes it more suitable to a data […]