Data science for data-driven startups

Apache Spark on Windows without winutils.exe

In order to use Spark on windows you need to install winutils.exe and change some environment variables. Here is a nice fix.

Read more...

PostgreSQL for data science : pro and cons

Is PostgreSQL a good companion for a data scientist at a startup? At which maturity stage should it be used? Let’s find out!

Read more...

Hadoop landscape review 2013

I’ve spent some time lately to dig into the Hadoop ecosystem both from a product survey and some hands on. Here is some remarks about […]

Read more...

Data Manipulation Part 2 : ETL

My last post discuss about SQL queries. Nevertheless, sometimes data came from differents databases. In such cases, it is no longer possible to use SQL. […]

Read more...

Data Manipulation Part 1 : SQL

Data manipulation is a big part of a data mining process. Some authors claims it could take 80% of a data mining project. I could […]

Read more...

Data mining tools

When it comes to data mining the tool you use is very important. It seems that peoples use many software (see How many software packages […]

Read more...

Using MySQL as a Data Warehouse

PS : This post is quite old now and isn’t relevant anymore. MySQL 5.6 introduced hash join which basically makes it more suitable to a data […]

Read more...