Data science for data-driven startups

MySQL/MariaDB for Data Science : Pro and cons

Is MySQL or MariaDB suited for the analytics workload of data science? Can it be used as a data warehouse? Let’s find out.

Read more...

Supercharged Excel for startup analytics with PowerBI

You think Excel is not suitable for analytics? Let me convince you that Excel can be your best tool with PowerBI.

Read more...

Apache Spark on Windows without winutils.exe

In order to use Spark on windows you need to install winutils.exe and change some environment variables. Here is a nice fix.

Read more...

PostgreSQL for data science : pro and cons

Is PostgreSQL a good companion for a data scientist at a startup? At which maturity stage should it be used? Let’s find out!

Read more...

Hadoop landscape review 2013

I’ve spent some time lately to dig into the Hadoop ecosystem both from a product survey and some hands on. Here is some remarks about […]

Read more...

Data Manipulation Part 2 : ETL

My last post discuss about SQL queries. Nevertheless, sometimes data came from differents databases. In such cases, it is no longer possible to use SQL. […]

Read more...

Data Manipulation Part 1 : SQL

Data manipulation is a big part of a data mining process. Some authors claims it could take 80% of a data mining project. I could […]

Read more...

Data mining tools

When it comes to data mining the tool you use is very important. It seems that peoples use many software (see How many software packages […]

Read more...

Using MySQL as a Data Warehouse

I often use a database not only to store data but also to do some treatment before mining and some analysis. I use MySQL as […]

Read more...