dataminingLately, I was thinking on the difference between machine learning and simulation (for prediction).  Machine learning use historical inputs and outputs to find subsequent outputs.  Simulation, on the other side, asses you get the knowledge, i.e. the underlying model so you don’t need historical data to learn it.  Sometimes you can use both methods to know something, sometimes only one method is available. After thinking about it, I find than the distinction between them is thinner that I thought.

I think there is a continuum and it depends on the amount of domain knowledge you add versus the amount of learning. Pure machine learning doesn’t take anything else than data. In theory everything could be learnt from it. On the other side, pure simulation considers that everything is known and that no noise is present.

Of course, in real life, it’s a bit more complex than that. You have to inject some domain knowledge in your models to help them : post-treatment rules, adding some nodes ans leafs in a decision tree.  For simulation, often you can’t modelise everything or there is always a parameters you need to estimate on historical data (it’s more statistics than machine learning, but you still need data).

Machine learning never gives perfect results because of the learning approximation. Nevertheless,  the same apply to simulation and it’s sometimes worse. If you give wrong input data to a simulator it will produce crap. Machine learning doesn’t suffer a lot if the error is consistant (by example all your numericals inputs divided by 2). Morever, machine learning can change it’s model faster than using simulation or to much domain knowledge, underlying process can change.  I’m sure you already made a model with some domain knowledge and where removing the knowledge allows better reults. As always there is a tradeoff to make.

These two tools can be both useful and, to some extends, merged together.


Let's stay in touch with the newsletter