Machine learning vs simulation

datamining Lately, I was thinking on the difference between machine learning and simulation (for prediction). Machine learning use historical inputs and outputs to find subsequent outputs. Simulation, on the other side, asses you get the knowledge, i.e. the underlying model so you don’t need historical data to learn it. Sometimes you can use both methods to know something, sometimes only one method is available. After thinking about it, I find than the distinction between them is thinner that I thought.

I think there is a continuum and it depends on the amount of domain knowledge you add versus the amount of learning. Pure machine learning doesn’t take anything else than data. In theory everything could be learnt from it. On the other side, pure simulation considers that everything is known and that no noise is present.

Of course, in real life, it’s a bit more complex than that. You have to inject some domain knowledge in your models to help them : post-treatment rules, adding some nodes ans leafs in a decision tree. For simulation, often you can’t modelise everything or there is always a parameters you need to estimate on historical data (it’s more statistics than machine learning, but you still need data).

Machine learning never gives perfect results because of the learning approximation. Nevertheless, the same apply to simulation and it’s sometimes worse. If you give wrong input data to a simulator it will produce crap. Machine learning doesn’t suffer a lot if the error is consistant (by example all your numericals inputs divided by 2). Morever, machine learning can change it’s model faster than using simulation or to much domain knowledge, underlying process can change. I’m sure you already made a model with some domain knowledge and where removing the knowledge allows better reults. As always there is a tradeoff to make.

These two tools can be both useful and, to some extends, merged together.

Let's stay in touch with the newsletter

2 Comments

Add yours

Steffen
October 29, 2009 at 19:14

Strictly speaking, the usage of a specific machine learning method already adds domain knowledge 😉 (e.g. in a specific area some methods are more suitable than others).

I think an example for the merging of both “tools” is the EM-Algorithm (read: expectation maximization) for e.g. estimating the parameters of a multi-gaussian-distribution.

kind regards,

Steffen

PS: I like your style of writing and the topics you are covering. I am looking forward to read more posts (more often 😉 ) in the future.
Sébastien Derivaux
October 31, 2009 at 01:50

I agree on your comment but it is only when you choose the method before applying it and not trying other methods. Usually, I pick many methods and try all choosing the best or the best combination. Sometimes it is not the method I would choose by “domain knowledge” which is the best. For unsupervised classification, the choice rely more on knowledge than evaluation (usually not possible).

Thanks for your PS.

Data science for data-driven startups

Machine learning vs simulation

2 Comments

Add yours

Leave a Reply Cancel reply

Can I help ?

Subscribe to the mailing list

Search the site

Tags

Search the site

Last posts