About evaluation

When deploying a model, one very important thing is to monitor the results. Does it work like you’ve expected? I’m not talking about pre production tests but following the life of your model. I use two kind of reports to do that : preventive reports and corrective reports. As you expect the first one is created just after the prediction and the second is created after the consequence of the prediction is known.

Preventive reporting

My last model makes five weeks ahead prediction. The result could be changed within one week. After that, it’s just too late. As these predictions could have dramatic impacts, it is good for me to be sure it wouldn’t cause any mess (learning is done each time on an updated dataset). As there was thousands of predictions each time it would be impossible for me to check everything (beside the fact that it’s tricky sometimes).

I predict not only 5 weeks ahead but also 6, 7 and 8 weeks ahead. Thus I can watch the evolution of the prediction. If results are greatly changing, it’s good to go deeper. It’s also interesting to see if the results are quite the same 10 weeks ahead or 5 weeks ahead. If they are always the same, you could take opportunity to publish results 5 weeks sooner which could unlock a business opportunity.

I also consider an comparison with past data as my prediction are comparable years after years (but of course not exactly the same). This can avoid big mistakes on a particular prediction.

Corrective reporting

Corrective occurs when an evaluation about the prediction can be made. On the top of such report I have one or more dial chart which gives an aggregated indicator, i.e. is the error acceptable or not. If it seems fine, there is nothing more to do. If not, I have many statistics to find where the error is. Using only aggregated error indicators like root mean squared error don’t give any hint on why the error is so big. As I’m making hundred numeric predictions each time, I compute the mean deviance from the real values. Usually this value should be around 0%, deviations errors compensate themselves. Nevertheless sometimes I see 5%, i.e. on average each prediction is 5% bigger than the value which should be predicted. This reveal an obvious problem (learning set too old or with mistakes, a change in the process, …). You could also compute RMSE on different subspace of the evaluation set, but it seems quite complicated to obtain useful insights.

As conclusion, I found that prediction is not the end of the data miner job if you want some quality. How do you follow your models?

Let's stay in touch with the newsletter

Data science for data-driven startups

Leave a Reply Cancel reply

Can I help ?

Subscribe to the mailing list

Search the site

Tags

Search the site

Last posts