A new data mining contest is available here. The functional domain is medical, more precisely there is two tasks. First, we need to prediction if a given patient will be transferred to another hospital. The second task is to predict if the patient will die (the medical domain definitively lacks of fun). For each task, we give a score from the most probable patient to the least. The dataset contains many challenges. In this post, I propose my personals ideas to handle these challenges.
Sequences
Each patient is represented by a sequence (previous visits and the current one). For a given patient we have many lines in file. The sequence is not length fixed so we can’t just put everything on one line with concatenation.
Ensemble attributes
There is also some ensemble attributes (an attribute which the value is an ensemble). In the data file it is represented by Other-Dx-Code-1, Other-Dx-Code-2, … with Other-Dx-Code-9 often missing. There is also Principal-Dx-Code and Admit-Dx-Code which I see part of the ensemble.
Hierarchical attributes
Some attributes are hierarchy. For instance, Hospital-ID and Region-ID are two levels of a geographical hierarchy. I don’t know how hierarchy can be used in data mining (well in a clever way than standard attributes). I could be interesting for generalization purposes and reducing overfitting.
It’s relational
These three problems have in common their relational nature. I think it’s madness to use it directly as a single table, I think that we need to better formalize the problem first. Then we could construct a single table using to feature of relational data mining, selection graphs and aggregation (either manually or automatically). Notice that the last link is a paper from the contest organiser Claudia Perlich thus I think I couldn’t be so wrong. I don’t know if it’s the better way, but if I do something it will be clearly in this direction.
Let's stay in touch with the newsletter
October 25, 2013 at 10:47
Hello Sébastien Derivaux,
I’m Data Mining Specialist at Data Entry Outsourced Company(http://www.dataentryoutsourced.com/processing-services/mining.php) . I would like to participate data mining contest but the link which you had given that’s unable to open.
October 25, 2013 at 10:49
Hello Sébastien Derivaux,
I’m Data Mining Specialist at Data Entry Outsourced Company . I would like to participate data mining contest but the link which you had given that’s unable to open.
October 27, 2014 at 10:04
I, on behalf of
3Alpha Data Entry Services would like to join this kind of contest. Please share the latest link.