INFORMS Data Mining Contest Part 1

trophe A new data mining contest is available here. The functional domain is medical, more precisely there is two tasks. First, we need to prediction if a given patient will be transferred to another hospital. The second task is to predict if the patient will die (the medical domain definitively lacks of fun). For each task, we give a score from the most probable patient to the least. The dataset contains many challenges. In this post, I propose my personals ideas to handle these challenges.

Sequences

Each patient is represented by a sequence (previous visits and the current one). For a given patient we have many lines in file. The sequence is not length fixed so we can’t just put everything on one line with concatenation.

Ensemble attributes

There is also some ensemble attributes (an attribute which the value is an ensemble). In the data file it is represented by Other-Dx-Code-1, Other-Dx-Code-2, … with Other-Dx-Code-9 often missing. There is also Principal-Dx-Code and Admit-Dx-Code which I see part of the ensemble.

Hierarchical attributes

Some attributes are hierarchy. For instance, Hospital-ID and Region-ID are two levels of a geographical hierarchy. I don’t know how hierarchy can be used in data mining (well in a clever way than standard attributes). I could be interesting for generalization purposes and reducing overfitting.

It’s relational

These three problems have in common their relational nature. I think it’s madness to use it directly as a single table, I think that we need to better formalize the problem first. Then we could construct a single table using to feature of relational data mining, selection graphs and aggregation (either manually or automatically). Notice that the last link is a paper from the contest organiser Claudia Perlich thus I think I couldn’t be so wrong. I don’t know if it’s the better way, but if I do something it will be clearly in this direction.

Let's stay in touch with the newsletter

3 Comments

Add yours

BettyCooper
October 25, 2013 at 10:47

Hello Sébastien Derivaux,
I’m Data Mining Specialist at Data Entry Outsourced Company(http://www.dataentryoutsourced.com/processing-services/mining.php) . I would like to participate data mining contest but the link which you had given that’s unable to open.
BettyCooper
October 25, 2013 at 10:49

Hello Sébastien Derivaux,
I’m Data Mining Specialist at Data Entry Outsourced Company . I would like to participate data mining contest but the link which you had given that’s unable to open.
3alphadataentry
October 27, 2014 at 10:04

I, on behalf of
3Alpha Data Entry Services would like to join this kind of contest. Please share the latest link.

Data science for data-driven startups

INFORMS Data Mining Contest Part 1

3 Comments

Add yours

Leave a Reply Cancel reply

Can I help ?

Subscribe to the mailing list

Search the site

Tags

Search the site

Last posts