ROC Curves & Screening

This page is created (for the moment) as a set of Questions and Answers

What is a ROC curve?

Very good definitions, as usual in wikipedia.

Why am I interested in ROC curves?

Because they are deeply rooted in history of Psychological research, especially Psychophysics (more specifically Signal Detection Theory). But more important, because they are a true multidisciplinar methodology.

In the 70s it was used for the assessment of image diagnostic systems, and became a standard in this truly multidisciplinary field.

I applied it in my Ph. D. thesis for a problem of marketing research.

What research areas dealing with ROC curves I am interested in?

I am interested in the relation between precision and recall curves and ROC curves. There is a key paper about this issue, by
Davis and Goadrich (2006). Find the paper in their site:
http://pages.cs.wisc.edu/~richm/articles/davisgoadrichcamera2.pdf. This has been a very influential paper as you can see from the cites in scholar.google. They even provide a java program to compute ROC curves from PR curves and vice-versa:
http://pages.cs.wisc.edu/~richm/programs/AUC/

Definetly, a most interesting site to visit: http://pages.cs.wisc.edu/~richm/

From the cites in scholar.google you will also find the following:

J Burez, D Van den Poel (2008): Handling class imbalance in customer churn prediction - Expert Systems With Applications, 2008 - Elsevier

This is a most interesting paper about the problem of predicting churn, and uses PR and ROC curves for evaluation of different algorithms. Now reading it.

AL Garcia-Almanza, EPK Tsang, E Galvan-Lopez ('): Evolving Decision Rules to Discover Patterns in Financial Data Sets

Well, what to say about patterns in financial data sets these days? A curious application…

Data mining with rarity (very few positive examples in comparison with total number) is still a challenge. Some key references:

Weiss, G. M. (2004): Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter, 2004,Volume 6, Issue 1 - Page 7

S Visa, A Ralescu (2005): Issues in mining imbalanced data sets-A review paper. Proceedings of the Sixteen Midwest Artificial Intelligence. Journal of Artificial Intelligence Research 19:315-354.

Every single day more and more surprised of internet for a scholar. This is an essential book on information retrieval, from 2008, cambridge university press, and available here:

http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html

page_revision: 5, last_edited: 1246384161|%e %b %Y, %H:%M %Z (%O ago)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License