Solution: ROC curve
- We read false positive rates (FPR) for points A, B and C directly from the graph: they are 0.1, 0.3 and 0.5, respectively.
- True positive rates (TPR) are also shown in the graph: they are 0.2, 0.8 and 0.9.
- The probability of treating a healthy hamster equals the false positive rate (she believes the hamster is "positive" when it is actually not). These are 0.1, 0.3 and 0.5, as we read in the first question. The probability of not treating a sick one is equal to the false negative rate. Consider the point A. If Sara has 100 positives (that is, 100 sick hamsters), she would correctly classify 20 as positive — these are the true positives that the previous question asked about. Hence she would incorrectly classify the remaining 80 as negative. False negative rate is therefore one minus true positive rate; FNR = 1 - TPR. For points A, B and C, false negative rates are 1 - 0.2 = 0.8 and 1 - 0.8 = 0.2 and 1 - 0.9 = 0.1, respectively.
- We need to multiply the costs of mistakes by their respective probabilities. The first mistake costs \$1000 and its probability is FNR, as we've seen in the previous answer. The second mistake costs $600 and its probability is FPR.
- A: $1000 0.8 + $600 0.1 = $860
- B: $1000 0.1 + $600 0.5 = $400
- C: $1000 0.2 + $600 0.3 = $380
The optimal operating point is B. This answer is correct - point B is indeed optimal - but the computation ignores something important. The explanation is a bit longer; see below. 5. The task says that she will "administer the cure to the one which she believes is more likely to be sick". She has to pick one of the two, hence she cannot work with classification lest she can classify both as sick or both as healthy — and then what? The task says "she believes is more likely", which translates to "she assigns a higher probability to". We know that the area under the ROC curve has a nice probabilistic interpretation. Say that we are given two data instances and we are told that one is positive and the other is negative. We use the classifier to estimate the probabilities of being positive for each instance, and decide that the one with the highest probability is positive. The probability that such a decision is correct equals the AUC of this model. The task asks about the opposite probability, the probability of making the wrong decision, which is then 1 - AUC. We must compute the AUC from the graph. Conveniently, there are 100 squares, so we just count the number of squares below the curve: the AUC is 0.755, so the probability of picking the wrong hamster is 1 - 0.755 = 0.245.
Correct solution for Question 5
We will go through the computation in Question 5 more systematically. There are four possible events and related probabilities.
| predicted sick AND being sick | predicted sick AND being healthy | 
| predicted healthy AND being sick | predicted healthy AND being healthy | 
Each events has its associated cost.
| $0 | $600 | 
| $1000 | $0 | 
In general, the costs may be different; they may be even negative, so Sara earns something with she treats a hamster that's sick or when she (correctly) diagnoses a healthy hamster. So, in general, we would have
| A | B | 
| C | D | 
The expected cost is now
A × p(predicted sick AND being sick) + B × p(predicted sick AND being healthy) + 
C × p(predicted healthy AND being sick) + D × p(predicted healthy AND being healthy)
From the ROC curve, we read the probability that Sara predicts that the hamster is healthy WHEN it is actually sick. This is not the same as predicting healthy AND being sick. You may have learned about this in your probability class, but let's repeat.
Say there are 100 hamsters, 20 of which are sick; so p(being sick) = 0.2. Say that p(predicted healthy WHEN being sick) = 0.25. This would mean that she would misclassify 5 (out of 20) sick hamsters. What we computed corresponds to this:
p(predicted healthy AND being sick)
= p(predicted healthy WHEN being sick) × p(being sick)
I hope this makes sense; if not, pick up your favourite book on probabilities because this is important. Oh, yes, in your probability class, you wrote this a bit differently,
p(predicted healthy, being sick) = p(predicted healthy | being sick) × p(being sick)
but we'll stick with our simpler notation. And by the way, you may recall that the "AND" probabilities are called joint probabilities and "WHEN" probabilities are conditional probabilities.
From the ROC curve you read the WHEN probabilities. In computation of costs we need the AND. So, the expected cost in terms of WHENs is
A × p(predicted sick WHEN being sick) × p(being sick)
\+ B × p(predicted sick WHEN being healthy) × p(being healthy)
\+ C × p(predicted healthy WHEN being sick) × p(being sick)
\+ D × p(predicted healthy WHEN being healthy) × p(being healthy)
In terms of the rates we consider in ROC curves, this is
A × TPR × p(predicted sick AND being sick) + B × FPR × p(predicted sick AND being healthy) + 
C × FNR × p(predicted healthy AND being sick) + D × TNR × p(predicted healthy AND being healthy)
As we've realised in Questions 1-3, we can read FPR and TPR directly from ROC, and we compute the other two as FNR = 1 - TPR and TNR = 1 - FPR. So the expected cost is
This works in general. Now back to our homework. To compute the costs we need probability that the hamster who comes to visit Sara is sick — this is called prior probability, since it is known prior to whatever Sara does to diagnose the hamster. Prior probability is are hidden in a sentence that we did not use in the above solution, but we put it in the task's text for any students who would solve the homework strictly, like we're doing now. The homework says "About one half of hamsters she sees have this disease”, so p(being sick) = 0.5 and, naturally, p(begin healthy) = 0.5.
Costs A and D are zero. What remains is
600 × FPR × p(being healthy) + 1000 × (1 - TPR) × p(begin sick)
= 600 × FPR × 0.5 + 1000 × (1 - TPR) × 0.5
In general, the difference is important when prior probabilities are not equal. If the probability of being sick is lower (and thus the probability of being healthy is higher), the optimal operating point would move towards points with lower FPR and higher TPR.
If you encounter this problem in real life, make sure you use the proper procedure that we presented here. Or consult a mathematician. :)