A university is applying classification methods in order to identify alumni who may be interested in donating money. The university has a database of 58,205 alumni profiles containing numerous variables. Of these 58,205 alumni, only 576 have donated in the past. The university has oversampled the data and trained a random forest of 100 classification trees. For a cutoff value of 0.5, the following confusion matrix summarizes the performance of the random forest on a validation set: Predicted Actual Donation No Donation Donation 268 20 No Donation 5,375 23,439 The following table lists some information on individual observations from the validation set: Probability of Predicted Observation ID Actual Class Donation Class Donation 0.8 Donation No Donation No Donation No Donation 0.6 Donation (a) Choose the correct explanation for how the probability of Donation was computed for the three observations. (1) The probability of Donation for each observation is the proportion of the 100 individual classification trees that classified the observation as "Donation." () The probability of Donation for each observation is the proportion of the 100 individual classification trees that classified the observation as "No Donation." (i) The probability of Donation for each observation is the ratio of the individual classification trees that classified the observation as "Donation and those that classified it as "No Donation." (iv) The probability of Donation for each observation is the ratio of the individual classification trees that classified the observation as "No Donation and those that classified it as "Donation." Option (1) Why were observations A and classified as Donation and Observation B was classified as No Donation? 0.1 0.8 It is greater y than 0.5, so Observation A is classified as Donation by If required, round your answers to one decimal place. The probability of Donation for Observation A is A the random forest. The probability of Donation for Observation is by the random forest The probability of Donation for Observation C is the random forest. 0.1 It is less 9 than 0.5, so Observation B is classified as No Donation 0.6 It is greater » than 0.5, so Observation C is classified as Donation by (b) Compute the values of accuracy, sensitivity, specificity, and precision. Explain why accuracy is a misleading measure to consider in this case. Evaluate the performance of the random forest, particularly commenting on the precision measure. If required, round your answer to three decimal places. Accuracy If required, round your answers to the nearest whole percentage. Accuracy is not the best measure to use for unbalanced data sets because less than % of the alumni in the data have donated. If required, round your answers for Sensitivity and Specificity to three decimal places and round your answer for Precision to four decimal places. Sensitivity 0.931 Specificity 0.813 Precision 0.0474 The value of precision seems disturbingly small The precision measure represents the percentage of alumni classified by the random forest as Donations that are donors. Comparing the value of precision with the proportion of observations corresponding to donations, there is a tremendous improvement in the ability to target alumni who may be more likely to donate

Question

Answered

Answer :