33 lines
1.2 KiB
Markdown
33 lines
1.2 KiB
Markdown
1)
|
|
a) Binomial Distribution
|
|
b) Measures dispersion of probabilities with respect to a mean average value. Each possible value of S from 0 to N, the probability of observing S correct predictions given a sample of N independent examples of true accuracy P
|
|
|
|
2)
|
|
a) (150 + 180 + 420) / (150 + 180 + 420 + 30 + 50 + 50 + 40 + 50 + 30) = 0.75
|
|
|
|
# Variance of S $\sigma^2_S = N_p(1-p)$
|
|
# Std Dev of S $\sigma_S = \sqrt{N_p(1-p)}$
|
|
# Variance in F $\sigma_f = \frac{\sigma_S}{N} = \sqrt{\frac{N_p(1-p)}{N^2}} = \sqrt{\frac{p(1-p)}{N}}$
|
|
|
|
# Estimate of Predictive Accuracy $\mu_f = \frac{S}{N}$
|
|
# Successful Trials $S$
|
|
# Number of Trials $N$
|
|
|
|
|
|
750 Successes 1000 Trials
|
|
S = 750
|
|
N = 1000
|
|
$\mu_f$ = 0.75
|
|
$\sqrt{(0.75 \times 0.25)/1000} = 0.0137$
|
|
when c = 80%, (100-80)/2 = 10%, z = 1.28
|
|
|
|
$\mu_f \pm z \times \sigma_f = 0.75 \pm (1.28 \times 0.0137)$
|
|
$= 0.75 \pm 0.0175$
|
|
|
|
p lies between 73.25% and 76.75%, with 80% confidence.
|
|
|
|
3)
|
|
a)
|
|
Stratified Holdout, data split to guarantee same distribution of class values in training and test set
|
|
b)
|
|
Repeated Holdout, training and testing done several times with different splits. Overall estimate of predictive accuracy is average of predicted accuracy in different iteration |