2.6 KiB
Executable File
2.6 KiB
Executable File
Statistical Modelling
- Using statistical modelling for classification
- Bayesian techniques adopted by machine learning community in the 90s
- Opposite of 1R, uses all attributes
- Assume:
- Attributes equally important
- Statistically independent
- Independence assumption never correct
- Works in practice
Weather Dataset
Bayes' Rule of Conditional Probability
- Probability of event H given evidence E:
Pr[H|E] = \frac{Pr[E|H]\times Pr[H]}{Pr[E]}
- H may be ex. Play = Yes
- E may be particular weather for new day
- A priori probability of H:
Pr[H]
- Probability before evidence
- A posteriori probability of H:
Pr[H|E]
- Probability after evidence
Naive Bayes for Classification
- Classification Learning: what is the probability of class given instance?
- Evidence
E
= instance - Event
H
= class for given instance
- Evidence
- Naive assumption: evidence splits into attributes that are independent
Pr[H|E] = \frac{Pr[E_1|H] \times Pr[E_2|H]… Pr[E_n|H] \times Pr[H]}{Pr[E]}
- Denominator cancels out during conversion into probability by normalisation
Weather Data Example
Laplace Estimator
- Remedy to Zero-frequency problem: Add 1 to the count for every attribute value-class combination (laplace estimator)
- Result: probabilities will never be 0 (also stabilises probability estimates)
- Simple remedy is one which is often used in practice when zero frequency problem arises.
Example
Modified Probability Estimates
- Consider attribute outlook for class yes
\frac{2+\frac{1}{3}\mu}{9+\mu}
Sunny
\frac{4+\frac{1}{3}\mu}{9+\mu}
Overcast
\frac{3+\frac{1}{3}\mu}{9+\mu}
Rainy
- Each value treated the same way
- Prior to seeing training set, assume each value is equally likely, ex. prior probability is
\frac{1}{3}
- When decided to add 1 to counts, we implicitly set
\mu
to 3. - However, no particular reason to add 1 to the count, we could increment by 0.1 instead, setting
\mu
to 0.3. - A large value of
\mu
indicates prior probabilities are very important compared to evidence in training set.
Fully Bayesian Formulation
\frac{2+\frac{1}{3}\mu p_1}{9+\mu}
Sunny
\frac{4+\frac{1}{3}\mu p_2}{9+\mu}
Overcast
\frac{3+\frac{1}{3}\mu p_3}{9+\mu}
Rainy
- Where
p_1 + p_2 + p_3 = 1
p_1, p_2, p_3
are prior probabilities of outlook being sunny, overcast or rainy before seeing the training set. However, in practice it is not clear how these prior probabilities should be assigned.