# Statistical Modelling - Using statistical modelling for classification - Bayesian techniques adopted by machine learning community in the 90s - Opposite of 1R, uses all attributes - Assume: - Attributes equally important - Statistically independent - Independence assumption never correct - Works in practice # Weather Dataset ![](Pasted%20image%2020241003132609.png) ![](Pasted%20image%2020241003132636.png) # Bayes' Rule of Conditional Probability - Probability of event H given evidence E: # $Pr[H|E] = \frac{Pr[E|H]\times Pr[H]}{Pr[E]}$ - H may be ex. Play = Yes - E may be particular weather for new day - A priori probability of H: $Pr[H]$ - Probability before evidence - A posteriori probability of H: $Pr[H|E]$ - Probability after evidence ## Naive Bayes for Classification - Classification Learning: what is the probability of class given instance? - Evidence $E$ = instance - Event $H$ = class for given instance - Naive assumption: evidence splits into attributes that are independent # $Pr[H|E] = \frac{Pr[E_1|H] \times Pr[E_2|H]… Pr[E_n|H] \times Pr[H]}{Pr[E]}$ - Denominator cancels out during conversion into probability by normalisation ### Weather Data Example ![](Pasted%20image%2020241003133919.png) # Laplace Estimator - Remedy to Zero-frequency problem: Add 1 to the count for every attribute value-class combination (laplace estimator) - Result: probabilities will never be 0 (also stabilises probability estimates) - Simple remedy is one which is often used in practice when zero frequency problem arises. ## Example ![](Pasted%20image%2020241003134100.png) # Modified Probability Estimates - Consider attribute *outlook* for class *yes* # $\frac{2+\frac{1}{3}\mu}{9+\mu}$ Sunny # $\frac{4+\frac{1}{3}\mu}{9+\mu}$ Overcast # $\frac{3+\frac{1}{3}\mu}{9+\mu}$ Rainy - Each value treated the same way - Prior to seeing training set, assume each value is equally likely, ex. prior probability is $\frac{1}{3}$ - When decided to add 1 to counts, we implicitly set $\mu$ to 3. - However, no particular reason to add 1 to the count, we could increment by 0.1 instead, setting $\mu$ to 0.3. - A large value of $\mu$ indicates prior probabilities are very important compared to evidence in training set. ## Fully Bayesian Formulation # $\frac{2+\frac{1}{3}\mu p_1}{9+\mu}$ Sunny # $\frac{4+\frac{1}{3}\mu p_2}{9+\mu}$ Overcast # $\frac{3+\frac{1}{3}\mu p_3}{9+\mu}$ Rainy - Where $p_1 + p_2 + p_3 = 1$ - $p_1, p_2, p_3$ are prior probabilities of outlook being sunny, overcast or rainy before seeing the training set. However, in practice it is not clear how these prior probabilities should be assigned.