vault backup: 2024-10-16 09:12:37
This commit is contained in:
85
AI & Data Mining/Week 3/Lecture 5 - Naive Bayes.md
Normal file
85
AI & Data Mining/Week 3/Lecture 5 - Naive Bayes.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Statistical Modelling
|
||||
|
||||
- Using statistical modelling for classification
|
||||
- Bayesian techniques adopted by machine learning community in the 90s
|
||||
- Opposite of 1R, uses all attributes
|
||||
- Assume:
|
||||
- Attributes equally important
|
||||
- Statistically independent
|
||||
- Independence assumption never correct
|
||||
- Works in practice
|
||||
|
||||
# Weather Dataset
|
||||
|
||||

|
||||

|
||||
|
||||
# Bayes' Rule of Conditional Probability
|
||||
|
||||
- Probability of event H given evidence E:
|
||||
|
||||
# $Pr[H|E] = \frac{Pr[E|H]\times Pr[H]}{Pr[E]}$
|
||||
|
||||
- H may be ex. Play = Yes
|
||||
- E may be particular weather for new day
|
||||
- A priori probability of H: $Pr[H]$
|
||||
- Probability before evidence
|
||||
- A posteriori probability of H: $Pr[H|E]$
|
||||
- Probability after evidence
|
||||
|
||||
## Naive Bayes for Classification
|
||||
|
||||
- Classification Learning: what is the probability of class given instance?
|
||||
- Evidence $E$ = instance
|
||||
- Event $H$ = class for given instance
|
||||
- Naive assumption: evidence splits into attributes that are independent
|
||||
|
||||
# $Pr[H|E] = \frac{Pr[E_1|H] \times Pr[E_2|H]… Pr[E_n|H] \times Pr[H]}{Pr[E]}$
|
||||
|
||||
- Denominator cancels out during conversion into probability by normalisation
|
||||
|
||||
### Weather Data Example
|
||||
|
||||

|
||||
|
||||
# Laplace Estimator
|
||||
|
||||
- Remedy to Zero-frequency problem: Add 1 to the count for every attribute value-class combination (laplace estimator)
|
||||
- Result: probabilities will never be 0 (also stabilises probability estimates)
|
||||
- Simple remedy is one which is often used in practice when zero frequency problem arises.
|
||||
|
||||
## Example
|
||||
|
||||

|
||||
|
||||
# Modified Probability Estimates
|
||||
|
||||
- Consider attribute *outlook* for class *yes*
|
||||
# $\frac{2+\frac{1}{3}\mu}{9+\mu}$
|
||||
Sunny
|
||||
|
||||
# $\frac{4+\frac{1}{3}\mu}{9+\mu}$
|
||||
Overcast
|
||||
|
||||
# $\frac{3+\frac{1}{3}\mu}{9+\mu}$
|
||||
Rainy
|
||||
|
||||
- Each value treated the same way
|
||||
- Prior to seeing training set, assume each value is equally likely, ex. prior probability is $\frac{1}{3}$
|
||||
- When decided to add 1 to counts, we implicitly set $\mu$ to 3.
|
||||
- However, no particular reason to add 1 to the count, we could increment by 0.1 instead, setting $\mu$ to 0.3.
|
||||
- A large value of $\mu$ indicates prior probabilities are very important compared to evidence in training set.
|
||||
|
||||
## Fully Bayesian Formulation
|
||||
|
||||
# $\frac{2+\frac{1}{3}\mu p_1}{9+\mu}$
|
||||
Sunny
|
||||
|
||||
# $\frac{4+\frac{1}{3}\mu p_2}{9+\mu}$
|
||||
Overcast
|
||||
|
||||
# $\frac{3+\frac{1}{3}\mu p_3}{9+\mu}$
|
||||
Rainy
|
||||
|
||||
- Where $p_1 + p_2 + p_3 = 1$
|
||||
- $p_1, p_2, p_3$ are prior probabilities of outlook being sunny, overcast or rainy before seeing the training set. However, in practice it is not clear how these prior probabilities should be assigned.
|
||||
Reference in New Issue
Block a user