Files
G4G0-2/AI & Data Mining/Week 3/Lecture 5 - Naive Bayes.md
2024-10-16 09:12:37 +01:00

2.6 KiB

Statistical Modelling

  • Using statistical modelling for classification
  • Bayesian techniques adopted by machine learning community in the 90s
  • Opposite of 1R, uses all attributes
  • Assume:
    • Attributes equally important
    • Statistically independent
  • Independence assumption never correct
  • Works in practice

Weather Dataset

Bayes' Rule of Conditional Probability

  • Probability of event H given evidence E:

Pr[H|E] = \frac{Pr[E|H]\times Pr[H]}{Pr[E]}

  • H may be ex. Play = Yes
  • E may be particular weather for new day
  • A priori probability of H: Pr[H]
    • Probability before evidence
  • A posteriori probability of H: Pr[H|E]
    • Probability after evidence

Naive Bayes for Classification

  • Classification Learning: what is the probability of class given instance?
    • Evidence E = instance
    • Event H = class for given instance
  • Naive assumption: evidence splits into attributes that are independent

Pr[H|E] = \frac{Pr[E_1|H] \times Pr[E_2|H]… Pr[E_n|H] \times Pr[H]}{Pr[E]}

  • Denominator cancels out during conversion into probability by normalisation

Weather Data Example

Laplace Estimator

  • Remedy to Zero-frequency problem: Add 1 to the count for every attribute value-class combination (laplace estimator)
  • Result: probabilities will never be 0 (also stabilises probability estimates)
  • Simple remedy is one which is often used in practice when zero frequency problem arises.

Example

Modified Probability Estimates

  • Consider attribute outlook for class yes

\frac{2+\frac{1}{3}\mu}{9+\mu}

Sunny

\frac{4+\frac{1}{3}\mu}{9+\mu}

Overcast

\frac{3+\frac{1}{3}\mu}{9+\mu}

Rainy

  • Each value treated the same way
  • Prior to seeing training set, assume each value is equally likely, ex. prior probability is \frac{1}{3}
  • When decided to add 1 to counts, we implicitly set \mu to 3.
  • However, no particular reason to add 1 to the count, we could increment by 0.1 instead, setting \mu to 0.3.
  • A large value of \mu indicates prior probabilities are very important compared to evidence in training set.

Fully Bayesian Formulation

\frac{2+\frac{1}{3}\mu p_1}{9+\mu}

Sunny

\frac{4+\frac{1}{3}\mu p_2}{9+\mu}

Overcast

\frac{3+\frac{1}{3}\mu p_3}{9+\mu}

Rainy

  • Where p_1 + p_2 + p_3 = 1
  • p_1, p_2, p_3 are prior probabilities of outlook being sunny, overcast or rainy before seeing the training set. However, in practice it is not clear how these prior probabilities should be assigned.