Files
G4G0-2/AI & Data Mining/Week 6/Lecture 11 - ID3.md
2025-01-30 09:27:31 +00:00

2.3 KiB
Executable File

Logarithms

log_2X used for generating decision trees

  • Power to which we have to raise 2 to get X
  • When using, X will be probability between 0 and 1
  • log of probability is always negative

Decision Tree for Contact Lenses

  • Upside down
  • Ellipse at top = root
  • Edges = branches
  • Rectangles = leaves
  • Leaves assign classification

Strategy

  • Grow trees from root
  • Top-Down
  • More specific as grown, described as general-to-specific
  • Divide and Conquer
  • Stop if all examples have same class
  • How is attribute or root node selected?
    • Consider how to generate decision tree for weather dataset with nominal values only.

Weather Dataset

Criterion for Attribute Selection

  • Which is best?
    • Smallest tree
    • Heuristic: choose attribute which produces purest nodes
  • Information gain popular criteria for measuring impurity
    • Increases with average purity of subsets
  • Choose attribute that gives greatest information gain

Information

  • Expected amount of information needed to specify whether new example should be classified as yes or no, given it reached that node.

Computing Information

  • Measure information in bits
    • Given probability distribution, info required to predict an event is the distributions entropy
    • Entropy gives the information required in bits

I(p_1,p_2,...,p_n)=-p_{1}\log_{2}p_1 -p_{2}\log_{2}p_2 ... -p_{n}\log_{2}p_n

Where n = number of classes, and p_1 + p_2 + ... p_{n} = 1 Minus signs included since output must be positive

Expected Information for Outlook

  • Outlook = Sunny

info([2,3]) = I(\frac{2}{5},\frac{3}{5}) = -\frac{2}{5}\log_2(\frac{2}{5}) - \frac{3}{5}\log_2(\frac{3}{5}) = 0.971 bits

  • Outlook = Overcast

info([4,0]) = I(\frac{4}{4},\frac{0}{4}) = -1\log_2(1) -0\log_2(0) = 0 bits

  • Outlook = Rainy

info([3,2]) = I(\frac{3}{5},\frac{2}{5}) = -\frac{3}{5}\log_2(\frac{3}{5}) - \frac{2}{5}\log_2(\frac{2}{5}) = 0.693 bits

Computing Information Gain

Information before splitting - information after splitting

gain(outlook) = info([9,5])-E(Outlook) = 0.940 - 0.693 = 0.247

Information Gain for Attributes of Weather Data

gain(Outlook) = 0.247 gain(Temperature) = 0.029 gain(Humidity) = 0.152 gain(Windy) = 0.048

Outlook Selected for root because gains most information