# Logarithms $log_2X$ used for generating decision trees - Power to which we have to raise 2 to get X - When using, X will be probability between 0 and 1 - log of probability is always negative # Decision Tree for Contact Lenses ![](Pasted%20image%2020241024132411.png) - Upside down - Ellipse at top = root - Edges = branches - Rectangles = leaves - Leaves assign classification ## Strategy - Grow trees from root - Top-Down - More specific as grown, described as general-to-specific - Divide and Conquer - Stop if all examples have same class - How is attribute or root node selected? - Consider how to generate decision tree for weather dataset with nominal values only. ## Weather Dataset ![](Pasted%20image%2020241024132906.png) ### Criterion for Attribute Selection - Which is best? - Smallest tree - Heuristic: choose attribute which produces purest nodes - Information gain popular criteria for measuring impurity - Increases with average purity of subsets - Choose attribute that gives greatest information gain # Information - Expected amount of information needed to specify whether new example should be classified as yes or no, given it reached that node. ## Computing Information - Measure information in bits - Given probability distribution, info required to predict an event is the distributions entropy - Entropy gives the information required in bits # $I(p_1,p_2,…,p_n)=-p_{1}\log_{2}p_1 -p_{2}\log_{2}p_2 … -p_{n}\log_{2}p_n$ Where n = number of classes, and $p_1 + p_2 + … p_{n} = 1$ Minus signs included since output must be positive ### Expected Information for Outlook - Outlook = Sunny # $info([2,3]) = I(\frac{2}{5},\frac{3}{5}) = -\frac{2}{5}\log_2(\frac{2}{5}) - \frac{3}{5}\log_2(\frac{3}{5}) = 0.971 bits$ - Outlook = Overcast # $info([4,0]) = I(\frac{4}{4},\frac{0}{4}) = -1\log_2(1) -0\log_2(0) = 0 bits$ - Outlook = Rainy # $info([3,2]) = I(\frac{3}{5},\frac{2}{5}) = -\frac{3}{5}\log_2(\frac{3}{5}) - \frac{2}{5}\log_2(\frac{2}{5}) = 0.693 bits$ ### Computing Information Gain Information before splitting - information after splitting gain(outlook) = info([9,5])-E(Outlook) = 0.940 - 0.693 = 0.247 #### Information Gain for Attributes of Weather Data gain(Outlook) = 0.247 gain(Temperature) = 0.029 gain(Humidity) = 0.152 gain(Windy) = 0.048 Outlook Selected for root because gains most information