87 lines
2.3 KiB
Markdown
Executable File
87 lines
2.3 KiB
Markdown
Executable File
# Logarithms
|
|
|
|
$log_2X$ used for generating decision trees
|
|
|
|
- Power to which we have to raise 2 to get X
|
|
- When using, X will be probability between 0 and 1
|
|
- log of probability is always negative
|
|
|
|
# Decision Tree for Contact Lenses
|
|
|
|

|
|
|
|
- Upside down
|
|
- Ellipse at top = root
|
|
- Edges = branches
|
|
- Rectangles = leaves
|
|
- Leaves assign classification
|
|
|
|
## Strategy
|
|
|
|
- Grow trees from root
|
|
- Top-Down
|
|
- More specific as grown, described as general-to-specific
|
|
- Divide and Conquer
|
|
- Stop if all examples have same class
|
|
- How is attribute or root node selected?
|
|
- Consider how to generate decision tree for weather dataset with nominal values only.
|
|
|
|
## Weather Dataset
|
|
|
|

|
|
|
|
### Criterion for Attribute Selection
|
|
|
|
- Which is best?
|
|
- Smallest tree
|
|
- Heuristic: choose attribute which produces purest nodes
|
|
- Information gain popular criteria for measuring impurity
|
|
- Increases with average purity of subsets
|
|
- Choose attribute that gives greatest information gain
|
|
|
|
# Information
|
|
|
|
- Expected amount of information needed to specify whether new example should be classified as yes or no, given it reached that node.
|
|
|
|
## Computing Information
|
|
|
|
- Measure information in bits
|
|
- Given probability distribution, info required to predict an event is the distributions entropy
|
|
- Entropy gives the information required in bits
|
|
|
|
# $I(p_1,p_2,…,p_n)=-p_{1}\log_{2}p_1 -p_{2}\log_{2}p_2 … -p_{n}\log_{2}p_n$
|
|
|
|
Where n = number of classes, and $p_1 + p_2 + … p_{n} = 1$
|
|
Minus signs included since output must be positive
|
|
|
|
### Expected Information for Outlook
|
|
|
|
- Outlook = Sunny
|
|
|
|
# $info([2,3]) = I(\frac{2}{5},\frac{3}{5}) = -\frac{2}{5}\log_2(\frac{2}{5}) - \frac{3}{5}\log_2(\frac{3}{5}) = 0.971 bits$
|
|
|
|
- Outlook = Overcast
|
|
|
|
# $info([4,0]) = I(\frac{4}{4},\frac{0}{4}) = -1\log_2(1) -0\log_2(0) = 0 bits$
|
|
|
|
- Outlook = Rainy
|
|
|
|
# $info([3,2]) = I(\frac{3}{5},\frac{2}{5}) = -\frac{3}{5}\log_2(\frac{3}{5}) - \frac{2}{5}\log_2(\frac{2}{5}) = 0.693 bits$
|
|
|
|
### Computing Information Gain
|
|
|
|
Information before splitting - information after splitting
|
|
|
|
gain(outlook) = info([9,5])-E(Outlook)
|
|
= 0.940 - 0.693
|
|
= 0.247
|
|
|
|
#### Information Gain for Attributes of Weather Data
|
|
|
|
gain(Outlook) = 0.247
|
|
gain(Temperature) = 0.029
|
|
gain(Humidity) = 0.152
|
|
gain(Windy) = 0.048
|
|
|
|
Outlook Selected for root because gains most information
|