G4G0-2/AI & Data Mining/Week 6/Lecture 11 - ID3.md

# Logarithms

$log_2X$ used for generating decision trees

- Power to which we have to raise 2 to get X
- When using, X will be probability between 0 and 1
- log of probability is always negative

# Decision Tree for Contact Lenses

![](Pasted%20image%2020241024132411.png)

- Upside down
- Ellipse at top = root
- Edges = branches
- Rectangles = leaves
- Leaves assign classification

## Strategy

- Grow trees from root
- Top-Down
- More specific as grown, described as general-to-specific
- Divide and Conquer
- Stop if all examples have same class
- How is attribute or root node selected?
	- Consider how to generate decision tree for weather dataset with nominal values only.

## Weather Dataset

![](Pasted%20image%2020241024132906.png)

### Criterion for Attribute Selection

- Which is best?
	- Smallest tree
	- Heuristic: choose attribute which produces purest nodes
- Information gain popular criteria for measuring impurity
	- Increases with average purity of subsets
- Choose attribute that gives greatest information gain

# Information

- Expected amount of information needed to specify whether new example should be classified as yes or no, given it reached that node.

## Computing Information

- Measure information in bits
	- Given probability distribution, info required to predict an event is the distributions entropy
	- Entropy gives the information required in bits

# $I(p_1,p_2,…,p_n)=-p_{1}\log_{2}p_1 -p_{2}\log_{2}p_2 … -p_{n}\log_{2}p_n$

 Where n = number of classes, and $p_1 + p_2 + … p_{n} = 1$
Minus signs included since output must be positive

### Expected Information for Outlook

- Outlook = Sunny

# $info([2,3]) = I(\frac{2}{5},\frac{3}{5}) = -\frac{2}{5}\log_2(\frac{2}{5}) - \frac{3}{5}\log_2(\frac{3}{5}) = 0.971 bits$

- Outlook = Overcast

# $info([4,0]) = I(\frac{4}{4},\frac{0}{4}) = -1\log_2(1) -0\log_2(0) = 0 bits$

- Outlook = Rainy

# $info([3,2]) = I(\frac{3}{5},\frac{2}{5}) = -\frac{3}{5}\log_2(\frac{3}{5}) - \frac{2}{5}\log_2(\frac{2}{5}) = 0.693 bits$

### Computing Information Gain

Information before splitting - information after splitting

gain(outlook) = info([9,5])-E(Outlook)
				= 0.940 - 0.693
				= 0.247

#### Information Gain for Attributes of Weather Data

gain(Outlook) = 0.247
gain(Temperature) = 0.029
gain(Humidity) = 0.152
gain(Windy) = 0.048

Outlook Selected for root because gains most information