vault backup: 2024-10-16 09:12:37
This commit is contained in:
112
AI & Data Mining/Week 4/Lecture 7 - Nearest Neighbor.md
Normal file
112
AI & Data Mining/Week 4/Lecture 7 - Nearest Neighbor.md
Normal file
@@ -0,0 +1,112 @@
|
||||
- Instance Based
|
||||
- Solution to new problem is solution to closest example
|
||||
- Must be able to measure distance between pair of examples
|
||||
- Normally euclidean distance
|
||||
|
||||
# Normalisation of Numeric Attributes
|
||||
|
||||
- Attributes measured on different scales
|
||||
- Larger scales have higher impacts
|
||||
- Must normalise (transform to scale [0, 1])
|
||||
|
||||
# $a_i = \frac{v_i - minv_i}{maxv_i - minv_i}$
|
||||
|
||||
Where:
|
||||
- $a_i$ is normalised value for attribute $i$
|
||||
- $v_i$ is the current value for attribute $i$
|
||||
- $maxv_i$ is largest value of attribute $i$
|
||||
- $minv_i$ is smallest value of attribute $i$
|
||||
|
||||
## Example
|
||||
|
||||
# $maxv_{humidity} = 96$
|
||||
# $minv_{humidity} = 65$
|
||||
# $v_{humidity} = 80.5$
|
||||
|
||||
# $a_i = \frac{80.5-65}{96-55} = \frac{15.5}{31} = 0.5$
|
||||
|
||||
## Example (Transport Dataset)
|
||||
|
||||
# $maxv_{doors} = 5$
|
||||
# $minv_{doors} = 2$
|
||||
# $v_{doors} = 3$
|
||||
# $a_i = \frac{3-2}{5-2} = \frac{1}{3}$
|
||||
|
||||
# Nearest Neighbor Applied (Transport Dataset)
|
||||
|
||||
- Last row is new vehicle to be classified
|
||||
- N denotes normalised
|
||||
- Right most column shows euclidean distances between each vehicle and new vehicle
|
||||
- New vehicle is closest to the 1st example, a taxi, NN predicts taxi
|
||||

|
||||
# $vmin_{doors} = 2$
|
||||
# $vmax_{doors} = 5$
|
||||
# $vmin_{seats} = 7$
|
||||
# $vmax_{seats} = 65$
|
||||
|
||||
# Missing Values
|
||||
|
||||
## Missing Nominal Values
|
||||
|
||||
- Assume missing feature is maximally different from any other value
|
||||
- Distance is:
|
||||
- 0 if identical and not missing
|
||||
- 1 if otherwise
|
||||
|
||||
## Missing Numeric Values
|
||||
|
||||
- 1 if both missing
|
||||
- Assume maximum distance if one missing. Largest of:
|
||||
- (normalised) size of known value or
|
||||
- 1 - (normalised) size of known value
|
||||
|
||||
## Example (Weather Data)
|
||||
|
||||
- Humidity of one example = 76
|
||||
- Normalised = 0.36
|
||||
- One missing
|
||||
- Max distance = 1 - 0.36 = 0.64
|
||||
|
||||
## Example (Transport Data)
|
||||
|
||||
- Number of seats of one example = 16
|
||||
- Normalised = 9/58
|
||||
- One missing
|
||||
- 1 - 9/58 = 49/58
|
||||
|
||||
## Normalised Transport Data with Missing Values
|
||||
|
||||
- Last row to be classified
|
||||
- N denotes normalised
|
||||
- Right most column is euclidean values
|
||||

|
||||
|
||||
# Definitions of Proximity
|
||||
|
||||
## Euclidean Distance
|
||||
|
||||
# $\sqrt{(a_1-a_1')^2) + (a_2-a_2')^2 + ... + (a_n-a_n')^2}$
|
||||
|
||||
Where $a$ and $a'$ are two examples with $n$ attributes and $a'$ is the value of attribute $i$ for $a$
|
||||
|
||||
## Manhattan Distance
|
||||
|
||||
# $|a_1-a_1'|+|a_2-a_2'|+...+|a_n-a_n'|$
|
||||
|
||||
Vertical bar means absolute value
|
||||
Negative becomes positive
|
||||
|
||||
Another distance measure could be cube root of sum of cubes.
|
||||
Higher the power, greater influence of large differences
|
||||
Euclidean distance is generally a good compromise
|
||||
|
||||
# Problems with Nearest Neighbor
|
||||
|
||||
- Slow since every example must be compared with new
|
||||
- Assumes all attributes are equal
|
||||
- Only use important attributes to compute distance
|
||||
- Weight attributes according to importance
|
||||
- Does not detect noise
|
||||
- Use k-NN, get k closest examples and take majority vote on solutions
|
||||

|
||||
|
||||
36
AI & Data Mining/Week 4/Tutorial 4 - Nearest Neighbor.md
Normal file
36
AI & Data Mining/Week 4/Tutorial 4 - Nearest Neighbor.md
Normal file
@@ -0,0 +1,36 @@
|
||||
|
||||

|
||||
|
||||
## Normalisation Equation
|
||||
# $a_i = \frac{v_i - minv_i}{maxv_i - minv_i}$
|
||||
## Euclidean Distance Equation
|
||||
# $\sqrt{(a_1-a_1')^2) + (a_2-a_2')^2 + ... + (a_n-a_n')^2}$
|
||||
|
||||
|
||||
# $vmax_{temp} = 85$
|
||||
# $vmin_{temp} = 64$
|
||||
|
||||
# $a_{temp} = \frac{v_{temp} - 64}{21}$
|
||||
|
||||
# $vmax_{humidity} = 96$
|
||||
# $vmin_{humidity} = 65$
|
||||
|
||||
# $a_{humidity} = \frac{v_{humidity} - 65}{31}$
|
||||
|
||||
| outlook | temp | NT | humidity | NH | windy | play | Euclidean Distance to a' Calculation | Euclidean Distance |
|
||||
| -------- | ---- | ---- | -------- | ---- | ----- | ---- | -------------------------------------------------- | ------------------ |
|
||||
| sunny | 85 | 1 | 85 | 0.65 | F | N | $\sqrt{(85-72)^2 + (85-76)^2 + (2-2)^2 + (0-1)^2}$ | 15.84 |
|
||||
| sunny | 80 | 0.76 | 90 | 0.81 | T | N | $\sqrt{(80-72)^2 + (90-76)^2+ (2-2)^2 + (1-1)^2}$ | 16.12 |
|
||||
| overcast | 83 | 0.90 | 68 | 0.68 | F | Y | $\sqrt{(83-72)^2 + (68-76)^2+ (1-2)^2 + (0-1)^2}$ | 13.67 |
|
||||
| rainy | 70 | 0.29 | 96 | 1 | F | Y | $\sqrt{(70-72)^2 + (96-76)^2+ (0-2)^2 + (0-1)^2}$ | 20.22 |
|
||||
| rainy | 68 | 0.19 | 80 | 0.48 | F | Y | $\sqrt{(68-72)^2 + (80-76)^2+ (0-2)^2 + (0-1)^2}$ | 25 |
|
||||
| rainy | 65 | 0.05 | 70 | 0.16 | T | N | $\sqrt{(65-72)^2 + (70-76)^2+ (0-2)^2 + (1-1)^2}$ | |
|
||||
| overcast | 64 | 0 | 65 | 0 | T | Y | $\sqrt{(64-72)^2 + (65-76)^2+ (1-2)^2 + (1-1)^2}$ | |
|
||||
| sunny | 72 | 0.38 | 95 | 0.97 | F | N | $\sqrt{(72-72)^2 + (95-76)^2+ (2-2)^2 + (0-1)^2}$ | |
|
||||
| sunny | 69 | 0.24 | 70 | 0.16 | F | Y | $\sqrt{(69-72)^2 + (70-76)^2+ (2-2)^2 + (0-1)^2}$ | |
|
||||
| rainy | 75 | 0.52 | 80 | 0.48 | F | Y | $\sqrt{(75-72)^2 + (80-76)^2+ (0-2)^2 + (0-1)^2}$ | |
|
||||
| sunny | 75 | 0.52 | 70 | 0.16 | T | Y | $\sqrt{(75-72)^2 + (70-76)^2+ (2-2)^2 + (1-1)^2}$ | |
|
||||
| overcast | 72 | 0.38 | 90 | 0.81 | T | Y | $\sqrt{(72-72)^2 + (90-76)^2+ (1-2)^2 + (1-1)^2}$ | |
|
||||
| overcast | 81 | 0.81 | 75 | 0.32 | F | Y | $\sqrt{(81-72)^2 + (75-76)^2+ (1-2)^2 + (0-1)^2}$ | |
|
||||
| rainy | 71 | 0.33 | 91 | 0.84 | T | N | $\sqrt{(71-72)^2 + (91-76)^2+ (0-2)^2 + (1-1)^2}$ | |
|
||||
| sunny | 72 | 0.38 | 76 | 0.35 | T | ?? | | |
|
||||
42
AI & Data Mining/Week 4/Workshop 4 - Nearest Neighbor.md
Normal file
42
AI & Data Mining/Week 4/Workshop 4 - Nearest Neighbor.md
Normal file
@@ -0,0 +1,42 @@
|
||||
```
|
||||
=== Run information ===
|
||||
|
||||
Scheme: weka.classifiers.lazy.IBk -K 3 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""
|
||||
Relation: letsPlay
|
||||
Instances: 14
|
||||
Attributes: 5
|
||||
outlook
|
||||
temperature
|
||||
humidity
|
||||
windy
|
||||
play
|
||||
Test mode: user supplied test set: size unknown (reading incrementally)
|
||||
|
||||
=== Classifier model (full training set) ===
|
||||
|
||||
IB1 instance-based classifier
|
||||
using 3 nearest neighbour(s) for classification
|
||||
|
||||
Time taken to build model: 0 seconds
|
||||
|
||||
=== Predictions on test set ===
|
||||
|
||||
inst# actual predicted error prediction
|
||||
1 1:yes 1:yes 0.659
|
||||
2 1:yes 1:yes 0.659
|
||||
|
||||
=== Evaluation on test set ===
|
||||
|
||||
Time taken to test model on supplied test set: 0 seconds
|
||||
|
||||
=== Summary ===
|
||||
|
||||
Correctly Classified Instances 2 100 %
|
||||
Incorrectly Classified Instances 0 0 %
|
||||
Kappa statistic 1
|
||||
Mean absolute error 0.3409
|
||||
Root mean squared error 0.3409
|
||||
Relative absolute error 90.9091 %
|
||||
Root relative squared error 90.9091 %
|
||||
Total Number of Instances 2
|
||||
```
|
||||
Reference in New Issue
Block a user