vault backup: 2024-10-16 09:12:37

2024-10-16 09:12:37 +01:00
parent bad31f35c5
commit 124e0b67ef
190 changed files with 192115 additions and 0 deletions
--- a/Neighbor.md
+++ b/Neighbor.md
@@ -0,0 +1,112 @@
+- Instance Based
+- Solution to new problem is solution to closest example
+- Must be able to measure distance between pair of examples
+- Normally euclidean distance
+
+# Normalisation of Numeric Attributes
+
+- Attributes measured on different scales
+	- Larger scales have higher impacts
+	- Must normalise (transform to scale [0, 1])
+
+# $a_i = \frac{v_i - minv_i}{maxv_i - minv_i}$
+
+Where:
+- $a_i$ is normalised value for attribute $i$
+- $v_i$ is the current value for attribute $i$
+- $maxv_i$ is largest value of attribute $i$
+- $minv_i$ is smallest value of attribute $i$
+
+## Example
+
+ # $maxv_{humidity} = 96$
+# $minv_{humidity} = 65$
+# $v_{humidity} = 80.5$
+
+# $a_i = \frac{80.5-65}{96-55} = \frac{15.5}{31} = 0.5$
+
+## Example (Transport Dataset)
+
+# $maxv_{doors} = 5$
+# $minv_{doors} = 2$
+# $v_{doors} = 3$
+# $a_i = \frac{3-2}{5-2} = \frac{1}{3}$
+
+# Nearest Neighbor Applied (Transport Dataset)
+
+- Last row is new vehicle to be classified
+- N denotes normalised
+- Right most column shows euclidean distances between each vehicle and new vehicle
+- New vehicle is closest to the 1st example, a taxi, NN predicts taxi
+![](Pasted%20image%2020241010133818.png)
+# $vmin_{doors} = 2$
+# $vmax_{doors} = 5$
+# $vmin_{seats} = 7$
+# $vmax_{seats} = 65$
+
+# Missing Values
+
+## Missing Nominal Values 
+
+- Assume missing feature is maximally different from any other value
+- Distance is:
+	- 0 if identical and not missing
+	- 1 if otherwise
+
+## Missing Numeric Values
+
+- 1 if both missing
+- Assume maximum distance if one missing. Largest of:
+	- (normalised) size of known value or
+	- 1 - (normalised) size of known value
+
+## Example (Weather Data)
+
+- Humidity of one example = 76
+- Normalised = 0.36
+- One missing
+- Max distance = 1 - 0.36 = 0.64
+
+## Example (Transport Data)
+
+- Number of seats of one example = 16
+- Normalised = 9/58
+- One missing
+- 1 - 9/58  = 49/58
+
+## Normalised Transport Data with Missing Values
+
+- Last row to be classified
+- N denotes normalised
+- Right most column is euclidean values
+![](Pasted%20image%2020241010135130.png)
+
+# Definitions of Proximity
+
+## Euclidean Distance
+
+# $\sqrt{(a_1-a_1')^2) + (a_2-a_2')^2 + ... + (a_n-a_n')^2}$
+
+Where $a$ and $a'$ are two examples with $n$ attributes and $a'$ is the value of attribute $i$ for $a$
+
+## Manhattan Distance
+
+# $|a_1-a_1'|+|a_2-a_2'|+...+|a_n-a_n'|$
+
+Vertical bar means absolute value
+Negative becomes positive
+
+Another distance measure could be cube root of sum of cubes.
+Higher the power, greater influence of large differences
+Euclidean distance is generally a good compromise
+
+# Problems with Nearest Neighbor
+
+- Slow since every example must be compared with new
+- Assumes all attributes are equal
+	- Only use important attributes to compute distance
+	- Weight attributes according to importance
+- Does not detect noise
+	- Use k-NN, get k closest examples and take majority vote on solutions
+![](Pasted%20image%2020241011131542.png)
+
--- a/Neighbor.md
+++ b/Neighbor.md
@@ -0,0 +1,36 @@
+
+![](Pasted%20image%2020241011131844.png)
+
+## Normalisation Equation
+# $a_i = \frac{v_i - minv_i}{maxv_i - minv_i}$
+## Euclidean Distance Equation
+# $\sqrt{(a_1-a_1')^2) + (a_2-a_2')^2 + ... + (a_n-a_n')^2}$
+
+
+# $vmax_{temp} = 85$
+# $vmin_{temp} = 64$
+
+# $a_{temp} = \frac{v_{temp} - 64}{21}$
+
+# $vmax_{humidity} = 96$
+# $vmin_{humidity} = 65$
+
+# $a_{humidity} = \frac{v_{humidity} - 65}{31}$
+
+| outlook  | temp | NT   | humidity | NH   | windy | play | Euclidean Distance to a' Calculation               | Euclidean Distance |
+| -------- | ---- | ---- | -------- | ---- | ----- | ---- | -------------------------------------------------- | ------------------ |
+| sunny    | 85   | 1    | 85       | 0.65 | F     | N    | $\sqrt{(85-72)^2 + (85-76)^2 + (2-2)^2 + (0-1)^2}$ | 15.84              |
+| sunny    | 80   | 0.76 | 90       | 0.81 | T     | N    | $\sqrt{(80-72)^2 + (90-76)^2+ (2-2)^2 + (1-1)^2}$  | 16.12              |
+| overcast | 83   | 0.90 | 68       | 0.68 | F     | Y    | $\sqrt{(83-72)^2 + (68-76)^2+ (1-2)^2 + (0-1)^2}$  | 13.67              |
+| rainy    | 70   | 0.29 | 96       | 1    | F     | Y    | $\sqrt{(70-72)^2 + (96-76)^2+ (0-2)^2 + (0-1)^2}$  | 20.22              |
+| rainy    | 68   | 0.19 | 80       | 0.48 | F     | Y    | $\sqrt{(68-72)^2 + (80-76)^2+ (0-2)^2 + (0-1)^2}$  | 25                 |
+| rainy    | 65   | 0.05 | 70       | 0.16 | T     | N    | $\sqrt{(65-72)^2 + (70-76)^2+ (0-2)^2 + (1-1)^2}$  |                    |
+| overcast | 64   | 0    | 65       | 0    | T     | Y    | $\sqrt{(64-72)^2 + (65-76)^2+ (1-2)^2 + (1-1)^2}$  |                    |
+| sunny    | 72   | 0.38 | 95       | 0.97 | F     | N    | $\sqrt{(72-72)^2 + (95-76)^2+ (2-2)^2 + (0-1)^2}$  |                    |
+| sunny    | 69   | 0.24 | 70       | 0.16 | F     | Y    | $\sqrt{(69-72)^2 + (70-76)^2+ (2-2)^2 + (0-1)^2}$  |                    |
+| rainy    | 75   | 0.52 | 80       | 0.48 | F     | Y    | $\sqrt{(75-72)^2 + (80-76)^2+ (0-2)^2 + (0-1)^2}$  |                    |
+| sunny    | 75   | 0.52 | 70       | 0.16 | T     | Y    | $\sqrt{(75-72)^2 + (70-76)^2+ (2-2)^2 + (1-1)^2}$  |                    |
+| overcast | 72   | 0.38 | 90       | 0.81 | T     | Y    | $\sqrt{(72-72)^2 + (90-76)^2+ (1-2)^2 + (1-1)^2}$  |                    |
+| overcast | 81   | 0.81 | 75       | 0.32 | F     | Y    | $\sqrt{(81-72)^2 + (75-76)^2+ (1-2)^2 + (0-1)^2}$  |                    |
+| rainy    | 71   | 0.33 | 91       | 0.84 | T     | N    | $\sqrt{(71-72)^2 + (91-76)^2+ (0-2)^2 + (1-1)^2}$  |                    |
+| sunny    | 72   | 0.38 | 76       | 0.35 | T     | ??   |                                                    |                    |
--- a/Neighbor.md
+++ b/Neighbor.md
@@ -0,0 +1,42 @@
+```
+=== Run information ===
+
+Scheme:       weka.classifiers.lazy.IBk -K 3 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""
+Relation:     letsPlay
+Instances:    14
+Attributes:   5
+              outlook
+              temperature
+              humidity
+              windy
+              play
+Test mode:    user supplied test set:  size unknown (reading incrementally)
+
+=== Classifier model (full training set) ===
+
+IB1 instance-based classifier
+using 3 nearest neighbour(s) for classification
+
+Time taken to build model: 0 seconds
+
+=== Predictions on test set ===
+
+    inst#     actual  predicted error prediction
+        1      1:yes      1:yes       0.659 
+        2      1:yes      1:yes       0.659 
+
+=== Evaluation on test set ===
+
+Time taken to test model on supplied test set: 0 seconds
+
+=== Summary ===
+
+Correctly Classified Instances           2              100      %
+Incorrectly Classified Instances         0                0      %
+Kappa statistic                          1     
+Mean absolute error                      0.3409
+Root mean squared error                  0.3409
+Relative absolute error                 90.9091 %
+Root relative squared error             90.9091 %
+Total Number of Instances                2
+```