vault backup: 2024-10-16 09:12:37

2024-10-16 09:12:37 +01:00
parent bad31f35c5
commit 124e0b67ef
190 changed files with 192115 additions and 0 deletions
--- a/Mining/Week
+++ b/Mining/Week
@@ -0,0 +1,85 @@
+# Statistical Modelling
+
+- Using statistical modelling for classification
+- Bayesian techniques adopted by machine learning community in the 90s
+- Opposite of 1R, uses all attributes
+- Assume:
+	- Attributes equally important
+	- Statistically independent
+- Independence assumption never correct
+- Works in practice
+
+# Weather Dataset
+
+![](Pasted%20image%2020241003132609.png)
+![](Pasted%20image%2020241003132636.png)
+
+# Bayes' Rule of Conditional Probability
+
+- Probability of event H given evidence E:
+
+# $Pr[H|E] = \frac{Pr[E|H]\times Pr[H]}{Pr[E]}$
+
+- H may be ex. Play = Yes
+- E may be particular weather for new day
+- A priori probability of H: $Pr[H]$
+	- Probability before evidence
+- A posteriori probability of H: $Pr[H|E]$
+	- Probability after evidence
+
+## Naive Bayes for Classification
+
+- Classification Learning: what is the probability of class given instance?
+	- Evidence $E$ = instance
+	- Event $H$ = class for given instance
+- Naive assumption: evidence splits into attributes that are independent
+
+# $Pr[H|E] = \frac{Pr[E_1|H] \times Pr[E_2|H]… Pr[E_n|H] \times Pr[H]}{Pr[E]}$
+
+- Denominator cancels out during conversion into probability by normalisation
+
+### Weather Data Example
+
+![](Pasted%20image%2020241003133919.png)
+
+# Laplace Estimator
+
+- Remedy to Zero-frequency problem: Add 1 to the count for every attribute value-class combination (laplace estimator)
+- Result: probabilities will never be 0 (also stabilises probability estimates)
+- Simple remedy is one which is often used in practice when zero frequency problem arises.
+
+## Example
+
+![](Pasted%20image%2020241003134100.png)
+
+# Modified Probability Estimates
+
+- Consider attribute *outlook* for class *yes*
+# $\frac{2+\frac{1}{3}\mu}{9+\mu}$
+Sunny
+
+# $\frac{4+\frac{1}{3}\mu}{9+\mu}$
+Overcast
+
+# $\frac{3+\frac{1}{3}\mu}{9+\mu}$
+Rainy
+
+- Each value treated the same way
+- Prior to seeing training set, assume each value is equally likely, ex. prior probability is $\frac{1}{3}$
+- When decided to add 1 to counts, we implicitly set $\mu$ to 3.
+- However, no particular reason to add 1 to the count, we could increment by 0.1 instead, setting $\mu$ to 0.3.
+- A large value of $\mu$ indicates prior probabilities are very important compared to evidence in training set.
+
+## Fully Bayesian Formulation
+
+# $\frac{2+\frac{1}{3}\mu p_1}{9+\mu}$
+Sunny
+
+# $\frac{4+\frac{1}{3}\mu p_2}{9+\mu}$
+Overcast
+
+# $\frac{3+\frac{1}{3}\mu p_3}{9+\mu}$
+Rainy
+
+- Where $p_1 + p_2 + p_3 = 1$ 
+- $p_1, p_2, p_3$ are prior probabilities of outlook being sunny, overcast or rainy before seeing the training set. However, in practice it is not clear how these prior probabilities should be assigned.
--- a/Mining/Week
+++ b/Mining/Week
@@ -0,0 +1,51 @@
+| Temperature | Skin   | Blood Pressure | Blocked Nose | Diagnosis |
+| ----------- | ------ | -------------- | ------------ | --------- |
+| Low         | Pale   | Normal         | True         | N         |
+| Moderate    | Pale   | Normal         | True         | B         |
+| High        | Normal | High           | False        | N         |
+| Moderate    | PaleFF | Normal         | False        | B         |
+| High        | Red    | High           | False        | N         |
+| High        | Red    | High           | True         | N         |
+| Moderate    | Red    | High           | False        | B         |
+| Low         | Normal | High           | False        | B         |
+| Low         | Pale   | Normal         | False        | B         |
+| Low         | Normal | Normal         | False        | B         |
+| High        | Normal | Normal         | True         | B         |
+| Moderate    | Normal | High           | True         | B         |
+| Moderate    | Red    | Normal         | False        | B         |
+| Low         | Normal | High           | True         | N         |
+
+|          | Temperature |     |        | Skin |     |        | Pressure |     |       | Blocked |     | Diag |      |
+| -------- | ----------- | --- | ------ | ---- | --- | ------ | -------- | --- | ----- | ------- | --- | ---- | ---- |
+|          | N           | B   |        | N    | B   |        | N        | B   |       | N       | B   | N    | B    |
+| Low      | 2           | 3   | Pale   | 1    | 3   | Normal | 1        | 6   | True  | 3       | 3   | 5    | 9    |
+| Moderate | 0           | 5   | Normal | 2    | 4   | High   | 4        | 3   | False | 2       | 6   |      |      |
+| High     | 3           | 1   | Red    | 2    | 2   |        |          |     |       |         |     |      |      |
+|          | Temperature |     |        | Skin |     |        | Pressure |     |       | Blocked |     |      | Diag |
+| Low      | 2/5         | 3/9 | Pale   | 1/5  | 3/9 | Normal | 1/5      | 6/9 | True  | 3/5     | 3/9 | 5/14 | 9/14 |
+| Moderate | 0/5         | 5/9 | Normal | 2/5  | 4/9 | High   | 4/5      | 3/9 | False | 2/5     | 6/9 |      |      |
+| High     | 2/5         | 1/9 | Red    | 3/5  | 2/9 |        |          |     |       |         |     |      |      |
+
+# Problem 1
+# $Pr[Diagnosis=N|E] = \frac{2}{5} \times \frac{2}{5} \times \frac{4}{5} \times \frac{3}{5} \times \frac{5}{14} = 0.027428571$
+# $Pr[Diagnosis = B|E] = \frac{3}{9} \times \frac{4}{9} \times \frac{3}{9} \times \frac{3}{9} \times \frac{9}{14} = 0.010582011$
+
+# $p(B) = \frac{0.0106}{0.0106+0.0274} = 0.2789$
+
+#  $p(N) = \frac{0.0274}{0.0106+0.0274} = 0.7211$
+
+Diagnosis N is much more likely than Diagnosis B
+
+# Problem 2
+
+# $Pr[Diagnosis = N|E] = \frac{2}{5} \times \frac{1}{5} \times \frac{3}{5} \times \frac{5}{14} = 0.0171$
+# $Pr[Diagnosis = B|E] = \frac{3}{9} \times \frac{6}{9} \times \frac{3}{9} \times \frac{9}{14} = 0.0476$
+# $p(N) = \frac{0.0171}{0.0171+0.0476} = 0.2643$
+# $p(B) = \frac{0.0474}{0.0476+0.0171} = 0.7357$
+
+Diagnosis B is much more likely than Diagnosis N
+
+# Problem 3
+
+# $Pr[Diagnosis = N|E] = \frac{0}{5} \times \frac{2}{5} \times \frac{4}{5} \times \frac{3}{5} \times \frac{5}{14} = 0$
+# $Pr[Diagnosis = B|E] = \frac{5}{9} \times \frac{4}{9} \times \frac{3}{9} \times \frac{3}{9} \times \frac{9}{14} = 0.018$
--- a/Mining/Week
+++ b/Mining/Week
@@ -0,0 +1,277 @@
+# Weather Dataset
+
+## Dataset
+
+```
+% This is a comment about the data set. 
+% This data describes examples of whether to play 
+% a game or not depending on weather conditions. 
+@relation letsPlay 
+@attribute outlook {sunny, overcast, rainy} 
+@attribute temperature real 
+@attribute humidity real 
+@attribute windy {TRUE, FALSE} 
+@attribute play {yes, no} 
+
+@data
+sunny,85,FALSE,no
+sunny,90,TRUE,no
+overcast,86,FALSE,yes
+rainy,96,FALSE,yes
+rainy,80,FALSE,yes
+rainy,70,TRUE,no
+overcast,65,TRUE,yes
+sunny,95,FALSE,no
+sunny,70,FALSE,yes
+rainy,80,FALSE,yes
+sunny,70,TRUE,yes
+overcast,90,TRUE,yes
+overcast,75,FALSE,yes
+rainy,91,TRUE,no
+```
+
+## Output
+
+```
+=== Run information ===
+
+Scheme:       weka.classifiers.bayes.NaiveBayes 
+Relation:     letsPlay
+Instances:    14
+Attributes:   5
+              outlook
+              temperature
+              humidity
+              windy
+              play
+Test mode:    evaluate on training data
+
+=== Classifier model (full training set) ===
+
+Naive Bayes Classifier
+
+                 Class
+Attribute          yes      no
+                (0.63)  (0.38)
+===============================
+outlook
+  sunny             3.0     4.0
+  overcast          5.0     1.0
+  rainy             4.0     3.0
+  [total]          12.0     8.0
+
+temperature
+  mean          72.9697 74.8364
+  std. dev.      5.2304   7.384
+  weight sum          9       5
+  precision      1.9091  1.9091
+
+humidity
+  mean          78.8395 86.1111
+  std. dev.      9.8023  9.2424
+  weight sum          9       5
+  precision      3.4444  3.4444
+
+windy
+  TRUE              4.0     4.0
+  FALSE             7.0     3.0
+  [total]          11.0     7.0
+
+Time taken to build model: 0 seconds
+
+=== Evaluation on training set ===
+
+Time taken to test model on training data: 0.01 seconds
+
+=== Summary ===
+
+Correctly Classified Instances          13               92.8571 %
+Incorrectly Classified Instances         1                7.1429 %
+Kappa statistic                          0.8372
+Mean absolute error                      0.2798
+Root mean squared error                  0.3315
+Relative absolute error                 60.2576 %
+Root relative squared error             69.1352 %
+Total Number of Instances               14
+```
+
+# Medical Dataset
+
+## Dataset
+
+```
+```@relation medical
+@attribute Temperature {Low,Moderate,High}
+@attribute Skin {Pale,Normal,Red}
+@attribute BloodPressure {Normal,High}
+@attribute BlockedNose {True,False}
+@attribute Diagnosis {N,B}
+
+@data
+Low, Pale, Normal, True, N
+Moderate, Pale, Normal, True, B
+High, Normal, High, False, N
+Moderate, Pale, Normal, False, B
+High, Red, High, False, N
+High, Red, High, True, N
+Moderate, Red, High, False, B
+Low, Normal, High, False, B
+Low, Pale, Normal, False, B
+Low, Normal, Normal, False, B
+High, Normal, Normal, True, B
+Moderate, Normal, High, True, B
+Moderate, Red, Normal, False, B
+Low, Normal, High, True, N```
+```
+
+## Output
+
+```
+=== Run information ===
+
+Scheme:       weka.classifiers.bayes.NaiveBayes 
+Relation:     diagnosis
+Instances:    14
+Attributes:   5
+              Temperature
+              Skin
+              BloodPressure
+              BlockedNose
+              Diagnosis
+Test mode:    evaluate on training data
+
+=== Classifier model (full training set) ===
+
+Naive Bayes Classifier
+
+                 Class
+Attribute            N      B
+                (0.38) (0.63)
+==============================
+Temperature
+  Low               3.0    4.0
+  Moderate          1.0    6.0
+  High              4.0    2.0
+  [total]           8.0   12.0
+
+Skin
+  Pale              2.0    4.0
+  Normal            3.0    5.0
+  Red               3.0    3.0
+  [total]           8.0   12.0
+
+BloodPressure
+  Normal            2.0    7.0
+  High              5.0    4.0
+  [total]           7.0   11.0
+
+BlockedNose
+  True              4.0    4.0
+  False             3.0    7.0
+  [total]           7.0   11.0
+
+Time taken to build model: 0 seconds
+
+=== Evaluation on training set ===
+
+Time taken to test model on training data: 0 seconds
+
+=== Summary ===
+
+Correctly Classified Instances          12               85.7143 %
+Incorrectly Classified Instances         2               14.2857 %
+Kappa statistic                          0.6889
+Mean absolute error                      0.2635
+Root mean squared error                  0.3272
+Relative absolute error                 56.7565 %
+Root relative squared error             68.2385 %
+Total Number of Instances               14
+```
+
+# Using Test Data
+
+## Test Data
+
+```
+@relation medical
+@attribute Temperature {Low,Moderate,High}
+@attribute Skin {Pale,Normal,Red}
+@attribute BloodPressure {Normal,High}
+@attribute BlockedNose {True,False}
+@attribute Diagnosis {N,B}
+@data
+Low,Normal,High,True,N
+Low,?,Normal,True,B
+Moderate,Normal,High,True,B
+```
+
+## Output
+
+```
+=== Run information ===
+
+Scheme:       weka.classifiers.bayes.NaiveBayes 
+Relation:     medical
+Instances:    14
+Attributes:   5
+              Temperature
+              Skin
+              BloodPressure
+              BlockedNose
+              Diagnosis
+Test mode:    user supplied test set:  size unknown (reading incrementally)
+
+=== Classifier model (full training set) ===
+
+Naive Bayes Classifier
+
+                 Class
+Attribute            N      B
+                (0.38) (0.63)
+==============================
+Temperature
+  Low               3.0    4.0
+  Moderate          1.0    6.0
+  High              4.0    2.0
+  [total]           8.0   12.0
+
+Skin
+  Pale              2.0    4.0
+  Normal            3.0    5.0
+  Red               3.0    3.0
+  [total]           8.0   12.0
+
+BloodPressure
+  Normal            2.0    7.0
+  High              5.0    4.0
+  [total]           7.0   11.0
+
+BlockedNose
+  True              4.0    4.0
+  False             3.0    7.0
+  [total]           7.0   11.0
+
+Time taken to build model: 0 seconds
+
+=== Predictions on test set ===
+
+    inst#     actual  predicted error prediction
+        1        1:N        1:N       0.652 
+        2        2:B        2:B       0.677 
+        3        2:B        2:B       0.706 
+
+=== Evaluation on test set ===
+
+Time taken to test model on supplied test set: 0 seconds
+
+=== Summary ===
+
+Correctly Classified Instances           3              100      %
+Incorrectly Classified Instances         0                0      %
+Kappa statistic                          1     
+Mean absolute error                      0.3215
+Root mean squared error                  0.3223
+Relative absolute error                 70.1487 %
+Root relative squared error             68.0965 %
+Total Number of Instances                3
+```