vault backup: 2024-10-16 09:12:37
This commit is contained in:
@@ -0,0 +1,214 @@
|
||||
# Assessment
|
||||
|
||||
## T1
|
||||
|
||||
- Exam (50%)
|
||||
|
||||
## T2
|
||||
|
||||
- Coursework (50%)
|
||||
|
||||
# Resources
|
||||
|
||||
Data Mining: Practical Machine Learning Tools and Techniques (Witten, Frank, Hall & Pal) 4th Edition 2016
|
||||
|
||||
Scientific Calculator
|
||||
|
||||
# Data Vs Information
|
||||
|
||||
- Too much data
|
||||
- Valuable resource
|
||||
- Raw data less important, need to develop techniques to extract information
|
||||
- Data: recorded facts
|
||||
- Information: patterns underlying data
|
||||
|
||||
# Philosophy
|
||||
|
||||
## Cow Culling
|
||||
|
||||
- Cows described by 700 features about certain variables
|
||||
- Problem is the selection of cows of which to cull
|
||||
- Data is historical records, and farmer decisions
|
||||
- Machine Learning used to ascertain which factors taken into account by farmers, rather than automating the decision making process.
|
||||
|
||||
# Definition of Data Mining
|
||||
|
||||
- The extraction of:
|
||||
- Implicit,
|
||||
- Previously unknown,
|
||||
- Potentially useful data
|
||||
- Programs that detect patterns and regularities are needed
|
||||
- Strong patterns => good predictions
|
||||
- Issues:
|
||||
- Most patterns not interesting
|
||||
- Patterns may be inexact
|
||||
- Data may be garbled or missing
|
||||
|
||||
# Machine Learning Techniques
|
||||
|
||||
- Algorithms for acquiring structural descriptions from examples
|
||||
- Structural descriptions represent patterns, explicitly.
|
||||
- Predict outcome in new situation
|
||||
- Understand and explain how prediction derived.
|
||||
- Methods originate from AI, statistics and research on databases.
|
||||
|
||||
# Can Machines Learn?
|
||||
|
||||
- By definition, sort of. The ability to obtain knowledge by study, experience or being taught, is very difficult to measure.
|
||||
- Does learning imply intention?
|
||||
|
||||
# Terminology
|
||||
|
||||
- Concept - Thing to be learned
|
||||
- Example / Instance - Individual, independent examples of a concept
|
||||
- Attributes / Features - Measuring aspects of an example / instance
|
||||
- Concept description (pattern, model, hypothesis) - Output for data mining algorithms.
|
||||
|
||||
# Famous Small Datasets
|
||||
|
||||
- Will be used in module
|
||||
- Unrealistically simple
|
||||
|
||||
## Weather Dataset - Nominal
|
||||
|
||||
Concept: conditions which are suitable for a game.
|
||||
Reference: Quinlan, J.R. (1986)
|
||||
Induction of decision trees. Machine
|
||||
Learning, 1(1), 81-106.
|
||||
|
||||
### Attributes
|
||||
|
||||
3\*3\*2\*2 = 36 possible combinations of values.
|
||||
Outlook
|
||||
|
||||
- sunny, overcast, rainy
|
||||
Temperature
|
||||
- hot, mild, cool
|
||||
Humidity
|
||||
- high, normal
|
||||
Windy
|
||||
- yes, no
|
||||
Play
|
||||
- Class
|
||||
- yes, no
|
||||
|
||||
### Dataset
|
||||
|
||||

|
||||

|
||||
|
||||
Rules ordered, higher = higher priority
|
||||
|
||||
### Weather Dataset - Mixed
|
||||
|
||||

|
||||

|
||||
|
||||
## Contact Lenses Dataset
|
||||
|
||||
Describes conditions under which an optician might want to prescribe soft, hard or no contact lenses.
|
||||
Grossly over-simplified.
|
||||
Reference: Cendrowska, J. (1987). Prism: an algorithm
|
||||
for inducing module rules. Journal of Man-Machine
|
||||
Studies, 27(4), 349–370.
|
||||
|
||||
### Attributes
|
||||
|
||||
3\*2\*2\*2 = 24 possibilities
|
||||
Dataset is exhaustive, which is unusual.
|
||||
|
||||
Age
|
||||
|
||||
- young, pre-presbyopic, presbyopic
|
||||
Spectacle Prescription
|
||||
- myope (short), hypermetrope (long)
|
||||
Astigmatism
|
||||
- yes, no
|
||||
Tear Production Rate
|
||||
- reduced, normal
|
||||
Recommended Lenses
|
||||
- class
|
||||
- hard, soft, none
|
||||
|
||||
### Dataset
|
||||
|
||||

|
||||
|
||||
## Iris Dataset
|
||||
|
||||
Used in many statistical experiments
|
||||
Contains numeric attributes of 3 different types of iris.
|
||||
Created in 1936 by Sir Ronald Fisher
|
||||
|
||||
### Dataset
|
||||
|
||||

|
||||
|
||||
# Styles of Learning
|
||||
|
||||
- Classification Learning: Predicting a **nominal** class
|
||||
- Numeric Prediction (Regression): Predicting a **numeric** quantity
|
||||
- Clustering: Grouping similar examples into clusters
|
||||
- Association Learning: Detecting associations between attributes
|
||||
|
||||
## Classification Learning
|
||||
|
||||
- Nominal
|
||||
- Supervised
|
||||
- Provided with actual value of the class
|
||||
- Measure success on fresh data for which class labels are known (test data)
|
||||
|
||||
## Numeric Prediction (Regression)
|
||||
|
||||
- Numeric
|
||||
- Supervised
|
||||
- Test Data
|
||||
|
||||

|
||||
|
||||
Example uses a linear regression function to provide an estimated performance value based on attributes.
|
||||
|
||||
## Clustering
|
||||
|
||||
- Finding similar groups
|
||||
- Unsupervised
|
||||
- Class of example is unknown
|
||||
- Success measured **subjectively**
|
||||
|
||||
## Association Learning
|
||||
|
||||
- Applied if no class specified, and any kind of structure is interesting
|
||||
- Difference to Classification Learning:
|
||||
- Predicts any attribute's value, not just class.
|
||||
- More than one attribute's value at a time
|
||||
- Far more association rules than classification rules.
|
||||
|
||||
## Classification Vs Association Rules
|
||||
|
||||
Classification Rule:
|
||||
|
||||
- Predicts value of a given attribute (class of example)
|
||||
- ``If outlook = sunny and humidity = high, then play = no``
|
||||
|
||||
Association Rule:
|
||||
|
||||
- Predicts value of arbitrary attribute / combination
|
||||
|
||||
```If temperature = cool, humidity = normal
|
||||
If humidity = normal and windy = false, play = yes
|
||||
If outlook = sunny and play = no, humidity = high
|
||||
If windy = false and play = no, then outlook = sunny and humidity = high
|
||||
```
|
||||
|
||||
# Data Mining and Ethics
|
||||
|
||||
- Ethical Issues arise in practical applications
|
||||
- Data mining often used to discriminate
|
||||
- Ethical situation depends on application
|
||||
- Attributes may contain problematic information
|
||||
- Does ownership of data bestow right to use it in other ways than those purported when it was originally collected?
|
||||
- Who is permitted to access the data?
|
||||
- For what purpose was the data collected?
|
||||
- What conclusions can sensibly be drawn?
|
||||
- Caveats must be attached to results
|
||||
- Purely statistical arguments never sufficient
|
Reference in New Issue
Block a user