G4G0-2/AI & Data Mining/Week 1/Lecture 1 - Introduction to Data Mining.md

# Assessment

## T1

- Exam (50%)

## T2

- Coursework (50%)

# Resources

Data Mining: Practical Machine Learning Tools and Techniques (Witten, Frank, Hall & Pal) 4th Edition 2016

Scientific Calculator

# Data Vs Information

- Too much data
- Valuable resource
- Raw data less important, need to develop techniques to extract information
	- Data: recorded facts
	- Information: patterns underlying data

# Philosophy

## Cow Culling

- Cows described by 700 features about certain variables
- Problem is the selection of cows of which to cull
- Data is historical records, and farmer decisions
- Machine Learning used to ascertain which factors taken into account by farmers, rather than automating the decision making process.

# Definition of Data Mining

- The extraction of:
	- Implicit,
	- Previously unknown,
	- Potentially useful data
- Programs that detect patterns and regularities are needed
- Strong patterns => good predictions
	- Issues:
		- Most patterns not interesting
		- Patterns may be inexact
		- Data may be garbled or missing

# Machine Learning Techniques

- Algorithms for acquiring structural descriptions from examples
- Structural descriptions represent patterns, explicitly.
	- Predict outcome in new situation
	- Understand and explain how prediction derived.
- Methods originate from AI, statistics and research on databases.

# Can Machines Learn?

- By definition, sort of. The ability to obtain knowledge by study, experience or being taught, is very difficult to measure.
- Does learning imply intention?

# Terminology

- Concept - Thing to be learned
- Example / Instance - Individual, independent examples of a concept
- Attributes / Features - Measuring aspects of an example / instance
- Concept description (pattern, model, hypothesis) - Output for data mining algorithms.

# Famous Small Datasets

- Will be used in module
- Unrealistically simple

## Weather Dataset - Nominal

Concept: conditions which are suitable for a game.
Reference: Quinlan, J.R. (1986)
Induction of decision trees. Machine
Learning, 1(1), 81-106.

### Attributes

3\*3\*2\*2 = 36 possible combinations of values.
Outlook

- sunny, overcast, rainy
Temperature
- hot, mild, cool
Humidity
- high, normal
Windy
- yes, no
Play
- Class
- yes, no

### Dataset

![](Pasted%20image%2020240919134249.png)
![](Pasted%20image%2020240919134304.png)

Rules ordered, higher = higher priority

### Weather Dataset - Mixed

![](Pasted%20image%2020240919134526.png)
![](Pasted%20image%2020240919134535.png)

## Contact Lenses Dataset

Describes conditions under which an optician might want to prescribe soft, hard or no contact lenses.
Grossly over-simplified.
Reference: Cendrowska, J. (1987). Prism: an algorithm
for inducing module rules. Journal of Man-Machine
Studies, 27(4), 349–370.

### Attributes

3\*2\*2\*2 = 24 possibilities
Dataset is exhaustive, which is unusual.

Age

- young, pre-presbyopic, presbyopic
Spectacle Prescription
- myope (short), hypermetrope (long)
Astigmatism
- yes, no
Tear Production Rate
- reduced, normal
Recommended Lenses
- class
- hard, soft, none

### Dataset

![](Pasted%20image%2020240919134848.png)

## Iris Dataset

Used in many statistical experiments
Contains numeric attributes of 3 different types of iris.
Created in 1936 by Sir Ronald Fisher

### Dataset

![](Pasted%20image%2020240920130950.png)

# Styles of Learning

- Classification Learning: Predicting a **nominal** class
- Numeric Prediction (Regression): Predicting a **numeric** quantity
- Clustering: Grouping similar examples into clusters
- Association Learning: Detecting associations between attributes

## Classification Learning

- Nominal
- Supervised
	- Provided with actual value of the class
- Measure success on fresh data for which class labels are known (test data)

## Numeric Prediction (Regression)

- Numeric
- Supervised
- Test Data

![](Pasted%20image%2020240920131244.png)

Example uses a linear regression function to provide an estimated performance value based on attributes.

## Clustering

- Finding similar groups
- Unsupervised
	- Class of example is unknown
- Success measured **subjectively**

## Association Learning

- Applied if no class specified, and any kind of structure is interesting
- Difference to Classification Learning:
	- Predicts any attribute's value, not just class.
	- More than one attribute's value at a time
	- Far more association rules than classification rules.

## Classification Vs Association Rules

Classification Rule:

- Predicts value of a given attribute (class of example)
- ``If outlook = sunny and humidity = high, then play = no``

Association Rule:

- Predicts value of arbitrary attribute / combination

```If temperature = cool, humidity = normal
If humidity = normal and windy = false, play = yes
If outlook = sunny and play = no, humidity = high
If windy = false and play = no, then outlook = sunny and humidity = high
```

# Data Mining and Ethics

- Ethical Issues arise in practical applications
- Data mining often used to discriminate
- Ethical situation depends on application
- Attributes may contain problematic information
- Does ownership of data bestow right to use it in other ways than those purported when it was originally collected?
- Who is permitted to access the data?
- For what purpose was the data collected?
- What conclusions can sensibly be drawn?
- Caveats must be attached to results
- Purely statistical arguments never sufficient