G4G0-1/Semester 2/Database Systems/Week 2/Week 2 Database Systems.md

# Query Optimisation

Dominant cost in query processing is secondary storage access. The fewer blocks accessed, the faster the database queries can be.
Many tuples will fit into a single block, requiring the query to: find the block, find the tuple, edit the tuple, and place it back in the block. This is inefficient if we are doing many different queries accessing different blocks.

# Types of Index

## Primary Index

Data file sequentially ordered by ordering key field, indexing field is built on the ordering key field. Guaranteed to have unique value for each tuple.

## Clustering Index

Data file sequentially ordered on non-key field, indexing field built on same non-key field. Can be more than one tuple corresponding to a value in the indexing field.

## Primary / Clustered Indices

Affect the order the data is stored in a file.

## Secondary Indices

Give a lookup table to the file.

# Index Restrictions

- Table can have 1 primary index OR 1 clustering index.
	- Most frequently looked up value is often the best choice
	- Some DBMS' assume PK is primary index, as it is usually used to refer to rows

# Exercise

1. What is a Query Tree?
A query tree is a visual model used to represent a logical model of database queries. Each leaf node represents a relation. Each internal node is a different query function ex. selection, projection, product, etc.

2. Write a relational algebra expression for the following query

```sql
SELECT lecName, schedule
FROM lecturer, module, enroll, student
WHERE lastName=“Burns”
AND firstName=“Edward”
AND module.moduleNumber=enrol.moduleNumber
AND lecturer.lecID=module.lectID
AND student.stuID=enrol.stuID;
```
![](Pasted%20image%2020231121134251.png)

pi lecName, schedule
( sigma lastName = burns
	( sigma firstName = edwards
		( sigma module.moduleNumber=enrol.moduleNumber
			( sigma (lecturer.ledID=module.lectID (lecturer x module ) x enrol ) x student
		)
	)
)

Draw a Query Tree for this SQL query:

3. Why can we not use SQL for query optimisation?
Using relational algebra is easier for us to visualise, and less abstracted than SQL, allowing us to optimise the query flow.

4. List the heuristics that optimisers use to reduce optimisation cost.
- Begin with initial query tree for SQL
- Move SELECT operations down the tree
- Apply more restrictive SELECT operations first ( eg. equalities before range queries )
- Replace Cartesian products followed by selection with theta joins ( eg. *sigma(f) ( RxS )* -> *R theta(f) S* )
- Move PROJECT operations down the query tree ( add project operations as inputs to theta joins ).

5. Draw a near optimal query tree for the following SQL query, and write a relational algebra expression for this tree.
```sql
SELECT sailors.name
FROM sailors, reservations
WHERE reservations.sID=sailors.ID
AND reservations.bID=100
AND sailors.rating=7;
```

![](Pasted%20image%2020240123164006.png)

# Workshop

Count number of tuples in the following relations:
1. Products
![](Pasted%20image%2020240126101329.png)
2. Suppliers
![](Pasted%20image%2020240126101454.png)

How many suppliers does each product have?
Many suppliers to many products

Write SQL queries which count the number of tuples in each of the following algebraic statements.
1. The relation created by products x suppliers
![](Pasted%20image%2020240126103004.png)
2. The relation created by Products theta Products.SupplierID=Suppliers.SupplierID Suppliers
![](Pasted%20image%2020240126102902.png)

For each of the following write a description of the data it will retrieve and execute a single SQL statement which retrieves this data from the database

1. This will return the product name of all products shipped from Manchester.
2. This will return the same as the first, but be more efficient.
3. This will return the city of all employees that live in the same city as a customer.