103 lines
3.8 KiB
Markdown
103 lines
3.8 KiB
Markdown
# Query Optimisation
|
|
|
|
Dominant cost in query processing is secondary storage access. The fewer blocks accessed, the faster the database queries can be.
|
|
Many tuples will fit into a single block, requiring the query to: find the block, find the tuple, edit the tuple, and place it back in the block. This is inefficient if we are doing many different queries accessing different blocks.
|
|
|
|
# Types of Index
|
|
|
|
## Primary Index
|
|
|
|
Data file sequentially ordered by ordering key field, indexing field is built on the ordering key field. Guaranteed to have unique value for each tuple.
|
|
|
|
## Clustering Index
|
|
|
|
Data file sequentially ordered on non-key field, indexing field built on same non-key field. Can be more than one tuple corresponding to a value in the indexing field.
|
|
|
|
## Primary / Clustered Indices
|
|
|
|
Affect the order the data is stored in a file.
|
|
|
|
## Secondary Indices
|
|
|
|
Give a lookup table to the file.
|
|
|
|
# Index Restrictions
|
|
|
|
- Table can have 1 primary index OR 1 clustering index.
|
|
- Most frequently looked up value is often the best choice
|
|
- Some DBMS' assume PK is primary index, as it is usually used to refer to rows
|
|
|
|
# Exercise
|
|
|
|
1. What is a Query Tree?
|
|
A query tree is a visual model used to represent a logical model of database queries. Each leaf node represents a relation. Each internal node is a different query function ex. selection, projection, product, etc.
|
|
|
|
2. Write a relational algebra expression for the following query
|
|
|
|
```sql
|
|
SELECT lecName, schedule
|
|
FROM lecturer, module, enroll, student
|
|
WHERE lastName=“Burns”
|
|
AND firstName=“Edward”
|
|
AND module.moduleNumber=enrol.moduleNumber
|
|
AND lecturer.lecID=module.lectID
|
|
AND student.stuID=enrol.stuID;
|
|
```
|
|

|
|
|
|
pi lecName, schedule
|
|
( sigma lastName = burns
|
|
( sigma firstName = edwards
|
|
( sigma module.moduleNumber=enrol.moduleNumber
|
|
( sigma (lecturer.ledID=module.lectID (lecturer x module ) x enrol ) x student
|
|
)
|
|
)
|
|
)
|
|
|
|
Draw a Query Tree for this SQL query:
|
|
|
|
3. Why can we not use SQL for query optimisation?
|
|
Using relational algebra is easier for us to visualise, and less abstracted than SQL, allowing us to optimise the query flow.
|
|
|
|
4. List the heuristics that optimisers use to reduce optimisation cost.
|
|
- Begin with initial query tree for SQL
|
|
- Move SELECT operations down the tree
|
|
- Apply more restrictive SELECT operations first ( eg. equalities before range queries )
|
|
- Replace Cartesian products followed by selection with theta joins ( eg. *sigma(f) ( RxS )* -> *R theta(f) S* )
|
|
- Move PROJECT operations down the query tree ( add project operations as inputs to theta joins ).
|
|
|
|
5. Draw a near optimal query tree for the following SQL query, and write a relational algebra expression for this tree.
|
|
```sql
|
|
SELECT sailors.name
|
|
FROM sailors, reservations
|
|
WHERE reservations.sID=sailors.ID
|
|
AND reservations.bID=100
|
|
AND sailors.rating=7;
|
|
```
|
|
|
|

|
|
|
|
# Workshop
|
|
|
|
Count number of tuples in the following relations:
|
|
1. Products
|
|

|
|
2. Suppliers
|
|

|
|
|
|
How many suppliers does each product have?
|
|
Many suppliers to many products
|
|
|
|
Write SQL queries which count the number of tuples in each of the following algebraic statements.
|
|
1. The relation created by products x suppliers
|
|

|
|
2. The relation created by Products theta Products.SupplierID=Suppliers.SupplierID Suppliers
|
|

|
|
|
|
For each of the following write a description of the data it will retrieve and execute a single SQL statement which retrieves this data from the database
|
|
|
|
1. This will return the product name of all products shipped from Manchester.
|
|
2. This will return the same as the first, but be more efficient.
|
|
3. This will return the city of all employees that live in the same city as a customer.
|
|
|