vault backup: 2024-01-25 14:02:02

This commit is contained in:
2024-01-25 14:02:03 +00:00
parent c09a4d605d
commit 14b75ba16c
32 changed files with 19979 additions and 26 deletions

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,78 @@
# Query Optimisation
Dominant cost in query processing is secondary storage access. The fewer blocks accessed, the faster the database queries can be.
Many tuples will fit into a single block, requiring the query to: find the block, find the tuple, edit the tuple, and place it back in the block. This is inefficient if we are doing many different queries accessing different blocks.
# Types of Index
## Primary Index
Data file sequentially ordered by ordering key field, indexing field is built on the ordering key field. Guaranteed to have unique value for each tuple.
## Clustering Index
Data file sequentially ordered on non-key field, indexing field built on same non-key field. Can be more than one tuple corresponding to a value in the indexing field.
## Primary / Clustered Indices
Affect the order the data is stored in a file.
## Secondary Indices
Give a lookup table to the file.
# Index Restrictions
- Table can have 1 primary index OR 1 clustering index.
- Most frequently looked up value is often the best choice
- Some DBMS' assume PK is primary index, as it is usually used to refer to rows
# Exercise
1. What is a Query Tree?
A query tree is a visual model used to represent a logical model of database queries. Each leaf node represents a relation. Each internal node is a different query function ex. selection, projection, product, etc.
2. Write a relational algebra expression for the following query
```sql
SELECT lecName, schedule
FROM lecturer, module, enroll, student
WHERE lastName=Burns
AND firstName=Edward
AND module.moduleNumber=enrol.moduleNumber
AND lecturer.lecID=module.lectID
AND student.stuID=enrol.stuID;
```
![](Pasted%20image%2020231121134251.png)
pi lecName, schedule
( sigma lastName = burns
( sigma firstName = edwards
( sigma module.moduleNumber=enrol.moduleNumber
( sigma (lecturer.ledID=module.lectID (lecturer x module ) x enrol ) x student
)
)
)
Draw a Query Tree for this SQL query:
3. Why can we not use SQL for query optimisation?
Using relational algebra is easier for us to visualise, and less abstracted than SQL, allowing us to optimise the query flow.
4. List the heuristics that optimisers use to reduce optimisation cost.
- Begin with initial query tree for SQL
- Move SELECT operations down the tree
- Apply more restrictive SELECT operations first ( eg. equalities before range queries )
- Replace Cartesian products followed by selection with theta joins ( eg. *sigma(f) ( RxS )* -> *R theta(f) S* )
- Move PROJECT operations down the query tree ( add project operations as inputs to theta joins ).
5. Draw a near optimal query tree for the following SQL query, and write a relational algebra expression for this tree.
```sql
SELECT sailors.name
FROM sailors, reservations
WHERE reservations.sID=sailors.ID
AND reservations.bID=100
AND sailors.rating=7;
```
![](Pasted%20image%2020240123164006.png)