# Query Optimisation Dominant cost in query processing is secondary storage access. The fewer blocks accessed, the faster the database queries can be. Many tuples will fit into a single block, requiring the query to: find the block, find the tuple, edit the tuple, and place it back in the block. This is inefficient if we are doing many different queries accessing different blocks. # Types of Index ## Primary Index Data file sequentially ordered by ordering key field, indexing field is built on the ordering key field. Guaranteed to have unique value for each tuple. ## Clustering Index Data file sequentially ordered on non-key field, indexing field built on same non-key field. Can be more than one tuple corresponding to a value in the indexing field. ## Primary / Clustered Indices Affect the order the data is stored in a file. ## Secondary Indices Give a lookup table to the file. # Index Restrictions - Table can have 1 primary index OR 1 clustering index. - Most frequently looked up value is often the best choice - Some DBMS' assume PK is primary index, as it is usually used to refer to rows # Exercise 1. What is a Query Tree? A query tree is a visual model used to represent a logical model of database queries. Each leaf node represents a relation. Each internal node is a different query function ex. selection, projection, product, etc. 2. Write a relational algebra expression for the following query ```sql SELECT lecName, schedule FROM lecturer, module, enroll, student WHERE lastName=“Burns” AND firstName=“Edward” AND module.moduleNumber=enrol.moduleNumber AND lecturer.lecID=module.lectID AND student.stuID=enrol.stuID; ``` ![](Pasted%20image%2020231121134251.png) pi lecName, schedule ( sigma lastName = burns ( sigma firstName = edwards ( sigma module.moduleNumber=enrol.moduleNumber ( sigma (lecturer.ledID=module.lectID (lecturer x module ) x enrol ) x student ) ) ) Draw a Query Tree for this SQL query: 3. Why can we not use SQL for query optimisation? Using relational algebra is easier for us to visualise, and less abstracted than SQL, allowing us to optimise the query flow. 4. List the heuristics that optimisers use to reduce optimisation cost. - Begin with initial query tree for SQL - Move SELECT operations down the tree - Apply more restrictive SELECT operations first ( eg. equalities before range queries ) - Replace Cartesian products followed by selection with theta joins ( eg. *sigma(f) ( RxS )* -> *R theta(f) S* ) - Move PROJECT operations down the query tree ( add project operations as inputs to theta joins ). 5. Draw a near optimal query tree for the following SQL query, and write a relational algebra expression for this tree. ```sql SELECT sailors.name FROM sailors, reservations WHERE reservations.sID=sailors.ID AND reservations.bID=100 AND sailors.rating=7; ``` ![](Pasted%20image%2020240123164006.png) # Workshop Count number of tuples in the following relations: 1. Products ![](Pasted%20image%2020240126101329.png) 2. Suppliers ![](Pasted%20image%2020240126101454.png) How many suppliers does each product have? Many suppliers to many products Write SQL queries which count the number of tuples in each of the following algebraic statements. 1. The relation created by products x suppliers ![](Pasted%20image%2020240126103004.png) 2. The relation created by Products theta Products.SupplierID=Suppliers.SupplierID Suppliers ![](Pasted%20image%2020240126102902.png) For each of the following write a description of the data it will retrieve and execute a single SQL statement which retrieves this data from the database 1. This will return the product name of all products shipped from Manchester. 2. This will return the same as the first, but be more efficient. 3. This will return the city of all employees that live in the same city as a customer.