CS330 Spring 2003 Final Exam Study Guide

Material

The final exam will cover all the material that was discussed in the lectures. Here is a high-level overview list of the material that we covered (note that the material we covered in class is the ultimate guide):

The Entity-Relationship Model
Transactions
The relational model
Mapping ER models to relational tables
Integrity constraints
Relational algebra
SQL
Schema refinement and normal forms
Indexing; hash and tree-structured indexes
Disks and RAID
Physical database design
Database security
Three-tier architectures, HTTP, HTML, HTTP GET versus POST, Cookies, CGI, Application servers, Servlets, JSP, Cookies, JavaScript
XML, XML DTDs, XPath, XSLT
Stored procedures
Decision support and data warehousing
Data mining: Knowledge discovery process, decision trees (split selection, pruning), clustering, association rules
Peer-to-peer systems

A lot of the material, including sample questions can be found in Ramakrishnan and Gehrke: Database Management Systems (3rd edition). Especially helpful will be chapters 1-8, 12, 16, 19-21, 23 and 26. You can find study material for these chapters here: http://www.cs.wisc.edu/~dbbook/openAccess/thirdEdition/supporting_material.htm (look for the solutions material for the exercises; there are exercises for chapter 1-13 available that you can just download without purchasing the book). There are copies of the book available on reserve in the Engineering library.

Sample Questions (Last Update May 7, 2003)

Three-Tier Architectures

Review Question 1

In three sentences, describe the difference between a Servlet and JSP page.

Review Question 2

Describe three different techniques for implementing top-N queries at the middle tier.

Review Question 3

What are the advantages of a three-tier architecture versus a single-tier architecture or a two-tier architecture? What are its disadvantages?

Decision Support and Data Mining

Review Question 1

Why do we need to prune a decision tree? Why should we use a separate pruning dataset instead of pruning the tree with the training database?

Review Question 2

Describe the K-Means algorithm.

Review Question 3

Describe the difference between classification and clustering.

Review Question 4

Consider the relation R(ProductId, LocationId, Sales), where ProductId and LocationId are dimension attributes, and Sales is a measure attribute. Further, consider the following tuples stored in the relation R: (1, 10, 15), (1, 20, 43), (1, 30, 19), (2, 10, 90), (2, 30, 17), (3, 20, 12).

a) Draw the two-dimensional data cube for R.

b) Write down the set of SQL group-by queries on R that can be used to compute the above data cube.

Review Question 5

What is the difference between OLAP and Data Warehousing?

Review Question 6

Describe the differences between ER-modeling and dimensional data modeling.

Queries and Views

Review Question 1

Consider the following schema (keys are underlined):
Product(pid, name, type, mfgr, price), Buys(cid, pid), Customer(cid, cname, age, gender)

a) Write the following query in relational algebra: "Find the names of all customers who have purchased all products manufactured by Sears." (5 points)

b) Write the following query in SQL: "Find the names of all customers who have not purchased the most expensive product." (5 points)

Review Question 2

Illustrate with an example how views can be used to mask changes to the conceptual schema. Your example should include the original conceptual schema, the new conceptual schema, and a view definition that masks this change so that applications do not have to be rewritten.

Review Question 3

Illustrate with an example how views can be used for security. Your example should include a table, a view defined over the table, and a brief description of how the view "hides" some information from an unauthorized user.

Normalization

Review Question 1

Consider a relation R with five attributes ABCDE. Now assume that R is decomposed into two smaller relations ABC and CDE. Define S to be the relation (ABC NaturalJoin CDE). Assume that the above decomposition is lossless join, but not dependency preserving. You do not know any additional information about the decomposition. Which of the following statements can you infer to be always true: (1) R = S, (2) R Í S, (3) R Ì S, (4) R Ê S, (5) R É S, (6) none of the above. List all true statements if more than one statement can be inferred to be true. (4 points)

Review Question 2

Consider a relation R(A, B, C) with the following tuples: (1, 3, 7), (2, 3, 7), (2, 5, 7).

a) List two functional dependencies that do not hold over R.

b) List all non-trivial, non-key functional dependencies that could possibly hold over R (Recall that X à Y is a key functional dependency if XY = R)

Review Question 3

Prove the following inference rule for functional dependencies using only Armstrong's axioms:

If P --> QR and R --> S, then P --> QS

Show the steps of your proof, and indicate which of Armstrong's axioms is applied in each step.

Review Question 4

Consider a relation R with five attribute ABCDE.

a) Give a set of functional dependencies such that the decomposition of R into ABC and CDE is lossless join, but not dependency preserving.

b) Give a set of functional dependencies such that the decomposition of R into ABC and CDE is dependency preserving, but not lossless join.

Review Question 5

Consider a relation R with five attributes ABCDE. You are given the following functional dependencies:

AD --> E, BE --> C, C --> D

a) Find all the keys for the relation R.

b) In what normal form is relation R?