# assignment 1702

Intro to Data Mining

Chapter 6 exercises:

5. Prove Equation 6.3 in the book. (Hint: First, count the number of ways to create an itemset that forms the left hand side of the rule. Next, for each size k itemset selected for the left-hand side, count the number of ways to choose the remaining d âˆ’ k items to form the right-hand side of the rule.)

Chapter 7 exercises:

15. Describe the types of modifications necessary to adapt the frequent subgraph mining algorithm to handle:

(a) Directed graphs

(b) Unlabeled graphs

(c) Acyclic graphs

(d) Disconnected graphs

For each type of graph given above, describe which step of the algorithm will be affected (candidate generation, candidate pruning, and support counting), and any further optimization that can help improve the efficiency of the algorithm.

Chapter 8 exercises:

8. Consider the mean of a cluster of objects from a binary transaction data set. What are the minimum and maximum values of the components of the mean? What is the interpretation of components of the cluster mean? Which components most accurately characterize the objects in the cluster?

11. Total SSE is the sum of the SSE for each separate attribute. What does it mean if the SSE for one variable is low for all clusters? Low for just one cluster? High for all clusters? High for just one cluster? How could you use the per variable SSE information to improve your clustering?

18. Suppose we find K clusters using Ward’s method, bisecting K-means, and ordinary K-means. Which of these solutions represents a local or global minimum? Explain.