Beginning Database Design- P17
Mô tả tài liệu
Beginning Database Design- P17:This book focuses on the relational database model from a beginning perspective. The title is, therefore, Beginning Database Design. A database is a repository for data. In other words, you can store lots of information in a database. A relational database is a special type of database using structures called tables. Tables are linked together using what are called relationships. You can build tables with relationships between those tables, not only to organize your data, but also to allow later retrieval of information from the database....
Tóm tắt nội dung
293 Creating and Refining Tables During the Design Phase ❑ Western Union ❑ Cash ❑ Visa ❑ MasterCard ❑ American Express So, the PAYMENT_METHODS field for a specific listing could be something like this: Cashier’s Check, Western Union, Visa, MasterCard This string is a comma-delimited list. A comma-delimited list is by definition a multi-valued set. A multi-valued set is thus a set, or a single item containing more than one possible value. 4NF demands that comma delimited strings should be split up. In the case of an online auction house, it is likely that the PAYMENT_METHODS field would only be used for online display. Then again, the list could be split in applications. For example, the string value Visa determines that a specific type of credit card is acceptable, perhaps processing payment through an online credit card payment service for Visa credit cards. 4NF would change the OLTP database model in Figure 10-18 to that shown in Figure 10-20. Figure 10-20: Applying 4NF to the OLTP database model. Seller_History Seller seller_id seller popularity_rating join_date address return_policy international seller_history_id buyer_id (FK) seller_id (FK) comment_date comments Buyer_History buyer_history_id seller_id (FK) buyer_id (FK) comment_date comments Buyer buyer_id buyer popularity_rating join_date address Category_Primary primary_id primary secondary Seller_Payment_Methods seller_id (FK) payment method Category_Secondary secondary_id primary_id (FK) tertiary Category_Tertiary tertiary_id secondary_id (FK) Listing seller_id (FK) tertiary_id (FK) secondary_id (FK) buyer_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids winning_price Bid bidder_id (FK) listing# (FK) bid_price bid_date listing# 16_574906 ch10.qxd 11/4/05 10:46 AM Page 293 The sensibility of the application of 4NF, as shown in Figure 10-20, depends on applications. Once again, increasing the number of tables in a database model leads to more tables in query joins. The more tables there are in query joins, the more performance is adversely affected. Using the 4NF application shown in Figure 10-20, a seller could allow four payment methods as follows: Cashier’s Check, Western Union, Visa, MasterCard That seller would have four records as shown in Figure 10-21. Figure 10-21: Dividing a comma delimited list into separate records using 4NF. Reading SELLER records using the database model shown in Figure 10-20 would require a two-table join of the SELLER and SELLER_PAYMENT_METHODS tables. On the contrary, without the 4NF application, as for the database model shown in Figure 10-18, only a single table would be read. Querying a single table is better and easier than a two table join; however, two-table joins perform perfectly adequately between a few tables, with no significant effect on performance, unless one of the tables has a huge number of records. The only problem with the database model structure in Figure 10-20 is that the SELLER_PAYMENT_METHODS table potentially has very few records for each SELLER record. Is there any point in dividing up multi-valued strings in this case? Splitting comma-delimited strings in programming languages for applications, is one of the easiest things in the world, and is extremely unlikely to cause performance problems in applications. Doing this type of normalization at the database model level using 4NF, on this scale, is a little overzealous — to say the least! Denormalizing 5NF 5NF can be used, and not necessarily should be used, to eliminate cyclic dependencies. A cyclic dependency is something that depends on one thing, such that the one thing is either directly or indirectly dependent upon itself. Thus, a cyclic dependency is a form of circular dependency, where three pairs result, as a combination of a single three-field composite primary key table. For example, the three pairs could be field 1 with field 2, field 2 with field 3, and field 1 with field 3. In other words, the cyclic dependency means that everything is related to everything else, including itself. There is a combination or a permutation, which excludes repetitions. If tables are joined, again using a three-table join, the resulting records will be the same as that present in the original table. It is a stated requirement of the validity of 5NF that the post-transformation join must match the number of records for a query on the pre-transformation table. Effectively, 5NF is similar to 4NF, in that both attempt to minimize the number of fields in composite keys. Figure 10-18 has no composite primary keys, because surrogate keys are used. At this stage, using 5NF is thus a little pointless; however, take a quick look at Figure 10-5 (earlier in this chapter) where surrogate keys were not yet implemented into the online auction house OLTP database model. The structure of the category tables in Figure 10-5 looks similar to that shown in Figure 10-22. SELLER_ID 1 1 1 1 PAYMENT_METHOD Cashier’s Check Western Union Visa Mastercard 294 Chapter 10 16_574906 ch10.qxd 11/4/05 10:46 AM Page 294 Figure 10-22: 5NF can help to break down composite primary keys. Does the end justify the means? Commercially, probably not! As you can see in Figure 10-22, the 5NF implementation starts to look a little like the hierarchical structure shown on the left of Figure 10-22. Case Study: Backtracking and Refining an OLTP Database Model This is the part where you get to ignore the deep-layer normalization applied in the previous section, and go back to the OLTP database model shown in Figure 10-18. And, yes, the database model in Figure 10-18 can be denormalized. Essentially, there are no rules or any kind of process with respect to performing denormalization. Denormalization is mostly common sense. In this case, common sense is the equivalent of experience. Figure 10-18 is repeated here again, in Figure 10-23, for convenience. 5NF5NF Category_Primary primary secondary Category_Secondary primary (FK) secondary (FK) Category_Tertiary primary (FK) tertiary Primary_Secondary primary secondary Primary_Tertiary primary tertiary Secondary_Tertiary secondary tertiary 295 Creating and Refining Tables During the Design Phase 16_574906 ch10.qxd 11/4/05 10:46 AM Page 295 Figure 10-23: The online auction house OLTP database model normalized to 3NF. What can and should be denormalized in the database model shown in Figure 10-23? ❑ The three category tables should be merged into a single self-joining table. Not only does this make management of categories easier, it also allows any number of layers in the category hierarchy, rather than restricting to the three of primary, secondary, and tertiary categories. ❑ Seller and buyer histories could benefit by being a single table, not only because fields are the same but also because a seller can also be a buyer and visa versa. Merging the two tables could make group search of historical information a little slower; however, proper indexing might even improve performance in general (for all applications). Also, because buyers can be sellers, and sellers can be buyers, it makes no logical sense to store historical records in two separate tables. If sellers and buyers are merged, it might be expedient to remove fields exclusive to the SELLER table, into a 4NF, one-to-one subset table, to remove NULL values from the merged table. These fields are the RETURN_POLICY, INTERNATIONAL, and the PAYMENT_METHODS fields. ❑ Depending on the relative numbers of buyers, sellers, and buyer-sellers (those who do both buying and selling), it might be expedient to even merge the sellers and buyers into a single table, as well as merging histories. Once again, fields are largely the same. The number of buyer-sellers in operation might preempt the merge as well. The resulting OLTP database model could look similar to that shown in Figure 10-24. Seller_History Seller seller_id seller popularity_rating join_date address return_policy international payment_methods seller_history_id buyer_id (FK) seller_id (FK) comment_date comments Buyer_History buyer_history_id seller_id (FK) buyer_id (FK) comment_date comments Buyer buyer_id buyer popularity_rating join_date address Category_Primary primary_id primary secondary Category_Secondary secondary_id primary_id (FK) tertiary Category_Tertiary tertiary_id secondary_id (FK) Listing tertiary_id (FK) secondary_id (FK) buyer_id (FK) seller_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids winning_price Bid buyer_id (FK) listing# (FK) bid_price bid_date listing# 296 Chapter 10 16_574906 ch10.qxd 11/4/05 10:46 AM Page 296 Figure 10-24: Denormalizing the online auction house OLTP database model. Denormalization is, in general, far more significant for data warehouse database models than it is for OLTP database models. One of the problems with predicting what and how to denormalize is that in the analysis and design phases of database modeling and design, denormalization is a little like a Shakespearian undiscovered country. If you don’t denormalize beyond 3NF, your system design could meet its maker. And then if you do denormalize an OLTP database model, you could kill the simplicity of the very structure you have just created. In general, denormalization is not quantifiable because no one has really thought up a formal approach for it, like many have devised for normalization. Denormalization, therefore, might be somewhat akin to guesswork. Guesswork is always dangerous, but if analysis is all about expert subconscious knowledge through experience, don’t let the lack of formal methods in denormalization scare you away from it. The biggest problem with denormalization is that it requires extensive application knowledge. Typically, this kind of foresight is available only when a system has been analyzed, designed, implemented, and placed into production. Generally, when in production, any further database modeling changes are not possible. So, when hoping to denormalize a database model for efficiency and ease of use by developers, History User user_id name popularity_rating join_date address Seller user_id (FK) return_policy international payment_methods user_history_id user_id (FK) comment_date comments category Category category_id parent_id Listing category_id (FK) user_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids winning_price Bid listing# (FK) user_id (FK) bid_price bid_date listing# 297 Creating and Refining Tables During the Design Phase 16_574906 ch10.qxd 11/4/05 10:46 AM Page 297 298 Chapter 10 try to learn as much about how applications use tables, in terms of record quantities, how many records are accessed at once on GUI screens, how large reports will be, and so on. And do that learning process as part of analysis and design. It might be impossible to rectify in production and even in development. Denormalization requires as much applications knowledge as possible. Example Application Queries The following state the obvious: ❑ The database model is the backbone of any application that uses data of any kind. That data is most likely stored in some kind of database. That database is likely to be a relational database of one form or another. ❑ Better designed database models tend to lend themselves to clearer and easier construction of SQL code queries. The ease of construction of, and the ultimate performance of queries, depends largely on the soundness of the underlying database model. The database model is the backbone of applications. The better the database model design, the better queries are produced, the better applications will ultimately be and the happier your end-users will be. A good application often easily built by programmers is often not also easily usable by end-users. Similar to database modelers, programmers often write code for themselves, in an elegant fashion. Elegant solutions are not always going to produce the most end-user happy-smiley face result. Applications must run fast enough. Applications must not encourage end-users to become frustrated. Do not let elegant modeling and coding ultimately drive away your customers. No customer — no business. No business — no company. No company — no job! And, if your end-user happens to be your boss, well, you know the rest. So, you must be able to build good queries. The soundness of those queries, and ultimately applications, are dependent upon the soundness of the underlying database model. A highly normalized database model is likely to be unsound because there are too many tables, too much complexity, and too many tables in joins. Lots of tables and lots of complex inter-table relationships confuse people, especially the query programmers. Denormalize for successful applications. And preferably perform denormalization of database models in the analysis and design phases, not after the fact in production. Changing database model structure for production systems is generally problematic, extremely expensive, and disruptive to end-users (applications go down for maintenance). After all, the objective is to turn a profit. This means keeping your end-users interested. If the database is an in-house thing, you need to keep your job. Denormalize, denormalize, denormalize! Once again, the efficiency of queries comes down to how many tables are joined in a single query. Figure 10-23 shows the original normalized OLTP database model for the online auction house. In Figure 10-24, the following denormalization has occurred: ❑ Categories — Categories were denormalized from three tables down to a single table. A query against the three category tables would look similar to this: SELECT * FROM CATEGORY_PRIMARY CP JOIN CATEGORY_SECONDARY CS USING (PRIMARY_ID) JOIN CATEGORY_TERTIARY CT USING (SECONDARY_ID); 16_574906 ch10.qxd 11/4/05 10:46 AM Page 298 299 Creating and Refining Tables During the Design Phase A query against the single category table could be constructed as follows: SELECT * FROM CATEGORY; If the single category table was required to display a hierarchy, a self join could be used (some database engines have special syntax for single-table hierarchical queries): SELECT P.CATEGORY, C.CATEGORY FROM CATEGORY P JOIN CATEGORY C ON(P.CATEGORY_ID = C.CATEGORY_ID) ORDER BY P.CATEGORY, C.CATEGORY; Denormalizing categories in this way is probably a very sensible idea for the OLTP database model of the online auction house. ❑ Users — Sellers and buyers were partially denormalized into users, where 4NF normalization was used to separate seller details from buyers. Using the normalized database model in Figure 10-23 to find all listings for a specific seller, the following query applies (joining two tables and applying a WHERE clause to the SELLER table): SELECT * FROM SELLER S JOIN LISTING L USING (SELLER_ID) WHERE S.SELLER = “Joe Soap”; Once again, using the normalized database model in Figure 10-23, the following query finds all existing bids, on all listings, for a particular buyer (joining three tables and applying a WHERE clause to the BUYER table): SELECT * FROM LISTING L JOIN BID BID USING (LISTING#) JOIN BUYER B USING (BUYER_ID) WHERE B.BUYER = “Jim Smith”; Using the denormalized database model in Figure 10-24, this query finds all listings for a spe- cific seller (the SELLER and USER tables are actually normalized): SELECT * FROM USER U JOIN SELLER S USING (SELLER_ID) JOIN LISTING L USING (USER_ID) WHERE U.NAME = “Joe Soap”; This query is actually worse for the denormalized database model because it joins three tables instead of two. And again, using the denormalized database model in Figure 10-24, the follow- ing query finds all existing bids on all listings for a particular buyer: SELECT * FROM LISTING L JOIN BID BID USING (LISTING#) JOIN USER U USING (USER_ID) WHERE U.NAME = “Jim Smith” AND U.USER_ID NOT IN (SELECT USER_ID FROM SELLER); 16_574906 ch10.qxd 11/4/05 10:46 AM Page 299 300 Chapter 10 This query is also worse for the denormalized version because not only does it join three tables, but additionally performs a semi-join (and an anti semi-join at that). An anti semi-join is a nega- tive search. A negative search tries to find what is not in a table, and therefore must read all records in that table. Indexes can’t be used at all and, thus, a full table scan results. Full table scans can be I/O heavy for larger tables. It should be clear to conclude that denormalizing the BUYER and SELLER tables into the USER and normalized SELLER tables (as shown in Figure 10-24) is probably quite a bad idea! At least it appears that way from the perspective of query use; however, an extra field could be added to the USER table to dissimilate between users and buyers, in relation to bids and listings (a person performing both buying and selling will appear in both buyer and seller data sets). The extra field could be used as a base for very efficient indexing or even something as advanced as parti- tioning. Partitioning physically breaks tables into separate physical chunks. If the USER table were partitioned between users and sellers, reading only sellers from the USER table would only perform I/O against a partition containing sellers (not buyers). It is still not really very sensible to denormalize the BUYER and SELLER table into the USER table. ❑ Histories — The two history tables were denormalized into a single table, as shown in Figure 10-24. Executing a query using the normalized database model in Figure 10-23 to find the history for a specific seller, could be performed using a query like the following: SELECT * FROM SELLER S JOIN SELLER_HISTORY SH USING (SELLER_ID) WHERE S.SELLER = “Joe Soap”; Finding a history for a specific seller using the denormalized database model shown in Figure 10-24 could use a query like this: SELECT * FROM USER U JOIN HISTORY H (USER_ID) WHERE U.NAME = “Joe Soap” AND U.USER_ID IN (SELECT USER_ID FROM SELLER); Once again, as with denormalization of SELLER and BUYER tables into the USER table, denormal- izing the SELLER_HISTORY and BUYER HISTORY tables into the HISTORY table, might actually be a bad idea. The first query above joins two tables. The second query also joins two tables, but also executes a semi-join. This semi-join is not as bad as for denormalization of users, which used an anti semi-join; however, this is still effectively a three-way join. So, you have discovered that perhaps the most effective, descriptive, and potentially efficient database model for the OLTP online auction house is as shown in Figure 10-25. The only denormalization making sense at this stage is to merge the three separate category hierarchy tables into the single self-joining CATEGORY table. Buyer, seller, and history information is probably best left in separate tables. 16_574906 ch10.qxd 11/4/05 10:46 AM Page 300 Denormalization is rarely effective for OLTP database models for anything between 1NF and 3NF; however (and this very important), remember that previously in this chapter you read about layers of normalization beyond 3NF (BCNF, 4NF, 5NF and DKNF). None of these intensive Normal Forms have so far been applied to the OLTP database model for the online auction house. As of Figure 10-23, you began to attempt to backtrack on previously performed normalization, by denormalizing. You began with the 3NF database model as shown in Figure 10-23. In other words, any normalization beyond 3NF was simply ignored, having already been proved to be completely superfluous and over the top for this particular database model. Figure 10-25: The online auction house OLTP database model, 3NF, partially denormalized. The only obvious issue still with the database model as shown in Figure 10-25 is that the BUYER_HIS- TORY and SELLER_HISTORY tables have both BUYER_ID and SELLER_ID fields. In other words, both his- tory tables are linked (related) to both of the BUYER and SELLER tables. It therefore could make perfect sense to denormalize not only the category tables, but the history tables as well, leave BUYER and SELLER tables normalized, and separate, as shown in Figure 10-26. Seller_History Seller seller_id seller popularity_rating join_date address return_policy international payment_methods seller_history_id buyer_id (FK) seller_id (FK) comment_date comments Buyer_History buyer_history_id seller_id (FK) buyer_id (FK) comment_date comments Buyer buyer_id buyer popularity_rating join_date address category Category category_id parent_id Listing category_id (FK) buyer_id (FK) seller_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids winning_price Bid bidder_id (FK) listing# (FK) bid_price bid_date listing# 301 Creating and Refining Tables During the Design Phase 16_574906 ch10.qxd 11/4/05 10:46 AM Page 301 302 Chapter 10 Figure 10-26: The online auction house OLTP database model, 3NF, slightly further denormalized. The newly denormalized HISTORY table can be accessed efficiently by splitting the history records based on buyers and sellers, using indexing or something hairy fairy and sophisticated like physical partitioning. Try It Out Designing an OLTP Database Model Create a simple design level OLTP database model for a Web site. This Web site allows creation of free classified ads for musicians and bands. Use the simple OLTP database model presented in Figure 10-27 (copied from Figure 9-19, in Chapter 9). Here’s a basic approach: 1. Create surrogate primary keys for all tables. 2. Enforce referential integrity using appropriate primary keys, foreign keys, and inter-table relationships. 3. Refine inter-table relationships properly, according to requirements, as identifying, non-identifying relationships, and also be precise about whether each crow’s foot allows zero. 4. Normalize as much as possible. 5. Denormalize for usability and performance. History Seller seller_id seller popularity_rating join_date address return_policy international payment_methods history_id seller_id (FK) buyer_id (FK) comment_date comments Buyer buyer_id buyer popularity_rating join_date address category Category category_id parent_id Listing category_id (FK) buyer_id (FK) seller_id (FK) description image start_date listing_days currency starting_price reserve_price buy_now_price number_of_bids winning_price Bid bidder_id (FK) listing# (FK) bid_price bid_date listing# 16_574906 ch10.qxd 11/4/05 10:46 AM Page 302 Figure 10-27: Musicians, bands, their online advertisements and some other goodies. How It Works Figure 10-27 shows the analyzed OLTP database model database model, for online musician and band advertisements. The database model in Figure 10-26 has the following basic requirements: ❑ Musicians can play multiple instruments. ❑ Musicians can be multi-skilled. ❑ A band can have multiple genres. ❑ The MEMBERS field in the band table takes into account the one-to-many relationship between BAND and MUSICIAN. In other words, there is more than one musician in a band (usually); how- ever, a musician doesn’t necessarily have to be in a band, a band may be broken up and have no musicians, and both bands and musicians can advertise. ❑ Musicians and bands can place advertisements. ❑ Bands and musicians perform shows. ❑ Bands and musicians sell merchandise, typically online. ❑ Discography contains all released CDs (albums). Instrument instrument Skill skill Genre genre Musician name phone email Band name members founding_date Discography cd_name release_date price Merchandise type price Shows location address directions phone show_date show_times Advertisement ad_date ad_text phone email requirements 303 Creating and Refining Tables During the Design Phase 16_574906 ch10.qxd 11/4/05 10:46 AM Page 303 304 Chapter 10 Figure 10-28 shows the database model in Figure 10-27, with all primary keys as surrogate keys. The SHOWS table is renamed to SHOW (each record equals one show). Figure 10-28: Primary keys as surrogate keys. Instrument instrument instrument_id Skill skill skill_id Genre genre genre_id Musician band_id (FK) name phone email musician_id Band name members founding_date band_id Discography band_id (FK) cd_name release_date price discography_id Merchandise band_id type price merchandise_id Show band_id (FK) location address directions phone show_date show_times show_id Advertisement band_id (FK) musician_id (FK) ad_date ad_text phone email requirements advertisement_id 16_574906 ch10.qxd 11/4/05 10:46 AM Page 304 305 Creating and Refining Tables During the Design Phase Figure 10-29 enforces referential integrity properly, between all primary and foreign keys, for all tables. This stage limited the many-to-many relationships between INSTRUMENT and MUSICIAN, SKILL and MUSICIAN, and GENRE and MUSICIAN. Figure 10-29: Enforcing referential integrity between all primary key and foreign keys. Instrument section_id instrument instrument_id Skill musician_id (FK) skill skill_id Genre parent_id genre genre_id Musician instrument_id (FK) band_id (FK) name phone email musician_id Band genre_id (FK) name members founding_date band_id Discography band_id (FK) cd_name release_date price discography_id Merchandise band_id (FK) type price merchandise_id Show band_id (FK) location address directions phone show_date show_times show_id Advertisement band_id (FK) musician_id (FK) ad_date ad_text phone email requirements advertisement_id 16_574906 ch10.qxd 11/4/05 10:46 AM Page 305 306 Chapter 10 Figure 10-30 refines relationships between tables as identifying, non-identifying, and NULL valued. The INSTRUMENT table contains a hierarchy, with instruments divided into sections (such as a strings section, or a percussion section). The GENRE table also contains a hierarchy of genres and sub-genres. For exam- ple, hard rock music is part of rock music, and rock music is part of modern music. Hard rock music could contain sub-genres as well, such as alternative rock. The INSTRUMENT and GENRE table has self join or fishhook join relationships added, to represent the hierarchies. A musician must play at least one instrument (voice is an instrument). Skills cannot exist without a musician. Figure 10-30: Refining relationships as identifying, non-identifying, and NULL valued. Instrument section_id (FK) instrument instrument_id Skill musician_id (FK) skill Genre parent_id (FK) genre genre_id Musician instrument_id (FK) band_id (FK) name phone email musician_id Band genre_id (FK) name members founding_date band_id Discography band_id (FK) cd_name release_date price discography_id Merchandise band_id (FK) type price merchandise_id Show band_id (FK) location address directions phone show_date show_times show_id Advertisement band_id (FK) musician_id (FK) ad_date ad_text phone email requirements advertisement_id 16_574906 ch10.qxd 11/4/05 10:46 AM Page 306 307 Creating and Refining Tables During the Design Phase Figure 10-31 refines the database model with normalization. This is about as far as this database model can be normalized. The INSTRUMENT and GENRE tables could be normalized if the number of layers in the hierarchies of the two tables is known. For the purposes of this example, it is assumed this is not the case. Some fields have been removed from various tables for being inappropriate. For example, the PHONE, EMAIL, and REQUIREMENTS fields have been removed from the ADVERTISEMENT table, assuming they are included in the AD_TEXT field. Some fields have been renamed. Figure 10-31: Refining the database model with normalization. Skill musician_id (FK) skill Musician_Phone musician_id (FK) phone Venue_Phone venue_id (FK) phone Venue_Directions venue_id (FK) directions Venue_Address venue_id (FK) address Musician_Name musician_id (FK) musician Band_Name band_id (FK) band Musician musician_id instrument_id (FK) band_id (FK) Band band_id genre_id (FK) founding_date Instrument instrument_id section_id (FK) instrument Genre genre_id parent_id (FK) genre Venue venue_id venue Type type_id description Advertisement advertisement_id band_id (FK) musician_id (FK) ad_date ad_text Discography discography_id band_id (FK) cd_name release_date price Merchandise merchandise_id type_id (FK) band_id (FK) price Show show_id venue_id (FK) band_id (FK) date time Musician_Email musician_id (FK) email 16_574906 ch10.qxd 11/4/05 10:46 AM Page 307 Figure 10-32 refines the database model with denormalization. All the nasty detailed Normal Forms are removed. The VENUE is retained since venues are static (dimensional) and shows are dynamic (fact data). Figure 10-32: Refining the database model with denormalization. Case Study: Refining a Data Warehouse Database Model Figure 10-12 shows the most recent version of the data warehouse database model for the online auction house. From an operational perspective, you identified categories, sellers, buyers, locations, times, and a large fact table. Where do you go from here? What you need to do is probably to normalize the fact table somewhat, into separate fact tables. Separate fact tables become separate star schemas, and thus separate data marts. Figure 10-33 shows an initial split of facts into three separate fact tables (see Figure 10-12). Instrument section_id (FK) instrument instrument_id Genre parent_id (FK) genre genre_id Musician instrument_id (FK) band_id (FK) musician phone email skills musician_id Band genre_id (FK) band founding_date band_id Discography band_id (FK) cd_name release_date price discography_id Merchandise band_id (FK) type price merchandise_id Venue location address directions phone venue_id Show venue_id (FK) band_id (FK) date time show_id Advertisement band_id (FK) musician_id (FK) ad_date ad_text advertisement_id 308 Chapter 10 16_574906 ch10.qxd 11/4/05 10:46 AM Page 308 O ne -to -m an y re la tio ns hi p Li st in g Fa ct s Li st in g Fa ct s S el le r se lle r_ id se lle r po pu la rit y_ ra tin g jo in _d at e ad dr es s re tu rn _p ol ic y in te rn at io na l pa ym en t_ m et ho ds Lo ca tio n lo ca tio n_ id re gi on co un tr y st at e ci ty Ti m e tim e_ id m on th qu ar te r ye ar C at eg or y_ H ie ra rc hy ca te go ry _i d pa re nt _i d (F K ) ca te go ry Li st in g fa ct _i d tim e_ id ( FK ) bu ye r_ id ( FK ) lo ca tio n_ id ( FK ) se lle r_ id ( FK ) ca te go ry _i d (F K ) lis tin g# lis tin g_ de sc rip tio n lis tin g_ im ag e lis tin g_ st ar t_ da te lis tin g_ da ys lis tin g_ cu rr en cy lis tin g_ st ar tin g_ pr ic e lis tin g_ re se r v e_ pr ic e lis tin g_ bu y_ no w _p ric e lis tin g_ nu m be r_ of _b id s lis tin g_ w in ni ng _p ric e lis tin g_ w in ne r_ bu ye r B uy er bu ye r_ id bu ye r po pu la rit y_ ra tin g jo in _d at e ad dr es s S el le r se lle r_ id se lle r po pu la rit y_ ra tin g jo in _d at e ad dr es s re tu rn _p ol ic y in te rn at io na l pa ym en t_ m et ho ds Lo ca tio n lo ca tio n_ id re gi on co un tr y st at e ci ty Ti m e tim e_ id m on th qu ar te r ye ar C at eg or y_ H ie ra rc hy ca te go ry _i d pa re n t _i d (F K ) ca te go ry B id fa ct _i d tim e_ id ( FK ) bu ye r_ id ( FK ) lo ca tio n_ id ( FK ) se lle r_ id ( FK ) ca te go ry _i d (F K ) bi dd er bi dd er _p ric e bi dd er _d at e B uy er bu ye r_ id bu ye r po pu la rit y_ ra tin g jo in _d at e ad dr es s B id F ac ts B id F ac ts S el le r se lle r_ id se lle r po pu la rit y_ ra tin g jo in _d at e ad dr es s re tu rn _p ol ic y in te rn at io na l pa ym en t_ m et ho ds Lo ca tio n lo ca tio n_ id re gi on co un tr y st at e ci ty Ti m e tim e_ id m on th qu ar te r ye ar C at eg or y_ H ie ra rc hy ca te go ry _i d pa re nt _i d (F K ) ca te go ry H is to ry fa ct _i d tim e_ id ( FK ) bu ye r_ id ( FK ) lo ca tio n_ id ( FK ) se lle r_ id ( FK ) ca te go ry _i d (F K ) hi st or y_ bu ye r hi st or y_ bu ye r_ co m m en t_ da te hi st or y_ bu ye r_ co m m en ts hi st or y_ se lle r hi st or y_ se lle r_ co m m en t_ da te hi st or y_ se lle r_ co m m en ts B uy er bu ye r_ id bu ye r po pu la rit y_ ra tin g jo in _d at e ad dr es s H is to ry F ac ts H is to ry F ac ts H is to rie s pe rt ai n to se lle r s a nd b uy er s on ly Fi gu re 1 0 -3 3 : D iv id in g th e da ta w ar eh ou se d at ab as e m od el in to s ep ar at e fa ct s. 16_574906 ch10.qxd 11/4/05 10:46 AM Page 309 The problem as shown in Figure 10-33 is that there is essentially an operational requirement for a one-to- many (master-detail) relationship between listing and bid facts. Why? Think about how this data would be used. If you wanted to track listing information alone, then there is no problem. If you wanted to ana- lyze the pattern of bids against particular types of listings, then that LISTING to BID relationship would be required. Establishing a relationship between multiple fact tables causes serious problems. The reason why goes back to the existence of the fact-dimensional data warehouse database model itself. Data warehouse database models were devised to split very small tables, linked in a single-layer hierarchy of dimensions (a star schema), all linked to a single fact table. Fact tables are very large. The most efficient types of query joins are those between one or more small tables (the dimensions) and only a single large table. That great big humungous table is the fact table. And there should also be only one fact table in a star schema. Never relate or join fact tables. You might wait a week for queries to execute! Linking more than one fact table together results in a join between two very large tables, which can be frighteningly inefficient — defeating the very existence of the fact-dimensional data warehouse database model. Don’t do it! A solution is to merge and denormalize the fact tables as shown in Figure 10-34. The HISTORY fact table is not a problem because histories apply to sellers and buyers. In the real world, histories are used by sellers and buyers to assess whether they are dealing with an honest trader or a complete shyster! Figure 10-34: Reducing the three fact tables in Figure 10-33 to two fact tables, based on operational requirements. Listing-Bid FactsListing-Bid Facts Seller seller_id seller popularity_rating join_date address return_policy international payment_methods Location location_id region country state city Time time_id month quarter year Category_Hierarchy category_id parent_id (FK) category Listing_Bids_History fact_id time_id (FK) buyer_id (FK) location_id (FK) seller_id (FK) category_id (FK) listing# listing_description listing_image listing_start_date listing_days listing_currency listing_starting_price listing_reserve_price listing_buy_now_price listing_number_of_bids listing_winning_price listing_winner_buyer bidder bidder_price bidder_date Buyer buyer_id buyer popularity_rating join_date address Seller seller_id seller popularity_rating join_date address return_policy international payment_methods Location location_id region country state city Time time_id month quarter year Category_Hierarchy category_id parent_id (FK) category History fact_id time_id (FK) buyer_id (FK) location_id (FK) seller_id (FK) category_id (FK) history_buyer history_buyer_comment_date history_buyer_comments history_seller history_seller_comment_date history_seller_comments Buyer buyer_id buyer popularity_rating join_date address History FactsHistory Facts Combining the listing and bid facts eliminates the inappropriate relationship 310 Chapter 10 16_574906 ch10.qxd 11/4/05 10:46 AM Page 310 Creating and Refining Tables During the Design Phase A more easily readable form of the star schemas shown in Figure 10-34 is shown in Figure 10-35. If you go back briefly to Chapter 9 and examine Figure 9-23, you can see that this structure was already devised, even in the analysis stage. Figure 10-35: The data warehouse fact table star schemas, for the online auction house. Category Hierarchy Time Seller Location Buyer Listing -Bids Listing -Bids Buyer Category Hierarchy Time Seller Location HistoryHistory 311 16_574906 ch10.qxd 11/4/05 10:46 AM Page 311 312 Chapter 10 Essentially, Figure 10-34 and Figure 10-35 represent the best, most effective, most easily understandable, and usable database model for the online auction house data warehouse. It is possible to normalize further by normalizing the heck out of the dimensions — just don’t normalize the facts. Normalizing facts (other than making operational fact table splits into multiple star schemas, as shown in Figure 10-34) defeats the purpose of the data warehouse dimensional-fact model. Figure 10-36 shows an ERD of the HISTORY fact table snowflake schema, with dimensions normalized up to gazoo! A snowflake schema is a star schema, where dimensions have been normalized. There is no need to detail query examples for the data warehouse database model, as the same concepts apply for SQL coding of query joins for both OLTP and data warehouse databases: ❑ The fewer tables in a join, the better. ❑ It is more efficient to join between a small table and a large table, compared with equally sized tables. Obviously, joining two small tables is most efficient because there isn’t much data (which should be logical by now). Using the snowflake schema shown in Figure 10-36 is not only completely nuts, it will also drive your programmers completely nuts trying to figure it all out. And end-users simply won’t know what the heck is going on in there. End-users, typically in-house type end-users, often write (or at least specify) data warehouse reporting requirements. The obsessively over-normalized data warehouse database model shown in Figure 10-36 is quite simply impractical. The end-users it will probably think it is just scary, and they will probably avoid it. Try It Out Designing a Data Warehouse Database Model Create a simple design level data warehouse database model, for a Web site. This Web site allows creation of free classified ads for musicians and bands. Use the not-so-well-refined data warehouse database model presented in Figure 10-37 (copied from Figure 9-29, in Chapter 9). Here’s a basic approach: 1. Refine dimensions and facts, making sure that dimensions are dimensions and facts are facts. 2. Divide facts into multiple star schemas if multiple, unrelated fact sets exist. 3. Normalize dimensions into a snowflake schema. This can help to identify and quantify dimensions more precisely. 4. Denormalize dimensions into a star schema. The primary purpose of the data warehouse fact-dimensional model is to allow the fastest possible join method between two tables: one large table, and one or more very small tables. 16_574906 ch10.qxd 11/4/05 10:46 AM Page 312 cover.pdf page_c2.pdf page_r01.pdf page_r02.pdf page_r03.pdf page_r04.pdf page_r05.pdf page_r06.pdf page_r07.pdf page_r08.pdf page_r09.pdf page_r10.pdf page_r11.pdf page_r12.pdf page_r13.pdf page_r14.pdf page_r15.pdf page_r16.pdf page_r17.pdf page_r18.pdf page_r19.pdf page_r20.pdf page_r21.pdf page_r22.pdf page_r23.pdf page_r24.pdf page_r25.pdf page_r26.pdf page_z0001.pdf page_z0002.pdf page_z0003.pdf page_z0004.pdf page_z0005.pdf page_z0006.pdf page_z0007.pdf page_z0008.pdf page_z0009.pdf page_z0010.pdf page_z0011.pdf page_z0012.pdf page_z0013.pdf page_z0014.pdf page_z0015.pdf page_z0016.pdf page_z0017.pdf page_z0018.pdf page_z0019.pdf page_z0020.pdf page_z0021.pdf page_z0022.pdf page_z0023.pdf page_z0024.pdf page_z0025.pdf page_z0026.pdf page_z0027.pdf page_z0028.pdf page_z0029.pdf page_z0030.pdf page_z0031.pdf page_z0032.pdf page_z0033.pdf page_z0034.pdf page_z0035.pdf page_z0036.pdf page_z0037.pdf page_z0038.pdf page_z0039.pdf page_z0040.pdf page_z0041.pdf page_z0042.pdf page_z0043.pdf page_z0044.pdf page_z0045.pdf page_z0046.pdf page_z0047.pdf page_z0048.pdf page_z0049.pdf page_z0050.pdf page_z0051.pdf page_z0052.pdf page_z0053.pdf page_z0054.pdf page_z0055.pdf page_z0056.pdf page_z0057.pdf page_z0058.pdf page_z0059.pdf page_z0060.pdf page_z0061.pdf page_z0062.pdf page_z0063.pdf page_z0064.pdf page_z0065.pdf page_z0066.pdf page_z0067.pdf page_z0068.pdf page_z0069.pdf page_z0070.pdf page_z0071.pdf page_z0072.pdf page_z0073.pdf page_z0074.pdf page_z0075.pdf page_z0076.pdf page_z0077.pdf page_z0078.pdf page_z0079.pdf page_z0080.pdf page_z0081.pdf page_z0082.pdf page_z0083.pdf page_z0084.pdf page_z0085.pdf page_z0086.pdf page_z0087.pdf page_z0088.pdf page_z0089.pdf page_z0090.pdf page_z0091.pdf page_z0092.pdf page_z0093.pdf page_z0094.pdf page_z0095.pdf page_z0096.pdf page_z0097.pdf page_z0098.pdf page_z0099.pdf page_z0100.pdf page_z0101.pdf page_z0102.pdf page_z0103.pdf page_z0104.pdf page_z0105.pdf page_z0106.pdf page_z0107.pdf page_z0108.pdf page_z0109.pdf page_z0110.pdf page_z0111.pdf page_z0112.pdf page_z0113.pdf page_z0114.pdf page_z0115.pdf page_z0116.pdf page_z0117.pdf page_z0118.pdf page_z0119.pdf page_z0120.pdf page_z0121.pdf page_z0122.pdf page_z0123.pdf page_z0124.pdf page_z0125.pdf page_z0126.pdf page_z0127.pdf page_z0128.pdf page_z0129.pdf page_z0130.pdf page_z0131.pdf page_z0132.pdf page_z0133.pdf page_z0134.pdf page_z0135.pdf page_z0136.pdf page_z0137.pdf page_z0138.pdf page_z0139.pdf page_z0140.pdf page_z0141.pdf page_z0142.pdf page_z0143.pdf page_z0144.pdf page_z0145.pdf page_z0146.pdf page_z0147.pdf page_z0148.pdf page_z0149.pdf page_z0150.pdf page_z0151.pdf page_z0152.pdf page_z0153.pdf page_z0154.pdf page_z0155.pdf page_z0156.pdf page_z0157.pdf page_z0158.pdf page_z0159.pdf page_z0160.pdf page_z0161.pdf page_z0162.pdf page_z0163.pdf page_z0164.pdf page_z0165.pdf page_z0166.pdf page_z0167.pdf page_z0168.pdf page_z0169.pdf page_z0170.pdf page_z0171.pdf page_z0172.pdf page_z0173.pdf page_z0174.pdf page_z0175.pdf page_z0176.pdf page_z0177.pdf page_z0178.pdf page_z0179.pdf page_z0180.pdf page_z0181.pdf page_z0182.pdf page_z0183.pdf page_z0184.pdf page_z0185.pdf page_z0186.pdf page_z0187.pdf page_z0188.pdf page_z0189.pdf page_z0190.pdf page_z0191.pdf page_z0192.pdf page_z0193.pdf page_z0194.pdf page_z0195.pdf page_z0196.pdf page_z0197.pdf page_z0198.pdf page_z0199.pdf page_z0200.pdf page_z0201.pdf page_z0202.pdf page_z0203.pdf page_z0204.pdf page_z0205.pdf page_z0206.pdf page_z0207.pdf page_z0208.pdf page_z0209.pdf page_z0210.pdf page_z0211.pdf page_z0212.pdf page_z0213.pdf page_z0214.pdf page_z0215.pdf page_z0216.pdf page_z0217.pdf page_z0218.pdf page_z0219.pdf page_z0220.pdf page_z0221.pdf page_z0222.pdf page_z0223.pdf page_z0224.pdf page_z0225.pdf page_z0226.pdf page_z0227.pdf page_z0228.pdf page_z0229.pdf page_z0230.pdf page_z0231.pdf page_z0232.pdf page_z0233.pdf page_z0234.pdf page_z0235.pdf page_z0236.pdf page_z0237.pdf page_z0238.pdf page_z0239.pdf page_z0240.pdf page_z0241.pdf page_z0242.pdf page_z0243.pdf page_z0244.pdf page_z0245.pdf page_z0246.pdf page_z0247.pdf page_z0248.pdf page_z0249.pdf page_z0250.pdf page_z0251.pdf page_z0252.pdf page_z0253.pdf page_z0254.pdf page_z0255.pdf page_z0256.pdf page_z0257.pdf page_z0258.pdf page_z0259.pdf page_z0260.pdf page_z0261.pdf page_z0262.pdf page_z0263.pdf page_z0264.pdf page_z0265.pdf page_z0266.pdf page_z0267.pdf page_z0268.pdf page_z0269.pdf page_z0270.pdf page_z0271.pdf page_z0272.pdf page_z0273.pdf page_z0274.pdf page_z0275.pdf page_z0276.pdf page_z0277.pdf page_z0278.pdf page_z0279.pdf page_z0280.pdf page_z0281.pdf page_z0282.pdf page_z0283.pdf page_z0284.pdf page_z0285.pdf page_z0286.pdf page_z0287.pdf page_z0288.pdf page_z0289.pdf page_z0290.pdf page_z0291.pdf page_z0292.pdf page_z0293.pdf page_z0294.pdf page_z0295.pdf page_z0296.pdf page_z0297.pdf page_z0298.pdf page_z0299.pdf page_z0300.pdf page_z0301.pdf page_z0302.pdf page_z0303.pdf page_z0304.pdf page_z0305.pdf page_z0306.pdf page_z0307.pdf page_z0308.pdf page_z0309.pdf page_z0310.pdf page_z0311.pdf page_z0312.pdf page_z0313.pdf page_z0314.pdf page_z0315.pdf page_z0316.pdf page_z0317.pdf page_z0318.pdf page_z0319.pdf page_z0320.pdf page_z0321.pdf page_z0322.pdf page_z0323.pdf page_z0324.pdf page_z0325.pdf page_z0326.pdf page_z0327.pdf page_z0328.pdf page_z0329.pdf page_z0330.pdf page_z0331.pdf page_z0332.pdf page_z0333.pdf page_z0334.pdf page_z0335.pdf page_z0336.pdf page_z0337.pdf page_z0338.pdf page_z0339.pdf page_z0340.pdf page_z0341.pdf page_z0342.pdf page_z0343.pdf page_z0344.pdf page_z0345.pdf page_z0346.pdf page_z0347.pdf page_z0348.pdf page_z0349.pdf page_z0350.pdf page_z0351.pdf page_z0352.pdf page_z0353.pdf page_z0354.pdf page_z0355.pdf page_z0356.pdf page_z0357.pdf page_z0358.pdf page_z0359.pdf page_z0360.pdf page_z0361.pdf page_z0362.pdf page_z0363.pdf page_z0364.pdf page_z0365.pdf page_z0366.pdf page_z0367.pdf page_z0368.pdf page_z0369.pdf page_z0370.pdf page_z0371.pdf page_z0372.pdf page_z0373.pdf page_z0374.pdf page_z0375.pdf page_z0376.pdf page_z0377.pdf page_z0378.pdf page_z0379.pdf page_z0380.pdf page_z0381.pdf page_z0382.pdf page_z0383.pdf page_z0384.pdf page_z0385.pdf page_z0386.pdf page_z0387.pdf page_z0388.pdf page_z0389.pdf page_z0390.pdf page_z0391.pdf page_z0392.pdf page_z0393.pdf page_z0394.pdf page_z0395.pdf page_z0396.pdf page_z0397.pdf page_z0398.pdf page_z0399.pdf page_z0400.pdf page_z0401.pdf page_z0402.pdf page_z0403.pdf page_z0404.pdf page_z0405.pdf page_z0406.pdf page_z0407.pdf page_z0408.pdf page_z0409.pdf page_z0410.pdf page_z0411.pdf page_z0412.pdf page_z0413.pdf page_z0414.pdf page_z0415.pdf page_z0416.pdf page_z0417.pdf page_z0418.pdf page_z0419.pdf page_z0420.pdf page_z0421.pdf page_z0422.pdf page_z0423.pdf page_z0424.pdf page_z0425.pdf page_z0426.pdf page_z0427.pdf page_z0428.pdf page_z0429.pdf page_z0430.pdf page_z0431.pdf page_z0432.pdf page_z0433.pdf page_z0434.pdf page_z0435.pdf page_z0436.pdf page_z0437.pdf page_z0438.pdf page_z0439.pdf page_z0440.pdf page_z0441.pdf page_z0442.pdf page_z0443.pdf page_z0444.pdf page_z0445.pdf page_z0446.pdf page_z0447.pdf page_z0448.pdf page_z0449.pdf page_z0450.pdf page_z0451.pdf page_z0452.pdf page_z0453.pdf page_z0454.pdf page_z0455.pdf page_z0456.pdf page_z0457.pdf page_z0458.pdf page_z0459.pdf page_z0460.pdf page_z0461.pdf page_z0462.pdf page_z0463.pdf page_z0464.pdf page_z0465.pdf page_z0466.pdf page_z0467.pdf page_z0468.pdf