Beginning Database Design- P17

Beginning Database Design- P17

Thể loại: Cơ sở dữ liệu
Lượt xem: 13,254Lượt tải: 2Số trang: 20

Mô tả tài liệu

Beginning Database Design- P17:This book focuses on the relational database model from a beginning perspective. The title is, therefore, Beginning Database Design. A database is a repository for data. In other words, you can store lots of information in a database. A relational database is a special type of database using structures called tables. Tables are linked together using what are called relationships. You can build tables with relationships between those tables, not only to organize your data, but also to allow later retrieval of information from the database....

Tóm tắt nội dung

and Refining Tables During the Design Phase ❑ Western Union ❑ Cash ❑ Visa ❑ American the field for a specific listing could be something like Check, Western Union, Visa, string is a list. A list is by a set. set is thus a set, or a single item more than one possible value. 4NF demands delimited strings should be split up. In the case of an online auction house, it is likely that field would only be used for online display. Then again, the list could be split For example, the string value Visa that a specific type of credit card is perhaps payment through an online credit card payment service for Visa 4NF would change the OLTP database model in Figure 10-18 to that shown in Figure 10-20. Figure 10-20: Applying 4NF to the OLTP database ch10.qxd 11/4/05 10:46 AM Page of the of 4NF, as shown in Figure 10-20, depends on Once the number of tables in a database model leads to more tables in query joins. The more are in query joins, the more is adversely affected. Using the 4NF shown in Figure 10-20, a seller could allow four payment methods as Check, Western Union, Visa, seller would have four records as shown in Figure 10-21: Dividing a comma delimited list into separate records using SELLER records using the database model shown in Figure 10-20 would require a two-table join of the SELLER and tables. On the contrary, without the 4NF as for the database model shown in Figure 10-18, only a single table would be read. Querying a single table is better and easier than a two table join; however, two-table joins perform perfectly a few tables, with no effect on unless one of the tables has a huge number of records. The only problem with the database model structure in Figure 10-20 is that table has very few records for each SELLER record. Is there any point in dividing up strings in this case? Splitting strings in languages for is one of the easiest things in the world, and is to cause problems in Doing this type of at model level using 4NF, on this scale, is a little — to say the 5NF 5NF can be used, and not should be used, to eliminate cyclic A cyclic is something that depends on one thing, such that the one thing is either directly or upon itself. Thus, a cyclic is a form of circular where three as a of a single composite primary key table. For example, the three pairs could be field 1 with field 2, field 2 with field 3, and field 1 with field 3. In other words, the means that is related to else, including itself. There is a a which excludes If tables are joined, again using a join, records will be the same as that present in the original table. It is a stated of of 5NF that the join must match the number of records for a query on table. 5NF is similar to 4NF, in that both attempt to minimize the number of fields in composite 10-18 has no composite primary keys, because surrogate keys are used. At this stage, using 5NF is thus a little however, take a quick look at Figure 10-5 (earlier in this chapter) where were not yet into the online auction house OLTP database model. The structure of tables in Figure 10-5 looks similar to that shown in Figure ch10.qxd 11/4/05 10:46 AM Page 10-22: 5NF can help to break down composite primary the end justify the means? probably not! As you can see in Figure 10-22, the starts to look a little like the structure shown on the left of Figure Study: and Refining an OLTP Database is the part where you get to ignore the applied in the previous go back to the OLTP database model shown in Figure 10-18. And, yes, the database model in can be there are no rules or any kind of process with respect to is mostly common sense. In this case, common sense is the of 10-18 is repeated here again, in Figure 10-23, for and Refining Tables During the Design ch10.qxd 11/4/05 10:46 AM Page 10-23: The online auction house OLTP database model to can and should be in the database model shown in Figure 10-23? ❑ The three category tables should be merged into a single table. Not only does this make of easier, it also allows any number of layers in the category rather than to the three of primary, and tertiary Seller and buyer histories could benefit by being a single table, not only because fields are the same but also because a seller can also be a buyer and visa versa. Merging the two tables group search of a little slower; however, proper indexing improve in general (for all Also, because buyers can be sellers can be buyers, it makes no logical sense to store records in two If sellers and buyers are merged, it might be expedient to remove fields exclusive to table, into a 4NF, subset table, to remove NULL values from the merged fields are the and the Depending on the relative numbers of buyers, sellers, and (those who do and selling), it might be expedient to even merge the sellers and buyers into a as well as merging Once again, fields are largely the same. The number of in operation might preempt the merge as well. The resulting OLTP database model could look similar to that shown in Figure ch10.qxd 11/4/05 10:46 AM Page 10-24: the online auction house OLTP database is, in general, far more for data warehouse database models than it is for OLTP database models. One of the problems with what and how to is that in the analysis and design phases of database modeling and design, is a little like country. If you don’t beyond 3NF, your system design its maker. And then if you do an OLTP database model, you could kill the the very structure you have just general, is not because no one has really thought up a formal it, like many have devised for might be somewhat akin Guesswork is always but if analysis is all about expert don’t let the lack of formal methods in scare you away from it. The biggest problem with is that it requires extensive kind of foresight is available only when a system has been analyzed, designed, into when in any further database modeling changes are So, when hoping to a database model for and ease of use by and Refining Tables During the Design ch10.qxd 11/4/05 10:46 AM Page 10 try to learn as much about how use tables, in terms of record how many accessed at once on GUI screens, how large reports will be, and so on. And do that learning part of analysis and design. It might be to rectify in and even in requires as much knowledge as following state the obvious: ❑ The database model is the backbone of any that uses data of any kind. That data is most likely stored in some kind of database. That database is likely to be a database of one form or another. ❑ Better designed database models tend to lend to clearer and easier of SQL code queries. The ease of of, and the ultimate of largely on the soundness of the database model. The database model is the backbone of The better the database model design, the better queries are produced, the better will be and the happier your end-users will be. A good often easily built by often not also easily usable by Similar to database modelers, often write code for in an elegant fashion. Elegant solutions are not always going to produce the most face result. must run fast enough. must not encourage become Do not let elegant modeling and coding drive away your — no business. No business — no company. No company — no job! And, if your end-user happens to be your boss, well, you know the you must be able to build good queries. The soundness of those queries, and dependent upon the soundness of the database model. A highly is likely to be unsound because there are too many tables, too much and too in joins. Lots of tables and lots of complex confuse people, the query for And perform database models in the analysis and design phases, not after the fact in model structure for systems is generally extremely and to end-users go down for After all, the objective is to turn a profit. This means keeping your end-users If the database is an in-house thing, you need to keep your job. again, the of queries comes down to how many tables are joined in a single query. shows the original OLTP database model for the online auction house. In Figure following has — were from three tables down to a single table. A the three category tables would look similar to * FROM CP JOIN CS USING CT USING ch10.qxd 11/4/05 10:46 AM Page and Refining Tables During the Design Phase A query against the single category table could be as * FROM the single category table was required to display a a self join could be used engines have special syntax for FROM CATEGORY P JOIN CATEGORY C = BY in this way is probably a very sensible idea for the OLTP of the online auction house. ❑ Users — Sellers and buyers were partially into users, where 4NF used to separate seller details from buyers. Using the database model in to find all listings for a specific seller, the following query applies (joining two tables a WHERE clause to the SELLER * FROM SELLER S JOIN LISTING L USING S.SELLER = “Joe again, using the database model in Figure 10-23, the following query finds bids, on all listings, for a buyer (joining three tables and applying a to the BUYER * FROM LISTING L JOIN BID BID USING BUYER B USING B.BUYER = “Jim the database model in Figure 10-24, this query finds all listings for a seller (the SELLER and USER tables are actually * FROM USER U JOIN SELLER S USING LISTING L USING U.NAME = “Joe query is actually worse for the database model because it joins three of two. And again, using the database model in Figure 10-24, the query finds all existing bids on all listings for a * FROM LISTING L JOIN BID BID USING USER U USING U.NAME = “Jim U.USER_ID NOT IN (SELECT USER_ID FROM ch10.qxd 11/4/05 10:46 AM Page 10 This query is also worse for the version because not only does it join three performs a semi-join (and an anti semi-join at that). An anti semi-join is a search. A negative search tries to find what is not in a table, and therefore must read in that table. Indexes can’t be used at all and, thus, a full table scan results. Full can be I/O heavy for larger tables. It should be clear to conclude that the BUYER and SELLER tables into the USER and SELLER tables (as shown in Figure 10-24) is probably quite a bad idea! At least it appears that way from the of query use; however, an extra field could be added to the USER table to between users and buyers, in relation to bids and listings (a both buying and selling will appear in both buyer and seller data sets). The could be used as a base for very efficient indexing or even something as advanced as breaks tables into separate physical chunks. If the USER between users and sellers, reading only sellers from the USER table would I/O against a partition sellers (not buyers). It is still not really very the BUYER and SELLER table into the USER table. ❑ Histories — The two history tables were into a single table, as shown in Executing a query using the database model in Figure 10-23 to find the history for a specific seller, could be performed using a query like the * FROM SELLER S JOIN SH USING S.SELLER = “Joe a history for a specific seller using the database model shown in could use a query like * FROM USER U JOIN HISTORY H U.NAME = “Joe Soap” AND U.USER_ID IN (SELECT USER_ID FROM again, as with of SELLER and BUYER tables into the USER table, the and BUYER HISTORY tables into the HISTORY table, might a bad idea. The first query above joins two tables. The second query also joins two tables, but also executes a This semi-join is not as bad as for of users, an anti however, this is still a three-way you have that perhaps the most and efficient for the OLTP online auction house is as shown in Figure 10-25. The only at this stage is to merge the three separate category hierarchy tables into the single table. Buyer, seller, and history is probably best left in separate ch10.qxd 11/4/05 10:46 AM Page is rarely effective for OLTP database models for anything between 1NF and 3NF; however (and this very remember that in this chapter you read about layers beyond 3NF (BCNF, 4NF, 5NF and DKNF). None of these intensive Normal Forms have so far been applied to the OLTP database model for the online auction house. As of Figure 10-23, you began to attempt to backtrack on performed by You the 3NF database model as shown in Figure 10-23. In other words, any beyond 3NF was simply ignored, having already been proved to be and over the top for database 10-25: The online auction house OLTP database model, 3NF, partially only obvious issue still with the database model as shown in Figure 10-25 is that the and tables have both BUYER_ID and SELLER_ID fields. In other words, both his- tory tables are linked (related) to both of the BUYER and SELLER tables. It therefore could make to not only the category tables, but the history tables as well, leave BUYER tables and separate, as shown in Figure and Refining Tables During the Design ch10.qxd 11/4/05 10:46 AM Page 10-26: The online auction house OLTP database model, 3NF, slightly further newly HISTORY table can be accessed by splitting the history records based on buyers and sellers, using indexing or something hairy fairy and like physical It Out Designing an OLTP Database a simple design level OLTP database model for a Web site. This Web site allows creation of ads for musicians and bands. Use the simple OLTP database model presented in Figure from Figure 9-19, in Chapter 9). Here’s a basic Create surrogate primary keys for all Enforce integrity using primary keys, foreign keys, and Refine properly, according to as and also be precise about whether each crow’s foot allows zero. 4. Normalize as much as for usability and ch10.qxd 11/4/05 10:46 AM Page 10-27: bands, their online and some other It 10-27 shows the analyzed OLTP database model database model, for online musician and The database model in Figure 10-26 has the following basic Musicians can play multiple Musicians can be A band can have multiple The MEMBERS field in the band table takes into account the and MUSICIAN. In other words, there is more than one musician in a band a musician doesn’t have to be in a band, a band may be broken up and have and both bands and musicians can Musicians and bands can place Bands and musicians perform shows. ❑ Bands and musicians sell typically contains all released CDs and Refining Tables During the Design ch10.qxd 11/4/05 10:46 AM Page 10-28 shows the database model in Figure 10-27, with all primary keys as surrogate keys. The SHOWS table is renamed to SHOW (each record equals one show). Figure 10-28: Primary keys as surrogate ch10.qxd 11/4/05 10:46 AM Page and Refining Tables During the Design 10-29 enforces integrity properly, between all primary and foreign keys, for all stage limited the between and MUSICIAN, SKILL and GENRE and 10-29: Enforcing integrity between all primary key and foreign ch10.qxd 11/4/05 10:46 AM Page 10-30 refines between tables as and NULL valued. table contains a with divided into sections (such as a strings a section). The GENRE table also contains a hierarchy of genres and For hard rock music is part of rock music, and rock music is part of modern music. Hard rock contain as well, such as rock. The and GENRE table has self join or fishhook join added, to represent the A musician must play at least (voice is an Skills cannot exist without a 10-30: Refining as and NULL ch10.qxd 11/4/05 10:46 AM Page and Refining Tables During the Design 10-31 refines the database model with This is about as far as this database model can be The and GENRE tables could be if the number of layers in the of the two tables is known. For the purposes of this example, it is assumed this is not the case. Some fields have been removed from various tables for being For example, EMAIL, and fields have been removed from the table, are included in the AD_TEXT field. Some fields have been 10-31: Refining the database model with ch10.qxd 11/4/05 10:46 AM Page 10-32 refines the database model with All the nasty detailed Normal Forms The VENUE is retained since venues are static and shows are dynamic (fact 10-32: Refining the database model with Study: Refining a Data Warehouse Database 10-12 shows the most recent version of the data warehouse database model for the online From an you sellers, buyers, times, and a large fact table. Where do you go from here? What you need to do is probably to normalize the fact into separate fact tables. Separate fact tables become separate star schemas, and thus marts. Figure 10-33 shows an initial split of facts into three separate fact tables (see Figure ch10.qxd 11/4/05 10:46 AM Page t _i d s a nd th e da ta in to ch10.qxd 11/4/05 10:46 AM Page problem as shown in Figure 10-33 is that there is an for a between listing and bid facts. Why? Think about how this data would be used. If you wanted to track listing alone, then there is no problem. If you wanted to ana- lyze the pattern of bids against types of listings, then that LISTING to BID would be required. a between multiple fact tables causes serious problems. The goes back to the existence of the data warehouse database model itself. Data database models were devised to split very small tables, linked in a hierarchy (a star schema), all linked to a single fact table. Fact tables are very large. The most of query joins are those between one or more small tables (the and only a single That great big humungous table is the fact table. And there should also be only one fact table in a star schema. Never relate or join fact tables. You might wait a week for queries to more than one fact table together results in a join between two very large tables, which can — defeating the very existence of the data warehouse Don’t do it! A solution is to merge and the fact tables as shown in Figure 10-34. The HISTORY fact table is not a problem because histories apply to sellers and buyers. In the real world, histories are used and buyers to assess whether they are dealing with an honest trader or a complete 10-34: Reducing the three fact tables in Figure 10-33 to two fact tables, based on the listing and bid facts ch10.qxd 11/4/05 10:46 AM Page and Refining Tables During the Design Phase A more easily readable form of the star schemas shown in Figure 10-34 is shown in Figure 10-35. If you go back briefly to Chapter 9 and examine Figure 9-23, you can see that this structure was even in the analysis 10-35: The data warehouse fact table star schemas, for the online auction ch10.qxd 11/4/05 10:46 AM Page Figure 10-34 and Figure 10-35 represent the best, most most easily usable database model for the online auction house data It is possible to normalize further by the heck out of the — just don’t normalize the facts. (other than making fact table splits into multiple star schemas, as shown in defeats the purpose of the data warehouse 10-36 shows an ERD of the HISTORY fact table snowflake schema, with to gazoo! A snowflake schema is a star schema, where have been is no need to detail query examples for the data warehouse database model, as the same for SQL coding of query joins for both OLTP and data warehouse The fewer tables in a join, the It is more efficient to join between a small table and a large table, compared with equally joining two small tables is most efficient because there isn’t much data be logical by the snowflake schema shown in Figure 10-36 is not only nuts, it will also drive nuts trying to figure it all out. And end-users simply won’t know what the heck is going on in there. typically in-house type often write (or at least warehouse reporting The data warehouse shown in Figure 10-36 is quite simply The end-users it will probably think it is and they will probably avoid it. Try It Out Designing a Data Warehouse Database a simple design level data warehouse database model, for a Web site. This Web site allows free ads for musicians and bands. Use the data warehouse presented in Figure 10-37 (copied from Figure 9-29, in Chapter 9). Here’s a basic Refine and facts, making sure that are and facts are facts. 2. Divide facts into multiple star schemas if multiple, unrelated fact sets exist. 3. Normalize into a snowflake schema. This can help to identify and quantify into a star primary purpose of the data warehouse model is to allow possible join method between two tables: one large table, and one or more very small ch10.qxd 11/4/05 10:46 AM Page