Summarize the principles of database design-Mysql Tutorial-php.cn

The relationship between the original document and the entity can be one-to-one, one-to-many, or many-to-many. In general, they are a one-to-one relationship: that is, one original document corresponds to and only one entity. In special cases, they may be a one-to-many or many-to-one relationship, that is, one original document corresponds to multiple entities, or multiple original documents correspond to one entity.

1. The relationship between the original document and the entity
It can be a one-to-one, one-to-many, or many-to-many relationship. In general, they are a one-to-one relationship: that is, one original document corresponds to and only one entity.
In special cases, they may be a one-to-many or many-to-one relationship, that is, one original document corresponds to multiple entities, or multiple original documents correspond to one entity.
The entities here can be understood as basic tables. After clarifying this correspondence, it will be of great benefit to us in designing the input interface.

〖Example 1〗: An employee resume information corresponds to three basic tables in the human resources information system: employee basic information table, social relationship table, and work resume table.
　　　This is a typical example of "one original document corresponds to multiple entities".

2. Primary key and foreign key
Generally speaking, an entity cannot have both a primary key and a foreign key. In the E-R diagram, the entity in the leaf position can define a primary key or not (because it has no descendants), but it must have a foreign key (because it has a parent).

The design of primary keys and foreign keys plays an important role in the design of the global database. After the design of the global database was completed, an American database design expert said: "Keys are everywhere, and there is nothing except keys." This is what he said about his database design experience, and it also reflects He developed his highly abstract ideas about the core of information systems (data models). Because: the primary key is a highly abstract entity, and the pairing of the primary key and the foreign key represents the connection between entities.

3. Properties of basic tables

Basic tables are different from intermediate tables and temporary tables because they have the following four characteristics: (1) Atomicity. Fields in the base table are not decomposable.
　 (2) Primitiveness. The records in the base table are records of the original data (base data). 　 (3) Deductive. All output data can be derived from the data in the basic table and code table.
　 (4) Stability. The structure of the basic table is relatively stable, and the records in the table need to be saved for a long time.
After understanding the nature of basic tables, you can distinguish basic tables from intermediate tables and temporary tables when designing a database.

4. Normal Form Standard

The relationship between the basic table and its fields should try to meet the Third Normal Form
. However, a database design that satisfies the third paradigm is often not the best design. In order to improve the operating efficiency of the database, it is often necessary to lower the paradigm standard: appropriately increase redundancy to achieve the purpose of exchanging space for time. 〖Example 2〗: There is a basic table for storing products, as shown in Table 1. The existence of the "amount" field indicates that the design of the table does not meet the third normal form.
Because "amount" can be obtained by multiplying "unit price" by "quantity", it means that "amount" is a redundant field. However, adding the redundant field "Amount" can
improve the speed of querying statistics. This is how to trade space for time.
In Rose 2002, there are two types of columns: data columns and calculated columns. Columns such as "Amount" are called "calculated columns", while columns such as "Unit Price" and "Quantity" are called "data columns".
Table 1 Table structure of the product table
Product name Product model Unit price Quantity amount
TV 29 inches 2,500 40 100,000

5. Understand the three paradigms in a simple way

A common understanding of the three paradigms is of great benefit to database design. In database design, in order to better apply the three paradigms, it is necessary to understand the three paradigms in a popular way (the popular understanding is a sufficient understanding, not the most scientific and accurate understanding):
　First Normal Form: 1NF is an atomicity constraint on attributes, which requires attributes to be atomic and cannot be decomposed;
　
Second Normal Form
: 2NF is a uniqueness constraint on records, which requires The record has a unique identifier, that is, the uniqueness of the entity; Third normal form: 3NF is a constraint on field redundancy, that is, any field cannot be derived from other fields, and it requires that the fields are not redundant.
No redundant database design can do this. However, a database without redundancy may not be the best database. Sometimes in order to improve operating efficiency, it is necessary to lower the paradigm standard and appropriately retain redundant data. The specific approach is: comply with the third paradigm when designing the conceptual data model, and put the work of lowering the paradigm standard into consideration when designing the physical data model. Lowering the normal form means adding fields and allowing redundancy.
6. Be good at identifying and correctly handling many-to-many relationships

If there is a many-to-many relationship between two entities, this relationship should be eliminated. The solution is to add a third entity between the two. In this way, it turns out that a
One many-to-many relationship is now two one-to-many relationships. The attributes of the original two entities should be reasonably allocated to the three entities. The third
entity here is essentially a more complex relationship, which corresponds to a basic table. Generally speaking, database design tools cannot recognize many-to-many relationships, but they can handle many-to-many relationships.

〖Example 3〗: In the "Library Information System", "book" is an entity, and "reader" is also an entity. The relationship between these two entities is a typical many-to-many relationship: a book can be borrowed by multiple readers at different times, and a reader can borrow multiple books. To this end, a third entity should be added between the two. This entity is named "Borrowing and Returning Books". Its attributes are: borrowing and returning time, borrowing and returning flag (0 means borrowing the book, 1 means returning the book). Book), in addition,
It should also have two foreign keys (the primary key of "Book", the primary key of "Reader"), so that it can be connected with "Book" and "Reader".

7. The value method of primary key PK

PK is an inter-table connection tool for programmers. It can be a digital string with no physical meaning, which is automatically added by 1 by the program. to fulfill. It can also be a field name or a combination of field names with physical meaning. But the former is better than the latter. When PK is a combination of field names, it is recommended that the number of fields not be too many. If there are too many fields, not only will the index take up a lot of space, but it will also be slow.

8. Correctly understand data redundancy

The recurrence of primary keys and foreign keys in multiple tables is not data redundancy. This concept must be clear. In fact, many people Not sure yet. The repeated occurrence of non-key fields is data redundancy! And it is a kind of low-level redundancy, that is, repetitive redundancy. High-level redundancy is not the repeated occurrence of a field, but the derived occurrence of a field.
〖Example 4〗: Among the three fields of "unit price, quantity, and amount" in the product, "amount" is derived from "unit price" multiplied by "quantity", which is redundant, And it’s a high-level redundancy. The purpose of redundancy is to increase processing speed. Only low-level redundancy will increase data inconsistency, because the same data can be entered multiple times from different times, locations, and roles. Therefore, we advocate high-level redundancy (derivative redundancy) and oppose low-level redundancy (repetitive redundancy).

9. There is no standard answer to the E--R diagram

There is no standard answer to the E--R diagram of the information system, because its design and drawing method are not unique, as long as it It is feasible if it covers the business scope and functional content of the system requirements. Otherwise, modify the E--R diagram. Although it does not have a single standard answer, it does not mean that it can be designed at will. The standards for a good E-R diagram are:
Clear structure, concise associations, moderate number of entities, reasonable attribute allocation, and no low-level redundancy.

10. View technology is very useful in database design
Unlike basic tables, code tables, and intermediate tables, a view is a virtual table that relies on the real table of the data source. And exist. A view is a window for programmers to use the database. It is a form of base table data synthesis, a method of data processing, and a means of keeping user data confidential. In order to carry out complex processing,
improve computing speed and save storage space, the definition depth of a view should generally not exceed three levels. If the three-layer view is still not enough, you should define a temporary table on the view,
　 and then define the view on the temporary table. By repeatedly overlapping definitions in this way, the depth of the view is not limited.

For some information systems related to national political, economic, technical, military and security interests, the role of views is even more important. After the basic tables of these systems are physically designed, the first-level views are immediately established on the basic tables. The number and structure of this layer of views are exactly the same as the number and structure of the basic tables. It is also stipulated that all programmers are only allowed to operate on views. Only the database administrator, with a "security key" jointly held by multiple people,
, can operate directly on the basic table. Please think about this: Why is this?

11. Intermediate tables, reports and temporary tables

Intermediate tables are tables that store statistical data. They are designed for data warehouses, output reports or query results. Sometimes they There are no primary keys and foreign keys (except for the data warehouse
database). Temporary tables are designed by programmers to store temporary records for personal use. The base table and intermediate table are maintained by the DBA, and the temporary table is automatically maintained by the programmer
himself.

12. Integrity constraints are expressed in three aspects
Domain integrity: Use Check to implement constraints, and define the value range of fields in the database design tool At this time, there is a Check button through which the value of the field is defined. Referential integrity: implemented using PK, FK, and table-level triggers.
User-defined integrity: It is some business rules, implemented using stored procedures and triggers.

13. The way to prevent database design from being patched is the "Three Less Principles"
　 (1) The fewer the number of tables in a database, the better. Only when the number of tables is reduced can it be shown that the E-R diagram of the system is small and precise, removing repeated and redundant
entities, forming a high degree of abstraction of the objective world, carrying out systematic data integration, and preventing Patch design;
　 (2) The fewer the number of combined primary key fields in a table, the better. Because of the role of the primary key, one is to build a primary key index, and the other is to serve as a foreign key for the sub-table, so the number of fields that combine the primary key is reduced, which not only saves running time, but also saves index storage space;
　 (3) The fewer fields in a table, the better. Only when the number of fields is reduced can it mean that there is no duplication of data in the system, and there is very little redundant data. More importantly, readers are urged to learn to "change columns into rows", thus preventing the change of sub-tables. The fields are pulled into the main table, leaving many empty fields in the main table. The so-called "column to row" is to pull out part of the content in the main table and create a separate sub-table. This method is very simple, but some people are just not used to it, don’t adopt it, and don’t implement it.
The practical principle of database design is to find the right balance between data redundancy and processing speed. "Three Young Masters" is an overall concept, a comprehensive view, and
cannot isolate any one principle. This principle is relative, not absolute. The "three more" principle is definitely wrong. Just imagine: if the same functionality of the system is covered, the E-R diagram of one hundred entities (a total of one thousand attributes) will definitely be better than the E-R diagram of two hundred entities (a total of two thousand attributes). , much better.
Promoting the "Three Less" principle is to teach readers to use database design technology for systematic data integration. The steps of data integration are to integrate the file system into an application database, integrate the application database into a subject database, and integrate the subject database into a global comprehensive database. The higher the degree of integration, the stronger the data sharing, and the fewer information islands. The number of entities, the number of primary keys, and the number of attributes in the global ER diagram of the entire enterprise information system. # will be less.
The purpose of advocating the "Three Less" principle is to prevent readers from using patching technology to continuously add, delete, and modify the database, turning the enterprise database into a "garbage heap" for randomly
designing database tables, or database tables. The "large courtyard" eventually resulted in countless basic tables, code tables, intermediate tables, and temporary tables in the database, which caused the information systems of enterprises and institutions to be unable to be maintained and paralyzed.
Anyone can do the "Three More" principle. This principle is the perverse theory of database design using the "patch method". The "Three Less" principle is the principle of less but better
. It requires high database design skills and art. Not everyone can do it, because this principle is to eliminate the use of "patch method"
Theoretical basis for designing databases.

14. Ways to improve the operating efficiency of the database

Under the given system hardware and system software conditions, the way to improve the operating efficiency of the database system is:
　 (1 ) In the physical design of the database, lower the paradigm, increase redundancy, use less triggers, and use more stored procedures.
　 (2) When the calculation is very complex and the number of records is very large (for example, 10 million records), the complex calculation must first be performed outside the database using the file system method
　　After the calculation and processing is completed using the C++ language, Finally, it is added to the database and added to the table. This is the experience of telecommunications billing system design.
　 (3) If it is found that there are too many records in a certain table, for example, more than 10 million records, the table must be split horizontally. The method of horizontal splitting is to use a certain value of the primary key
PK of the table as a boundary to horizontally split the records of the table into two tables. If it is found that there are too many fields in a table, for example, more than eighty, then
split the table vertically and decompose the original table into two tables. 　 (4) System optimization of the database management system DBMS, that is, optimizing various system parameters, such as the number of buffers. 　 (5) When using data-oriented SQL language for programming, try to adopt optimization algorithms.
　In short, to improve the operating efficiency of the database, we must work simultaneously on the three levels of database system-level optimization, database design-level optimization, and program implementation-level optimization.

The above fourteen skills are gradually summarized by many people through a large number of database analysis and design practices. Regarding the application of these experiences, readers should not be mechanically memorized and memorized, but should digest and understand them, seek truth from facts, and grasp them flexibly. And gradually achieve: develop in application, apply in development.

Let’s take a look at other principles:

1) The database design should not be carried out for the entire system, but should be based on the component division in the system architecture, and the database design of the component unit should be carried out according to the business handled by each component; the corresponding database tables between different components should be designed Associations should be minimized as much as possible. If tables between different components require foreign key associations, try not to create foreign key associations. Instead, just record a primary key of the associated table to ensure the independence between the tables corresponding to the components and ensure the system or table structure. Refactoring offers possibilities.

2) Adopt a domain model-driven approach and a top-down approach to database design. First analyze the system business and define objects according to responsibilities. The object must comply with the characteristics of encapsulation, ensuring that the data items related to the responsibility are defined within an object. These data items can fully describe the responsibility, and there will be no missing responsibility description. And an object has and has only one responsibility. If an object is responsible for two or more responsibilities, it should be split.

3) Map database tables according to the established domain model. At this time, you should refer to the second paradigm of database design: all non-keyword attributes in a table depend on the entire keyword. The keyword can be an attribute or a collection of multiple attributes. In either case, the keyword should be guaranteed to be unique. When determining the keyword, it should be ensured that the keyword will not participate in the business and update exceptions will not occur. In this case, the optimal solution is to use an auto-incrementing numeric attribute or a random string as the keyword of the table.

4) Since the database table structure is designed in a domain model-driven manner as mentioned in the first point, each object in the domain model has only one responsibility, so the data items in the object do not have transitive dependencies, so, The database table structure design of this idea satisfies the third normal form from the beginning: a table should satisfy the second normal form, and there is no transitive dependency between attributes.

5) Similarly, due to the unity of object responsibilities and the relationship between objects reflect the relationship between business logic, the objects in the domain model are divided into master objects and slave objects. It is the business logic of the main object from the perspective of 1-N or N-N, so there are no deletion and insertion exceptions in the tables and table associations mapped from objects and object relationships.

6) The database table structure obtained after mapping should be further modified according to the fourth normal form to ensure that there are no multi-valued dependencies. At this time, feedback should be given to the domain model based on reverse engineering ideas. If there are multi-valued dependencies in the table structure, it proves that the objects in the domain model have at least two or more responsibilities, and the design should be revised according to the first article. Fourth normal form: If a table satisfies BCNF, there should be no multi-valued dependencies.

7) After analysis, it is confirmed that all tables meet the second, third, and fourth normal forms. The association between tables should be weak association as much as possible to facilitate the adjustment and adjustment of table fields and table structures. Refactor. Moreover, I think that the table in the database is used to persist the state of an object instance at a specific time and under specific conditions. It is just a storage medium. Therefore, there should be no strong association between tables to express business (between data Consistency), this responsibility should be guaranteed by the logical layer of the system. This approach also ensures the compatibility of the system with incorrect data (dirty data). Of course, from the perspective of the entire system, we still have to do our best to ensure that the system does not generate dirty data. From another perspective, the generation of dirty data is inevitable to a certain extent, and we must also ensure that the system does not generate dirty data. fault tolerance of this situation. This is a compromise.

8) Indexes should be established on the primary keys and foreign keys of all tables, and indexes on combined attributes should be established in a targeted manner (for some large amounts of data and common retrieval methods) to improve retrieval efficiency. Although building an index will consume some system resources, the performance impact caused by searching the data in the entire table during retrieval, especially when the amount of data in the table is large, is compared with the performance impact caused by the sorting operation when there is no index. , this approach is still worth advocating.

9) Use stored procedures as little as possible. There are already many technologies that can replace the functions of stored procedures, such as "object/relational mapping", etc., to ensure data consistency in the database, whether for version control, Development and deployment, as well as database migration will have a great impact. However, it is undeniable that stored procedures have performance advantages. Therefore, when the hardware that can be used by the system will not be improved and performance is a very important quality attribute, stored procedures can be selected through balanced considerations.

10) When the cost of processing association constraints between tables (often the cost of usability) exceeds the cost of ensuring that modification, deletion, and change exceptions will not occur, and data redundancy is also When it is not a major problem, the table design does not need to conform to the four normal forms. The four paradigms ensure that exceptions will not occur, but it may also lead to an overly pure design, making the table structure difficult to use. Therefore, comprehensive judgment is required during design, but first ensure that it conforms to the four paradigms, and then refine and correct it. This is the best approach you can take when you are just entering the field of database design.

11) The designed table must have good usability, mainly reflected in whether it is necessary to associate multiple tables and use complex SQL skills during query.

12) The designed table should reduce data redundancy as much as possible to ensure data accuracy. Effective control of redundancy will help improve database performance. (Of course sometimes the paradigm must be lowered)

[Related recommendations]

1. Special recommendation:"php program "Employee Toolbox" V0.1 version download

2. Free mysql online video tutorial

##3.

Database design those things

The above is the detailed content of Summarize the principles of database design. For more information, please follow other related articles on the PHP Chinese website!