Home>Article>Database> Let’s talk about database zipper tables

Let’s talk about database zipper tables

WBOY
WBOY forward
2022-06-20 12:01:11 2608browse

This article brings you relevant knowledge aboutmysql, which mainly introduces related issues about database zipper tables. Zipper tables are a data model, mainly for data warehouse design. The table is defined by the way it stores data. As the name suggests, the so-called zipper is to record history. Let’s take a look at it together. I hope it will be helpful to everyone.

Let’s talk about database zipper tables

Recommended learning:mysql video tutorial

zipper table generation background

In the process of designing the data model of the data warehouse, we often encounter such requirements:

1. The amount of data is relatively large;

2. Some fields in the table will Be updated, such as the user's address, product description information, order status, etc.;

3. Need to view historical snapshot information at a certain point in time or time period, for example, to view the historical snapshot information of an order at a certain time in the history The status at a point in time, for example, check how many times a user has updated in a certain period of time in the past, etc.;

4. The proportion and frequency of changes are not very large, for example, there are 10 million in total There are about 100,000 new members and changes every day;

5. If a full copy of this table is retained every day, then a lot of unchanged information will be saved in the full volume each time, which is very important for storage. A huge waste;

There are several options for this kind of table:

  • Option 1: Only keep the latest copy every day, such as ours Use Sqoop to extract the latest full amount of data into Hive every day.
  • Option 2: Keep a full amount of sliced data every day.
  • Option 3: Use zipper list.

Comparison of the above solutions

Option 1

Needless to say more about this solution, it is very easy to implement Simple, drop the previous day's data every day and extract the latest one again.

The advantages are obvious, it saves space, and is also very convenient for common uses. There is no need to add a time partition when selecting a table.

The shortcomings are also obvious. There is no historical data. The only way to check the old accounts first is through other methods, such as drawing from the flow sheet.

Option 2

A full serving of slices every day is a relatively safe plan, and the historical data is also there.

The disadvantage is that the storage space is too large. If a full copy of this table is kept every day, a lot of unchanged information will be saved in each full copy, which is a huge waste of storage. , I still feel very deeply about this...

Of course we can also make some trade-offs, such as only retaining the data of the past month? However, the demand is shameless, and the life cycle of data is not something we can completely control.

Zipper table

The zipper table basically takes into account our needs in use.

First of all, it makes a trade-off in terms of space. Although it does not occupy as small a space as Plan 1, its daily increment may be only one thousandth or even one ten thousandth of Plan 2.

In fact, it can meet the needs of Solution 2, which can not only obtain the latest data, but also add filtering conditions and obtain historical data.

So it is still necessary for us to use zipper tables.

Zipper table concept

The zipper table is a data model, which is mainly defined for the way tables store data in data warehouse design. As the name suggests, the so-called zipper is to record history. Record information about all changes in a thing from its beginning to its current state. Zipper tables can avoid the massive storage problem caused by storing all records for each day, and are also a common way to deal with slowly changing data (SCD2).

Baidu Encyclopedia’s explanation: The zipper table is a table that maintains historical status and the latest status data. Depending on the zipper granularity, the zipper table is actually equivalent to a snapshot, but it has been optimized and part of it has been removed. Unchanged records, the customer records at the time of zipping can be easily restored through the zipper table.

Zipper table algorithm

1. Collect the full data of the day to the ND (NowDay) table;

2. Can retrieve yesterday’s full data from the history table and store it in the OD ( OldDay (last day) table;

3. Compare the two tables with all fields. (ND-OD) is the new and changed data on the day, that is, the increment of the day, represented by W_I;

4. Compare the full fields of the two tables. (OD-ND) is the data that needs to be closed when the status ends. END_DATE needs to be modified, represented by W_U;

5. Change the contents of the W_I table All are inserted into the history table, these are new records, start_date is the current day, and end_date is the max value, which can be set to '9999-12-31';

6. Perform W_U part of the history table In the update operation, start_date remains unchanged, while end_date is changed to the current day, which is a link operation. The historical table (OD) is compared with the W_U table, except for START_DATE and END_DATE. The W_U table shall prevail. The intersection of the two will change END_DATE to the current day. , indicating that the record is invalid.

Zipper table example 1

As a simple example, for example, there is an order table:

There are 3 records on June 20th:

Order creation date Order number Order status
2012 -06-20 001 Create order
2012-06-20 002 Create Order
2012-06-20 003 Payment completed

to 6 On March 21st, there are 5 records in the table:

##2012-06-20 002 Create order 2012-06-20 003 Payment completed 2012-06-21 004 Create order 2012-06-21 005 Create order
Order creation date Order number Order status
2012-06-20 001 Create order
By June 22, there are 6 records in the table:

Order creation date Order number Order status ##2012-06-20 2012-06-20 2012-06-20 2012-06-21 2012-06-21 2012-06-22 Retention method for this table in the data warehouse:
001 Create order
002 Create order
003 Payment completed
004 Create order
005 Create order
006 Create order

1. Only one copy of the full amount is retained, so the data is the same as the record on June 22. If you need to check the status of order 001 on June 21, it cannot be satisfied;

2. One copy is retained every day. In full, the table in the data warehouse has a total of 14 records, but many records are saved repeatedly without task changes, such as order 002,004. The large amount of data will cause a lot of storage waste;

If If the table is designed to be saved as a historical zipper table in the data warehouse, there will be a table like the following:

Order creation date 2012-06-20 ##2012-06-20 001 Payment completed 2012-06-21 9999-12-31 2012-06- 20 002 Create order 2012-06-20 9999-12-31 003 003 004 ##2012-06-21 005 Create order 2012-06-21 2012-06-21 2012-06-21 005 Payment completed 2012-06-22 9999-12-31 2012-06-22 006 Create order 2012-06 -22 9999-12-31 1. dw_begin_date indicates the life cycle start time of the record, dw_end_date Indicates the end time of the life cycle of the record;
Order number Order status dw_bigin_date dw_end_date
001 Create order 2012-06-20 2012-06-20
## 2012-06-20
Payment completed 2012-06-20 2012-06-21 2012-06-20
shipped 2012-06-22 9999-12-31 2012-06-21
Create order 2012-06-21 9999-12- 31
Note:
2. dw_end_date = '9999-12-31' indicates that the record is currently in a valid state;

3. If querying all currently valid record, then select * from order_his where dw_end_date = '9999-12-31';

4. If you query the historical snapshot of 2012-06-21, then select * from order_his where dw_begin_date <= '2012- 06-21' and end_date >= '2012-06-21', this statement will query the following records:

Order Creation Date

Order number Order status dw_bigin_date dw_end_date 001 Payment completed Create order ## 2012-06-20 003 Payment completed 2012-06-20 2012-06-21 2012-06-21 004 Create order 2012-06-21 9999-12-31 2012-06-21 005 Create order 2012-06-21 2012-06-21 Order creation date
##2012-06-20
2012-06-21 9999-12-31 2012-06- 20 002
2012-06-20 9999-12-31
is completely consistent with the record in the source table on June 21:

Order number

It can be seen that such a historical zipper table can not only meet the demand for historical data, but also save storage resources to a great extent;

Zipper table example 2:

In There may only be a few records of a person's life in the history table, which avoids the problem of massive storage caused by recording customer status on a daily basis:

Order status 2012-06-20 001 Create order 2012-06-20 002 Create order 2012-06-20 003 Payment completed 2012-06-21 004 Create order 2012-06-21 005 Create order
##client 19000101 19070901 Hat home client 19070901 19130901 A Elementary School client 19130901 19160901 B Junior High School client 19160901 19190901 C High School client 19190901 19230901 D University client 19230901 19601231 E Company client 19601231 29991231 H retires at home ##Every record above is not counted at the end, for example By 19070901, the client is already in A, not H. Therefore, except for the last record, whose status has not changed so far, the rest of the records are actually in the state on the end date of the record and are no longer in the state on the end date of the record. This phenomenon can be understood as counting the beginning but not the end.
Name of person Start Date End Date Status

Zip table implementation method

1. Define two temporary tables, one for the full data of the day, and the other for the data that needs to be added or updated;

CREATE VOLATILE TABLE VT_xxxx_NEW AS xxxx WITH NO DATA ON COMMIT PRESERVE ROWS;CREATE VOLATILE SET TABLE VT_xxxx_CHG,NO LOG AS xxxx WITH NO DATA ON COMMIT PRESERVE ROWS;

2. Get the full amount of data for the day

INSERT INTO VT_xxxx_NEW(xx) SELECT (xx,cur_date, max_date) FROM xxxx_sorce;

3. Extract new or changed data from xxxx_NEW temporary table to xxxx_CHG temporary table;

INSERT INTO VT_xxxx_CHG(xx)SELECT xx FROM VT_xxxx_NEWWHERE (xx) NOT IN (select xx from xxxx_HIS where end_date='max_date');

4. Update the end_date of the invalid record in the history table to max Value

UPDATE A1 FROM xxxx_HIS A1, VT_xxxx_CHG A2SET End_Date='current_date'WHERE A1.xx=A2.xx AND A1.End_Date='max_date';

5. Insert new or changed data into the target table

INSERT INTO xxxx_HIS SELECT * FROM VT_xxxx_CHG;

Take product data as an example

There is a product table t_product, The table structure is as follows:

Column name Type Description ##goods_id varchar(50) Product number goods_status varchar(50) Product status (pending review, for sale, on sale, deleted) createtime varchar(50) Product creation date modifytime varchar(50) Product modification date The data on December 20, 2019 is as follows:

goods_id 001 To be reviewed 2019-12-20 2019-12-20 002 For Sale 2019-12-20 2019-12-20 003 for sale 2019-12-20 2019-12-20 2019- 12-20 The status of the product will change over time. We need to record the history of all changes in the product. The information is saved. Plan 1: Snapshot each day's data to the data warehouse This plan is to save a full copy every day and synchronize all data to the data warehouse. Many records are saved repeatedly. , no changes.
goods_status ##createtime ##modifytime
##004 Deleted
2019-12-20

December 20 (4 data)

##goods_id
goods_status To be reviewed 2019-12-18 2019-12-20 002 For sale 2019-12-19 2019-12-20 2019-12-20 2019-12-20
createtime modifytime 001
##003 for sale 2019-12-20
004 Deleted 2019-12-15
December 21 (10 pieces of data)
The following are 12 Snapshot data on March 20th ##001 ##002 for sale 2019-12-19 2019-12-20 003 On sale 2019-12-20 2019-12-20 004 Deleted 2019-12-15 2019 -12-20 The following is the snapshot data on December 21 for sale (from pending to pending) for sale On sale Deleted To be reviewed pending review
##goods_id goods_status createtime modifytime



To be reviewed 2019-12-18 2019-12-20



##001
2019-12-18 2019-12 -21 002
2019-12-19 2019-12-20 003
2019-12-20 2019-12-20 004
2019-12-15 2019-12-20 005 (New item)
2019-12-21 2019-12-21 006 (New product)
2019-12-21 2019-12-21
12月22日(18条数据)
goods_id goods_status createtime modifytime
以下为12月20日快照数据


001 待审核 2019-12-18 2019-12-20
002 待售 2019-12-19 2019-12-20
003 在售 2019-12-20 2019-12-20
004 已删除 2019-12-15 2019-12-20
以下为12月21日快照数据


001 待售(从待审核到待售) 2019-12-18 2019-12-21
002 待售 2019-12-19 2019-12-20
003 在售 2019-12-20 2019-12-20
004 已删除 2019-12-15 2019-12-20
005 待审核 2019-12-21 2019-12-21
006 待审核 2019-12-21 2019-12-21
以下为12月22日快照数据


001 待售 2019-12-18 2019-12-21
002 待售 2019-12-19 2019-12-20
003 已删除(从在售到已删除) 2019-12-20 2019-12-22
004 待审核 2019-12-21 2019-12-21
005 待审核 2019-12-21 2019-12-21
006 已删除(从待审核到已删除) 2019-12-21 2019-12-22
007 待审核 2019-12-22 2019-12-22
008 待审核 2019-12-22 2019-12-22
MySQL数仓代码实现

MySQL初始化

在MySQL中 lalian 库和商品表用于到原始数据层

-- 创建数据库create database if not exists lalian;-- 创建商品表create table if not exists `lalian`.`t_product`( goods_id varchar(50), -- 商品编号 goods_status varchar(50), -- 商品状态 createtime varchar(50), -- 商品创建时间 modifytime varchar(50) -- 商品修改时间);

在MySQL中创建ods和dw层来模拟数仓

-- ods创建商品表create table if not exists `lalian`.`ods_t_product`( goods_id varchar(50), -- 商品编号 goods_status varchar(50), -- 商品状态 createtime varchar(50), -- 商品创建时间 modifytime varchar(50), -- 商品修改时间 cdat varchar(10) -- 模拟hive分区)default character set = 'utf8';-- dw创建商品表create table if not exists `lalian`.`dw_t_product`( goods_id varchar(50), -- 商品编号 goods_status varchar(50), -- 商品状态 createtime varchar(50), -- 商品创建时间 modifytime varchar(50), -- 商品修改时间 cdat varchar(10) -- 模拟hive分区)default character set = 'utf8';

增量导入12月20号数据

原始数据导入12月20号数据(4条)

insert into `lalian`.`t_product`(goods_id, goods_status, createtime, modifytime) values('001', '待审核', '2019-12-18', '2019-12-20'),('002', '待售', '2019-12-19', '2019-12-20'),('003', '在售', '2019-12-20', '2019-12-20'),('004', '已删除', '2019-12-15', '2019-12-20');

注意:由于这里使用的MySQL来模拟的数仓所以直接使用insert into的方式导入数据,在企业中可能会使用hive来做数仓使用 kettle 或者 sqoop 或 datax 等来同步数据。

# 从原始数据层导入到ods 层insert into lalian.ods_t_productselect *,'20191220' from lalian.t_product ;# 从ods同步到dw层insert into lalian.dw_t_productselect * from lalian.ods_t_product where cdat='20191220';

查看dw层的运行结果

select * from lalian.dw_t_product where cdat='20191220';
goods_id goods_status createtime modifytime cdat
1 待审核 2019/12/18 2019/12/20 20191220
2 待售 2019/12/19 2019/12/20 20191220
3 在售 2019/12/20 2019/12/20 20191220
4 已删除 2019/12/15 2019/12/20 20191220

增量导入12月21数据

原始数据层导入12月21日数据(6条数据)

UPDATE `lalian`.`t_product` SET goods_status = '待售', modifytime = '2019-12-21' WHERE goods_id = '001'; INSERT INTO `lalian`.`t_product`(goods_id, goods_status, createtime, modifytime) VALUES ('005', '待审核', '2019-12-21', '2019-12-21'), ('006', '待审核', '2019-12-21', '2019-12-21');

将数据导入到ods层与dw层

# 从原始数据层导入到ods 层insert into lalian.ods_t_productselect *,'20191221' from lalian.t_product ;# 从ods同步到dw层insert into lalian.dw_t_productselect * from lalian.ods_t_product where cdat='20191221';

查看dw层的运行结果

select * from lalian.dw_t_product where cdat='20191221';
goods_id goods_status createtime modifytime cdat
1 待售 2019/12/18 2019/12/21 20191221
2 待售 2019/12/19 2019/12/20 20191221
3 在售 2019/12/20 2019/12/20 20191221
4 已删除 2019/12/15 2019/12/20 20191221
5 待审核 2019/12/21 2019/12/21 20191221
6 待审核 2019/12/21 2019/12/21 20191221

增量导入12月22日数据

原始数据层导入12月22日数据(6条数据)

UPDATE `lalian`.`t_product` SET goods_status = '已删除', modifytime = '2019-12-22' WHERE goods_id = '003';UPDATE `lalian`.`t_product` SET goods_status = '已删除', modifytime = '2019-12-22' WHERE goods_id = '006';INSERT INTO `lalian`.`t_product`(goods_id, goods_status, createtime, modifytime) VALUES('007', '待审核', '2019-12-22', '2019-12-22'),('008', '待审核', '2019-12-22', '2019-12-22');

将数据导入到ods层与dw层

# 从原始数据层导入到ods 层 insert into lalian.ods_t_product select *,'20191222' from lalian.t_product ; # 从ods同步到dw层 insert into lalian.dw_t_productpeizhiwenjian select * from lalian.ods_t_product where cdat='20191222';

查看dw层的运行结果

select * from lalian.dw_t_product where cdat='20191222';
goods_id goods_status createtime modifytime cdat
1 待售 2019/12/18 2019/12/21 20191222
2 待售 2019/12/19 2019/12/20 20191222
3 已删除 2019/12/20 2019/12/22 20191222
4 已删除 2019/12/15 2019/12/20 20191222
5 待审核 2019/12/21 2019/12/21 20191222
6 已删除 2019/12/21 2019/12/22 20191222
7 待审核 2019/12/22 2019/12/22 20191222
8 待审核 2019/12/22 2019/12/22 20191222

查看dw层的运行结果

select * from lalian.dw_t_product;
goods_id goods_status createtime modifytime cdat
1 待审核 2019/12/18 2019/12/20 20191220
2 待售 2019/12/19 2019/12/20 20191220
3 在售 2019/12/20 2019/12/20 20191220
4 已删除 2019/12/15 2019/12/20 20191220
1 待售 2019/12/18 2019/12/21 20191221
2 待售 2019/12/19 2019/12/20 20191221
3 在售 2019/12/20 2019/12/20 20191221
4 已删除 2019/12/15 2019/12/20 20191221
5 待审核 2019/12/21 2019/12/21 20191221
6 待审核 2019/12/21 2019/12/21 20191221
1 待售 2019/12/18 2019/12/21 20191222
2 待售 2019/12/19 2019/12/20 20191222
3 已删除 2019/12/20 2019/12/22 20191222
4 已删除 2019/12/15 2019/12/20 20191222
5 待审核 2019/12/21 2019/12/21 20191222
6 已删除 2019/12/21 2019/12/22 20191222
7 待审核 2019/12/22 2019/12/22 20191222
8 待审核 2019/12/22 2019/12/22 20191222

From the above case, we can see that the table retains a full copy every day, and a lot of unchanged information will be saved in each full volume.If the amount of data is large, it will be a huge waste of storage, the table can be designed as a zipper table, which can not only reflect the historical status of the data, but also save storage space to the maximum extent.

Option 2: Use zipper table to save historical snapshots

zipper table does not store redundant data. Only the data of a certainrow needs to be saved, correspondingly It will save storage space compared with full synchronization every time.

Be able to query historical snapshots

Added two additional columns (dw_start_date,dw_end_date), is the life cycle of the data row.

Data of product zipper list on December 20
##001 002 003 004 ##The data on December 20 is a brand new data import To the dw table
##goods_id goods_status createtime modifytime dw_start_date dw_end_date
To be reviewed 2019-12-18 2019-12-20 2019-12-20 9999-12-31
For Sale 2019-12-19 2019-12-20 2019-12-20 9999-12-31
On sale 2019-12-20 2019-12-20 2019-12-20 9999-12-31
Deleted 2019-12-15 2019-12 -20 2019-12-20 9999-12-31

dw_start_date indicates the starting time of the life cycle of a certain piece of data, that is, the data is valid from that time (ie the effective date)

    dw_end_date indicates the life cycle of a certain piece of data The end time, that is, the data reaches this day (not included) (that is, the expiration date)
  • dw_end_date is 9999-12-31, which means that the current data is the latest data, and the data will not expire until 9999-12-31
  • December 21st product zipper table data

There is no redundant data stored in the zipper table, that is, as long as the data does not change, there is no need to synchronize

  • The status of the product data numbered #001 has changed (from pending review → pending sale), which is required Change the originaldw_end_date from 9999-12-31 to 2019-12-21, indicating the pending review status, valid from 2019/12/20 (inclusive) - 2019/12/21 (exclusive) ;
  • 001 number re-saves a record in the new state,dw_start_date is 2019/12/21, dw_end_date is 9999/12/31;
  • New data 005, 006. dw_start_date is 2019/12/21 and dw_end_date is 9999/12/31.
December 22nd product zipper table data
##goods_id ##modifytime dw_start_date dw_end_date To be reviewed 2019-12-18 2019-12-21 2019-12-19 2019-12-20 2019-12-15 for sale To be reviewed
goods_status ##createtime 001
2019-12-20 2019-12-20 002For Sale
2019-12-20 2019-12-20 9999- 12-31 003 On Sale
2019-12-20 2019-12-20 9999-12-31 004 Deleted
2019-12-20 2019-12-20 9999-12-31 ##001( Change)
2019-12-18 2019-12-21 2019-12-21 9999-12-31 005 (New)
2019-12-21 2019 -12-21 2019-12-21 9999-12-31
##createtime 001 To be reviewed 2019-12-18 2019-12-20 2019-12-20 2019-12-21 for sale On sale 2019-12-22 Deleted ##001 For Sale 2019-12-18 2019-12-21 2019-12-21 9999-12-31 005 Pending review 2019-12-21 2019-12-21 2019-12 -21 9999-12-31 006 pending review 2019-12-21 2019-12-21 2019-12-21 9999-12-31 ##003 (Change) Deleted 2019-12-20 007 (New ) pending review 008 (New) Pending review
##goods_id goods_status##modifytime dw_start_date dw_end_date
##002
2019-12-19 2019-12-20 2019-12-20 9999-12-31 003
2019-12-20 2019-12-20 2019-12- 20 004
2019-12-15 2019-12-20 2019-12-20 9999-12-31
##2019-12-22 2019-12-22 ##9999-12-31
2019-12-22 2019-12-22 2019-12-22 9999-12-31
2019-12-22 2019- 12-22 2019-12-22 9999-12-31

拉链表中没有存储冗余的数据,即只要数据没有变化,无需同步

  • 003编号的商品数据的状态发生了变化(从在售→已删除),需要将原有的dw_end_date从9999-12-31变为2019-12-22,表示在售状态,在2019/12/20(包含) - 2019/12/22(不包含)有效
  • 003编号新的状态重新保存了一条记录,dw_start_date为2019-12-22,dw_end_date为9999-12-31
  • 新数据007、008、dw_start_date为2019-12-22,dw_end_date为9999-12-31
MySQL数仓拉链表快照实现

操作流程:

  1. 在原有dw层表上,添加额外的两列
  2. 只同步当天修改的数据到ods层
  3. 拉链表算法实现
  4. 拉链表的数据为:当天最新的数据 UNION ALL 历史数据

代码实现

在MySQL中lalian库和商品表用于到原始数据层

-- 创建数据库create database if not exists lalian;-- 创建商品表create table if not exists `lalian`.`t_product2`( goods_id varchar(50), -- 商品编号 goods_status varchar(50), -- 商品状态 createtime varchar(50), -- 商品创建时间 modifytime varchar(50) -- 商品修改时间)default character set = 'utf8';

在MySQL中创建ods和dw层 模拟数仓

-- ods创建商品表create table if not exists `lalian`.`ods_t_product2`( goods_id varchar(50), -- 商品编号 goods_status varchar(50), -- 商品状态 createtime varchar(50), -- 商品创建时间 modifytime varchar(50), -- 商品修改时间 cdat varchar(10) -- 模拟hive分区)default character set = 'utf8';-- dw创建商品表create table if not exists `lalian`.`dw_t_product2`( goods_id varchar(50), -- 商品编号 goods_status varchar(50), -- 商品状态 createtime varchar(50), -- 商品创建时间 modifytime varchar(50), -- 商品修改时间 dw_start_date varchar(12), -- 生效日期 dw_end_date varchar(12), -- 失效时间 cdat varchar(10) -- 模拟hive分区)default character set = 'utf8';

全量导入2019年12月20日数据

原始数据层导入12月20日数据(4条数据)

insert into `lalian`.`t_product_2`(goods_id, goods_status, createtime, modifytime) values('001', '待审核', '2019-12-18', '2019-12-20'),('002', '待售', '2019-12-19', '2019-12-20'),('003', '在售', '2019-12-20', '2019-12-20'),('004', '已删除', '2019-12-15', '2019-12-20');

将数据导入到数仓中的ods层

insert into lalian.ods_t_product2select *,'20191220' from lalian.t_product2 where modifytime >='2019-12-20';

将数据从ods层导入到dw层

insert into lalian.dw_t_product2select goods_id, goods_status, createtime, modifytime, modifytime,'9999-12-31', cdat from lalian.ods_t_product2 where cdat='20191220';

增量导入2019年12月21日数据

原始数据层导入12月21日数据(6条数据)

UPDATE `lalian`.`t_product2` SET goods_status = '待售', modifytime = '2019-12-21' WHERE goods_id = '001';INSERT INTO `lalian`.`t_product2`(goods_id, goods_status, createtime, modifytime) VALUES('005', '待审核', '2019-12-21', '2019-12-21'),('006', '待审核', '2019-12-21', '2019-12-21');

原始数据层同步到ods层

insert into lalian.ods_t_product2select *,'20191221' from lalian.t_product2 where modifytime >='2019-12-21';

编写ods层到dw层重新计算 dw_end_date

select t1.goods_id, t1.goods_status, t1.createtime, t1.modifytime, t1.dw_start_date, case when (t2.goods_id is not null and t1.dw_end_date>'2019-12-21') then '2019-12-21'else t1.dw_end_date end as dw_end_date , t1.cdatfrom lalian.dw_t_product2 t1left join (select * from lalian.ods_t_product2 where cdat='20191221')t2 on t1.goods_id=t2.goods_idunionselect goods_id, goods_status, createtime, modifytime, modifytime,'9999-12-31', cdat from lalian.ods_t_product2 where cdat='20191221';

执行结果如下:

goods_id goods_status createtime modifytime dw_start_date dw_end_date cdat
1 待审核 2019-12-18 2019-12-20 2019-12-20 2019-12-21 20191220
2 待售 2019-12-19 2019-12-20 2019-12-20 9999-12-31 20191220
3 在售 2019-12-20 2019-12-20 2019-12-20 9999-12-31 20191220
4 已删除 2019-12-15 2019-12-20 2019-12-20 9999-12-31 20191220
1 待售 2019-12-18 2019-12-21 2019-12-21 9999-12-31 20191221
5 待审核 2019-12-21 2019-12-21 2019-12-21 9999-12-31 20191221
6 待审核 2019-12-21 2019-12-21 2019-12-21 9999-12-31 20191221

拉链历史表,既能满足反应数据的历史状态,又可以最大程度的节省存储。我们做拉链表的时候要确定拉链表的粒度,比如说拉链表每天只取一个状态,也就是说如果一天有3个状态变更,我们只取最后一个状态,这种天粒度的表其实已经能解决大部分的问题了。

推荐学习:mysql视频教程

The above is the detailed content of Let’s talk about database zipper tables. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete