我现在有个需求
需要记录页面点击数据,上游吐到redis中,
上游怎么吐到redis中对我们来说是透明的,
我们只用关心redis中如何存储就好。
查询某天某页面下所有点击数,即有效点击总数
+无效点击总数
查询某天某页面某分辨率下 所有有效点击总数
和无效点击总数
查询某天某页面某分辨率下所有的坐标点及点击数
框选查询(相当于范围查询) 查询某天某页面某分辨率下 某个范围(比如100<x<1000,30<y<600
)坐标点的有效点击总数
和无效点击总数
。
同时还有各种维度的有效点击数和无效点击数
关于有效点击和无效点击:我们进行存储时可以用0和1区分,至于前端如何定义有效或者无效,对我们透明。
关于分辨率:按宽度区分共有三种:比如1380 1190 1000; 根据现有实现:有了分辨率可以将zset切割的小一些,比如没有分辨率可能有共10w个key 的zset,有了分辨率我一次最多查询某个分辨率下 可能只有3w个key 的zset
。
关于框选: 就是用鼠标在页面上从左上到右下划出一个框, 我们会查询这个选择框范围(如100<x<1000,30<y<600
)内所有的点相关的数据。
关于维度: 就是点击这个点的用户 所在地区
, 所使用浏览器
上游吐过来的点经过处理存入redis,
x,y都经过
Math.ceil(realx / 4.0) * 4;
Math.ceil(realy / 4.0) * 4;
处理,即相当于4个点为一个点
存储到redis.
zset
来实现需求。一个 zset 记录某天某页面某分辨率的数据
key
为 date_pageid_分辨率 member为: 有效OR无效_ 浏览器_ 地区score
为点击数
举例:key
: 20140908_0001_1000member
: 0_1_10对应无效点击,1对应浏览器表中的QQ浏览器,1对应地区表中的上海
score
:10
每个坐标点相关数据都用一个对应的
zset
记录key
为 date_pageid_分辨率_ 横坐标_ 纵坐标member
为: 有效OR无效浏览器地区score
为点击数
举例:key
: 20140908_0001_1000_23_478member
: 0_1_20对应无效点击,1对应浏览器表中的QQ浏览器,2对应地区表中的北京
score
:12
这样可以理解为,坐标为(23,478)
这个点,在20140908
这一天,pageid
为0001的页面上,分辨率
为1000的时候,来自北京地区的,使用QQ浏览器,进行的无效点击数
为12
两个zset 做辅助范围查询
通过zrangebyscore 分别获得x,y范围(如
100<x<1000,30<y<600
)对应的key集然后取交集获得需要查询的真正key集
y的辅助查询zetkey
为: date_pageid_分辨率yeg.20140908_0001_1000_y
member
: 为 date_pageid分辨率_ 横坐标 _纵坐标eg.20140908_0001_1000_23_478
score
为:横坐标y的值eg.478
x的辅助查询zet
key
为: date_pageid_分辨率xeg.20140908_0001_1000_x
member
: 为 date_pageid分辨率_ 横坐标 _纵坐标eg.20140908_0001_1000_23_478
score
为:横坐标X的值eg.23
查询速度太慢
举例 :比如我想一次取出某天某页面某分辨率下所有的点,
可能需要一次查询几万个keyeg. keys("20140908_0001_1000_*");
获得查询的key集之后 ,还需要使用zrange(key)
得到每个key下的member集,然后再使用zscore(key,member)
获得对应的key和 member下的score值
可以看到这个操作:
串行化执行,不容易改成并行化。
暂时的解决方案:可以利用异步任务执行 ,进行缓存以优化查询速度,
但是有可能引起redis慢查询问题。
框选行为
举例:查询范围(如100<x<1000,30<y<600
)
使用
zrangeByScore(key, 100, 1000)``zrangeByScore(key, 30, 600)
查出x,y在各自范围分别对应的key集,然后
取交集
获得最终需要查询的key集
获得查询的key集之后 ,还需要使用
zrange(key)
得到每个key下的member集,
然后再使用
zscore(key,member)
获得对应的key和 member下的score值
缺点:因为查询范围不定,所以无法进行缓存,当查询范围很大时,即key很多的时候,查询速度很慢。和上面查询坐标点一样
串行化执行,不容易改成并行化。有可能引起redis慢查询问题。
不知道大家针对我
现在的实现方案有什么更好的优化策略
或者针对查询需求有没有什么更好的设计方案
,
新人第一次发帖,感谢@暗雨西喧
对排版的提醒。
请大家多指教。
Many of the key queries are slow. Does this mean that the zset actually clicked on the last query is used?
Not sure how many resolutions there will be? You can modify the key of zset not to have resolution, but to have resolution in value. This can reduce a lot of keys. If your search conditions have resolution, you can do some filtering after searching for value, and the speed should be very fast.
It’s like asking users to manually draw an area for search. Can you consider changing this condition to include the entire image? Cut into 10 parts (100 parts, 10,000 parts). Each part is a square. The condition can only select a certain square, rather than just drawing it randomly. In this way, the data in each square can be "summarized" predictably. .
Let’s talk about these first, see if it helps, if you still need to optimize, you can modify the query description in the question. There are some places that can be supplemented by your brain, but I don’t know if you want to express this, so I will give you a simpler one. Write the examples in detail and use typesetting, it looks very tiring
I wrote them separately. Here is what you have done after correcting the question
First of all, you are not using the essence of zset, which is automatically sorting the index according to scop. It seems that you must not understand the resolution I mentioned above when you put it in value. Let me give you an example
Suppose there are 3 resolutions: A, B, C
Saving the key as you said will look like this
20140908_0001_A
20140908_0001_B
20140908_0001_C
The storage method I am talking about is
key:20140908_0001
member:valid OR invalid_browser_region_number of clicks
score:resolution
When searching like this, you actually only need to get the 0001 page of the day 20140908 (just 1 key), and then range A resolution and look at its members. This is not easy to use because it does not display the nice resolution. It's not interesting here. There are problems with using zset in this case.
The above is just an example! Actually, don't do this. There is a better way. After you revised the question and understood the requirements, I came up with a new approach.
zset:data set
key:date-page-resolution
score: coordinates (think about turning x and y into a number)
member: browser-region-number of valid clicks-number of invalid clicks
If the date becomes an optional range, this set is needed to specifically store the date. We call it: date set
key:page
score:date
member:data set key
The purpose of the date set is to index the data set key. Your method of using key() is very slow because it will perform an all search. Your example is a certain day. I understand that there may be no date range, so the date set can be unnecessary. Similarly, if there are too many resolutions and it is impossible to master, you can also imitate this set to make a collection of keys!
Then there are two coordinates zset. I didn’t look at them carefully. Let’s think carefully about using zset.
You gave 4 query examples below
A: You said there are 3 resolutions, then add 3 resolutions after the key, range 0 and -1 are all taken
20150415-page1-1380,20150415-page1-1190,20150415-page1-1000
B: This is good. Just check one key and get all range 0 and -1
20150415-page1-1380
C: Okay, you can also get the coordinates for the first two, but you don’t have a show
D: After using your coordinate set to get the key, check the data set range coordinates
I finished writing, but I found a small problem when checking for typos. It seems that you need to record the valid and invalid browsers in each region? If it is not necessary, the member in the data set can just record valid and invalid numbers. If it is necessary, the design needs to be considered based on the number of browsers in the region. Your question does not seem to introduce this aspect.
Maybe my understanding of redis is different from the questioner’s. According to my idea, to achieve the above requirements may be
Remember log, etl transfer data
Finally available for inquiry