digoal
2020-12-24
PostgreSQL , multi range , array range , agg , 物联网 , index , gin , gist , sp-gist , 多树 , 全段gist gap
PG一直在进化, 从原子到夸克, 从range到multirange. PostgreSQL 14 新增数据类型multi range, 在一个value中表达多个range 分段, 有点类似range array, 但又不是array.
应用场景想象: 表达火焰图, 表达物联网传感器在一个周期内的波动范围(分段), 表达学习成绩的波动, ... 具体的场景有待大家想象.
目前仅支持类型, gist索引还在路上. 因为一个value中表达了多个分段, 所以索引可能会分为两个阶段来实现:
第一阶段: 将multi range的值扩展到一个bound, 类似几何或多维类型, 使用bound来作为被索引value的放大后的边界. multi range也可以这么做, 在搜索时, 先按大的bound range来搜索, 然后再回表recheck.
例子:
multirange : {[1,10] [20,25]} 的索引中可能存储的是[1,25], 回表再进行recheck.
第二阶段: 类似gin索引, 一个multi range的value在index中存在多个index item, 每个index item都指向对应的heap ctid.
例子:
multirange : row 1 {[1,10] [20,25]} 的索引中分别包含[1,10]和[20,25]两个index item, 分别都指向row 1.
https://git.postgresql.org/pg/commitdiff/6df7a9698bb036610c1e8c6d375e1be38cb26d5f
Multirange datatypes
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.
Since v14, each range type automatically gets a corresponding multirange
datatype. There are both manual and automatic mechanisms for naming multirange
types. Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE. Otherwise, a multirange type name is generated
automatically. If the range type name contains "range" then we change that to
"multirange". Otherwise, we add "_multirange" to the end.
Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids. Altogether this format allows fetching a particular range by its index
in O(n).
Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.
Catversion is bumped.
Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov
以下是depesz的测试
https://www.depesz.com/2020/12/21/waiting-for-postgresql-14-multirange-datatypes/
Unfortunately, at the time of writing this blogpost, I couldn't make gin/gist indexes on multiranges, but I would assume these will happen eventually. Or maybe I was just doing something wrong.
I mean – I can, sure, create btree index, and use it for queries like:
SELECT * FROM test
WHERE ranges = '{[77.7909859996235,177.7909859996235],(1035.84122266822,1135.84122266822],(1000099.99954803,1000199.99954803]}';
but the cooler operators (like @>
) are not indexable, yet.
下面是Alexander Korotkov针对索引计划的回复, 在v14的feature冻结之前, 可能还有机会增加更多的op或者第一阶段的索引支持.
Hubert, thank you for your post.
Committed work implements basic infrastructure for multiranges. Index support will be added later. We still have time before v14 feature freeze. So, I think it’s feasible to commit something in this fields.
The first step is to make range indexes support matching with multiranges. That is support queries like “range_col opr multirange_const”.
The second step is to index multiranges themselves. In the first implementation, we can approximate multiranges as union ranges with no gaps. That would allow to re-use existing indexing schemes for ranges. Ideally, we need to teach gist/spgist to have multiple index tuples per heap tuple (and become gin-like). But that requires much more work and definitely not a subject for v14.
您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.