You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: aggregates.rst
+26-69Lines changed: 26 additions & 69 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,6 @@ Below you'll find a description of all the aggregates that PipelineDB supports.
14
14
.. note:: For the aggregates that have PostgreSQL and PostGIS equivalents, it may be helpful for you to consult the excellent `PostgreSQL aggregates`_ or `PostGIS aggregates`_ documentation.
@@ -44,41 +43,41 @@ See :ref:`bloom-funcs` for functionality that can be used to manipulate Bloom fi
44
43
45
44
.. _cmsketch-aggs:
46
45
47
-
Count-Min Sketch Aggregates
46
+
Frequency Tracking Aggregates
48
47
-----------------------------
49
48
50
-
**cmsketch_agg ( expression )**
49
+
**freq_agg ( expression )**
51
50
52
-
Adds all input values to a :ref:`count-min-sketch`.
51
+
Adds all input values to an internal :ref:`count-min-sketch`, enabling efficient online computation of the frequency of each input expression.
53
52
54
-
**cmsketch_agg ( expression, epsilon, p )**
53
+
**freq_agg ( expression, epsilon, p )**
55
54
56
55
Same as above, but accepts **epsilon** and **p** as parameters for the underlying **cmsketch**. **epsilon** determines the acceptable error rate of the **cmsketch**, and defaults to **0.002** (0.2%). **p** determines the confidence, and defaults to **0.995** (99.5%). Lower **epsilon** and **p** will result in smaller **cmsketch** structures, and vice versa.
57
56
58
-
**cmsketch_merge_agg ( count-min sketch )**
57
+
**freq_merge_agg ( count-min sketch )**
59
58
60
59
Merges all input Count-min sketches into a single one containing all of the information of the input Count-min sketches.
61
60
62
61
See :ref:`cmsketch-funcs` for functionality that can be used to manipulate Count-Min sketches.
63
62
64
-
.. _fss-aggs:
63
+
.. _topk-aggs:
65
64
66
-
Filtered-Space Saving Aggregates
65
+
Top-K Aggregates
67
66
--------------------------------------
68
67
69
-
**fss_agg ( expression , k )**
68
+
**topk_agg ( expression , k )**
70
69
71
-
Adds all input values to a :ref:`fss` data structure sized for the given k, incrementing each value's count by 1 each time it is added.
70
+
Tracks the top k input expressions by adding all input values to a :ref:`topk` data structure sized for the given **k**, incrementing each value's count by **1** each time it is added.
72
71
73
-
**fss_agg_weighted (expression, k, weight )**
72
+
**topk_agg (expression, k, weight )**
74
73
75
-
Adds all input values to an FSS sized for the given k, incrementing each value's count by the given weight each time it is added.
74
+
Same as above, but associates the given weight to the input expression (rather than a default weight of 1).
76
75
77
-
**fss_merge_agg ( fss )**
76
+
**topk_merge_agg ( topk )**
78
77
79
-
Merges all FSS inputs into a single FSS.
78
+
Merges all **topk** inputs into a single **topk** data structure.
80
79
81
-
See :ref:`fss-funcs` for functionality that can be used to manipulate Filtered-Space Saving objects.
80
+
See :ref:`topk-funcs` for functionality that can be used to manipulate **topk** objects.
82
81
83
82
.. _hll-aggs:
84
83
@@ -101,18 +100,18 @@ See :ref:`hll-funcs` for functionality that can be used to manipulate HyperLogLo
101
100
102
101
.. _tdigest-aggs:
103
102
104
-
T-Digest Aggregates
103
+
Distribution Aggregates
105
104
-------------------------------
106
105
107
-
**tdigest_agg ( expression )**
106
+
**dist_agg ( expression )**
108
107
109
-
Adds all input values to a :ref:`t-digest`.
108
+
Adds all input values to a :ref:`t-digest` in order to track the distribution of all input expressions.
110
109
111
-
**tidgest_merge_agg ( tdigest )**
110
+
**dist_merge_agg ( tdigest )**
112
111
113
-
Merges all input T-Digest's into a single one representing all of the information contained in the input T-Digests.
112
+
Merges all input **tdigests** into a single one representing all of the information contained in the input **tdigests**.
114
113
115
-
See :ref:`tdigest-funcs` for functionality that can be used to manipulate T-Digest objects.
114
+
See :ref:`tdigest-funcs` for functionality that can be used to manipulate **tdigest** objects.
116
115
117
116
.. _misc-aggs:
118
117
@@ -215,48 +214,6 @@ Let's look at an example:
215
214
216
215
------------------------------
217
216
218
-
CREATE AGGREGATE
219
-
-------------------
220
-
221
-
In addition to PipelineDB's built-in aggregates, user-defined aggregates also work with continuous views. User-defined combinable aggregates can be created with PostgreSQL's `CREATE AGGREGATE`_ command. To make an aggregate combinable, a **combinefunc** must be given. **combineinfunc** and **transoutfunc** are optional:
A function that takes two transition states and returns a single transition state. For example, here's an example of a combine function for an integer :code:`avg` implementation:
238
-
239
-
.. code-block:: pipeline
240
-
241
-
CREATE FUNCTION avg_combine(state integer[], incoming integer[]) RETURNS integer[] AS $$
The transition state is represented as a 2-element array containing the number of elements and their sum, which can be used to compute a final.
249
-
250
-
**combineinfunc ( any )**
251
-
252
-
A function that deserializes the aggregate's transition state from an external to internal representation. **Deserialization is only necessary when the transition state type is not a native type.**
253
-
254
-
**transoutfunc ( stype )**
255
-
256
-
A function that serializes the aggregate's transition state from an internal to external representation that can be stored in a table cell. **Serialization is only necessary when the transition state type is not a native type.**
257
-
258
-
------------------------------
259
-
260
217
General Aggregates
261
218
----------------------
262
219
@@ -418,21 +375,21 @@ Ordered-set Aggregates
418
375
419
376
**ordered-set** aggregates apply ordering to their input in order to obtain their results, so they use the :code:`WITHIN GROUP` clause. Its syntax is as follows:
420
377
421
-
.. code-block:: pipeline
378
+
.. code-block:: sql
422
379
423
380
aggregate_name ( [ expression [ , ... ] ] ) WITHIN GROUP ( order_by_clause )
424
381
425
382
Let's look at a couple examples.
426
383
427
384
Compute the 99th percentile of **value**:
428
385
429
-
.. code-block:: pipeline
386
+
.. code-block:: sql
430
387
431
388
SELECT percentile_cont(0.99) WITHIN GROUP (ORDER BY value) FROM some_table;
432
389
433
390
Or with a continuous view:
434
391
435
-
.. code-block:: pipeline
392
+
.. code-block:: sql
436
393
437
394
CREATE CONTINUOUS VIEW percentile AS
438
395
SELECT percentile_cont(0.99) WITHIN GROUP (ORDER BY value::float8)
@@ -459,13 +416,13 @@ Hypothetical-set Aggregates
459
416
460
417
The hypothetical-set aggregates use the :code:`WITHIN GROUP` clause to define the input rows. Its syntax is as follows:
461
418
462
-
.. code-block:: pipeline
419
+
.. code-block:: sql
463
420
464
421
aggregate_name ( [ expression [ , ... ] ] ) WITHIN GROUP ( order_by_clause )
465
422
466
423
Here is an example of of a hypothetical-set aggregate being used by a continuous view:
467
424
468
-
.. code-block:: pipeline
425
+
.. code-block:: sql
469
426
470
427
CREATE CONTINUOUS VIEW continuous_rank AS
471
428
SELECT rank(42) WITHIN GROUP (ORDER BY value::float8)
@@ -510,6 +467,6 @@ Unsupported Aggregates
510
467
511
468
:(
512
469
513
-
**aggregate_name (DISTINCT expression)**
470
+
**<aggregate_name> (DISTINCT expression)**
514
471
515
472
Only the :code:`count` aggregate function is supported with a :code:`DISTINCT` expression as noted above in the General Aggregates section. In future releases, we might leverage :ref:`bloom-filter` to allow :code:`DISTINCT` expressions for all aggregate functions.
Since PipelineDB objects are represented by standard PostgreSQL objects, backups can be taken using PostgreSQL's `pg_dump`_ and `pg_dumpall`_ tools. Other PostgreSQL backup and restore tooling will work as well, since a PipelineDB database is just a regular PostgreSQL database.
78
7
79
8
Exporting Specific Continuous Views
80
9
-----------------------------------------
@@ -88,11 +17,12 @@ To export a single continuous view, both the continuous view and its associated
88
17
Restoring Continuous Views
89
18
-------------------------------
90
19
91
-
To restore a backup taken with :code:`pipeline-dump`, simply pass its output to the :code:`pipeline` client:
20
+
To restore a backup taken with `pg_dump`_, simply pass its output to the :code:`psql` client:
0 commit comments