Skip to content

Commit 8fcaa58

Browse files
authored
1.0.0 docs
1 parent 818107c commit 8fcaa58

20 files changed

+415
-721
lines changed

aggregates.rst

Lines changed: 26 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ Below you'll find a description of all the aggregates that PipelineDB supports.
1414
.. note:: For the aggregates that have PostgreSQL and PostGIS equivalents, it may be helpful for you to consult the excellent `PostgreSQL aggregates`_ or `PostGIS aggregates`_ documentation.
1515

1616
.. _`PostgreSQL aggregates`: http://www.postgresql.org/docs/current/static/functions-aggregate.html
17-
.. _`PostGIS aggregates`: http://postgis.net/docs/manual-1.4/ch08.html#PostGIS_Aggregate_Functions
1817

1918
----------------------------
2019

@@ -44,41 +43,41 @@ See :ref:`bloom-funcs` for functionality that can be used to manipulate Bloom fi
4443

4544
.. _cmsketch-aggs:
4645

47-
Count-Min Sketch Aggregates
46+
Frequency Tracking Aggregates
4847
-----------------------------
4948

50-
**cmsketch_agg ( expression )**
49+
**freq_agg ( expression )**
5150

52-
Adds all input values to a :ref:`count-min-sketch`.
51+
Adds all input values to an internal :ref:`count-min-sketch`, enabling efficient online computation of the frequency of each input expression.
5352

54-
**cmsketch_agg ( expression, epsilon, p )**
53+
**freq_agg ( expression, epsilon, p )**
5554

5655
Same as above, but accepts **epsilon** and **p** as parameters for the underlying **cmsketch**. **epsilon** determines the acceptable error rate of the **cmsketch**, and defaults to **0.002** (0.2%). **p** determines the confidence, and defaults to **0.995** (99.5%). Lower **epsilon** and **p** will result in smaller **cmsketch** structures, and vice versa.
5756

58-
**cmsketch_merge_agg ( count-min sketch )**
57+
**freq_merge_agg ( count-min sketch )**
5958

6059
Merges all input Count-min sketches into a single one containing all of the information of the input Count-min sketches.
6160

6261
See :ref:`cmsketch-funcs` for functionality that can be used to manipulate Count-Min sketches.
6362

64-
.. _fss-aggs:
63+
.. _topk-aggs:
6564

66-
Filtered-Space Saving Aggregates
65+
Top-K Aggregates
6766
--------------------------------------
6867

69-
**fss_agg ( expression , k )**
68+
**topk_agg ( expression , k )**
7069

71-
Adds all input values to a :ref:`fss` data structure sized for the given k, incrementing each value's count by 1 each time it is added.
70+
Tracks the top k input expressions by adding all input values to a :ref:`topk` data structure sized for the given **k**, incrementing each value's count by **1** each time it is added.
7271

73-
**fss_agg_weighted (expression, k, weight )**
72+
**topk_agg (expression, k, weight )**
7473

75-
Adds all input values to an FSS sized for the given k, incrementing each value's count by the given weight each time it is added.
74+
Same as above, but associates the given weight to the input expression (rather than a default weight of 1).
7675

77-
**fss_merge_agg ( fss )**
76+
**topk_merge_agg ( topk )**
7877

79-
Merges all FSS inputs into a single FSS.
78+
Merges all **topk** inputs into a single **topk** data structure.
8079

81-
See :ref:`fss-funcs` for functionality that can be used to manipulate Filtered-Space Saving objects.
80+
See :ref:`topk-funcs` for functionality that can be used to manipulate **topk** objects.
8281

8382
.. _hll-aggs:
8483

@@ -101,18 +100,18 @@ See :ref:`hll-funcs` for functionality that can be used to manipulate HyperLogLo
101100

102101
.. _tdigest-aggs:
103102

104-
T-Digest Aggregates
103+
Distribution Aggregates
105104
-------------------------------
106105

107-
**tdigest_agg ( expression )**
106+
**dist_agg ( expression )**
108107

109-
Adds all input values to a :ref:`t-digest`.
108+
Adds all input values to a :ref:`t-digest` in order to track the distribution of all input expressions.
110109

111-
**tidgest_merge_agg ( tdigest )**
110+
**dist_merge_agg ( tdigest )**
112111

113-
Merges all input T-Digest's into a single one representing all of the information contained in the input T-Digests.
112+
Merges all input **tdigests** into a single one representing all of the information contained in the input **tdigests**.
114113

115-
See :ref:`tdigest-funcs` for functionality that can be used to manipulate T-Digest objects.
114+
See :ref:`tdigest-funcs` for functionality that can be used to manipulate **tdigest** objects.
116115

117116
.. _misc-aggs:
118117

@@ -215,48 +214,6 @@ Let's look at an example:
215214
216215
------------------------------
217216

218-
CREATE AGGREGATE
219-
-------------------
220-
221-
In addition to PipelineDB's built-in aggregates, user-defined aggregates also work with continuous views. User-defined combinable aggregates can be created with PostgreSQL's `CREATE AGGREGATE`_ command. To make an aggregate combinable, a **combinefunc** must be given. **combineinfunc** and **transoutfunc** are optional:
222-
223-
.. code-block:: pipeline
224-
225-
CREATE AGGREGATE name ( [ argmode ] [ argname ] arg_data_type [ , ... ] ) (
226-
...
227-
COMBINEFUNC = combinefunc,
228-
[ , COMBINEINFUNC = combineinfunc ]
229-
[ , TRANSOUTFUNC = transoutfunc ]
230-
)
231-
232-
.. _CREATE AGGREGATE: http://www.postgresql.org/docs/current/static/sql-createaggregate.html
233-
234-
235-
**combinefunc ( stype, stype )**
236-
237-
A function that takes two transition states and returns a single transition state. For example, here's an example of a combine function for an integer :code:`avg` implementation:
238-
239-
.. code-block:: pipeline
240-
241-
CREATE FUNCTION avg_combine(state integer[], incoming integer[]) RETURNS integer[] AS $$
242-
BEGIN
243-
RETURN ARRAY[state[1] + incoming[1], state[2] + incoming[2]];
244-
END;
245-
$$
246-
LANGUAGE plpgsql
247-
248-
The transition state is represented as a 2-element array containing the number of elements and their sum, which can be used to compute a final.
249-
250-
**combineinfunc ( any )**
251-
252-
A function that deserializes the aggregate's transition state from an external to internal representation. **Deserialization is only necessary when the transition state type is not a native type.**
253-
254-
**transoutfunc ( stype )**
255-
256-
A function that serializes the aggregate's transition state from an internal to external representation that can be stored in a table cell. **Serialization is only necessary when the transition state type is not a native type.**
257-
258-
------------------------------
259-
260217
General Aggregates
261218
----------------------
262219

@@ -418,21 +375,21 @@ Ordered-set Aggregates
418375

419376
**ordered-set** aggregates apply ordering to their input in order to obtain their results, so they use the :code:`WITHIN GROUP` clause. Its syntax is as follows:
420377

421-
.. code-block:: pipeline
378+
.. code-block:: sql
422379
423380
aggregate_name ( [ expression [ , ... ] ] ) WITHIN GROUP ( order_by_clause )
424381
425382
Let's look at a couple examples.
426383

427384
Compute the 99th percentile of **value**:
428385

429-
.. code-block:: pipeline
386+
.. code-block:: sql
430387
431388
SELECT percentile_cont(0.99) WITHIN GROUP (ORDER BY value) FROM some_table;
432389
433390
Or with a continuous view:
434391

435-
.. code-block:: pipeline
392+
.. code-block:: sql
436393
437394
CREATE CONTINUOUS VIEW percentile AS
438395
SELECT percentile_cont(0.99) WITHIN GROUP (ORDER BY value::float8)
@@ -459,13 +416,13 @@ Hypothetical-set Aggregates
459416

460417
The hypothetical-set aggregates use the :code:`WITHIN GROUP` clause to define the input rows. Its syntax is as follows:
461418

462-
.. code-block:: pipeline
419+
.. code-block:: sql
463420
464421
aggregate_name ( [ expression [ , ... ] ] ) WITHIN GROUP ( order_by_clause )
465422
466423
Here is an example of of a hypothetical-set aggregate being used by a continuous view:
467424

468-
.. code-block:: pipeline
425+
.. code-block:: sql
469426
470427
CREATE CONTINUOUS VIEW continuous_rank AS
471428
SELECT rank(42) WITHIN GROUP (ORDER BY value::float8)
@@ -510,6 +467,6 @@ Unsupported Aggregates
510467

511468
:(
512469

513-
**aggregate_name (DISTINCT expression)**
470+
**<aggregate_name> (DISTINCT expression)**
514471

515472
Only the :code:`count` aggregate function is supported with a :code:`DISTINCT` expression as noted above in the General Aggregates section. In future releases, we might leverage :ref:`bloom-filter` to allow :code:`DISTINCT` expressions for all aggregate functions.

architecture.rst

Lines changed: 0 additions & 29 deletions
This file was deleted.

backups.rst

Lines changed: 6 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -3,78 +3,7 @@
33
Backups
44
==============
55

6-
PipelineDB backups can be taken using the :code:`pipeline-dump` or :code:`pipeline-dumpall` tools. They each work identically to
7-
PostgreSQL's analogous pg_dump_ and pg_dumpall_ tools, with an added capability to export continuous views.
8-
9-
.. _pg_dump: http://www.postgresql.org/docs/current/static/app-pgdump.html
10-
.. _pg_dumpall: http://www.postgresql.org/docs/current/static/app-pg-dumpall.html
11-
12-
Its usage is as follows:
13-
14-
.. code-block:: pipeline
15-
16-
Usage:
17-
pipeline-dump [OPTION]... [DBNAME]
18-
19-
General options:
20-
-f, --file=FILENAME output file or directory name
21-
-F, --format=c|d|t|p output file format (custom, directory, tar,
22-
plain text (default))
23-
-j, --jobs=NUM use this many parallel jobs to dump
24-
-v, --verbose verbose mode
25-
-V, --version output version information, then exit
26-
-Z, --compress=0-9 compression level for compressed formats
27-
--lock-wait-timeout=TIMEOUT fail after waiting TIMEOUT for a table lock
28-
-?, --help show this help, then exit
29-
30-
Options controlling the output content:
31-
-a, --data-only dump only the data, not the schema
32-
-b, --blobs include large objects in dump
33-
-c, --clean clean (drop) database objects before recreating
34-
-C, --create include commands to create database in dump
35-
-E, --encoding=ENCODING dump the data in encoding ENCODING
36-
-n, --schema=SCHEMA dump the named schema(s) only
37-
-N, --exclude-schema=SCHEMA do NOT dump the named schema(s)
38-
-o, --oids include OIDs in dump
39-
-O, --no-owner skip restoration of object ownership in
40-
plain-text format
41-
-s, --schema-only dump only the schema, no data
42-
-S, --superuser=NAME superuser user name to use in plain-text format
43-
-t, --table=TABLE dump the named table(s) only
44-
-T, --exclude-table=TABLE do NOT dump the named table(s)
45-
-x, --no-privileges do not dump privileges (grant/revoke)
46-
--binary-upgrade for use by upgrade utilities only
47-
--column-inserts dump data as INSERT commands with column names
48-
--disable-dollar-quoting disable dollar quoting, use SQL standard quoting
49-
--disable-triggers disable triggers during data-only restore
50-
--exclude-table-data=TABLE do NOT dump data for the named table(s)
51-
--if-exists use IF EXISTS when dropping objects
52-
--inserts dump data as INSERT commands, rather than COPY
53-
--no-security-labels do not dump security label assignments
54-
--no-synchronized-snapshots do not use synchronized snapshots in parallel jobs
55-
--no-tablespaces do not dump tablespace assignments
56-
--no-unlogged-table-data do not dump unlogged table data
57-
--quote-all-identifiers quote all identifiers, even if not key words
58-
--section=SECTION dump named section (pre-data, data, or post-data)
59-
--serializable-deferrable wait until the dump can run without anomalies
60-
--use-set-session-authorization
61-
use SET SESSION AUTHORIZATION commands instead of
62-
ALTER OWNER commands to set ownership
63-
64-
Connection options:
65-
-d, --dbname=DBNAME database to dump
66-
-h, --host=HOSTNAME database server host or socket directory
67-
-p, --port=PORT database server port number
68-
-U, --username=NAME connect as specified database user
69-
-w, --no-password never prompt for password
70-
-W, --password force password prompt (should happen automatically)
71-
--role=ROLENAME do SET ROLE before dump
72-
73-
If no database name is supplied, then the PGDATABASE environment
74-
variable value is used.
75-
76-
Report bugs to <[email protected]>.
77-
6+
Since PipelineDB objects are represented by standard PostgreSQL objects, backups can be taken using PostgreSQL's `pg_dump`_ and `pg_dumpall`_ tools. Other PostgreSQL backup and restore tooling will work as well, since a PipelineDB database is just a regular PostgreSQL database.
787

798
Exporting Specific Continuous Views
809
-----------------------------------------
@@ -88,11 +17,12 @@ To export a single continuous view, both the continuous view and its associated
8817
Restoring Continuous Views
8918
-------------------------------
9019

91-
To restore a backup taken with :code:`pipeline-dump`, simply pass its output to the :code:`pipeline` client:
20+
To restore a backup taken with `pg_dump`_, simply pass its output to the :code:`psql` client:
9221

9322
.. code-block:: bash
9423
95-
pipeline-dump > backup.sql
96-
pipeline -f backup.sql
24+
pg_dump > backup.sql
25+
psql -f backup.sql
9726
98-
27+
.. _pg_dump: http://www.postgresql.org/docs/current/static/app-pgdump.html
28+
.. _pg_dumpall: http://www.postgresql.org/docs/current/static/app-pg-dumpall.html

0 commit comments

Comments
 (0)