-
Notifications
You must be signed in to change notification settings - Fork 25
/
FastDB.htm
4253 lines (3659 loc) · 208 KB
/
FastDB.htm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<HTML>
<HEAD>
<TITLE>FastDB Main Memory Database Management System</TITLE>
<UL>
<LI> <A HREF = "#introduction">Introduction</A>
<LI> <A HREF = "#sql">Query language</A>
<UL>
<LI> <A HREF = "#structure">Structures</A>
<LI> <A HREF = "#array">Arrays</A>
<LI> <A HREF = "#string">Strings</A>
<LI> <A HREF = "#reference">References</A>
<LI> <A HREF = "#rectangle">Rectangle</A>
<LI> <A HREF = "#function">Functions</A>
</UL>
<LI> <A HREF = "#cpp">C++ interface</A>
<UL>
<LI> <A HREF = "#table">Table</A>
<LI> <A HREF = "#query">Query</A>
<LI> <A HREF = "#cursor">Cursor</A>
<LI> <A HREF = "#database">Database</A>
</UL>
<LI> <A HREF = "#cli">CLI - call level interface</A>
<UL>
<LI> <A HREF = "#cli_errors">CLI error codes</A>
<LI> <A HREF = "#cli_types">CLI supported types</A>
<LI> <A HREF = "#cli_open">cli_open</A>
<LI> <A HREF = "#cli_close">cli_close</A>
<LI> <A HREF = "#cli_statement">cli_statement</A>
<LI> <A HREF = "#cli_parameter">cli_parameter</A>
<LI> <A HREF = "#cli_column">cli_colunm</A>
<LI> <A HREF = "#cli_array_column">cli_array_column</A>
<LI> <A HREF = "#cli_fetch">cli_fetch</A>
<LI> <A HREF = "#cli_insert">cli_insert</A>
<LI> <A HREF = "#cli_get_first">cli_get_first</A>
<LI> <A HREF = "#cli_get_last">cli_get_last</A>
<LI> <A HREF = "#cli_get_next">cli_get_next</A>
<LI> <A HREF = "#cli_get_prev">cli_get_prev</A>
<LI> <A HREF = "#cli_get_oid">cli_get_oid</A>
<LI> <A HREF = "#cli_update">cli_update</A>
<LI> <A HREF = "#cli_remove">cli_remove</A>
<LI> <A HREF = "#cli_close_cursor">cli_close_cursor</A>
<LI> <A HREF = "#cli_free">cli_free</A>
<LI> <A HREF = "#cli_commit">cli_commit</A>
<LI> <A HREF = "#cli_abort">cli_abort</A>
<LI> <A HREF = "#cli_show_tables">cli_show_tables</A>
<LI> <A HREF = "#cli_describe">cli_describe</A>
<LI> <A HREF = "#cli_describe_layout">cli_describe_layout</A>
<LI> <A HREF = "#cli_create_table">cli_create_table</A>
<LI> <A HREF = "#cli_alter_table">cli_alter_table</A>
<LI> <A HREF = "#cli_drop_table">cli_drop_table</A>
<LI> <A HREF = "#cli_alter_index">cli_alter_index</A>
<LI> <A HREF = "#cli_freeze">cli_freeze</A>
<LI> <A HREF = "#cli_unfreeze">cli_unfreeze</A>
<LI> <A HREF = "#cli_seek">cli_seek</A>
<LI> <A HREF = "#cli_skip">cli_skip</A>
</UL>
<LI> <A HREF = "#localcli">Local implementation of CLI</A>
<UL>
<LI> <A HREF = "#cli_create">cli_create</A>
<LI> <A HREF = "#cli_create_replication_node">cli_create_replication_node</A>
<LI> <A HREF = "#cli_attach">cli_attach</A>
<LI> <A HREF = "#cli_detach">cli_detach</A>
<LI> <A HREF = "#cli_create_transaction_context">cli_create_transaction_context</A>
<LI> <A HREF = "#cli_join_transaction">cli_join_transaction</A>
<LI> <A HREF = "#cli_remove_transaction_context">cli_remove_transaction_context</A>
<LI> <A HREF = "#cli_get_database_state">cli_get_database_state</A>
<LI> <A HREF = "#cli_prepare_query">cli_prepare_query</A>
<LI> <A HREF = "#cli_execute_query">cli_execute_query</A>
<LI> <A HREF = "#cli_execute_query_ex">cli_execute_query_ex</A>
<LI> <A HREF = "#cli_insert_struct">cli_insert_struct</A>
<LI> <A HREF = "#cli_xml_export">cli_xml_export</A>
<LI> <A HREF = "#cli_xml_import">cli_xml_import</A>
</UL>
<LI> <A HREF = "jnicli/docs/index.html">Native interface to Java language</A>
<LI> <A HREF = "#advanced">Delayed transactions and online backup scheduler</A>
<LI> <A HREF = "#replication">Fault tolerant support</A>
<LI> <A HREF = "#optimization">Query optimization</A>
<UL>
<LI> <A HREF = "#indices">Using indices in queries</A>
<LI> <A HREF = "#inverse">Inverse references</A>
<LI> <A HREF = "#realtime">Realtime issues</A>
<LI> <A HREF = "#par">Parallel query execution</A>
</UL>
<LI> <A HREF = "#gist">Generalized search tree</A>
<LI> <A HREF = "#implementation">FastDB implementation issues</A>
<UL>
<LI> <A HREF = "#memory">Memory allocation</A>
<LI> <A HREF = "#transaction">Transactions</A>
<LI> <A HREF = "#recovery">Recovery</A>
<LI> <A HREF = "#hashtable">Hash table</A>
<LI> <A HREF = "#ttree">T-tree</A>
</UL>
<LI> <A HREF = "#subsql">Interactive SQL</A>
<LI> <A HREF = "#www">API for development Web applications</A>
<LI> <A HREF = "#examples">Examples of FastDB applications</A>
<UL>
<LI> <A HREF = "#guess">Example: game "Guess an animal"</A>
<LI> <A HREF = "#testdb">Example: various types of queries</A>
<LI> <A HREF = "#testperf">Performance test</A>
<LI> <A HREF = "#bugdb">Bug tracking database</A>
<LI> <A HREF = "#clidb">Clients-Managers database</A>
</UL>
<LI> <A HREF = "#quick">Quick start</A>
<LI> <A HREF = "#dbsize">Reducing initial size of the database file</A>
<LI> <A HREF = "#diskless">Diskless configuration</A>
<LI> <A HREF = "#sharing">Sharing of classes between different databases</A>
<LI> <A HREF = "docs/html/index.html">Documentation generated by Doxygen</A>
<LI> <A HREF = "#distribution">Distribution terms</A>
</UL>
<BODY>
<HR>
<H2><A NAME = "introduction">Introduction</A></H2>
FastDB is a highly efficient main memory database system
with realtime capabilities and convenient C++ interface.
FastDB doesn't support a client-server architecture and all applications
using a FastDB database should run at the same host. FastDB is optimized
for applications with dominated read access pattern. High speed of query
execution is provided by the elimination of data transfer overhead and
a very effective locking implementation. The Database file is mapped to the virtual
memory space of each application working with the database. So the query is executed in
the context of the application, requiring no context switching and data transfer.
Synchronization of concurrent database access is implemented in FastDB
by means of atomic instructions, adding almost
no overhead to query processing. FastDB assumes that the whole database is
present in RAM and optimizes the search algorithms and structures according to this
assumption. Moreover, FastDB has no overhead caused by database buffer management
and needs no data transfer between a database file and buffer pool.
That is why FastDB
will work significantly faster than a traditional database with all data cached
in buffers pool.<P>
FastDB supports transactions, online backup and automatic recovery
after system crash. The transaction commit protocol is based on
a shadow root pages algorithm, performing atomic update of the database.
Recovery can be done very fast, providing
high availability for critical applications. Moreover, the elimination
of transaction logs improves the total system performance and leads to a more
effective usage of system resources.<P>
FastDB is an application-oriented database. Database tables are constructed using
information about application classes. FastDB supports automatic scheme
evaluation, allowing you to do changes only in one place - in your
application classes. FastDB provides a flexible and convenient interface
for retrieving data from the database. A SQL-like query language is used
to specify queries. Such post-relational capabilities as non-atomic
fields, nested arrays, user-defined types and methods, direct interobject
references simplifies the design of database applications and makes them more
efficient.<P>
Although FastDB is optimized in the assumption that database as a whole fits
into the physical memory of the computer, it is also possible to use it with databases,
the size of which exceeds the size of the physical memory in the system. In the last
case, standard operating system swapping mechanisms will work. But
all FastDB search algorithms and structures are optimized under the assumption of
residence of all data in memory, so the efficiency for swapped
out data will not be very high.<P>
<H2><A NAME = "sql">Query language</A></H2>
FastDB supports a query language with SQL-like syntax. FastDB uses a notation more
popular for object-oriented programming then for a relational database.
Table rows are considered as object instances, the table is the class of these
objects. Unlike SQL, FastDB is oriented on work with objects instead of SQL
tuples. So the result of each query execution is a set of objects of one
class. The main differences of the FastDB query language from standard SQL are:<P>
<OL>
<LI> There are no joins of several tables and nested subqueries. The query always
returns a set of objects from one table.
<LI> Standard C types are used for atomic table columns.
<LI> There are no NULL values, except null references.
I completely agree with C.J. Date's criticism of three-value logic and his
proposal to use default values instead.
<LI> Structures and arrays can be used as record components. A special
<B>exists</B> quantor is provided for locating elements in arrays.
<LI> Parameterless user methods can be defined for table records (objects) as well as
for record components.
<LI> User functions with (only) one single string or numeric argument can be defined by
the application.
<LI> References between objects are supported including automatic support
for inverse references.
<LI>Construction of <code>start from follow by</code> performs a recursive records
traversal using references.
<LI> Because the query language is deeply integrated into C++ classes, a case
sensitive mode is used for language identifiers as well as for keywords.
<LI> No implicit conversion of integer and floating types to string
representation is done. If such conversion is needed, it must be done explicitly.
</OL><P>
The following rules in BNF-like notation specify the grammar of the
FastDB query language search predicates:<P>
<TABLE BORDER ALIGN="center">
<CAPTION>Grammar conventions</CAPTION>
<TR><TH>Example</TH><TH>Meaning</TH></TR>
<TR><TD><I>expression</I></TD><TD>non-terminals</TD></TR>
<TR><TD><B>not</B></TD><TD>terminals</TD></TR>
<TR><TD ALIGN="center">|</TD><TD>disjoint alternatives</TD></TR>
<TR><TD>[<B>not</B>]</TD><TD>optional part</TD></TR>
<TR><TD>{<B>1</B>..<B>9</B>}</TD><TD>repeat zero or more times</TD></TR>
</TABLE><P>
<PRE>
<I>select-condition</I> ::= [ <I>expression</I> ] [ <I>traverse</I> ] [ <I>order</I> ] [ <I>limit</I> ]
<I>expression</I> ::= <I>disjunction</I>
<I>disjunction</I> ::= <I>conjunction</I>
| <I>conjunction</I> <B>or</B> <I>disjunction</I>
<I>conjunction</I> ::= <I>comparison</I>
| <I>comparison</I> <B>and</B> <I>conjunction</I>
<I>comparison</I> ::= <I>operand</I> <B>=</B> <I>operand</I>
| <I>operand</I> <B>!=</B> <I>operand</I>
| <I>operand</I> <B><></B> <I>operand</I>
| <I>operand</I> <B><</B> <I>operand</I>
| <I>operand</I> <B><=</B> <I>operand</I>
| <I>operand</I> <B>></B> <I>operand</I>
| <I>operand</I> <B>>=</B> <I>operand</I>
| <I>operand</I> [<B>not</B>] <B>like</B> <I>operand</I>
| <I>operand</I> [<B>not</B>] <B>like</B> <I>operand</I> <B>escape</B> <I>string</I>
| <I>operand</I> [<B>not</B>] <B>match</B> <I>operand</I>
| <I>operand</I> [<B>not</B>] <B>in</B> <I>operand</I>
| <I>operand</I> [<B>not</B>] <B>in</B> <I>expressions-list</I>
| <I>operand</I> [<B>not</B>] <B>between</B> <I>operand</I> <B>and</B> <I>operand</I>
| <I>operand</I> <B>is</B> [<B>not</B>] <B>null</B>
<I>operand</I> ::= <I>addition</I>
<I>additions</I> ::= <I>multiplication</I>
| <I>addition</I> <B>+</B> <I>multiplication</I>
| <I>addition</I> <B>||</B> <I>multiplication</I>
| <I>addition</I> <B>-</B> <I>multiplication</I>
<I>multiplication</I> ::= <I>power</I>
| <I>multiplication</I> <B>*</B> <I>power</I>
| <I>multiplication</I> <B>/</B> <I>power</I>
<I>power</I> ::= <I>term</I>
| <I>term</I> <B>^</B> <I>power</I>
<I>term</I> ::= <I>identifier</I> | <I>number</I> | <I>string</I>
| <B>true</B> | <B>false</B> | <B>null</B>
| <B>current</B> | <B>first</B> | <B>last</B>
| <B>(</B> expression <B>)</B>
| <B>not</B> <I>comparison</I>
| <B>-</B> term
| <I>term</I> <B>[</B> expression <B>]</B>
| <I>identifier</I> <B>.</B> <I>term</I>
| <I>function</I> <I>term</I>
| <B>exists</B> <I>identifier</I> <B>:</B> <I>term</I>
<I>function</I> ::= <B>abs</B> | <B>length</B> | <B>lower</B> | <B>upper</B>
| <B>integer</B> | <B>real</B> | <B>string</B> | <I>user-function</I>
<I>string</I> ::= <B>'</B> { { <I>any-character-except-quote</I> } [<B>''</B>] } <B>'</B>
<I>expressions-list</I> ::= <B>(</B> <I>expression</I> { <B>,</B> <I>expression</I> } <B>)</B>
<I>order</I> ::= <B>order by</B> <I>sort-list</I>
<I>sort-list</I> ::= <I>field-order</I> { <B>,</B> <I>field-order</I> }
<I>field-order</I> ::= [<B>length</B>] <I>field</I> (<B>asc</B> | <B>desc</B>)
<I>field</I> ::= <I>identifier</I> { <B>.</B> <I>identifier</I> }
<I>traverse</I> ::= <B>start from</B> <I>field</I> [ <B>follow by</B> <I>fields-list</I> ]
<I>fields-list</I> ::= <I>field</I> { <B>,</B> <I>field</I> }
<I>user-function</I> ::= <I>identifier</I>
<I>limit</I>::= <B>limit</B> [ <I>start-position</I> <B>,</B> ] <I>max-selected</I>
<I>max-selected</I> ::= <B>integer</B> | <I>parameter</I>
<I>start-position</I> ::= <B>integer</B> | <I>parameter</I>
</PRE><P>
Identifiers are case sensitive, begin with a a-z, A-Z, '_' or '$'
character, contain only a-z, A-Z, 0-9, '_' or '$' characters, and
do not duplicate a SQL reserved word.<P>
<TABLE WIDTH=100%>
<CAPTION>List of reserved words</CAPTION>
<TR><TD>abs</TD><TD>and</TD><TD>asc</TD><TD>between</TD><TD>by</TD></TR>
<TR><TD>current</TD><TD>desc</TD><TD>escape</TD><TD>exists</TD><TD>false</TD></TR>
<TR><TD>first</TD><TD>follow</TD><TD>from</TD><TD>in</TD><TD>integer</TD></TR>
<TR><TD>is</TD><TD>length</TD><TD>like</TD><TD>last</TD><TD>lower</TD></TR>
<TR><TD>match</TD><TD>not</TD><TD>null</TD><TD>or</TD><TD>real</TD></TR>
<TR><TD>rectangle</TD><TD>start</TD><TD>string</TD><TD>true</TD><TD>upper</TD></TR>
</TABLE><P>
ANSI-standard comments may also be used. All characters after a double-hyphen up to
the end of the line are ignored.<P>
FastDB extends ANSI standard SQL operations by supporting bit manipulation
operations. Operators <code>and</code>/<code>or</code> can be applied not only
to boolean operands but also to operands of integer type. The result of applying the
<code>and</code>/<code>or</code> operator to integer operands is an integer
value with bits set by the bit-AND/bit-OR operation. Bit operations can be used
for efficient implementation of small sets. Also the rasing to a power
operation (x<B>^</B>y) is supported by FastDB for integer and floating point
types.<P>
<H3><A NAME = "structure">Structures</A></H3>
FastDB accepts structures as components of records. Fields of the structure
can be accessed using the standard dot notation: <code>company.address.city</code>
<P>
Structure fields can be indexed and used in an <code>order by</code>
specification. Structures can contain other structures as their components;
there are no limitations on the nesting level.<P>
The programmer can define methods for structures, which can be used
in queries with the same syntax as normal structure components.
Such a method should have no arguments except a pointer to the object to which
it belongs (the <code>this</code> pointer in C++), and should return
an atomic value (of boolean, numeric, string or reference type).
Also the method should not change the object instance (immutable method).
If the method returns a string, this string should be allocated using the
<code>new char</code> operator, because it will be deleted after copying of
its value.<P>
So user-defined methods can be used for
the creation of <I>virtual</I> components -
components which are not stored in the database,
but instead are calculated
using values of other components.
For example, the FastDB <code>dbDateTime</code>
type contains only integer timestamp components and such methods
as <code>dbDateTime::year()</code>, <code>dbDateTime::month()</code>...
So it is possible to specify queries like: "<code>delivery.year = 1999</code>"
in an application, where the <code>delivery</code> record field has
<code>dbDateTime</code> type. Methods are executed in the context of the
application, where they are defined, and are not available to other
applications and interactive SQL.<P>
<H3><A NAME = "array">Arrays</A></H3>
FastDB accepts arrays with dynamic length as components of records.
Multidimensional arrays are not supported, but it is possible to
define an array of arrays. It is possible to sort records in the result set
by length of array field.
FastDB provides a set of special constructions for dealing with arrays:<P>
<OL>
<LI>It is possible to get the number of elements in the array by
the <code>length()</code> function.
<LI>Array elements can be fetched by the<code>[]</code> operator.
If an index expression is out of array range, an exception will be raised.
<LI>The operator <code>in</code> can be used to check if an array contains
a value specified by the left operand. This operation can be used only for arrays of
atomic type: with boolean, numeric, reference or string components.
<LI>Array can be updated using <code>update</code> method which creates copy of the array and returns
non-constant reference.
<LI>Iteration through array elements is performed by the <code>exists</code>
operator. A variable specified after the <code>exists</code> keyword can be used
as an index in arrays in the expression preceeded by the <code>exists</code>
quantor. This index variable will iterate through all possible array
index values, until the value of the expression will become <code>true</code> or
the index runs out of range. The condition
<PRE>
exists i: (contract[i].company.location = 'US')
</PRE>
will select all details which are shipped by companies
located in 'US', while the query
<PRE>
not exists i: (contract[i].company.location = 'US')
</PRE>
will select all details which are shipped from companies outside 'US'.<P>
Nested <code>exists</code> clauses are allowed. Using nested
<code>exists</code>
quantors is equivalent to nested loops using the correspondent
index variables. For example the query
<PRE>
exists column: (exists row: (matrix[column][row] = 0))
</PRE>
will select all records, containing 0 in elements of a <code>matrix</code>
field, which has type array of array of integer.
This construction is equivalent to the following
two nested loops:
<PRE>
bool result = false;
for (int column = 0; column < matrix.length(); column++) {
for (int row = 0; row < matrix[column].length(); row++) {
if (matrix[column][row] == 0) {
result = true;
break;
}
}
}
</PRE>
The order of using indices is essential!
The result of the following query execution
<PRE>
<code>exists row: (exists column: (matrix[column][row] = 0))</code>
</PRE>
will be completely different from the result of the previous query.
In the last case, the program simply hangs due to an infinite loop
in case of empty matrices.
</OL>
<H3><A NAME = "string">Strings</A></H3>
All strings in FastDB have varying length and the programmer should not
worry about specification of maximal length for character fields.
All operations acceptable for arrays are also applicable to strings.
In addition to them, strings have a set of own operations.
First of all, strings can be compared with each other using standard
relation operators. At present, FastDB supports only the
ASCII character set (corresponds to type <code>char</code> in C) and
byte-by-byte comparison of strings ignoring locality settings.<P>
The operator <code>like</code> can be used for
matching a string with a pattern containing special wildcard characters
'%' and '_'. The character '_' matches any single character,
while the character '%' matches zero or more characters.
An extended form of the <code>like</code> operator together with
the <code>escape</code> keyword can be used to handle the
characters '%' and '_' in the pattern as normal characters if
they are preceded by a special escape character, specified after
the <code>escape</code> keyword.<P>
If you rebuild GigaBASE with USE_REGEX macro, then you can use
<code>match</code> operator implementing standard regular expressions
(based on GNU regex library). Second operand of this operator specified
regular expression to be matched and should be string literal.<P>
It is possible to search substrings within a string by the <code>in</code>
operator. The expression <code>('blue' in color)</code> will be true
for all records which <code>color</code> field contains 'blue'.
If the length of the searched string is greater than some threshold value
(currently 512), a Boyer-Moore substring search algorithm is used instead
of a straightforward search implementation.<P>
Strings can be concatenated by <code>+</code> or <code>||</code> operators.
The last one was added for compatibility with the ANSI SQL standard.
As far as FastDB doesn't support the implicit conversion to string type in
expressions, the semantic of the operator <code>+</code> can be redefined for
strings.<P>
<H3><A NAME = "reference">References</A></H3>
References can be dereferenced using the same dot notation as used for
accessing structure components. For example the following query
<PRE>
company.address.city = 'Chicago'
</PRE>
will access records referenced by the <code>company</code> component of
a <code>Contract</code> record and extract the city component of the
<code>address</code> field of the referenced record from
the <code>Supplier</code> table.<P>
References can be checked for <code>null</code> by <code>is null</code>
or <code>is not null</code> predicates. Also references can be compared for
equality with each other as well as with the special <code>null</code>
keyword. When a null reference is dereferenced, an exception is raised
by FastDB.<P>
There is a special keyword <code>current</code>, which
during a table search can be used to refer to the current record.
Usually , the <code>current</code>
keyword is used for comparison of the current record identifier with
other references or locating it within an array of references.
For example, the following query will search in the <code>Contract</code>
table for all active contracts
(assuming that the field <code>canceledContracts</code> has a
<code>dbArray< dbReference<Contract> ></code> type):
<PRE>
current not in supplier.canceledContracts
</PRE><P>
FastDB provides special operators for recursive traverse of records by
references:
<PRE>
<code>start from</code> <I>root-references</I>
( <code>follow by</code> <I>list-of-reference-fields</I> )
</PRE>
The first part of this construction is used to specify root objects.
The nonterminal <I>root-references</I> should be a variable of reference or
of array of reference type. The two special keywords <code>first</code> and
<code>last</code> can be used here, locating the first/last record in the table
correspondingly.
If you want to check all records
referenced by an array of references or a single reference field
for some condition, then this
construction can be used without the <code>follow by</code> part.<P>
If you specify the follow by part, then FastDB will recursively traverse
the table of records, starting from the root
references and using a
<I>list-of-reference-fields</I> for transition between records.
The <I>list-of-reference-fields</I> should consist of fields of
reference or of array of reference type. The traverse is done in depth first
top-left-right order (first we visit the parent node and then the siblings in
left-to-right order).
The recursion terminates when a null reference is accessed
or an already visited record is referenced. For example the following
query will search a tree of records with weight larger than 1 in TLR order:<P>
<PRE>
"weight > 1 start from first follow by left, right"
</PRE><P>
For the following tree:
<PRE>
A:1.1
B:2.0 C:1.5
D:1.3 E:1.8 F:1.2 G:0.8
</PRE>
the result of the query execution will be:
<PRE>
('A', 1.1), ('B', 2.0), ('D', 1.3), ('E', 1.8), ('C', 1.5), ('F', 1.2)
</PRE><P>
As was already mentioned FastDB always manipulates with objects and doesn't accept joins.
Joins can be implemented using references. Consider the classical
<code>Supplier-Shipment-Detail</code> examples:
<PRE>
struct Detail {
char const* name;
double weight;
TYPE_DESCRIPTOR((KEY(name, INDEXED), FIELD(weight)));
};
struct Supplier {
char const* company;
char const* address;
TYPE_DESCRIPTOR((KEY(company, INDEXED), FIELD(address)));
};
struct Shipment {
dbReference<Detail> detail;
dbReference<Supplier> supplier;
int4 price;
int4 quantity;
dbDateTime delivery;
TYPE_DESCRIPTOR((KEY(detail, HASHED), KEY(supplier, HASHED),
FIELD(price), FIELD(quantity), FIELD(delivery)));
};
</PRE>
We want to get information about delivery of some concrete details from some concrete
suppliers. In relational database this query will be written something like this:
<PRE>
select from Supplier,Shipment,Detail where
Supplier.SID = Shipment.SID and Shipment.DID = Detail.DID
and Supplier.company like ? and Supplier.address like ?
and Detail.name like ?
</PRE>
In FastDB this request should be written as:
<PRE>
dbQuery q = "detail.name like",name,"and supplier.company like",company,
"and supplier.address like",address,"order by price";
</PRE>
FastDB will first perform index search in the table <code>Detail</code> for details
matching the search condition. Then it performs another index search to locate shipment
records referencing selected details. Then sequential search is used to check the rest of
select predicate.
<P>
<H3><A NAME = "rectangle">Rectangle</A></H3>
GigaBASE has builtin support of spatial data. It provides <code>rectangle</code> type.
By default it has dimension 2 and integer coordinate types. It is possible to easily change
dimension or use floating point coordinates, but recompilation of GigaBASE is needed in
this case.<P>
It is possible to use this type not only in geographical system for
representing spatial objects but also in many other cases when date is organized as hyper-cube and queries specify range of values for each dimension, for example:
<PRE>
select * from CAR where price between 20000 and 30000
and producedYear between 1997 and 1999
and mileage between 50000 and 100000;
</PRE>
If <code>%lt;price, producedYear, milage></code> are dimensions of the rectangle,
then this query can be executed using only one indexed search in R-Tree.<P>
Rectangle fields can be indexed - <B>R-Tree</B> index is used in this case
The R-tree provides fast access to spatial data. Basically the idea behind
the R-Tree and the B-Tree are the same:
use a hierarchical structure with a high branching factor to reduce the
number of disk accesses. The R-tree is the extension of the B_tree for a
multidimensional object. A geometric object is represented by its minimum
bounding rectangle (MBR). Non-leaf nodes contain entries of the form
(R,<I>ptr</I>) where <I>ptr</I> is a pointer to a child node in the R-tree; R
is the MBR that covers all rectangles in the child node. Leaf nodes contain
entries of the form (obj-id, R) where
obj-id is a pointer to the object, and R is the MBR of the object. The main
innovation in the R-tree is that the father nodes are allowed to overlap. By
this means the R-tree guarantees at least 50% space utilization and remains
balanced.
The first R-tree implementation was proposed by Guttman. The Gigabase <B>R-Tree</B>
class is based on Guttman's implementation with a quadratic split algorithm.
The quadratic split algorithm is the one that achieves the best trade-off
between splitting time and search performance.<P>
Rectangle class provides method for calculating distance between two rectangle,
rectangle area, checking whether two rectangle overlap or one contains another.
Rectangle is specified by coordinates of two it's vertices. Rectangle class
contains array of coordinates. Coordinates of first vertex are placed at the
beginning of array, and then - coordinates of another vertex.
Each coordinate of first vertex should not be large than correspondent coordinate of
the second vertex. Singular rectangle with one or more coordinates of first vertex equals
to the correspondent coordinate of the second vertex are allowed - this is a way of storing
points in the database.<P>
SubSQL provides some special operators for dealing with rectangle. First of all -
all comparison operators can be used with the following semantic:<P>
<TABLE BORDER ALIGN=CENTER>
<TR><TD><code>a == b</code></TD><TD>Rectangle <code>a</code> is the same as rectangle <code>b</code></TD></TR>
<TR><TD><code>a != b</code></TD><TD>Rectangle <code>a</code> is not the same as rectangle <code>b</code></TD></TR>
<TR><TD><code>a <= b</code></TD><TD>Rectangle <code>b</code> contains rectangle <code>a</code></TD></TR>
<TR><TD><code>a < b</code></TD><TD>Rectangle <code>b</code> contains rectangle <code>a</code> and them are not the same</TD></TR>
<TR><TD><code>a >= b</code></TD><TD>Rectangle <code>a</code> contains rectangle <code>b</code> and are not the same</TD></TR>
<TR><TD><code>a > b</code></TD><TD>Rectangle <code>b</code> contains rectangle <code>a</code> and them are not the same</TD></TR>
</TABLE><P>
Also SubSQL provides <code>overlaps</code> and <code>in</code> operators.
First checks if two rectangles overlap, the second is equivalent to <code><=</code> operator.
Rectangles can be added - result is minimal rectangle containing both rectangles-operands.
It is possible to access rectangle as array of coordinates - using <code>[index]</code>
notation. Coordinates in query are always returned as real numbers.<P>
Optimizer is able to use spatial index for all comparison operators (except <code>!=</code>)
and for <code>overlaps</code> operator.<P>
<H3><A NAME = "function">Functions</A></H3>
<TABLE BORDER ALIGN="center">
<CAPTION>Predefined functions</CAPTION>
<TR><TH>Name</TH><TH>Argument type</TH><TH>Return type</TH><TH>Description</TH></TR>
<TR><TD>abs</TD><TD>integer</TD><TD>integer</TD><TD>absolute value of the argument</TD</TR>
<TR><TD>abs</TD><TD>real</TD><TD>real</TD><TD>absolute value of the argument</TD</TR>
<TR><TD>area</TD><TD>rectangle</TD><TD>real</TD><TD>area of the rectangle</TD></TR>
<TR><TD>integer</TD><TD>real</TD><TD>integer</TD><TD>conversion of real to integer</TD</TR>
<TR><TD>length</TD><TD>array</TD><TD>integer</TD><TD>number of elements in array</TD</TR>
<TR><TD>lower</TD><TD>string</TD><TD>string</TD><TD>lowercase string</TD</TR><TR><TD>real</TD><TD>integer</TD><TD>real</TD><TD>conversion of integer to real</TD</TR>
<TR><TD>string</TD><TD>integer</TD><TD>string</TD><TD>conversion of integer to string</TD</TR>
<TR><TD>string</TD><TD>real</TD><TD>string</TD><TD>conversion of real to string</TD</TR>
<TR><TD>upper</TD><TD>string</TD><TD>string</TD><TD>uppercase string</TD</TR>
</TABLE><P>
FastDB allows user to define its own functions and operators.
Function should have at least one but no more than 3 parameters of string, integer,
boolean, reference or user defined (raw binary) type. It should return value of integer, real, string or
boolean type.<P>
User functions should be registered by the <code>USER_FUNC(f)</code> macro,
which creates a static object of the <code>dbUserFunction</code> class, binding
the function pointer and the function name.<P>
There are two ways of implementing these functions in application.
First can be used only for functions with one argument. This argument should be of <code>int8, real8,
char*</code> types. And the function return type should be <code>int8, real8, char*</code> or <code>bool</code>.
If function has more than one parameters or it can accept parameters of different types (polymorphism)
then parameters should be passed as reference to <code>dbUserFunctionArgument</code> structure.
This structure contains <code>type</code> field, which value can be used in function implementation to
detect type of passed argument and union with argument value.
The following table contains mapping between argument types and where the value should be taken from:<P>
<TABLE BORDER ALIGN=CENTER>
<TR><TH>Argument type</TH><TH>Argument value</TH><TH>Argument value type</TH></TR>
<TR><TD><code>dbUserFunctionArgument::atInteger</code></TD><TD><code>u.intValue</code></TD><TD><code>int8</code></TD></TR>
<TR><TD><code>dbUserFunctionArgument::atBoolean</code></TD><TD><code>u.boolValue</code></TD><TD><code>bool</code></TD></TR>
<TR><TD><code>dbUserFunctionArgument::atString</code></TD><TD><code>u.strValue</code></TD><TD><code>char const*</code></TD></TR>
<TR><TD><code>dbUserFunctionArgument::atReal</code></TD><TD><code>u.realValue</code></TD><TD><code>real8</code></TD></TR>
<TR><TD><code>dbUserFunctionArgument::atReference</code></TD><TD><code>u.oidValue</code></TD><TD><code>oid_t</code></TD></TR>
<TR><TD><code>dbUserFunctionArgument::atRawBinary</code></TD><TD><code>u.rawValue</code></TD><TD><code>void*</code></TD></TR>
</TABLE><P>
For example the following
statements make it possible to use the <code>sin</code> function in SQL
statements:
<PRE>
#include <math.h>
...
USER_FUNC(sin);
</PRE>
Functions can be used only
within the application, where they are defined. Functions are not accessible
from other applications and interactive SQL. If a function returns a string
type , the returned string should be copied by means of the operator
<code>new</code>, because
FastDB will call the destructor after copying the returned value.<P>
In FastDB, the function argument can (but not necessarily must) be enclosed in
parentheses. So both of the following expressions are valid:
<PRE>
'$' + string(abs(x))
length string y
</PRE><P>
Functions with two argument can be also used as operators. Consider the following example,
in which function <code>contains</code> which performs case insensitive search for substring is defined:
<PRE>
bool contains(dbUserFunctionArgument& arg1, dbUserFunctionArgument& arg2) {
assert(arg1.type == dbUserFunctionArgument::atString
&& arg2.type == dbUserFunctionArgument::atString);
return stristr(arg1.u.strValue, arg2.u.strValue) != NULL;
}
USER_FUNC(contains);
dbQuery q1, q2;
q1 = "select * from TestTable where name contains 'xyz'";
q2 = "select * from TestTable where contains(name, 'xyz')";
</PRE>
In this example, queries <code>q1</code> and <code>q2</code> are equivalent.<P>
<H2><A NAME = "cpp">C++ interface</A></H2>
One of the primary goals of FastDB is to provide a flexible and convenient
application language interface. Anyone who has to use
ODBC or similar SQL interfaces will understand what I am speaking about.
In FastDB, a query can be written in C++ in the following way:<P>
<PRE>
dbQuery q;
dbCursor<Contract> contracts;
dbCursor<Supplier> suppliers;
int price, quantity;
q = "(price >=",price,"or quantity >=",quantity,
") and delivery.year=1999";
// input price and quantity values
if (contracts.select(q) != 0) {
do {
printf("%s\n", suppliers.at(contracts->supplier)->company);
} while (contracts.next());
}
</PRE>
<H3><A NAME = "table">Table</A></H3>
Data in FastDB is stored in tables which correspond to C++ classes
whereas the table records correspond to class instances.
The following C++ types are accepted as atomic components of
FastDB records:<P>
<TABLE BORDER ALIGN="center">
<TR><TH>Type</TH><TH>Description</TH></TR>
<TR><TD>bool</TD><TD>boolean type (<code>true,false</code>)</TD></TR>
<TR><TD>int1</TD><TD>one byte signed integer (-128..127)</TD></TR>
<TR><TD>int2</TD><TD>two bytes signed integer (-32768..32767)</TD></TR>
<TR><TD>int4</TD><TD>four bytes signed integer (-2147483648..2147483647)</TD></TR>
<TR><TD>int8</TD><TD>eight bytes signed integer (-2**63..2**63-1)</TD></TR>
<TR><TD>real4</TD><TD>four bytes ANSI floating point type</TD></TR>
<TR><TD>real8</TD><TD>eight bytes ANSI double precision floating point type</TD></TR>
<TR><TD>char const*</TD><TD>zero terminated string</TD></TR>
<TR><TD>dbReference<T></TD><TD>reference to class T</TD></TR>
<TR><TD>dbArray<T></TD><TD>dynamic array of elements of type T</TD></TR>
</TABLE><P>
In addition to types specified in the table above, FastDB records can
also contain nested structures of these components.
FastDB doesn't support unsigned types to simplify the query language,
to eliminate bugs caused by signed/unsigned comparison
and to reduce the size of the database engine.<P>
Unfortunately C++ provides no way
to get metainformation about a class at runtime (RTTI is not supported by all
compilers and also doesn't provide enough information).
Therefore the programmer
has to explicitly enumerate class fields to be included in the database table
(it also makes mapping between classes and tables more flexible).
FastDB provides a set of macros and classes to make such mapping as simple as
possible.<P>
Each C++ class or structure, which will be used in the database, should
contain a special method describing its fields. The macro
<code>TYPE_DESCRIPTOR(</code><I>field_list</I><code>)</code> will construct
this method. The single argument of this macro is - enclosed in parentheses -
a list of class field descriptors.
If you want to define some methods for the class
and make them available for the database, then the macro
<code>CLASS_DESCRIPTOR(</code><I>name, field_list</I><code>)</code>
should be used instead of <code>TYPE_DESCRIPTOR</code>. The class name is
needed to get references to member functions.<P>
The following macros can be used for the construction of field
descriptors:
<DL>
<DT><B>FIELD(</B>name<B>)</B><DD>Non-indexed field with specified name.
<DT><B>KEY(</B>name, index_type<B>)</B><DD>Indexed field. <I>index_type</I>
should be a combination of <code>HASHED</code> and <code>INDEXED</code> flags.
When the <code>HASHED</code> flag is specified, FastDB will create a hash table
for the table using this field as a key. When the <code>INDEXED</code> flag is
specified, FastDB will create a (special kind of index) T-tree for the table
using this field as a key.
<DT><B>UDT(</B>name, index_type, comparator<B>)</B><DD>User defined raw binary type.
Database deals with this type just as with sequence of bytes of specified size.
This field can be used in query (compared with query parameter of the same type),
may be indexed and used in <code>order by</code> clause. Comparison is performed by means of
<code>comparator</code> function provided by programmer. Comparator functions receives three
arguments: two pointers to the compared raw binary objects and size of binary object.
The semantic of <I>index_type</I> is the same as of <code>KEY</code> macro.
<DT><B>RAWKEY(</B>name, index<B>)</B><DD>Raw binary type with predefined comparator.
This macro is just specialized version of <code>UDT</code> macro with <code>memcmp</code>
used as comparator.
<DT><B>RAWFIELD(</B>name<B>)</B><DD>One more specialization of <code>UDT</code> macro
for raw binary fields with predefined comparator <code>memcmp</code> and without indices.
<DT><B>SUPERCLASS(</B>name<B>)</B><DD>Specifies information about the base class
(parent) of the current class.
<DT><B>RELATION(</B>reference, inverse_reference<B>)</B>
<DD>Specifies <I>one-to-one, one-to-many</I> or <I>many-to-many</I>
relationships between classes (tables). Both <I>reference</I>
and <I>inverse_reference</I>
fields should be of reference or of array of reference type.
<code>inverse_reference</code> is a field of the referenced table
containing the inverse reference(s) to the current table. Inverse references
are automatically updated by FastDB and are used for query optimization
(see <A HREF="#inverse">Inverse references</A>).
<DT><B>OWNER(</B>reference, inverse_reference<B>)</B>
<DD>Specifies <I>one-to-many</I> or <I>many-to-many</I>
relationship between classes (tables) of owner-member type.
When owner record is removed all referenced member records are also removed
(cascade delete). If member record has reference to owner class, it should be
declared with RELATION macro.
<DT><B>METHOD(</B>name<B>)</B><DD>Specifies a method of the class.
The method should be a parameterless instance member function
returning a
boolean, numeric, reference or string type. Methods should be specified after
all other attributes of the class.
</DL><P>
Although only atomic fields can be indexed, an index type can be specified
for structures. The index will be created for components of the structure
only if such type of index is specified in the index type mask of the
structure. This allows the programmers to enable or disable indices for
structure fields depending on the role of the structure in the record.<P>
The following example illustrates the creation of a type descriptor
in the header file:<P>
<PRE>
class dbDateTime {
int4 stamp;
public:
int year() {
return localtime((time_t*)&stamp)->tm_year + 1900;
}
...
CLASS_DESCRIPTOR(dbDateTime,
(KEY(stamp,INDEXED|HASHED),
METHOD(year), METHOD(month), METHOD(day),
METHOD(dayOfYear), METHOD(dayOfWeek),
METHOD(hour), METHOD(minute), METHOD(second)));
};
class Detail {
public:
char const* name;
char const* material;
char const* color;
real4 weight;
dbArray< dbReference<Contract> > contracts;
TYPE_DESCRIPTOR((KEY(name, INDEXED|HASHED),
KEY(material, HASHED),
KEY(color, HASHED),
KEY(weight, INDEXED),
RELATION(contracts, detail)));
};
class Contract {
public:
dbDateTime delivery;
int4 quantity;
int8 price;
dbReference<Detail> detail;
dbReference<Supplier> supplier;
TYPE_DESCRIPTOR((KEY(delivery, HASHED|INDEXED),
KEY(quantity, INDEXED),
KEY(price, INDEXED),
RELATION(detail, contracts),
RELATION(supplier, contracts)));
};
</PRE>
Type descriptors should be defined for all classes used in the database.
In addition to defining type descriptors, it is necessary to establish
a mapping between C++ classes and database tables. The macro
<code>REGISTER(</code>name<code>)</code> will do it. Unlike the
<code>TYPE_DESCRIPTOR</code> macro, the <code>REGISTER</code> macro should
be used in the implementation file and not in the header file. It constructs
a descriptor of the table associated with the class. If you are going to work
with multiple databases from one application, it is possible to register
a table in a concrete database by means of the
<code>REGISTER_IN(</code>name,database</code<code>)</code> macro.
The parameter <code>database</code> of this macro should be a pointer to the
<code>dbDatabase</code> object. You can register tables
in the database as follows:<P>
<PRE>
REGISTER(Detail);
REGISTER(Supplier);
REGISTER(Contract);
</PRE>
The table (and correspondent class) can be used only with one database
at each moment of time. When you open a database, FastDB imports into
the database all classes defined in the application.
If a class with the same name already exists
in the database, its descriptor stored in the database is compared with the
descriptor of this class in the application. If the class definitions
differ, FastDB tries to convert records from the table to the new
format. Any kind of conversion between numeric types (integer to
real, real to integer, with extension or truncation) is allowed. Also,
addition of new fields can be easily handled. But removal of fields
is only possible for empty tables (to avoid accidental data destruction).<P>
After loading all class descriptors, FastDB checks if all indices specified
in the application class descriptor are already present in the database,
constructs new indices and
removes indices, which are no more used. Reformatting the table and
adding/removing indices is only possible when no more than one
application accesses the database. So when the first application is attached
to the database, it can perform table conversion. All other applications
can only add new classes to the database.<P>
There is one special internal database <code>Metatable</code>, which
contains information about other tables in the database. C++ programmers
need not access this table, because the format of database tables is specified
by C++ classes. But in an interactive SQL program, it may be necessary to
examine this table to get information about record fields.<P>
Starting from version 2.30 FastDB supports autoincrement fields
(fields unique value to which are assigned automaticaly by database).
To be able to use them you should:<P>
<OL>
<LI>Recompile FastDB and your application with <code>-DAUTOINCREMENT_SUPPROT</code> flags
(add this flag to <code>DEFS</code> variables in FastDB makefile).<BR>
<B>Attention</B>: database files created by FastDB compiled without this option will be incompatible
with FastDB compiled with <code>DAUTOINCREMENT_SUPPORT</code>.
<LI>If you want to use other than 0 initial counter value, you should
asssign value to <code>dbTableDescriptor::initialAutoincrementCount</code>.
It will be shared between all tables, so all table will have the same initial value of
autoincrement counter.
<LI>Autoincrement fields should be of int4 type and should be declared
with <code>AUTOINCREMENT</code> flag:
<PRE>
class Record {
int4 rid;
char const* name;
...
TYPE_DESCRIPTOR((KEY(rid, AUTOINCREMENT|INDEXED), FIELD(name), ...));
}
</PRE>
<LI>When record with autoincrement field is inserted in the database
there is no need to specify value of autoincremented field (it will be
ignored). After successful insertion of record this field will be
assigned unique value (which is guaranteed to be not used before this
table):
<PRE>
Record rec;
// no rec.rid should be specified
rec.name = "John Smith";
insert(rec);
// rec.rid now assigned unique value
int newRecordId = rec.rid; // and can be used to reference this record