This repository has been archived by the owner on Jan 19, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
/
dfio.txt
1798 lines (1285 loc) · 76.9 KB
/
dfio.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
IRAF DATAFILE I/O INTERFACE
Doug Tody
February 1988
1. INTRODUCTION
The DFIO (Datafile I/O) interface is a general purpose interface
used to store and access arbitrary binary data tables maintained in a
machine independent random access binary file on disk. The main
purpose of the present interface is to support the IRAF image
structures, although it may also be considered a prototype for the
planned IRAF DBIO (Database I/O) interface, and could also be used by
applications as a interim interface for catalog output and intermodule
communication.
The capabilities or features of the DFIO prototype include the
following.
o The ability to store multiple binary tables in a single file.
o Data definition via a high level data definition language.
o Runtime mapping of internal stored tables to and from user
defined record structures for full data independence and
programmer convenience.
o Abstract user defined domains, including storage of information
for formatting tabular output.
o Highly efficient storage and access for both very small and
very large tables.
o Efficient symbolic access via a SYMTAB binary hash table.
o A simple sparse indexing scheme for tables stored sorted by a
user defined primary key, providing fast lookup for arbitrarily
large tables provided retrieval is based on the primary key.
o Dynamic updating and insertion, including updating variable
sized records.
o Array valued fields, including variable length arrays.
o Record valued fields.
o The ability to dynamically add fields to existing tables.
-1-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
o Default values for uninitialized or newly added fields.
o Nested tables (logical hierarchical structures implemented
physically as independent relational base tables).
o A fully self describing data format.
o A fully machine independent external data format.
o Distributed access via IRAF networking.
The major areas NOT dealt with by DFIO include concurrency, crash
recovery (important for large updatable databases, such as those
produced by sophisticated analysis programs), transaction logging,
dense indexes, multiple indexes, support for transparent access to host
databases, and all higher level database functions such as record
selection via a symbolically defined predicate (fundamental for
efficiently querying the database). With the possible exception of
concurrency and crash recovery, these areas are not considered of major
importance to the immediate goal of solving the general problem of IRAF
image storage; they will be dealt with by DBIO.
2. TERMINOLOGY
A DATAFILE may be thought of as a database, or collection of
TABLES, with an associated CATALOG (table of tables), and DATA
DICTIONARY, describing the RELATIONS (record types), ATTRIBUTES (fields
of the records), DOMAINS (user logical datatypes), and MAPPINGS
(mappings of applications structures onto datafile objects) stored in
the database. Physically, a datafile is a single IRAF random access
binary file containing both the data dictionary and data tables. One
or more separate text files containing statements written in a symbolic
DATA DEFINITION LANGUAGE are used to define the data elements to be
stored in the datafile.
Formal Relational Term Informal Equivalent (Date, 86)
relation table
tuple row or record
attribute column or field
primary key unique identifier
domain pool of legal values
A table is a named instance of a predefined relation. To distinguish
tables from relations or mappings, tables are often referred to as DATA
TABLES or BASE TABLES. A data table may be thought of as a table of
rows and columns, where the rows are the DATA RECORDS making up the
table, and the columns are the FIELDS of the data records. The fields
of a data record correspond to the attributes of the relation used to
construct the data table. Each attribute is defined upon some specific
-2-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
domain, e.g., "int" or "real". Often one or more of the columns of a
table will be concatenated to form the PRIMARY KEY of the table.
Tables with a primary key are stored sorted by the primary key with an
associated index of key values, allowing efficient record access even
for very large tables.
To make tables more useful in scientific applications, DFIO makes two
major extensions to the relational model upon which the interface is
based. First, fields may be array valued, with such arrays being
either fixed in size, or variable in size with effectively no upper
bound on the size of the array. Second, the fields of a record may
themselves be records, arrays of records, or other tables (SUBTABLES).
Although this appears at first glance to violate the relational model,
the subtables are independent tables with their own catalog entries,
and the database may still be viewed as a collection of largely
independent tables characterized by a somewhat generalized idea of what
constitutes a field.
3. DATA DEFINITION LANGUAGE
Before any data can be stored in a datafile, the set of possible
object datatypes (relations, attributes, and domains) must be defined.
This is most conveniently done using a high level textual description
of the data objects, expressed in the DFIO data definition language
(DDL). Data objects may also be defined, or old definitons altered,
using the procedural interface outlined in section 4.1.2. It is these
same procedures which are ultimately called when a data defintion file
(DDF) is compiled.
3.1 DDL LEXICAL FORM
The lexical form of the DDL is consistent with that of other IRAF
text data formats, i.e., comments may appear anywhere and are marked by
a #, backslash may be used to continue long lines, and blank lines are
ignored. Whitespace and newline delimits tokens, and strings
containing whitespace or DDL metacharacters must be quoted. The
language keywords should be given in lower case.
3.2 LANGUAGE ELEMENTS
The DFIO data definition language consists of only six statements,
DEFINE, INCLUDE, DOMAIN, RELATION, MAPPING, and SET.
-3-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
3.2.1 PREPROCESSOR DIRECTIVES
The DDL includes the standard preprocessor directives DEFINE and
INCLUDE, which are virtually identical to those used in the SPP
language.
INCLUDE <fname>
INCLUDE "pathname"
DEFINE macro_name replacement_string
If the include file is specified as <fname> the system libraries are
searched for the named file, otherwise the file should be specified by
an absolute pathname (IRAF VFN), to ensure that the file may be located
reliably regardless of the default directory set at runtime. The
arguments to macro defines are indicated as $1, $2, ... $9. It is
recommended that use of file inclusion be minimized in data definition
files intended to be read frequently at runtime.
3.2.2 THE DOMAIN STATEMENT
Each attribute of a relation (field of a record, column of a table)
must be defined upon some particular DOMAIN. A domain is similar to a
primitive datatype such as real or integer, but carries additional, more
application dependent meaning.
DOMAIN domain_name [domain_attribute=value [, ... ]]
The possible attributes of a domain are the following:
default Default value to be assigned to fields defined upon this
domain.
format FMTIO style format string (e.g, "%5.2f") to be used to
print the values of fields defined upon this domain.
type The datatype to be used to store values for this domain.
Either a primitive datatype, or the name of another
domain. If another domain is specified, the domain
currently being defined inherits the attributes of the
referenced domain.
units A string to be used in printed output to define the units
for the domain, e.g., "pixels", "degrees", and so on.
Note that inheritance (naming an existing domain as the type of the
domain currently being defined) serves only to set the default values
of the domain attributes. Subsequent DOMAIN_ATTRIBUTE=VALUE assigments
will override these default values, and the resultant domain will have
a primitive datatype.
-4-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
All domains are ultimately defined in terms of the following set of
primitive datatypes. Note that fixed size character strings, e.g.,
CH*80, are considered a primitive type. The B2 datatype is used to
store binary data in the datafile without it being subjected to any
format conversions, e.g., when storing data which has already been
rendered into a machine independent form by the applications program.
Name Description
b2 16 bit opaque binary data
ch [ *N ] character (stored one byte per character)
i2 16 bit signed integer
i4 32 bit signed integer
r4 32 bit floating point
r8 64 bit floating point
In addition, the following set of STANDARD DOMAINS are normally defined.
All user domains should be defined in terms of these standard domains,
to ensure that reasonable defaults are established for the domain
attributes.
Name Type Default Format Units
char [ *N ] ch "" %s none
short i2 0 %6d none
int, long i4 0 %11d none
real r4 0.0 %14.7g none
double r4 0.0 %21.14g none
opaque b2 0 %6o none
stime i4 0 %5D seconds
ltime i4 0 %8D seconds
sdate i4 0 %12T seconds
ldate i4 0 %22T seconds
The special domains STIME and LTIME, and SDATE and LDATE, provide a
standard way of representing timestamp information. All times are
stored internally as long integers in IRAF VOS time units (seconds
starting at 1970.0 LST). The only difference in the different time
domains is the way the output is formatted. Examples of the short and
long time formats are shown below.
stime 12:28
ltime 12:28:36
sdate Feb 20 12:28
ldate Sat 12:28:36 20-Feb-88
Note that domains can only be defined, directly or indirectly, in terms
of the primitive types. Array or record valued domains are not
permitted.
-5-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
3.2.3 THE RELATION STATEMENT
The RELATION statement is used to define a grouping of zero or more
ATTRIBUTES, each defined upon some particular DOMAIN. Defining a
relation does not in itself add any data to a datafile; to store data
in a datafile, one must create a DATA TABLE, which is a named instance
of some predefined relation, and then write DATA RECORDS to the table.
A relation is a typed object, like a domain, and may be used as an
element in an array, or as an attribute of another relation.
RELATION relation_name {
attribute1 [, alias...] : type [ "[" nelem "]" ], label
...
attributeN [, alias...] : type [ "[" nelem "]" ], label
}
A relation may have any number of attributes. The order in which the
attributes are defined is arbitrary and should be chosen for logical
clarity (tables are stored in a packed format hence there are no
alignment considerations). Note that aliases may be provided for the
attribute name, and that the attribute name is given first, followed by
a colon and then a comma delimited list of the attributes of the
attribute being defined. The attribute TYPE should be a domain name or
the name of a predefined relation. The LABEL is optional, and is used
as a comment or column label to identify the attribute in the data
definition and in formatted output tables.
The attributes of a relation may be scalars, fixed or variable length
arrays of any predefined type, i.e., of any predefined domain or
relation, or actual base tables. The elements of an array must be
fixed in size, hence arrays of variable length records are not
permitted. Array and table valued attributes are denoted as shown in
the examples below. EVENT is a predefined fixed size relation.
x: real # simple scalar field
coeff: real[10] # simple array valued field
name: char*20 # simple string valued field
month: char*3[12] # array of strings
label: char[] # variable length array
event: event # record valued field
event: event[8] # fixed array of records
event: event[] # variable array of records
event: event[*] # distinct base table
Any relation containing variable length arrays is a variable length
relation. A table valued attribute does not make a relation variable
length, since the table is stored separately from the records of the
parent table. Storage for fixed or small variable length arrays is
allocated within a variable length data record, hence small arrays are
the most efficient. Large arrays and table valued attributes are the
least efficient since storage is separately allocated (although
clustering of the records from different tables on the same datafile
-6-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
page is possible). Array or table valued attributes cannot be used in
indexes. The DF_FLEN function may be used to determine the length of
(number of elements or records in) an array or table valued attribute.
3.2.4 THE MAPPING STATEMENT
The DDL MAPPING statement is used to define a user record structure
(e.g., a dynamically allocated SPP binary data structure), and the
mapping of that structure onto a datafile relation (or base table, and
hence implicitly onto some relation). In the current implementation, a
mapping cannot map onto more than one relation, but there can be any
number of mappings of the same relation.
The syntax of the MAPPING statement is as follows.
MAPPING mapping_name {
field1 : spp_type, struct_offset, attribute [, flags]
...
fieldN : spp_type, struct_offset, attribute [, flags]
}
where FIELDN is the name by which the field will be known in
applications programs, SPP_TYPE is a SPP type name, e.g., "int",
"real", "real[6]", and so on, STRUCT_OFFSET is the zero-indexed struct
offset of the field in the SPP data structure, ATTRIBUTE is the name of
the attribute to which the field is to be mapped, and FLAGS tells DFIO
how to do the mapping, e.g., whether to abort or issue a warning
message if the field cannot be mapped.
Variable length arrays are implemented in SPP structures via a pointer
to a separately allocated array. This construct is indicated in a
mapping by the DFIO variable array declaration, plus a reference to the
LEN intrinsic function to identify the field to be used to hold the
length of the array, e.g.,
ncoeff: int, 0, len(coeff)
coeff: real[], 1, coeff
The corresponding SPP structure should declare COEFFP as a pointer
variable in the main data structure. In a record retrieval operation,
DFIO will automatically allocate a buffer of the correct size and
initialize the NCEOFF and COEFFP fields of the SPP structure
accordingly. In a record insert or update operation, NCOEFF is
required to pass the length of the array pointed to by COEFFP.
An applications program may reference either a maping of the data
table, or the table itself; in general the applications program does
not know the difference, although a mapping must always be used to map
stored records to and from SPP data structures. Since different
applications may have different mappings of the same relations, and the
-7-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
relations themselves may change with time, a mapping is bound to a
relation at runtime when the data table is opened. A mapping is bound
to a relation by resolving the symbolic field references into physical
datatypes and offsets into the stored record, setting up a binary
transformation table to be used to perform efficient runtime
transformations of records between their internal and external
representations. This is the most efficient way to access entire
records, short of dealing directly with the packed record as stored in
the datafile (as is done by some system code, e.g., when implementing
record selection).
The MAPPING construct performs two important functions:
[1] The MAPPING construct contributes significantly to the degree
of data independence provided by DFIO. Applications code is
isolated from knowledge of the logical structure of the
datafile (specific record types and field names, field
ordering, etc.), as well as from the physical structure of the
datafile (external representation of the stored records and
binary datatypes). The logical structure of the datafile can
be changed without having to change applications programs which
access the datafile. Different applications programs (or
versions of the same applications program) can view the same
data in different ways.
[2] The MAPPING construct makes it possible for the applications
programmer to work directly with conventional compiled binary
data structures without having to consider how these are stored
in the datafile.
Although the MAPPING construct is part of the DDL, there is an important
difference between a mapping and the other DDL constructs. A mapping
is logically part of an applications program, whereas domains,
relations, and so on are fundamentally tied to the datafile. Hence,
while it is customary to compile the binary description of the data
definition for a datafile and store it in the datafile itself, it is
unwise to do so for mappings. The data definition for mappings may be
precompiled if desired, but the compiled (and source) mappings should
be stored with the applications program and not with the data. That
way, the mappings can be reliably updated whenever the applications
program is modified. Ideally the data definition should be mechanically
generated from the host language structure definitions at compile time.
3.2.5 THE SET STATEMENT
The DDL SET statement is used to set the default values of
parameters governing the physical configuration of the datafile. All
such parameters have DFIO defined defaults, but generally it will be
desirable to tune the datafile parameters according to the type of data
to be stored in the datafile, and the expected type of access. Note
-8-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
that this is normally done by the programmer when they design the
datafile, rather than by the end user, who need not be aware of such
details.
SET parameter = value
The set of parameters currently defined includes at least the following
(more will doubtless be added as the interface evolves).
pagesize = 8192 The datafile page size in bytes. The page size
chosen must be large enough to store the
largest possible data record entirely within a
single page. The page size is fixed when the
datafile is created.
percentfull = 80 The percentage of the total space in a page to
be filled before starting a new page, when
writing to the datafile. Extra space is
desirable if frequent updates or insertions are
anticipated. If the datafile will be readonly
once created, the pages should be as full as
possible.
sbufsize = 1024 The default size of the string buffer portion
of the symbol table in bytes. At least this
much space will be reserved, but additional
space will automatically be allocated if
required.
symtabsize = 1024 The default size of the symbol table area in
bytes. At least this much space will be
reserved, but additional space will
automatically be allocated if required.
These parameters are discussed in more detail in section 5, which
describes the PHYSICAL SCHEMA of the datafile.
3.3 EXAMPLE 1: DATA DEFINITION
A complete example of the data definition for a (somewhat
simplified) image header relation is shown below. Note that the data
records of the HISTORY relation will be variable length, and that the
history relation appears as an attribute of the IMHDR relation. The
WCS attribute is implemented as an opaque array, implying that all
management of the contents of the array, e.g., conversions to and from
the external packed format, are being handled at a higher level. This
has the advantage of isolating all knowledge of the WCS structure to
the high level package, but requires that specialized code be used to
access the data.
-9-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
set pagesize= 2048
domain spptype type=char*8, format=%8s
domain posint type=short, format=%5d
# History record.
relation history {
time: stime, "Event time"
text: char[], "History text"
}
# Image header.
relation imhdr {
naxis, ndim: posint, "Number of axes"
axlen: long[7], "Axis lengths"
pixtype: spptype, "Pixel datatype"
title: char[], "Image title"
history: history[*], "Image history"
wcs: opaque[], "World Coordinates"
}
# Mapping of history record.
mapping m_history {
time: int, 0, time
text char[256] 1, text
}
# Mapping of image header record.
mapping m_imhdr {
pixtype: int, 0, pixtype
ndim: int, 1, ndim
axlen: long[7], 2, axlen
title: char[130], 10, title
history: char[32], 140, history
}
Note that in the mapping of the history relation, a fixed size text
buffer is used for simplicity, since we don't have to conserve storage,
and we don't expect the history text in each record to be very large.
The same approach is taken for the title string in the image header
mapping. Since the history is stored as a subtable, only the name of
the subtable appears in the parent record, hence in the header mapping
we need provide only a simple static char buffer to receive the name of
the history subtable. The WCS array is not mapped since we may not
need it, and can easily fetch it separately.
-10-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
4. DFIO APPLICATIONS INTERFACE
The DFIO applications interface, if we ignore the data definition
language which is rarely used in applications programs, is a PROCEDURAL
INTERFACE emphasizing data tables and the retrieval, insertion,
modification, and deletion of records in tables.
4.1 DFIO PROCEDURES
The DFIO interface procedures are logically subdivided into the
following sets of procedures:
GENERAL DATAFILE MANAGEMENT PROCEDURES
Open and close, delete, rename, etc., a datafile.
DATA DEFINITION PROCEDURES
These procedures are used to define the objects to be stored in
the datafile, and the mapping of those objects to and from the
corresponding objects in an applications program. This must be
done before any data can be stored in the datafile. The data
definition is itself stored in the datafile, hence the datafile
is self describing.
TABLE ACCESS PROCEDURES
These procedures make use of the access method procedures to
fetch, insert, update, and delete packed data records as
physically stored in the datafile, using the data definitions
to encode and decode the stored records, and map the records to
and from the representation used by the applications program.
The primary functions of this interface are to pack and unpack
data records, and to provide applications programs with ready
access to these records.
ACCESS METHOD PROCEDURES
The access method procedures have sole responsibility for
managing the area of the datafile used to store data tables.
At this level a data record is just an opaque byte stream. The
access method is responsible for creating and deleting data
tables, maintaining any indexes defined on a table, and
fetching, inserting, updating, or deleting packed data records
in base tables.
The data definition procedures are rarely used directly by applications,
which are expected to rely primarily upon the DDL for data definition.
DFIO makes use of the data definition procedures when it compiles a data
definition expressed in the DDL. Likewise, applications programs should
rarely need to call the access method procedures directly; the higer
level, more data independent table access procedures are used instead.
The main exception to this is when it is necessary to perform record
selection on a table as efficienty as possible, as is sometimes done in
-11-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
system code (e.g., the POE image kernel will use the access method
directly when selecting photon event records from the photon list
table).
4.1.1 GENERAL DATAFILE MANAGEMENT PROCEDURES
These are the usual top level package initialization, cleanup, and
query procedures, used to open an existing database, create a new one,
and so on.
df = df_open (datafile, mode)
df_close (df)
df_copy (df_in, df_out, what)
df_rename (old_datafile, new_datafile)
df_recover (datafile, out_fd)
df_destroy (datafile)
df_seti (df, param, ival)
ival = df_stati (df, param)
df_info (df, out_fd, what)
df_sync (df)
New datafiles are always empty when created. The data definition
procedures described in the next section must be used to define the
data elements to be stored in the datafile, after which the table
access procedures may be used to create new tables and insert data
records into them.
4.1.2 DATA DEFINITION PROCEDURES
The data definition procedures are used to define the data objects
(domains and relations) to be stored in the database. These procedures
are called either directly in user code, or indirectly during the
compilation of a data definition text file, i.e., with DF_COMPILE. The
DF_INHERIT procedure is used to load a precompiled data definition into
the datafile, e.g., to make a new version of an existing datafile.
df_compile (df, ddf_fname) # compile ddf
df_compileo (df, fd) # compile stream
df_inherit (df, old_df) # inherit ddf
df_setmapdf (df, mf) # set map ddf
The DF_SETMAPDF function is used to identify the mapping data
definitions to be used by the table access procedures. This defines a
second data dictionary (data definition symbol table) to be searched
BEFORE the datafile data dictionary is searched. Note that the
argument MF is a datafile descriptor, hence a separate MAPPING DATAFILE
-12-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
is used to store the mappings. This is a convenient way to implement a
precompiled, efficient binary representation of the mapping definitions,
yet allows the mappings to be stored separately from the data, allowing
multiple mappings of the same data to exist. The mapping datafile may
be kept open and used to access a sequence of "data" datafiles, or a
single mapping datafile may be shared by several simultaneously open
datafiles.
The DFD and DFR procedures are used to define new domains or relations,
alter existing domain or relation definitions, or examine existing
definitions, including traversing the lists of existing domain and
relation definitions using the NEXT functions. The OPENTB functions are
used to access the domain or relation descriptors for existing base
tables or fields, without having to know the relation or domain type
upon which the table or field is defined.
dp = dfd_open (df, domain, mode) # domains
dp = dfd_opentb (tb, field, mode)
dp = dfd_next (dp)
dfd_close (dp)
dfd_set[is] (dp, param, [is]val)
ival = dfd_stati (dp, param)
nch = dfd_stats (dp, param, outstr, maxch)
rp = dfr_open (df, relation, mode) # relations
rp = dfr_newcopy (df, relation, old_relation)
rp = dfr_opentb (tb, mode)
rp = dfr_next (rp)
dfr_close (rp)
attno= dfr_addattribute (rp, attribute)
nch = dfr_setattribute (rp, attno, attribute, maxch)
dfr_set[is] (rp, param, [is]val)
ival = dfr_stati (rp, param)
nch = dfr_stats (rp, param, outstr, maxch)
For example, to define a new DOMAIN, one opens a new domain with
DFD_OPEN, then sets the attributes of the domain with the DFD_SET
functions. Setting the domain type to the name of another domain causes
the attributes to be initialized to the values defined for the other
domain. Domain attributes not explicitly initialized retain their
default values. To examine an existing domain one opens it read only
by name, then uses the DFD_STAT functions to query the attributes. If
the domain name is the null string the most recently defined domain is
opened, and the DFD_NEXT function may be used to traverse the list of
currently defined domains.
The procedure for defining or querying RELATION definitions is similar,
except that a relation definition has an extra dimension - a relation
is a list of attributes, and each attribute is defined by a set of
attributes of its own. A new attribute definition is begun with
-13-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
DFR_ADDATTRIBUTE, then the DFR_SET functions are used to the set the
attributes of the attribute. The procedure for ALTERING an existing
relation definition by adding a new attribute is the same. The
DFR_NEWCOPY function makes a copy of an existing relation definition;
this is used when a new data table is created, to give that table its
own private relation definition (necessary if the table is to
subsequently be altered by the addition of a new field).
mp = dfm_open (df, mapping_name, mode)
dfm_mapuser (mp, name, type, offset, df_name, flags)
mp = dfm_next (mp)
dfm_close (mp)
attno= dfm_addattribute (mp, attribute)
nch = dfm_setattribute (mp, attno, attribute, maxch)
dfm_set[is] (mp, param, [is]val)
ival = dfm_stati (mp, param)
nch = dfm_stats (mp, param, outstr, maxch)
The DFM procedures are used to implement MAPPINGS, i.e., to define the
binary record structures used by an applications program, and the
mapping of these structures onto the attributes of relations defined in
the datafile; the relation definitions then define the mapping of the
relation onto the physical record format as stored in the datafile. A
user structure may only be mapped to a single datafile relation, i.e.,
a mapping cannot access fields from several different tables
simultaneously. There can however be any number of different mappings
of user structures onto a single datafile relation.
A general set of functions similar to those used to define relations are
provided for use by the DDL compiler, plus the function DFM_MAPUSER
which is provided to make runtime mapping definition in applications
programs more convenient and concise. Multiple calls to DFM_MAPUSER
are made, one for each field in the mapping being defined. The TYPE
argument specifies the SPP datatype of the struct field as a string,
e.g., "r". Arrays are indicated symbolically, e.g., "r[5]" for a
static array, or "r[]" for a dynamically allocated (variable length)
array. Anything else is the name of a previously defined user
structure, allowing nested substructures to be defined.
Note that the name of the relation or data table onto which the user
structure is to be mapped is not specified until the mapping is bound
to a specific relation at runtime in the DF_OPENTB (open table) call.
Obviously the attribute names DF_NAME specified when the mapping is
defined must match those of the target relation, and the implied type
conversions must be legal, but as long as these requirements are met
the mapping may be bound to any suitable relation.
-14-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
4.1.3 TABLE ACCESS PROCEDURES
This portion of the interface combines knowledge of the data
definition subsystem, plus the facilities provided by the raw file
access subsystem, to provide an applications level interface for record
storage. The main functions of this code are to manage the *contents*
of the data records, and to transform data to and from the internal,
machine independent, packed representation, and the representation seen
by the applications program.
The data definition procedures must have been used prior to calling
these routines to define the datafile and user data formats (relations
and domains), and define the mappings used to establish a
transformation between the fields of the user record and those of the
base table. The user sees only conventional, dynamically allocated
binary data structures, indicated as URECPTR in the figure. The
argument UREC_TYPE is the name of the mapping to be used; if this is
null the data table is accessed directly.
df_droptb (df, table)
df_renametb (df, old_name, new_name)
tb = df_createtb (df, table, urec_type, keyfields, flags)
tb = df_opentb (df, table, urec_type, mode)
nrecords = df_tlen (tb)
df_closetb (tb)
df_setkey (tb, keyfields) # select index
nkeys = df_getkey (tb, keyfields, maxch) # get keylist string
urecptr = df_mkurec (tb, urecptr) # make/copy urec
df_rmurec (tb, urecptr) # deallocate urec
df_seek (tb, urecptr, keylen, flag)
df_seek[cird] (tb, key[cird], flag)
df_fetch[cird] (tb, key[cird], urecptr, flag)
rid|EOF = df_fetch (tb, urecptr, flag)
df_insert (tb, urecptr)
df_update (tb, urecptr) # udpate record by key
df_upcur (tb, urecptr) # update current record
df_uprec (tb, rid, urecptr) # update by RID
df_delete (tb, urecptr) # composite key
df_delete[cird] (tb, key[cird]) # simple key
df_dlcur (tb) # delete current record
df_dlrec (tb, rid) # delete by RID
In a fetch operation the user defined data structure is allocated and
filled in by DFIO, to be reused in subsequent fetches, and deallocated
when the table is closed. A private copy of the structure may be
generated with DF_MKUREC if desired. In an insert, update, or delete
operation the structure pointed to by URECPTR is provided by the user;
-15-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
the structure returned by the fetch operation may be passed in if
desired.
Access may be either random or sequential. If the table has an index
then a key value may be specified in a DF_SEEK call to position the
CURRENT RECORD POINTER to the desired record. A DF_FETCH call with FLAG
set to DF_CURRENT will fetch the desired record. In the case of a
simple integer, real, or string valued key the DF_FETCH[CIRD] call may
be used to search the index and fetch the record all in one call.
Subsequent fetches with FLAG set to DF_NEXT will fetch records
sequentially. If the internal record id (RID) is known, DF_FETCHI can
be used with FLAG set to DF_RECORD to fetch the indicated record
directly without searching the index.
The record INSERT and UPDATE operations are slightly different because
if the table has an index, the primary key value determines the point
at which the record must be inserted or updated in the table (indexed
tables are maintained in sort order based on the primary key). Of
course if the key values increase monotonically as records are
inserted, then the new records will simply be appended to the table;
this is the most efficient insertion mode. If the table has no index
then new records are always inserted at the end of file. In an update
of a non-indexed file updates may occur at any point, specified by the
RID or the current record. Deletion is similar.
rp = df_access[cird] (tb, key[cird], flag) # get record by key
rp|NULL = df_access (tb, flag) # get next record
df_release (rp, flag) # free+ins|upd|del
val = df_get[silrd] (rp, field) # access fields
df_put[silrd] (rp, field, val)
nch = df_gstr (rp, field, outstr, maxch)
df_pstr (rp, field, sval)
df_[gp]rec (rp, field, urecptr, urec_type)
nelem = df_flen (rp, field)
nelem = df_gvec[silrd] (rp, field, vector, first, maxelem)
df_[pu]vec[silrd] (rp, field, vector, first, nelem)
A second set of functions, shown above, are provided for operating
piecemeal upon data records without having to deal with binary data
structures in user code. These functions operate directly upon the
individual fields of a record. The record to be accessed is loaded
into an internal buffer by DF_ACCESS, and the record is updated or
inserted (if modified) and the buffer freed by a subsequent call to
DF_RELEASE, after accessing the record with the GET and PUT calls
shown. If the table has a simple key, or access is by RID, the
DF_ACCESS[CIRD] routines may be used to directly fetch the record to be
accessed, otherwise a DF_SEEK call must be made first to position the
current record pointer. Sequential access is indicated by calling
DF_ACCESS with FLAG set to DF_NEXT. Multiple records may be
simultaneously accessed by making multiple ACCESS calls without
-16-
DFIO (Feb88) Datafile I/O Interface DFIO (Feb88)
RELEASEing the accessed records.
SUBTABLES may be accessed by obtaining the table name from the parent
record and opening the table independently with DF_OPENTB; if a record
has a subtable, the subtable field is string valued and contains only
the name of the subtable. Records stored directly within the parent
record may be accessed with the DF_[GP]REC functions, or mapped
directly onto the user structure with a mapping. The individual fields
of an embedded record may also be accessed directly by name in a put or
get call, e.g., "cvfit.ncoeff", if CVFIT is a record embedded within
the record currently being accessed.
VECTOR FIELDS of arbitrary size may be read or written with the
DF_[GPU]VEC calls. When writing to a stored vector, DF_PVEC may change
the length of the stored vector (make it shorter; either a put or an
update may be used to make it longer). The DF_UVEC call (update
vector) should be used to rewrite segments of an existing stored vector
without changing its length. The length of a vector may be queried with
DF_FLEN. Variable length vectors will have zero length until first
written to.
4.2 EXAMPLE 2: SIMPLE TABLE ACCESS
The following examples are based upon the following simple data and
user structure definitions, assumed to be stored in the files
"pkg$curve_d.ddf" and "pkg$curve_m.ddf". These could just as well be
entries in a larger DDF containing definitions for a number of other
objects. The code has been simplified to avoid complicating the
example.
# CURVE -- Relation used to describe a simple 1-Dim fitted curve.
relation curve {
recno: short, "Record number"
functype: char*8, "Fitting function"
x1: real, "Minimum input X value"
x2: real, "Maximum input X value"
coeff: real[] "Coefficient array"
}
# CVFIT -- Applications mapping of the curve relation.
mapping cvfit {
id: int, 0, recno
x1: real, 1, x1
x2: real, 2, x2
ncoeff: int, 3, len(coeff)
coeffp: real[], 4, coeff
function: char[9], 10, functype
}
The CURVE relation defines the record structure of a curve record as