Skip to content
43615 edited this page Dec 19, 2023 · 308 revisions

Full reference manual

This page details the behavior of all available language features. For command line options, run dcim -h.

This is not a formal binding language specification. Other implementations of dc:im may have different technical restrictions and quirks but should maintain the original intent if reasonably possible.

Optional notes and musings are in italic.

Look here for advanced examples.

Introduction, core ideas and working principles

  • dc:im (or just dcim) is an improved rewrite/successor of GNU dc, itself based on a "desk calculator" program for the PDP-11 from around 1970-72 (predating C!).
    • This project was born out of amazement by the elegance of dc (and RPN in general) and frustration about dc's very apparent limitations.
  • English pronunciations: dee-see-im, dee-sim, de-sim (like "decimal", pure coincidence)
  • It's a purely procedural, stack-based interpreted language focused on arithmetic and string operations. Applications range from a powerful interactive RPN calculator to a compact algorithm sketchbook.
  • Most (technically all) commands are single ASCII characters, which results in highly terse scripts.
  • The main working area is a (theoretically unlimited) stack of objects. New objects are pushed to the top/end of the stack, commands pop as many as they need and push their results.
  • All objects are either numbers or strings. Some commands can work with both.
  • Objects may also be stored in numbered registers in several different ways.
  • Strings consisting of dc:im commands may be executed as macros. Together with the conditional execution commands, this makes dc:im a Turing-complete language in which all code is just data.

Technical details

  • Despite being an interpreted language, this implementation is quite fast due to being completely linear (no lookaround). Macros are effectively esoteric bytecode.
    • The main parsing loop processes all commands as individual chars. The object input features enter command-consuming loops of their own, ! sets a flag that is reset after the next command.
  • Numbers are stored in a binary floating-point format with variable precision, based on IEEE 754:
    • (signed mantissa 0.5≤|m|<1 with length W) * 2 ^ (exponent in range ±1'073'741'823)
  • This corresponds to an allowed OoM range of ≈10^±323'228'496. Numbers that fall outside this range become 0 or ±∞ respectively. There is also a special "Not a number" value, generated when performing computations on ±∞. NaN and ±∞ may cause unhandled issues, commands that need integers implicitly convert them to 0.
  • Negative zero also exists for compliance with IEEE 754. It has no practical differences from positive zero.
  • For accurate storage of integers, W can be considered the maximum length in binary. W=16 is unsigned short, W=32 is unsigned int and so on.
  • Strings are stored in the UTF-8 format internally, but manipulation commands index them by characters.
  • Nested macro execution is accomplished without "real" recursion and is only limited by how many macro strings can be stored in (heap) memory. Tail calls of macros (...x]/...=a]) are optimized, looping macros have constant memory usage.
  • Registers are stored in a hashmap with bigint indices, and are allocated when first accessed.

Performance/complexity analysis

  • The baseline command parsing loop is perfectly linear.
  • Generally, anything that processes numbers or strings is proportional to the input length.
  • Anything involving extending or reshaping arrays (D, R, !:...) is O(n), no shallow copies. Truncations (C, !;, Q...) are O(1).
  • I can't tell you a lot about the performance of the mathematical operations, check the documentation of GNU GMP (integers, mpz) and GNU MPFR (floats). As they are quite mature libraries, most operations should be optimal.
  • Custom additions/modifications of number I/O use the minimum possible amount of O(n) operations unless I missed something. Looks and runs like a cat on fire.
  • Strings are stored as UTF-8 bytes, but indexed by chars. Unfortunately, indexing the nth char therefore takes O(n) time (most manipulation commands).
  • Repeated macros (X) need O(B) time (duh) and O(log B) memory (repetition counter is a bigint, basically O(1) for all practical purposes).
  • Forced global reallocation (!c) is proportional to the amount of allocated registers, plus some unpredictable overhead from the hashmap (possibly quadratic?).
  • Indexing into register arrays may implicitly extend them, O(index).
  • The regex engine used by !^ and !| is, at worst, proportional to the lengths of both inputs (O(A*B)). Regexes are compiled and cached in a hashmap (scope: one instance of exec(), not in state storage).
  • m is expensive since it copies most of the parent's state. Use !m when possible.

Notation conventions for command signatures

  • Argument type abbreviations: A: any type, [B]: string, 'C: number, 'D.: integer, 'E.1..10: integer in range 1-10 (inclusive).
  • Commands always try to pop exactly as many objects of correct types as specified. If not enough objects are available, nothing is popped; if the types don't match or a semantic error occurs, the popped objects are returned.
  • Similar commands with identical signatures are documented together for brevity.
  • Commands that expect integers always implicitly round numbers "towards zero" (discarding the fractional part).
  • █R means that command takes a register name (next character) or uses the register pointer if it's set.
  • The part after -> is the output which is pushed to the stack.

Number input (standard)

"Pangram" example: '_123.abcZYX@_987 (valid in base 36)

  • Because - is the subtraction command, _ is used as the negative sign (only in standard number i/o).
  • The input base can be changed with i, see below.
  • Exponential notation is available and uses @ instead of the traditional e/E to work with all input bases unambiguously.
    • The exponent must be a decimal integer regardless of input base. It is always applied to the current input base (1@20 in base 2 = 1048576 in base 10).
    • If no mantissa is given before @, 1 is implied.
  • Digits above 9 (a-z or A-Z) would normally be processed as commands and impossible to input. To force letters to be part of a number, it needs to be prefixed with '. This will affect all encountered letters until the number ends (on whitespace or a non-letter command).
    • This might seem inelegant since GNU dc just uses capital A-F for hexadecimal digits. However, escaping allows for bases up to 36 and allows A-F to be used as new commands (which they are).
  • "Empty" numbers like ., _ and ' default to 0.
  • Digits that are too high for the current input base, such as 'a when I=10, are not allowed.

String input

Strings are sequences of Unicode characters. They are input [between square brackets]. Nesting is possible (and in fact required for serious programming!), the string only ends at the matching ]. Closing brackets may be omitted at the end of input/macros (discouraged due to potential confusion). There are no escape sequences; unmatched literal brackets and other problematic characters like line feeds can be generated with a (see below).

Any-base input and output

  • Numbers can¹ be input and output in bases over 36 using a special format.
  • Any-base numbers consist of parentheses containing individual digit values (decimal integers) separated by spaces, with an optional negative sign at the beginning and an optional fractional separator . between two digits, as well as an optional exponential part.
  • "Pangram" example: (-123 456.789@-4321) (valid in base 790 and higher)
  • As this format doesn't interfere with command processing, both - and _ may be used as negative signs. Output uses -.
  • The exponential part may start with either @, e or E. @ is recommended for visual clarity and consistency.
  • Like with normal number input, the exponent must be a decimal integer and is always applied to the current input base.
  • Digits equal to or greater than the input base are not allowed.
  • "Empty" digits (like two spaces in a row) are implicitly converted to 0.
  • The closing parenthesis may be omitted at the end of input/macros.
  • ¹The math library I'm using does not provide functions for handling bases over 36 and the ones I wrote don't have full access to the raw underlying data. Any-base output is therefore limited in the amount of digits that can be printed and may be inaccurate in the final digits of fractional values. Any-base input is almost certainly accurate.

Printing

For disambiguation, all niladic printing commands (intended for interactive/diagnostic use) print brackets around strings. As n and P are intended for script output, they print strings "raw".

Printing of numbers is always controlled by the parameters K and O. Negative numbers are printed using _ to match the input syntax.

  • p is the most basic printing command. It prints the top-of-stack with a newline, without altering anything.
  • f prints the entire (full) main stack (top object first) without altering it. Useful as a learning/diagnostic tool.
  • A n pops one object and prints it with no newline.
  • A P pops one object and prints it with a newline.
  • FR prints the entire (full) register R, including arrays with indices.

Arithmetic

  • 'A 'B + -> 'Z adds two numbers.
  • 'A 'B - -> 'Z subtracts B from A.
  • 'A 'B * -> 'Z multiplies two numbers.
  • 'A 'B / -> 'Z divides A by B.
  • 'A. 'B. % -> 'Z. computes A modulo B.
  • 'A. 'B. ~ -> 'Y. 'Z.0.. performs Euclidean division of A by B, pushing the floor quotient Y and the remainder Z (always positive).
  • 'A 'B ^ -> 'Z raises A to the Bth power.
  • 'A. 'B. 'C. | -> 'Z.0.. computes A^B mod C efficiently and with higher limits.
  • 'A v -> 'Z computes the positive square root () of A.
  • 'A 'B V -> 'Z computes the principal Bth root () of A. Exactly equivalent to A^(1/B).
  • 'A g -> 'Z computes the natural logarithm of A.
  • 'A 'B G -> 'Z computes the Bth logarithm of A. Exactly equivalent to ln(A)/ln(B).
  • 'A 'B.-6..6 t -> 'Z: Trigonometric and hyperbolic functions, angles in radians. The desired function is selected by B:
    • 1 to 6: sin, cos, tan, sinh, cosh, tanh
    • -1 to -6: asin, acos, atan, asinh, acosh, atanh
    • 0: degree to radian conversion shorthand
  • 'A.1.. N -> 'Z. generates a random integer 0≤Z<A. Mersenne twister with high-entropy seeds, can't vouch for cryptographic security.
  • [A] " -> 'Z gets the constant or conversion factor with name A if it exists (list+explanation).

String manipulation

These are "overloaded" versions of arithmetic commands with the same adicity, but different types. The absolute value of all indices is limited to a pointer-sized integer (Rust primitive usize, 32 or 64 bits depending on system architecture). This doesn't matter for practical purposes since nothing exceeding that length can even exist. Just don't use arbitrarily high numbers to mean "all".

  • [A] [B] + -> [Z] concatenates two strings.
  • [A] 'B. - -> [Z] removes |B| characters from string A: from the back if B>0 or from the front if B<0.
  • [A] 'B. * -> [Z] repeats string A |B| times, reversing it if B<0.
  • [A] 'B. / -> [Z] truncates string A to length |B|, removing characters from the back if B>0 or from the front if B<0.
  • [A] 'B.0.. % -> [Z] isolates the Bth character of A (starting at 0).
  • [A] 'B.0.. ~ -> [Y] [Z] splits string A at the Bth character, pushing the left side Y (B characters long) and the right side Z.
  • [A] [B] ^ -> 'Z.-1.. searches A for the first occurence of B, pushing the index of the first character (-1 if not found).
    • !^ searches A for the first match of regex pattern B (syntax reference). Unlike string input, the regex engine does recognize some familiar character escapes.
    • Empty B matches anything and always returns 0.
  • [A] [B] [C] | -> [Z] searches A for occurences of B, replacing them with C.
    • !| searches for matches of regex pattern B and replaces them with string C (replacement syntax).
    • Empty B matches everywhere and results in C being inserted between all characters ([abc][][-]| -> [-a-b-c-]).
  • [A] g -> 'Z.0.. pushes the length of string A.
  • 'A " -> [Z] converts number A to its string representation as if it was printed.

Stack manipulation

These commands have the same restriction on indices as string commands.

  • c clears the entire stack.
    • !c optimizes memory usage by reallocating growable memory areas to fit their current contents (except register arrays, see the memory model diagram). Use sparingly after completing memory-hungry operations.
  • 'A.0.. C clears the top A objects from the stack.
  • d duplicates the top-of-stack.
  • 'A.0.. D duplicates the top A objects of the stack.
  • r swaps (reverses) the top 2 objects of the stack.
  • 'A. R rotates the top |A| objects of the stack: upwards if A>0, downwards if A<0.
    • 1 2 3 4 3R -> 1 4 2 3
    • 'A. !R flips the top |A| objects.
  • z -> 'Z.0.. pushes the depth of the main stack (amount of objects).

Parameters

These control how numbers are handled. The lowercase commands (kiow) alter them, the uppercase commands (KIOW) push them to the stack.

  • 'A.0.. k: Output precision ("skale"), amount of significant digits. If set to 0, as many digits are printed as is necessary to reconstruct the number exactly. Default = 0.
    • Inconsistent with GNU dc, where K sets the amount of fractional digits in computations. That wouldn't make much sense with binary floats.
  • 'A.2.. io: Input and output bases. Default = 10. Values over 36 force a special "any-base" format.
  • 'A.1..4294967295 w: Working precision, mantissa size in bits of all newly created numbers. Default = 64.
  • KIOW -> 'Z. pushes the corresponding parameter to the stack.
  • Sets of the parameters K, I and O are stored on a "parameter stack", which permits temporary usage of a different parameter context while preserving the original. All relevant operations always use the last/top entry of the stack. To avoid accidentally changing the precision of existing numbers, working precision is not stored on this stack and can only be altered with w.
  • { creates a new parameter context with defaults 0k 10i 10o.
  • } returns to the previous context or resets the parameters if no previous context exists.
  • Curly braces serve as a visual metaphor for a "block of normality".

Registers

  • An unlimited amount of integer-numbered registers is available for storing data.
  • Registers are themselves stacks, but the objects stored on them are more complex: Each "register object" also contains an unlimited array of normal objects. See the memory model diagram and the next section.
  • Commands that access registers need a register number. They usually "steal" the next command character and use its Unicode codepoint, but will instead use the register pointer if it's set.
  • A sR saves an object to register R, overwriting the top object if one already exists.
  • A SR pushes an object to register R.
  • lR -> Z loads the top object of register R without removing it.
  • LR -> Z pops an object from register R.
  • ZR -> 'Z.0.. pushes the depth of register R.
  • Suggested register usage conventions, for script readability:
    • 0-9: Temporary data/macros within functions
    • a-z, A-Z: Functions within scripts
    • Multi-character ([example],): User-facing functions

Advanced register features

  • The register pointer (rptr) is set by , and becomes disabled after one use by any register manipulation command (even on semantic errors).
    • This single-use behavior was chosen to introduce only one new command.
  • 'A. , or [A] , writes to the rptr. If the argument is a string, it is first converted to a numerical representation (like A, but not dependent on working precision).
    • For example, sa, [a],s and 97,s are equivalent assuming I=10.
    • Importantly for syntax, this command is infallible unless the stack is empty.
  • A 'B.0.. :R saves an object to the top-of-R array at index B.
    • Same index restriction as string commands.
    • !:R extends the array to length B using copies of A. Truncates like !;R if the given length is smaller than the current one.
  • 'A.0.. ;R -> Z loads an object from the top-of-R array at index A.
    • Likewise.
    • !;R truncates the array to length A, removing extraneous objects. No effect if the given length is greater than the current one. Automatically reallocates the array to fit (like !c).
  • !ZR -> 'Z.0.. pushes the length of the top-of-register's array.
  • A buffer for one register object is available to allow easy moving between registers.
    • bR is like LR, but puts the RegObj into the buffer (overwriting it).
    • BR is like SR, but gets the RegObj from the buffer (preserving it).

Bulk movement

* means "all".

  • * !sR overwrites R with the main stack.
    • Creates empty arrays.
  • * !SR appends the main stack to R.
    • Likewise.
  • !lR -> * copies R to the main stack.
    • Discards arrays.
  • !LR -> * appends R to the main stack.
    • Likewise.
  • !bR -> * pops the top-of-R array and appends it to the main stack (discarding the principal object).
    • Index 0 = Bottom of stack.
  • * !BR appends the main stack to the top-of-R array (pushing a default RegObj if needed).
    • Likewise.

Macros and string conversion

  • A x assumes A is a string and executes it as a series of dc:im commands. If A is a number, it's pushed back to the stack (no-op).
    • The no-op behavior is a holdover from GNU dc.
  • [A] 'B.0.. X executes macro A B times.
  • ! inverts comparisons and switches some other commands to alternative modes. Only applies to the immediately following command (effectively a digraph), no effect on commands that don't use it.
  • A B =<>R: Comparison operators, compare two objects of the same type and execute the top-of-register (like lRx) if the comparison is true. Including !, the valid operators are = < > != !< !>. A and B are swapped relative to the traditional direction of the operators (B is on the "left side", 1 2>R would execute R).
    • Strings can also be compared. Equality is obvious; < and > work by lexicographic order: First, the comparison is applied to the lengths. If the lengths are equal, the numerical values of the characters are compared from left to right (like numbers, but each digit is a code point).
    • Examples: [foo][bar]=: false, [a][bc]>: true, [ab][ac]<: false
  • A yR executes the top-of-reg (like lRx) if A is a string. Negated with !.
  • q signals to quit dc:im. The way this is handled depends on the wrapper implementation (this one exits the process). If the register pointer is set, its value is returned as the exit code.
    • Does not terminate the main process if used in a child thread.
  • 'A.0.. Q quits A levels of nested macro execution ("breaks" A levels). Instead of signalling to exit completely like q, it simply discards the affected pending commands.
    • Breaks repeated macros (X) correctly (all remaining repetitions are discarded at once). Keep in mind that tail calls are optimized (the calling macro is discarded if it's exhausted).
    • 1Q cancels the rest of the macro it's in, 2Q breaks the macro that called it, 3Q breaks the macro that called the macro that called it, and so on.
  • ? prompts for one line of input and executes it as a macro.
    • !? -> [Z] pushes the input to the stack instead of executing it.
    • For every call of the command, input is read until a line feed character (0x0A/\n) is encountered or input ends (if piping is used). All line feeds and carriage returns (0x0D/\r) at the end are removed.
  • [A] a -> 'Z.0..1114111 or 'A.0..1114111 a -> [Z] converts the first character of string A to its numerical value or reverses the conversion.
    • UTF-16 surrogates (0xD800-0xDFFF/55296-57343) are not permitted. Noncharacters are fine.
    • Only the lowest 32 bits of 'A are considered (implicit wrapping conversion).
  • [A] A -> 'Z.0.. or 'A. A -> [Z] reads string A as the underlying UTF-8 byte sequence and converts it to an integer (first char -> most significant byte) or reverses the conversion. Invalid UTF-8 sequences cause an error.
    • Mind the working precision!
    • [example]A -> 65 78 61 6d 70 6c 65 = 28561332491021413, 43615A -> aa 5f = invalid UTF-8
    • Negativity is ignored, otherwise bijective assuming valid UTF-8.
    • This is the only command that deals with strings as raw bytes. Chars take up 32 bits internally, which would be unwieldy as numbers.

Multithreading

  • Macros can be executed in child threads to offload or split intense computations. Every child is linked to (and identified by) a register in the main thread. The child can be joined when its macro is completed, which saves its main stack to its corresponding register.
  • Child threads can spawn (grand-)*child threads of their own as long as system resources allow. Thread reservations in the parent thread become irrelevant and aren't carried over.
  • Printing commands and ? do nothing in child threads to avoid confusion. The intended purpose is to run autonomous computations.
  • Child threads cannot be forcibly terminated. Ensure that thread macros don't run forever.
    • q can be used safely since it doesn't affect the parent thread.
    • This is to avoid overhead from inter-thread signalling and synchronization problems.
  • [A] mR (spawn): Reserves register R and spawns a child thread that executes macro A. The thread is given a copy of the whole current state:
    • main stack
    • all register contents, but with thread reservations removed
    • the buffer used by b and B
    • parameter stack (K, I, O) and W
  • This copying may be undesirable for performance reasons. If "negated" (!m), the thread receives a blank state instead.
  • 'A. MR (join): Waits until the thread on R finishes or A milliseconds elapse (checking the thread's status every millisecond). If the thread is found to have finished, the contents of its main stack are pushed onto R. All other thread state is discarded.
    • The technically possible range for A is 0 to 2⁶⁴-1. Values outside this range, such as -1, are accepted and default to the maximum. 2⁶⁴-1 milliseconds is over half a billion years.
    • If given 0, performs the check once and without wasting a millisecond.

Special

  • 'A.0..2⁶⁴-1 T waits A milliseconds (idle).
    • May wait slightly longer. Accuracy depends on system architecture and OS.

The following 3 commands may be disabled using the safe option of exec() (-s|--safe cli flag in this implementation). This safety flag cannot be altered by any in-language command; child threads always have it set.

  • [A] & executes the file with name A as a macro script if it's accessible.
    • Removes comments like --file mode.
    • [A] !& -> [Z] pushes the contents to the stack instead.
  • [A] $ -> [Z] pushes the environment variable with name A if it exists.
  • [A] \ executes A as an OS command (arguments separated by ). The command may also be of the form var=val, which sets an environment variable (scope: this dc:im instance and child processes created via \).
    • The syntax is very basic, this is not supposed to be a shell.
    • Keep in mind that in Windows, many commands are not binaries and must be called via cmd. For example, [cmd /c cls]\ must be used to clear the screen.
  • # marks the beginning of a comment. If a macro string contains it, everything after it is ignored. In file mode, all comments are removed (until the end of the line) before executing the script.
  • Any other characters except for whitespace and NUL cause an "invalid command" message when being interpreted as commands.

Memory model diagram

... means that no limit is hard-coded by me. The ultimate limit is your available system memory, but you may encounter smaller ones depending on your system architecture and OS. Everything marked as such - except register arrays - can be shrunk to fit using !c.

Basic object "Obj":
┌────────┐
│ String │
├── or ──┤
│ Number │
└────────┘

Main stack:
┌─────┬─────┬─────┬────
│ Obj │ Obj │ Obj │ ...
└─────┴─────┴─────┴────

Register object "RegObj":
┌─────┐
│ Obj │ principal object (accessed by s/l)
├─────┴─┬───────┬───────┬────
│ Obj 0 │ Obj 1 │ Obj 2 │ ... array of objects (accessed by :/;)
└───────┴───────┴───────┴────
Each RegObj contains one principal Obj and a dynamically-sized array of Objs.
Arrays are contiguous; interacting with an uninitialized array element initializes all previously nonexistent Objs as empty strings.
!: automatically extends the array up to a specified length, !; truncates and reallocates it.
Forced global reallocation (!c) does not process arrays because they can only lose elements by manual truncation, which reallocates anyway.
!b and !B move arrays to and from the main stack.

Register:
┌────────┬────────┬────────┬────
│ RegObj │ RegObj │ RegObj │ ... accessed by S/L
├────────┴────┬───┴────────┴────
│ Thread lock │
└─────────────┘
S and L create and destroy entire RegObjs.
b and B move entire RegObjs.
!s, !S, !l, and !L also operate at this level and don't care about arrays.
The thread lock optionally contains a handle to a thread spawned by m. If present, no further threads referencing this register can be spawned.
M extracts the thread's results when it's finished, removing the lock.

RegObj buffer:
┌────────┐
│ RegObj │ initialized with empty string and 0-sized array
└────────┘
With just the usual dc commands, easy handling of entire arrays is impossible since l ignores arrays and L wastes them.
To solve this, a buffer for one RegObj is available. It can be written to with b and read from with B.

Hashmap of registers, arbitrary bigint keys:
┌────────────┬────────────┬────────────┬────
│ Register X │ Register Y │ Register Z │ ...
└────────────┴────────────┴────────────┴────
Forced global reallocation (!c) removes all unused registers (no RegObjs, no thread lock).

Parameter context stack:
┌────┬────┬────┬────
│ K0 │ K1 │ K2 │
│ I0 │ I1 │ I2 │ ...
│ O0 │ O1 │ O2 │
└────┴────┴────┴────
All relevant commands (kioKIOpfnPF" and number input) always use the last/top context. New contexts are created by { with the values (0,10,10).
} destroys the top context, creating a default one if the stack becomes empty.

Executed macro:
┌────────┬────────┬────────┬────
│ char 0 │ char 1 │ char 2 │ ...
├───────┬┴────────┴────────┴────
│ index │
├───────┴──────┐
│ repeat count │
└──────────────┘
The upfront cost of splitting the UTF-8 stream into immediate chars is compensated by O(1) seeking during execution.
If the repeat count isn't 0 when the end is reached, it's decremented and the index is reset. Otherwise, the macro is discarded.

Macro stack:
┌───────┬───────┬───────┬────
│ Macro │ Macro │ Macro │ ...
└───────┴───────┴───────┴────
Newly executed macros are pushed to this stack, the parsing loop always processes the top one.
If the calling macro is fully exhausted, it's removed before the new one is added (tail call optimization).
Q removes A entries from the top.