Skip to content

Commit

Permalink
Basic types and values.
Browse files Browse the repository at this point in the history
  • Loading branch information
lerno committed Mar 5, 2024
1 parent f45a0f4 commit 1d2d39b
Show file tree
Hide file tree
Showing 2 changed files with 249 additions and 3 deletions.
246 changes: 246 additions & 0 deletions src/content/docs/guide/basic-types-and-values.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
---
title: Basic types and values
description: Get an overview of C3's basic types and values
sidebar:
order: 4
---

C3 provides a similar set of fundamental data types as C: integers, floats, arrays and pointer. On top of this it
expands on this set by adding slices and vectors, as well as the `any` and `typeid` types.

## Integers

C3 has signed and unsigned integer types. The built-in signed integer types are `ichar`, `short`, `int`, `long`,
`int128`, `iptr` and `isz`. `ichar` to `int128` have all well-defined power-of-two bit sizes, whereas `iptr`
has the same number of bits as a `void*` and `isz` has the same number of bits as the maximum difference
between two pointer. For each signed integer type there is a corresponding unsigned integer type: `char`,
`ushort`, `uint`, `ulong`, `uint128`, `uptr` and `usz`.

| type | signed? | min | max | bits |
|---------|---------|--------|-----------|--------|
| ichar | yes | -128 | 128 | 8 |
| short | yes | -32768 | 32767 | 16 |
| int | yes | -2^31 | 2^31 - 1 | 32 |
| long | yes | -2^63 | 2^63 - 1 | 64 |
| int128 | yes | -2^127 | 2^127 - 1 | 128 |
| iptr | yes | varies | varies | varies |
| isz | yes | varies | varies | varies |
| char | no | 0 | 255 | 8 |
| ushort | no | 0 | 65535 | 16 |
| uint | no | 0 | 2^32 - 1 | 32 |
| ulong | no | 0 | 2^64 - 1 | 64 |
| uint128 | no | 0 | 2^128 - 1 | 128 |
| uptr | no | 0 | varies | varies |
| usz | no | 0 | varies | varies |

On 64-bit machines `iptr`/`uptr` and `isz`/`usz` are usually 64-bits, like `long`/`ulong`.
On 32-bit machines on the other hand they are generally `int`/`uint`.

### Integer constants

Numeric constants typically use decimal, e.g. `234`, but may also use hexadecimal (base 16) numbers by prefixing
the number with `0x` or `0X`, e.g. `int a = 0x42edaa02;`. There is also octal (base 8) using the
`0o` or `0O` prefix, and `0b` for binary (base 2) numbers:

Numbers may also insert underscore `_` between digits to improve readability, e.g. `1_000_000`.

```c3
a = -2_000;
b = 0o770;
c = 0x7f7f7f;
```

For decimal numbers, the value is assumed to be a signed `int`, unless the number doesn't fit in an
`int`, in which case it is assumed to be the smallest signed type it *does* fit in (`long` or `int128`).

For hexadecimal, octal and binary, the type is assumed to be unsigned.

A integer literal can *implicitly* convert to a floating point literal, or an integer of
a different type provided the number fits in the type.

### Constant suffixes

If you want to ensure that a constant is of a certain type, you can either add an explicit cast
like: `(ushort)345`, or use an integer suffix: `345u16`.

The following integer suffixes are available:

| suffix | type |
|--------|--------:|
| i8 | ichar |
| i16 | short |
| i32 | int |
| i64 | long |
| i128 | int128 |
| u8 | char |
| u16 | ushort |
| u32 | uint |
| u | uint |
| u64 | ulong |
| u128 | uint128 |

Note how `uint` also has the `u` suffix.

### Character literals

A character literal is a value enclosed in `'``'`. Its value is intepreted as being its
ASCII value for a single character.

It is also possible to use 2, 4 or 8 character wide character literals. Such are interpreted
as `ushort`, `uint` and `ulong` respectively and are laid out in memory from left to right.
This means that the actual value depends on the [endianess](https://en.wikipedia.org/wiki/Endianness)
of the target.

- 2 character literals, e.g. `'C3'`, would convert to an ushort.
- 4 character literals, e.g. `'TEST'`, converts to an uint.
- 8 character literals, e.g. `'FOOBAR11'` converts to an ulong.

The 4 character literals correspond to the layout of [FourCC](https://en.wikipedia.org/wiki/FourCC)
codes. It will also correctly arrange unicode characters in memory. E.g. `Char32 smiley = '\u1F603'`

## Floating point types

As is common, C3 has two floating point types: `float` and `double`. `float` is the 32 bit floating
point type and `double` is 64 bits.

### Floating point constants

Floating point constants will *at least* use 64 bit precision.
Just like for integer constants, it is possible to use `_` to improve
readability, but it may not occur immediately before or after a dot or an exponential.

C3 supports floating points values either written in decimal or hexadecimal formats.
For decimal, the exponential symbol is e (or E, both are acceptable),
for hexadecimal p (or P) is used: `-2.22e-21` `-0x21.93p-10`

While floating point numbers default to `double` it is possible to type a
floating point by adding a suffix:

| Suffix | type |
| ------------ | --------:|
| f32 *or f* | float |
| f64 | double |

## Arrays

Arrays have the format `Type[size]`, so for example: `int[4]`. An array is a type consisting
of the same element repeated a number of times. Our `int[4]` is essentially four `int` values
packed together.

For initialization it's sometimes convenient to use the wildcard `Type[*]` declaration, which
infers the length from the number of elements:

```c3
int[3] abc = { 1, 2, 3 }; // Explicit int[3]
int[*] bcd = { 1, 2, 3 }; // Implicit int[3]
```

Read more about initializing arrays in [the chapter on arrays](/references/docs/arrays/).

## Slices

Slices have the format `Type[]`. Unlike the array, a slice does not hold the values themselves
but instead presents a view of some underlying array or vector.

Slices have two properties: `.ptr`, which retrieves the array it points to, and `.len` which
is the length of the slice - that is, the number of elements it is possible to index into.

Usually we can get a slice by taking the address of an array:

```c3
int[3] abc = { 1, 2, 3 };
int[] slice = &abc; // A slice pointing to abc with length 3
```

Because indexing into slices is range checked in safe mode, slices are vastly more safe
providing pointer + length separately.

## Vectors

Similar to arrays, vectors use the format `Type[<size>]`, with the restriction that vectors may only form out
of integers, floats and booleans. Similar to arrays, wildcard can be used to infer the size of a vector:

```c3
int[<*>] a = { 1, 2 };
```

Vectors are based on hardware SIMD vectors, and supports many different operations that work
on all elements in parallel, including arithmetics:

```c3
int[<2>] b = { 3, 8 };
int[<2>] c = { 7, 2 };
int[<2>] d = b * c; // d is { 21, 16 }
```

Vector initialization and literals work the same way as arrays, using `{ ... }`

## String literals

Like C, string literals is text enclosed in `" ... "` just like in C. These support
escape sequences like `\n` for line break and need to use `\"` for any `"` inside of the
string.

C3 also offers *raw strings* which are enclosed in \` \`.
Inside of a raw string, no escapes are available, and to write a \`, simply double the character:

```c3
char* foo = `C:\foo\bar.dll`;
char* bar = `"Say ``hello``"`;
// Same as
char* foo = "C:\\foo\\bar.dll";
char* bar = "\"Say `hello`\"";
```

String literals are special in that they can convert to several different types: `String`,
`char` and `ichar` arrays and slices and finally `ichar*` and `char*`.

## Base64 and hex data literals

Base64 literals are strings prefixed with `b64` to containing
[Base64 encoded](https://en.wikipedia.org/wiki/Base64) data, which
is converted into a char array at compile time:

```c3
// The array below contains the characters "Hello World!"
char[*] hello_world_base64 = b64"SGVsbG8gV29ybGQh";
```

The corresponding hex data literals convert a hexadecimal string rather than Base64:

```c3
// The array below contains the characters "Hello World!"
char[*] hello_world_hex = x"4865 6c6c 6f20 776f 726c 6421";
```
## Pointer types

Pointers have the syntax `Type*`. A pointer is a memory address where one or possibly more
elements of the underlying address is stored. Pointers can be stacked: `Foo*` is a pointer to a `Foo`
while `Foo**` is a pointer to a pointer to `Foo`.

The pointer type has a special literal called `null`, which is an invalid, empty pointer.

### `void*`

The `void*` type is a special pointer which implicitly converts to any other pointer. It is not "a pointer to void",
but rather a wildcard pointer which matches any other pointer.

### The `typeid` type

C3 retains some type information at runtime, in particular, each type has an identifier that uniquely
defines it. Typicallu `Type.typeid` is used to convert a type to its unique runtime id, e.g.
`typeid a = Foo.typeid;`. The type itself is pointer-sized.

### The `any` type

`any` can be seen as a typed counterpart to `void*`, it is essentially a struct a `typeid` plus a `void*` pointer
to a value. The `any` may *unsafely* be cast to any pointer.

```c3
int x;
any y = &x;
int* w = (int*)y; // Returns the pointer to x
```

`any.type` returns the underlying pointee typeid of the contained value. `any.ptr` returns
the raw `void*` pointer.
6 changes: 3 additions & 3 deletions src/content/docs/references/docs/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,16 +186,16 @@ Pointers mirror C: `Foo*` is a pointer to a `Foo`, while `Foo**` is a pointer to
The `typeid` can hold a runtime identifier for a type. Using `<typename>.typeid` a type may be converted to its unique runtime id,
e.g. `typeid a = Foo.typeid;`. This value is pointer-sized.

### The `any*` type
### The `any` type

C3 contains a built-in variant type, which is essentially struct containing a `typeid` plus a `void*` pointer to a value.
It is possible to cast the any pointer to any pointer type, which will return `null` if the types don't match,
or the pointer value otherwise.

int x;
any* y = &x;
any y = &x;
double *z = (double*)y; // Returns null
int* w = (int*)x; // Returns the pointer to x
int* w = (int*)y; // Returns the pointer to x

Switching over the `any` type is another method to unwrap the pointer inside:

Expand Down

0 comments on commit 1d2d39b

Please sign in to comment.