Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byte Strings #34

Open
lfkeitel opened this issue Feb 16, 2019 · 0 comments
Open

Byte Strings #34

lfkeitel opened this issue Feb 16, 2019 · 0 comments
Labels
Language change Changes to the language syntax or runtime

Comments

@lfkeitel
Copy link
Collaborator

Strings are implemented a slice of Unicode runes. Meaning arbitrary byte sequences are not allowed, or the very least not guaranteed. There needs to be a way to manipulate arbitrary byte data.

Other languages handle this a little differently. Some like Python 3 and Rust have UTF-8 strings and byte strings. Other languages like PHP and JavaScript have a single string type of just bytes.

The runtime could be modified to store a String as a byte slice instead of rune slice. This would require conversions for indexing and string manipulation functions. However this would allow a single type to serve both purposes.

However there's value in distinguishing between the two string types as they serve different purposes. A normal string is guaranteed to be a valid UTF-8 string. While a byte string would be nothing more than bytes that may or may not mean anything. Having them separate would also ensure there's no accidental usage of a byte string in the place of a normal string. There would be conversion functions between the two if needed.

Syntax Notes

As for syntax, maybe borrow Python's way of using the prefix b to denote the following is a byte string. That should be easy to parse. Bytes strings would not be allowed where valid strings are needed in the existing syntax. Examples being import statements, isDefined function, etc.

Will also need to add support for hex literals inside quotes.

b"\xDE\xAD\xBE\xEF"

Implementation Notes

New token to denote a byte string from a regular string. New AST node using a byte slice instead of rune slice. New Object type with the same change. toBytes function to convert a UTF-8 string to a byte string. toString would be modified to allow the reverse. Byte slices can be concated together as well as indexed.

I'm not sure about nay utility functions like the string ones. Byte strings have a particular usage where replace, find, etc would be all that useful. Perhaps start without them and add them later if needed. Maybe allow toBytes to take an array of numbers and convert them to a byte array. That can make generation a but easier for the programmer.

@lfkeitel lfkeitel added the Language change Changes to the language syntax or runtime label Feb 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Language change Changes to the language syntax or runtime
Projects
None yet
Development

No branches or pull requests

1 participant