Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phobos feels like a very detached standard library #84

Open
Connor-GH opened this issue Nov 13, 2024 · 32 comments
Open

Phobos feels like a very detached standard library #84

Connor-GH opened this issue Nov 13, 2024 · 32 comments

Comments

@Connor-GH
Copy link

As the title says.

Do you get this feeling too? It feels closer to boost than it would to (older) C++ stl types. And just like boost, many of the components I don't feel comfortable touching because of the very expensive runtime or compiletime cost associated with them. C is fast because the man pages document every function that allocates, and I can count them on my fingers (strdup, and printf("%n") is all I could think of). Zig is fast because any allocation is explicit due to the need to pass in an allocator everywhere. C++ is (sometimes) fast because it can allow you to pass in an allocator as a template argument, but this doesn't help the fact that it is optional and that the allocation characteristics of data structures such as std::vector is unspecified.

I believe that, in order for openD to start on the right foot, there needs to be some way of specifying an allocator outside of the GC interface. Also, if we were to start anew, memory allocations should be documented.

This issue was sparked by my experimentation with SumType versus my own Result, which had me conclude that my Result and my .match compiles over 2 times faster than SumType and .match.* Here is a practical example:

import result;
import core.stdc.stdio : printf;
import std.sumtype : SumType, match;

auto my_func(int x) {
	alias R = Result!(int, string);
	if (x == 6)
		return R.ok(0);
	else
		return R.err("uh oh!");
}


auto my_func2(int x) {
	alias R2 = SumType!(int, string);
	if (x == 6)
		return R2(0);
	else
		return R2("uh oh!");
}
/* ideal version: 
Result!(int, string) my_func3(int x) {
if (x == 6)
  return 0; // or ok(0)
else 
return "uh oh!"; // or err("uh oh!")
}
*/
extern(C) int main() {
	Result!(int, string) x = my_func(6);

	printf(result.match(x,
		(int i) => "yay!",
		(string s) => "bad."
	).ptr);

	SumType!(int, string) x2 = my_func2(6);

	printf(x2.match!(
		(int i) => "yay!",
		(string s) => "bad."
		).ptr);

	return 0;
}
  • disclaimer: my match only handles two "arguments" (functions) and does not do exhaustive checking. It is not a true "discriminated union" like SumType is, and is specialized for my kernel uses.
@adamdruppe
Copy link
Contributor

Your "ideal version" thing should be doable in opend fyi.

but generally i consider phobos to be quasi-deprecated abandonware..... don't want to change it in the name of compatibility but don't want to use it cuz it is slow. parts of it are fine but idk i haven't really decided.

in any case, nobody has proposed anything specific regardless.

@Connor-GH
Copy link
Author

Your "ideal version" thing should be doable in opend fyi.

could you explain? is there a specific feature that allows this?

@adamdruppe
Copy link
Contributor

@implict this(T t) - return is a case of implicit construction.

@Connor-GH
Copy link
Author

@implict this(T t) - return is a case of implicit construction.

That'd be great! (If I could get opend's ldc to build....the opend dmd does not support it, it seems)

@adamdruppe
Copy link
Contributor

just use the binary download... so much easiesr.

and don't forget to import core.attribute; for the implicit symbol

@Connor-GH
Copy link
Author

Connor-GH commented Nov 14, 2024

I'm sorry but I'm sure it works for you but it just isn't for me. When I import core.attribute, even making sure it's the right one in case the filename is hardcoded or something, ldc still segfaults. return 0; Still gives the implicit conversion error.

Just to get this right, 0 should convert to Result(0) or whatever?

It seems to work on just basic types like MyType x = 6; but does not work for functions. The problem is, hjow is the compiler supposed to figure this out? It tries to implicitly convert int to MyType but obviously fails.

@adamdruppe
Copy link
Contributor

What segfault? What compiler build? What OS? What sample code?

$ opend ldmd2 --version | head
LDC - the LLVM Open D compiler (1.36.0 - OpenD Nov  1 2024 00:35:56):
  based on DMD v2.106.1 and LLVM 17.0.6
module b0rk;

import core.attribute;
struct A {
        int a;
        @implicit this(int a) { this.a = a; }
}

A foo() {
        return 5; // works
}

void wtf(A a) {}

void main() {
        wtf(5); // works
        assert(foo().a == 5);
}

@Connor-GH
Copy link
Author

Your example just doesn't work for me. I downloaded the preview for my platform and ran the code as ./bin/ldc2 bork.d (i renamed the module to "bork"). I see that this version was built with llvm 17, and I use llvm 18. So I installed llvm 17 and the same thing still happens.

@adamdruppe
Copy link
Contributor

"the preview"......... the old one from feburary? Get the CI Automatic Build, that's always the up to date one (I'd delete the other if it'd let me!)

The --version should show it was built in November, just like mine.

@Connor-GH
Copy link
Author

The CI artifacts are "expired". https://github.com/opendlang/opend/actions/runs/11621956754#artifacts

@adamdruppe
Copy link
Contributor

Works fine from the homepage link: https://github.com/opendlang/opend/releases/tag/CI

@Connor-GH
Copy link
Author

Tested, it does (but you should do more real releases!)

Anyways, back to the topic at hand: the "new generation" D library you started has nearly nothing in it, do you have any data structures you would like to see or do, but don't have the time for? Make a wishlist and I will try to get to them. I am no metaprogramming wizard and cannot do exhaustive pattern matching with templates, but I don't think the library should have much of that, personally. The general rule seems to be that the harder it is for a human to figure out the templates by hand, the longer it takes to compile.

By the way, @implicit this is probably never making it upstream due to Walter's thoughts on it. Implicit construction can be abused but I like that it is explicit that you have to be implicit about it. Although there is no information about it outside of the type. A function annotation would be nice. Could you make it where the @implicit attribute is required on functions that use implicit instantiation?

@adamdruppe
Copy link
Contributor

oh every time I push to master is a real release!

but yeah, that D library thing is something Grim wanted to do and I'm not really sure what he has in mind for it. I really have no need for much of anything like that; the arsd namespace has most everything I need. there's some embedded data structures there too - a circular buffer as part of the terminal scrollback history, a stack as part of dom tree traversal, a double linked list as part of async io tracking, etc., but i tend to just do them ad hoc as i need them where i need them. tbh they're usually not hard anyway. So my preference is to leave that lib to people who do actually use it, so it reflects reality instead of my weird, twisted mind.

That said... idk if you have ideas we can talk through it. A lot of people say phobos needs containers and it is an embarrassment that it doesn't have them but like.... then they never follow up saying what, specifically, they have trouble with. so part of why that repo was made is to encourage people who talk about it to go ahead and put something in too.

re implicit, of course upstream isn't going to do it. if i had any faith in upstream's leadership, even just a tiny inkling that might indicate a seed of faith, this fork wouldn't exist. but putting implicit on the function itself... is that really necessary? Lots of functions can be called with implicit conversions and constructions:

void foo(long a) {}
foo(4); // this is an implicit conversion from int to long

Should that be marked too?

@Connor-GH
Copy link
Author

re implicit, of course upstream isn't going to do it. if i had any faith in upstream's leadership, even just a tiny inkling that might indicate a seed of faith, this fork wouldn't exist. but putting implicit on the function itself... is that really necessary? Lots of functions can be called with implicit conversions and constructions:

void foo(long a) {}
foo(4); // this is an implicit conversion from int to long

Should that be marked too?

I would say no, of course, because that's how it's always been. Also, almost never does it cause problems (the exception would be that you thought it was a long, but it's an it). On the other hand, I have looked at C++ code in the past that used std::optional (very similar to my Result with one type being a None) and seeing return {} and going "what on earth does that do?". Well, it turns out, it constructs an object using the default constructor for it, but that is documented exactly nowhere. I believe that the code should be as self-documenting (to a reasonable extent) as possible, and D has done a fantastic job of that so far. I can look at any given API signature and know exactly what it does, given all the scope, return scope, pure, const, etc attributes that can be applied to variables and functions (yes, i know pure and const are only for functions).

First thing anyone looks at when going into a function is its signature. If i were to see @implicit MyObject func(scope int *x), you can infer a lot about what you're going to see inside of that function. Implicit declarations? Probably, but not guaranteed (the compiler should not need to do exhaustiveness checking for this if it is too much of a performance cost). But we know that we will be constructing a MyObject and consuming x. I understand that someone might stick @implicit on main and just go back to C++ bad behavior but this is a case for exhaustiveness checking. Better yet, disallow @implicit on main. I would write the tests for these if you're willing to compromise with me on this.

@adamdruppe
Copy link
Contributor

What would implicit on main ever do? That doesn't make any sense, main's arguments are required to be empty or string[], no user defined types there at all.

@Connor-GH
Copy link
Author

What would implicit on main ever do? That doesn't make any sense, main's arguments are required to be empty or string[], no user defined types there at all.

@implicit main would work the same as @nogc main: it trickles down. @implicit on functions would be an annotation that "this function uses implicit construction".

@adamdruppe
Copy link
Contributor

Implicit construction only ever happens in two places: function calls and return values, and even then, only if it is the only option the type has opted into. Ambiguities are still errors so you won't get a surprising overload resolution.

C++ does some weird things but I think we've avoided that with this more conservative approach.

I'm not in favor of additional attributes at the usage site, that removes the whole point.

@Connor-GH
Copy link
Author

Hmm, the tests I performed confirmed this. Unfortunately, you can still implicitly and explicitly construct through the same constructor. Also, implicit constructions have a performance impact, and that shuould probably be obvious to the person using it, especially if they happen to stumble onto that API and did not create it themselves. People reading the code would immediately see that implicit conversion is being used. The annotation does not have to tell you what object does it, but it should be a correct annotation. Also it would be easier for serve-d to figure out what's going on (if you intend to contribute opend options to it).

@Connor-GH
Copy link
Author

Also, opAssign should probably be considered harmful.

@adamdruppe
Copy link
Contributor

Unfortunately, you can still implicitly and explicitly construct through the same constructor.

why is this bad? it is still a constructor, after all.

But implicit conversions and potentially expensive function calls already happen all over the place. Yes, opAssign and other overloaded operators, alias this, opDispatch, properties, tons of stuff. At some point you just have to hope people don't do crazy things.

@Connor-GH
Copy link
Author

Connor-GH commented Nov 18, 2024

I've kind of dropped the idea of implicit construction at this point. For now I'll make everything explicit because having a kernel depend on a forked language that already has a small community is just asking for it to never be used. I have acquired hardware and plan to give it a test run and write drivers so it can actually be used on real hardware. So my main concerns are accessibility to compile (compile time, requirements, etc.) and code readability.

Here's a suggestion that's more on-topic: how would you feel about creating a new standard library that takes in allocator arguments, therefore making gc optional?

I have quite a few data structures that I use in my kernel and I am constantly adding more, but of course they could always be more modularized.

@adamdruppe
Copy link
Contributor

Implicit construction is supposed to be pretty niche. I do use it for something like a sumtype (and note that you can offer support for it in library code and still be compatible with old D compilers, just polyfill the attribute, just you can't actually use it in those), but if you're unsure if it is a good fit, probably better to leave it out.

But re allocators, allocators are already passed in, just globally and unable to override. I have a plan to make them overrideable... then you don't have to rewrite any library, existing things will just work. If that is working, I suspect much of this will be obsoleted.

I also have this idea kicking around of an allocator interface that is explicitly passed and it redefines nogc slightly to mean "does not use the implicit allocator". So:

// from the library:
interface Allocator { /* all methods marked nogc */ }
Allocator gcAllocator(); // note this is NOT marked nogc
Allocator kernelAllocator() @nogc; 

void foo(Allocator a) @nogc {
   // all this stuff is allowed

   int[] stuff = a.New!(int[])(5);

   doOtherFunction(stuff, a); // can pass the allocator down like any other object

   a.Delete(stuff);
}

void main() { // note NOT nogc
   foo(gcAllocator()); // OK, you explicitly passed the GC from a GC-ok function, so it is allowed to use it
}

void main() @nogc {
   foo(gcAllocator()); // error: nogc function cannot call non-nogc function gcAllocator
}

so the yes-GC world is allowed to pass the GC down explicitly as an allocation when desired, but the nogc world can't initiate it, but is allowed to use it if it got it from a yesgc parent.

This is all doable as library code today (tho the gcAllocator needs some casting internally to adhere to the interface), and passing it as an interface gives a decent amount of flexibility with minimum hassle (well, not as minimum as using the implicit global allocator, but as close as you can get next to that).

The nogc annotation can't work with a runtime switch though. tbh I don't think nogc annotation is particularly important, but regardless, that does work with an explicit argument. And I think an explicit argument with a default might just work, even with the annotation, but it needs a fix:

Allocator GC() @trusted {
        __gshared obj = new GcAllocator;
        return obj;
}

void test(Allocator allocator = GC()) @nogc {

}

void main() @nogc {
        test(); // this SHOULD be a compile error but isn't, since the default arg passes that GC object.
        test(GC()); // this correctly is
}

So I don't know why that doesn't work but if i can figure this out, we might have the solution to all this divide - the stdlib functions are all marked nogc and take an explicit allocator, which defaults to GC. If you call it from another nogc context, it errors and makes you pass something, and if you don't care and want it to Just Work, it does.

@adamdruppe
Copy link
Contributor

adamdruppe commented Nov 18, 2024

I think the default arg there is CTFE'd, which is fine (the gshared thing there is a ctfe'd static instance anyway) but like it defeats my check lol. still need to investigate

edit: nope this is false, it is run at runtime. so this is an existing accepts invalid bug.

@GrimMaple
Copy link
Contributor

I am going to chip in since I were mentioned here :) (seriously Adam, just tag me if some info is needed)

Do you get this feeling too?

Personally, I do. Not only phobos "detached", it's also poorly designed, poorly connected within itself and feels more like a collection of random code than a standard library. There isn't much to do about it though.

but yeah, that D library thing is something Grim wanted to do and I'm not really sure what he has in mind for it.

I don't really have anything particular in mind; I just had a working serialization thing I wanted to share with the world so I did. I'm unsure if there's anything particularly useful for D in my current codebase, so I don't have much to add. I wanted to combine most useful code into it, but since @adamdruppe made pretty much all useful D code available from the get go, I see very little reason to combine it all in one d package. Maybe later, if we decide to change something :)

I believe that, in order for openD to start on the right foot, there needs to be some way of specifying an allocator outside of the GC interface. Also, if we were to start anew, memory allocations should be documented.

I don't want to be rude, but, to be super honest with you all, I'm eternally sick with this allocator talk. The thing is, allocators aren't actually that useful in the real world. I have experience programming everything known to man - from industrial grade web services and websites, to games and now even embedded stuff. Never have I ever felt the need to use an allocator, and even when I did, I ended up either using malloc because it's just faster, either having memory statically allocated and never reallocated again.

What allocators do bring in though is a huge spike in unnecessary code complexity, because what you can already do in, say, 10 lines of code (seriously, implementing a Stack or a Queue in D is just this easy), allocators would turn into a nightmare to implement, mostly because you need to invent reallocation strategies in cases when allocators of your types don't match. Let's say you have two Stacks, one using gcAllocator, the other one uses mallocAllocator. What do you do when:

Stack!(gcAllocator, int) a;
Stack!(mallocAllocator, int) b;

a = b; // ???

In other words, if time has come for you to use an allocator, you are probably better off with having a custom written solution anyway, so you can implement allocators there.

Actually, one thing why phobos sucks and feels so detached is exactly because it can't decide if it wants to be low level or high level. If phobos just went ahead with "GC only", it could've had so much more in it already. But alas, here we are.

Also-also, D's GC is already good enough to beat C++'s shared_ptr, and you can control the GC with @nogc attribute. There is no need to force unnecessary allocators to things that work perfectly fine without them.

As far as allocation strategies go, my thoughts were having a dedicated nogc submodule, eg import d.nogc; would import all the stuff that can be used without the gc. Otherwise, embrace GC by default for development speed.

@crazymonkyyy
Copy link

@Connor-GH are you looking to get involved with a new std?

@Connor-GH
Copy link
Author

I'm kind of back-and-forth about it. I have no prior experience in libc or otherwise standard library work that has been vetted by others. Lately, I have been involved with projects such as language development and osdev, so I haven't thought about openD recently. I tried integrating openD into my fork of xv6, but it would be cruel to require users to find the one page that lists a working release of an openD compiler. I like my projects being obscure but not that obscure. I ended up replacing all of my D infrastructure with Rust because I have been meaning to learn it, but I should mention that this does not take away from my liking of D.

Adam wanted implicit construction, so he added it. We're stuck with it now (unless Adam wants to replace the one instance of it in arsd....). That means that projects are bound to pay for costs they didn't know were there. We really don't want to be another C++. I have used all kinds of higher-level languages from D, Rust, Zig, Java, and C++, and C++ is just the worst ergonomically, at least for me. If you want something more objective, the compile times are about the same or slower than Rust, but without the safety guarantees.

If we are looking for reference standard libraries, I'd say avoid the C standard library, except where it makes sense. string.h and friends are terrible and error-prone. math.h is a fine reference however. The C++ standard library circa 1998 would probably be a decent reference, because that's before they started doing all kinds of operator overloading stuff "to make the library easier to work with". Boost even overrides operator, to add multiple things to a vector: vec += a, b, c, d, e;. Awful! But Rust does their library in a very interesting way. If we have the fluent interfaces that Rust has, it almost eliminates the need for UFCS! "foo".writeln is cool, but it doesn't make a ton of sense. It seems like a cheap* hack that was originally thought of to avoid having functions return this.

Bottom line is I'd want (and implement)

  • stacks, queues, linked lists, vectors, etc.
  • Option types (and Result types and the generic SumType)
  • "modern" printing (print("hello {} {}", "world", 1) and not print("Hello", " world", 1)). constants in the print function would be evaluated into a compile-time string ("{}", 6 => "6")
  • net decrease on compile times in comparison to phobos using the same features
  • string API similar to my libcstring maybe? obviously more features though. also, notice how my strings in this library have a small string optimization. the same thing is seen in Facebook's folly library, and yields performance benefits over normal heap strings. run some benchmarks, my strings are sometimes faster than native D strings.

Stuff I'd want but probably not implement myself:

  • .parallel should stay!
  • some dead-simple functions for iterating? (underdeveloped thought)

* cheap in idea, but probably not in compile time and compiler complexity. it's more stuff to check.

@Connor-GH
Copy link
Author

oh, and one last thing: all @safe, preferrably @nogc.

@GrimMaple
Copy link
Contributor

The C++ standard library circa 1998 would probably be a decent reference

Have you ever checked C# standard library? It eats C++ std for breakfast :) I haven't seen a single language that has better standard library than C# in general.

"modern" printing (print("hello {} {}", "world", 1) and not print("Hello", " world", 1)). constants in the print function would be evaluated into a compile-time string ("{}", 6 => "6")

That's already implemented with string interpolation though.

net decrease on compile times in comparison to phobos using the same features

That's pretty easy! Just get rid of templates.

oh, and one last thing: all @safe, preferrably @nogc.

Let's say we have a @nogc Stack class (struct/we). If I am a gc user and want to pop from such stack, what happens?

@GrimMaple
Copy link
Contributor

Let's say we have a @nogc Stack class (struct/we). If I am a gc user and want to pop from such stack, what happens?

Forget pop. How stack.push(new Object()) should work to begin with?

@adamdruppe
Copy link
Contributor

I tried integrating openD into my fork of xv6, but it would be cruel to require users to find the one page that lists a working release of an openD compiler.

You go to the website, click "Get started" then follow the download link. Or from github, click the "releases" link and it is right there. You can also direct link to it from your own stuff.

How would you make it better?

Let's say we have a @nogc Stack class (struct/we). If I am a gc user and want to pop from such stack, what happens?

A stack is trivial; it probably wouldn't allocate anyway, but if it does, you can always pass it an argument (and the default can be the simple gc appender.)

@GrimMaple
Copy link
Contributor

How would you make it better?

I'll PR later.

Let's say we have a @nogc Stack class (struct/we). If I am a gc user and want to pop from such stack, what happens?

A stack is trivial; it probably wouldn't allocate anyway, but if it does, you can always pass it an argument (and the default can be the simple gc appender.)

It isn't trivial at all.
I assume that this desire for @nogc comes from the performance standpoint, but the simple truth is, gc and nogc code doesn't mix well.

Let's say that our Stack is @nogc. When you push to it, there are two options:

  1. Pretend that it now owns the memory region pushed to it, and free it on exit
  2. Pretend that it doesn't own the memory region and leave cleanup to the user

Either way, you get problems - first way will corrupt the GC memory, the second way you get memory leaks. So to make it consistent, you'd have to employ a third (and very undesirable) option:

  • On push, copy the object being pushed

Hold on, as it gets worse. Because you can't freely assume the memory owner, when you'd then pop the object from stack, you are face with the similar issue that comes with the similar conclusion:

  • On pop, copy the object

Even worse, I can't think of a decent way to make pop universally work in this circumstance, so, besides Allocator, you'd have to add a new template parameter, that would control how the objects are being popped (ideally - how they are being pushed too).

To summarize, allocators aren't the "big problem", the problem is memory ownership, to clean it up properly.

When you use custom allocators, you kinda have to build your code around that allocator, so you can work out memory ownership. When you have more than 1 allocation strategy, you get problems, and such code doesn't generalize well

@crazymonkyyy
Copy link

crazymonkyyy commented Dec 25, 2024

allocator debate

let me settle this, I wont be implementing allocators, upstream has spent a decade on the idea and done nothing, my data structure lib took a week to start

however the idea of allocators is nogc, and 60% of the datastructures generated rn should be nogc; I wont do safe or any function coloring ever; but I could maintain that ratio

Let's say we have a @nogc Stack class (struct/we). If I am a gc user and want to pop from such stack, what happens?

if? https://github.com/crazymonkyyy/d/blob/d8c774537dd64c9841d803a6074f9df9217e82c2/source/odc/datastructures.d#L255
as written a nogc user would have to pass a size, then upstream a gc user may get overflows errors that a allocating stack wouldnt

It aint no allocator, or rust meme... by design, but it is a nonallocating stack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants