Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cpp] Generate Typed Functions #11151

Open
wants to merge 44 commits into
base: development
Choose a base branch
from

Conversation

Aidan63
Copy link
Contributor

@Aidan63 Aidan63 commented Apr 15, 2023

Corresponding hxcpp pr

Something I started last year but just now decided to come back to. This attempts to have all functions be implemented as a strongly typed callable object instead of using ::Dynamic internally.
Main two reasons for this are to stop primitive types becoming boxed when used as closures arguments or with class function closures, and to allow you to have a typed function object to use when writing interop code in C++.

Key Changes

Callable Object

There is a new templated hx::Object subclass called hx::Callable_obj to represent anything which holds invokable code. Since it uses template pack parameters for arguments several of the hx::Object functions can be overridden and marked as final in this base type to save the user or the haxe compiler needing to generate those functions. It also contains a pure virtual _hx_run function to invoke the code.

template<class TReturn, class... TArgs>
struct Callable_obj;

template<class TReturn, class... TArgs>
struct Callable_obj<TReturn(TArgs...)> : public hx::Object
{
    int __GetType() const override { return vtFunction; }
    int __ArgCount() const override { return sizeof...(TArgs); }

    virtual TReturn _hx_run(TArgs... args) = 0;
};

A corresponding hx::Callable hx::ObjectPtr subclass has also been created. This class overrides the () operator for nice function calling and more importantly handles generating wrapper callables to deal with all the dynamic unification haxe allows with function signatures.

Dynamic Calling

The next big change is surrounding dynamic calls with __run and __Run. __run is now non-virtual and implemented as a variadic template function, using some C++ template magic we can automatically generate an Array<::Dynamic> for passing into the still virtual __Run.

template<class ...TArgs>
Dynamic hx::Object::__run(const TArgs& ...args)
{
    using unused = int[];

    auto arr = Array_obj<::Dynamic>::__new(0, sizeof...(args));

    (void)unused {
        0, (arr->push(invoker::wrap::toDynamic(args)), 0)...
    };

    return __Run(arr);
}

This saves classes needing to implement (either by the user or the compiler) a specific __run overload for dynamic calling to work correctly (albeit at the expense of an extra call / array allocation now). invoker::wrap::toDynamic uses type traits to check at compile time the argument types and generate code for wrapping native pointers and structs in the relevant hxcpp objects for dynamic compatibility, again something which use to have to be done by the user or the compiler for objects you wanted to use as a function.

Callable objects also implement and mark final __Run, since arguments are part of the callable signature at compile time we can again use some C++ template magic to automatically unpack the Array<::Dynamic> to the _hx_run call using index sequences.

Due to the implementation of the above C++11 is now required for hxcpp, so the HXCPP_CPP_11 and NO variety defines could probably be cleaned up / removed.

Further Work

GC pressure could be reduced even more if the closures for stuff like class functions were cached or maybe even pre-allocated. Could even be something like the small number boxing optimisation hxcpp currently does, where boxed integers < 255 are only cached between collections.

Some unification / cleanup could probably be done around the cpp.Callable type, the name alone might cause some confusion, but now there's an actual callable type this pretend one which is intercepted by gencpp might not need to exist or its functionality rolled into the new actual callable type.

I have not updated anything surrounding cffi / calling functions from dynamic libraries. This is all still dynamic under the hood but could probably be updated to not be. I don't think I've ever used these methods of calling native code in hxcpp so I haven't done it for now.

@Aidan63 Aidan63 marked this pull request as draft April 15, 2023 22:49
@Aidan63 Aidan63 changed the title Draft: [cpp] Generate Typed Functions [cpp] Generate Typed Functions Apr 15, 2023
@skial skial mentioned this pull request Apr 16, 2023
1 task
@Simn
Copy link
Member

Simn commented Apr 16, 2023

This is awesome, and I think it's a worthwhile endeavor. It's also quite neat that you can make a change like this with a relatively small PR. Looking forward to seeing this completed!

@Simn
Copy link
Member

Simn commented Apr 17, 2023

I just rediscovered https://github.com/HaxeFoundation/haxe/tree/development/tests/benchs while cleaning up. It's a generally useless micro-benchmark, but for something like this it's quite nice to see a direct comparison between different call kinds:

  Suite: call with 1 arg
              anon: 10,861,172 ( 61.39%)
     class-to-anon:  9,442,624 ( 53.37%)
           dynamic:  9,631,823 ( 54.44%)
     field closure: 13,358,008 ( 75.51%)
             final: 17,203,872 ( 97.25%)
          instance: 17,689,460 (100.00%)
         interface: 16,642,138 ( 94.07%)
     local closure: 13,736,624 ( 77.65%)
    local function: 13,658,467 ( 77.21%)
          override: 17,132,126 ( 96.84%)
            static: 17,571,791 ( 99.33%)
    static closure: 13,525,721 ( 76.46%)

Will be nice to see if there's a before/after difference for your changes here.

@Aidan63
Copy link
Contributor Author

Aidan63 commented Apr 17, 2023

Just gave it a quick go, some pretty good results (even if it is just a micro benchmark).

This is what I got with haxe 4.3 and latest hxcpp
image

and this is what I get with this branch and my hxcpp changes.
image
Number of arguments with closures or local functions no longer make any meaningful difference to call time. The calls benchmark doesn't seem to have anything for functions with the dynamic modifier, so I'll probably add that to get a difference for those once I implement them.

The final function benchmark reminds me I should dig out my branch where I added final to the generated classes and functions in C++ if it was final in haxe, would allow compilers to do some devirtualisation micro optimisations.

@Simn
Copy link
Member

Simn commented Apr 17, 2023

Looks good! I suppose there would be an even bigger difference if it used integers instead of strings, because then we're in boxing land.

Regarding native final, there's a headache-inducing discussion in #8263.

@Aidan63
Copy link
Contributor Author

Aidan63 commented Apr 19, 2023

I saw the face of true evil this evening, it took the form of haxe optional arguments.

I've cleaned up the code handling optional arguments a bit, but my head is spinning as a result. I've now managed to apply the ::hx::Null<T> optimisation to arguments for closures like final f = (i = 12) -> {}. Unfortunately there are still some edge cases surrounding mixing optional / null arguments of similar functions where you may get an extra wrapper function, I don't think the optional bool of TFun is enough to be able to reliably apply the optimisation, we need to know the the expression of the default value which only comes with tfun. Thankfully though I haven't seen any cases where unintentional boxing occurs.

Seems like a lot of the native final arguments are about class variables, might be able to get away with it for functions.

@Aidan63
Copy link
Contributor Author

Aidan63 commented Jun 12, 2023

Fixed the issue with self referencing typedefs, not the most confident in my fix, but all the tests pass now.

I've started updating the hxcpp implemented classes to use the new callable type. So now class closures of maths and string functions are fully typed and don't use dynamic. Arrays are the main one left, going to be a bit trickier due to the BaseArray, Array, and VirtualArray structure. The array map function will be trickier still as as C++ can't infer template types, so gencpp will need to be updated to generate extra code for arrays map callable.

Cppia isn't working in some cases. The below code snippet gives a runtime error of unknown array type:Dynamic.

class Main {
	static function main() {
		final a = []; //new Array<Dynamic>();
		final f = () -> trace('hello, world!');

		a.push(f);

		a[0]();
	}
}

Interestingly if you uncomment and use that new Array<Dynamic>(); instead of [] everything works fine! I don't have much knowledge of cppia's internals but I've found some debug logging which will hopefully point me in the right direction.

I've also been experimenting with the dynamic run functions of hxcpp's internals. Dynamic calling is handled by a series of virtual functions on the base ::hx::Object type.

__Run(const Array<::Dynamic> args)

and

__run(const Dynamic& args0);
__run(const Dynamic& args0, const Dynamic& arg1);
// etc, etc...

The very large number of virtual __run overloads have been removed in favour of a non virtual __run which takes in templated pack parameters, a fold macro is then used to bundle them into an Array<Dynamic> for forwarding into that (still) virtual __Run function.

// defined at the bottom of Array.h due to header include order...
template<class ...TArgs>
Dynamic hx::Object::__run(const TArgs& ...args)
{
    auto arr = Array_obj<::Dynamic>::__new(0, sizeof...(args));

    (arr->push(Dynamic(args)), ...);

    return __Run(arr);
}

The callable object also now marks it's __Run as final due to it using an index sequence to automatically unpack the arrays contents into the _hx_run call, this way the compiler no longer needs to generate code for the __Run functions and any hand written callables in hxcpp also don't need to have __Run defined.

Dynamic __Run(const Array<Dynamic>& inArgs) override final
{
    return apply(inArgs, std::index_sequence_for<TArgs...>());
}

template<size_t... I>
Dynamic apply(const Array<Dynamic>& inArgs, std::index_sequence<I...>)
{
    if constexpr (std::is_void<TReturn>())
    {
        _hx_run(inArgs[I] ...);

        return null();
    }
    else
    {
        return _hx_run(inArgs[I] ...);
    }
}

virtual TReturn _hx_run(TArgs... args) = 0;

@skial skial mentioned this pull request Jun 14, 2023
1 task
@Aidan63
Copy link
Contributor Author

Aidan63 commented Jul 26, 2023

Managed to reduce the required C++ version down from 17 to 11, also gone through and updated all the _dyn functions in the built in hxcpp classes (String, Maths, Array, VirtualArray) to return callables instead of dynamic. The array map_dyn function was a bit of a bugger, but I managed to get it working.

I.e.

function main() {
	final myArr = [ 1, 2, 5 ];
	final myMap = myArr.map;

	indirect1(myMap);
	indirect2(myArr.map);
}

function indirect1(func : (f : Int->String) -> Array<String>) {
	trace(func(i -> Std.string(i)));
}

function indirect2(func : (f : Int->Float) -> Array<Float>) {
	trace(func(i -> i + 0.5));
}

Now produces

void Main_Fields__obj::main() {
	::Array< int > myArr = ::Array_obj< int >::fromData( _hx_array_data_1a48cc76_1,3);
	::hx::Callable< ::Array< ::String >(::hx::Callable< ::String(int)>)> myMap = myArr->map_dyn< ::String >();
	::_Main::Main_Fields__obj::indirect1(myMap);
	::_Main::Main_Fields__obj::indirect2(myArr->map_dyn< Float >());
}

where the templated map_dyn function allows everything to be typed with no dynamic. This appears to be the only function out of all the hxcpp implemented classes which needs this special template handling to prevent dynamic cropping up.

To get cppia working I had to disable the MemReference setting for objects, the reason for this is that every gets downcasted to hx::Object* and you can have a cppia script which replaces a hx::Callable's pointer with a pointer to a non callable_obj. Usually the constructors and implicit conversions take care of this, but MemReference directly re-assigns pointer addresses.

With that I think everything is good to go, the haxe repo tests pass with cpp and cppia, and the hxcpp repo tests seem fine as well. I'll update the main post with the core changes made instead of it being spread across multiple comments as well as some things which might want to be done for this merge or later down the line as nice improvements.

@Aidan63 Aidan63 marked this pull request as ready for review July 26, 2023 17:13
@skial skial mentioned this pull request Jul 27, 2023
1 task
@Aidan63
Copy link
Contributor Author

Aidan63 commented Jan 8, 2024

Updated both merges to latest changes, also started writing a post for my blog about these changes if and when they get merged. As part of that I got out the old excel to graph out those call benchmark results, I modified the benchmarks a bit to include up to four arguments and to use ints instead of strings (using int values >255 to avoid hxcpp's small int boxing cache) for maximum boxing comparison which showed some interesting results.

image

Dynamic calling is now slower than before but I'm not too concerned as if you're explicitly using dynamic I don't think anyone gets to complain about performance. Whats more interesting is the "anon" and "class to anon" cases are still using dynamic calling. This appears to be because gencpp has some special cases for function calling on cpp.Variant (what anon types return) which goes through dynamic calling.
Going to see if I can update that to cast the variant to a callable before function calling so these two cases will then be up to the constant speed line of the others leaving just explicit dynamic calling as slow.

@Simn
Copy link
Member

Simn commented Jan 8, 2024

I agree with that assessment and did the same on genjvm. Dynamic calls are comparatively slow there, but anon and class to anon are nicely optimized. Some of that is more about the implementation of anons itself rather than functions, because "true" anons actually end up as classes and access to them is essentially optimized as (expr is TrueAnon) ? (expr : TrueAnon).field : slowDynamicThing(expr, field). This might be a bit out-of-scope here though!

Edit: Just ran the benchmarks again on JVM, I don't remember the difference being this extreme:

Case: Calls
  Suite: call with 0 args
              anon: 107,172,087 ( 88.03%)
     class-to-anon: 119,446,709 ( 98.11%)
           dynamic:   2,874,844 (  2.36%)
     field closure: 115,100,667 ( 94.54%)
             final: 120,627,932 ( 99.08%)
          instance: 114,472,977 ( 94.03%)
         interface: 120,076,398 ( 98.63%)
     local closure: 121,716,550 ( 99.98%)
    local function: 120,335,417 ( 98.84%)
          override: 119,621,562 ( 98.26%)
            static: 111,061,004 ( 91.22%)
    static closure: 121,738,020 (100.00%)

@Aidan63
Copy link
Contributor Author

Aidan63 commented Jan 8, 2024

Anon objects on hxcpp are a weird type, unless you annotate the instantiation with @:fixed it won't be a class with fields, but instead will be a hybrid of string hash and array. Where fields in the anon types instantiation are stored contiguously and any fields added after the fact (reflection, casting to another anon with extra fields, etc) are stored in the hash map, so there will always be some dynamic-ness. But that shouldn't need to be extended to the function call in this case.

Just ran the jvm calling bench on my machine. Dynamic calling is slower than cpp, but not ~2% slow.

Case: Calls
  Suite: call with 0 args
              anon: 16,773,618 ( 97.71%)
     class-to-anon: 16,559,562 ( 96.47%)
           dynamic:  3,148,377 ( 18.34%)
     field closure: 17,165,295 (100.00%)
             final: 16,894,833 ( 98.42%)
          instance: 16,920,759 ( 98.57%)
         interface: 16,643,065 ( 96.95%)
     local closure: 16,871,213 ( 98.28%)
    local function: 16,831,601 ( 98.05%)
          override: 17,042,208 ( 99.28%)
            static: 16,646,746 ( 96.97%)
    static closure: 16,943,954 ( 98.71%)

@Aidan63
Copy link
Contributor Author

Aidan63 commented Jan 9, 2024

Had a play around with that this evening and got some interesting results. I added an extra case for TCppVariant in gencpp to cast the result to a callable before invoking it, to my surprise this made zero difference!
This is because I insert the "cast to callable" derrived from the argument expression types and expected expression return (TCppVariant holds no knowledge of the original anon field...). In the anon benchmark the return value is not assigned so it attempts to cast the function to stuff like Int->Int->Int->Int->Void instead of Int->Int->Int->Int->Int which means a wrapper callable is generated which adds the overhead.
Assigning the result of the call in the benchmark to an int variable then gets it to cast to the correct type but performance was still just shy of 60%, not the near 100% I want. Turns out the limit is now due to anon object lookup being slow on hxcpp and this happens on each of the benchmark iterations before invoking the function. Caching the result of the anon field into local variable and calling that in the benchmark iteration gets us up to the >95% speed!

I'm going to look into having TCppVariant hold information about the field it was derrived from so I can make accurate casts instead of guessing from the expressions being passed into the call, this should solve the first part nicely. The second one I'm not massively concerned about as improving anon objects is out of scope for this and this issue probably won't come up in situations outside of benchmarks like this.

@skial skial mentioned this pull request Jan 10, 2024
1 task
@Aidan63
Copy link
Contributor Author

Aidan63 commented Jan 10, 2024

TCppVariants derrived from anon field access now stores the type of the field with it so I can generate correct casts.

Plotting out the graph again shows that only the dynamic case now decreases with argument count, "class to anon" and "anon" is still significantly below the others due to it performing an anon field lookup in each iteration. The right graph shows the results again but I cache the looked up anon field before the iterate to "hide" that cost and have the benchmark be entirely the call speed. In this case they go up to >95% which is what I want.

image

@Simn
Copy link
Member

Simn commented Jan 10, 2024

Nice! I like seeing these graphs, it's always good to visualize progress like that.

I'd like to run this PR on https://benchs.haxe.org/ too, but it would be better if CI wasn't failing like it currently does. Can we get to a state where it at least doesn't fail with Error: Hxcpp is out of date - please update?

@Aidan63
Copy link
Contributor Author

Aidan63 commented Jan 10, 2024

For the cpp tests to run it will need the hxcpp pr linked in the description merged in first since this haxe merge sets the hxcpp API level to 500 which so far only exists in my hxcpp branch. The hxcpp merge ci is passing (with the exception of the unrelated flakey mac ci) which means I haven't broken backwards compatibility with my changes, so if @hughsando is happy with the overall approach and doesn't spot any issues then that should be good to merge.

After that if we re-run the ci on this branch it should start to run and hopefully get all green for the first time (cpp and cppia tests all pass for me locally when using my hxcpp branch).

@Aidan63
Copy link
Contributor Author

Aidan63 commented Aug 30, 2024

Updated this branch and the matching hxcpp one to dev / master.

Also managed to find a better solution to the template and partial declaration issue I came across. Since the templated __run function on the base hx::Object type now just wrangles the parameters into an array and forwards that into the non template __Run I just removed __run from hx::Object.
The functionality of the old __run is now gained using hx::invoker::invoke, this means the previous work around with an extra intermediate Closure type has been removed which cleans things up nicely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants