-
-
Notifications
You must be signed in to change notification settings - Fork 40.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization for scanning less layers. #8311
Conversation
One possible change is renaming it to |
Any performance metrics on if this is actually real world faster? |
Lucky there's Travis - Not sure why the #define failed on that other keyboard. From memory (like last time lol) it gave a small but definitely measurable improvement. Again the case that i'm targeting is pressing "zxcv" at the same time. From memory with the sym_g algorithm back in 2018, pressing zxcv would cause no chatter on entire keyboard (lets say 15ms), then finally it would know that there are zxcv all pressed. In TMK 2018, it would then do the following (with QMK_KEYS_PER_SCAN 4):
Process involves checking layers, then finally enacting an action.c. Then queuing it. Note that QMK does not currently support multiple keys per USB message, so the absolute best case scenario is getting z,x,c,v one ms apart from each other. #4904. This would mean current qmk does this (with QMK_KEYS_PER_SCAN 4)
In 2018 this bug wasn't there, so (again, from long term memory), it was sending all zxcv in one packet, which was good, but it was taking 2-3 milliseconds to process all the keys. adding latency. Adding the scan from layer 1 (i only use two layers) sped this up significantly. The effect is most compounded when using sym_g + a slow scan rate in both qmk 2020(yikes) and tmk 2018 (little yikes). And even more so if you press 6-8 keys. Imagine a rhythm game where you have to press multi's 6-8ms earlier than singles. Yeah that's yikes. Anyway, I will definitely benchmark this. It might not give real world performance yet due to #4904 being a thing. E.g. if it cuts half a millisecond/press but we had .9 milliseconds in the window before the 1000hz poll, there would be no realworld performance. If you're using sym_g it would help, but eager_pk alleviates it significantly since you're more likely to press 8 keys over a 4ms window than a 0ms window |
A good question would be to see if there is way to get the number of layers automatically, and handle stuff based on that. I'm not sure if that would help, or be faster. Or a way to just skip layers not set. |
Yep, that was discussed in TMK land too, and i couldn't find a clean solution for automatically determining number of layers at compile time. My "best" solution was a sane default and/or a shell script. QMK is a bit ahead of the curve thanks to one of your commits (or was it zvecr) since it already defines how many bits layer_state_t is, and then you can at least use 8/16/32 as a default instead of just 32. We could do something at runtime. Like in keyboard_init(), literally check if the entire layer is full of KC_TRANS, right? Then iterate from layer max_layer_for_bits() down to 0 and stop when we find a non-KC_TRANS. I don't like this approach since it's quite ugly, and adds a lot of bytes. Though once I know performance numbers we can go from there. |
ROFLMAO. #3637 :) And yeah, I suspected that it was something like that. Having a shell script that ran during make, and checked the keymap.c/json file could work, but it would be complicated, and make is already really complicated. Though, @yanfali this is something that the configurator could do automatically, since it's a known quantity that has to be processed anyways. |
Alright, i've done my testing: So i simply modified action_layer.c, adding a loop to do the layer check multiple times (just 20). uint8_t layer_switch_get_layer(keypos_t key) {
#ifndef NO_ACTION_LAYER
action_t action;
action.code = ACTION_TRANSPARENT;
uint16_t timer = timer_read();
layer_state_t layers = layer_state | default_layer_state;
/* check top layer first */
uint8_t result = 0;
for (uint8_t t = 0; t < 20; t++) {
for (int8_t i = 2; i >= 0; i--) {
if (layers & (1UL << i)) {
action = action_for_key(i, key);
if (action.code != ACTION_TRANSPARENT) {
result = i;
break;
}
}
}
}
uint16_t timer2 = timer_read();
if (timer2 != timer && timer2 != timer + 1) {
print("get_layer:\n");
print_dec(timer2-timer);
print("\n");
}
return result;
#else
return get_highest_layer(default_layer_state);
#endif
} Finally, i just pressed the letter "a". It's in layer 0.
The surprising thing is that it calls I'm simplifying the explanation to :
Is 0.75ms a significant gain? If your scan() + process_action() is above Basically this easy fix (assuming 2-4 layers) would reduce latency by 1ms 75% of the time. Pretty good for a 4-5 line code change! Combined with eager_pk (5ms improvement) and optimized matrix scanning (1-3ms improvement) you can see I'm really trying to whittle down input latency. |
Does |
Good idea! Which file would we put it in though? |
Was more referring to this section:
I don't know specifics as far as where needs changing, just that if the intention was to determine the highest available layer on the keymap, I was curious if it'd suffice. |
Right, I do like using some form of sizeof to get max layer. I guess also people who use layers 1…4 then say 10 would get rekt. Not sure if the sizeof() compiles since it's have to be in layers.c to be shared which doesn't have access to the specific keyboards mapping? So the question for me is where to implement it. But great idea. |
Yeah it's annoying that it complains that it's an incomplete type if you try to take sizeof from another compilation unit. |
How about something like this in the init code somewhere: int max_layer = 0;
for(int l = 0; l < MAX_LAYER; ++l) {
for(int r = 0; l != max_layer && r < MATRIX_ROWS; ++r) {
for(int c = 0; l != max_layer && c < MATRIX_COLS; ++c) {
if(keymaps[l][r][c] != KC_TRNS || keymaps[l][r][c] != KC_NO) {
max_layer = l;
break;
}
}
}
} Probably needs re-execution if VIA is enabled, too. |
For via, there is: qmk_firmware/quantum/dynamic_keymap.c Line 26 in c6b6676
|
Theoretically though, my STM32L082 could support nearly 50 layers in VIA, if I were to use the full EEPROM. The hard-coded max is fine for an upper cap, it'd still be preferable to make sure the max is detected. Given that it's likely only to occur once on startup, or once during VIA update, it's not likely to be noticeable. |
What is VIA? Also that implementation seems fine. I think maybe we should update documentation and tell people to use contiguous layers if possible also; though iirc not many people would use layers [0, 1, 2, 3, 9]. |
VIA is a configurable item The dynamically updatable bit is where we'd need to hook in, in order to be able to correctly identify the highest layer used by the user's VIA keymap definition. |
In this context, though, it's the "dynamic keymap" feature that stores and reads the keymap from EEPROM rather than progmem. |
I'm confused, why would this work? If higher layers are not assigned in the keymap, then sizeof() of the array will be smaller, and addressing an inexistent layer would overflow the layer and would read something else from progmem, which is undefined, and depends on what order the linker places sections in. Or is there some kind of hack somewhere I'm not aware of to fill the following memory with KC_TRANS/KC_NO codes? I can think of the following ways to count the number of layers:
I think we should do 4) and in the longterm switch to 3) with a breaking change. All of the above that I have written, I don't really know how it would apply to VIA, as I have not yet investigated using it. |
Good catch. Right. It'd have to be changed to code similar to how I think the existing solution (opt-in user defined MAX_LAYER) is good. The only "clean" way forward is:
The solutions you suggested are basically writing a hack because we want to make it easy for users with But in the end i do agree; we should eventually just make a breaking change. |
I'm not sure why you care about contiguous/non-contiguous layers, See this example:
If I change it, and use only layers 0, 1, and 3, this is in the output of objdump:
While the .elf doesn't specify what to fill the hole with, the hole must always be there. This is how .c arrays work (and by the way, looking in the .hex file for AVR, the hole is filled with zeros, but that may not be true for all architectures) In the second case I have looked at the value of So effectively the hole is treated as if there was a layer there, that just isn't in use ever. Of course it's inefficient to have holes in your keymap, but I don't see the reason to force the user to not have holes.
Yes, I agree, it's better then what's currently in there. But let me try to propose yet another solution that may be a little better:
And in some header file you should have:
And in action_layer.h:
This solution occupies 1 extra byte of RAM (which is very scarce on AVR), so another option would be:
And in some header file you should have:
And in action_layer.h:
So the advantage of the above solutions, is that the user can use the following in their keymap.c, to further optimize their keyboard:
This way they don't have to be aware of holes in their keymap, and they don't have to count the number of layers they have to redefine MAX_LAYER. Also we could put the above line in the default keymap template that is used when you run the new keymap script, so a lot of new keyboards would have this by default. We don't need to make a breaking change immediately to force people to give us the keymap size, but it will be available. And we could also do a massive commit eventually to add it into all existing keymaps. |
Because, it seems to increase the firmware size. And people that are counting the bytes (like myself), it can be a problem. Unless I'm misinterpreting things. |
I agree it's inefficient performance-wise, and firmware size-wise to have holes in your keymap, but why force the user to not insert holes? That's what I meant. |
The reasons you stated were the reason I would force it. One of the general complaints about qmk is firmware bloat and speed reduction over time. But gaps do lead to ease of use in terms of programming your layouts. Hackerman layer renumbering, or making people use enums for layers can help keep layers contiguous, and is cleaner anyway. Maybe forcing is unnecessary, this thread is more about automatically determining max layer after all, which should account for gaps. Your solution looks good; it's using the sizeof method but optionally. A further improvement would be if the end-user could just #define auto layers or something, or the number was 100% automatic with no user intervention, since users may be confused by pasting something as verbose as
I realise that sizeof needs to be on the keyboard side; just wondering if there's an even cleaner way. The dumb count method on startup is good because it gets the layercount without users having to define anything, speeds up 100% of keyboards, but it's bad because it's size bloaty and kludgy and completely not succinct |
Thanks! |
The name that was chosen in the finally merged version was really unfortunate — I would expect
That code works only if nobody tries to activate a nonexistent layer. If somehow a bit for a nonexistent layer gets set in
This looks nice; however, avr-gcc would probably be unable to optimize it to a real constant even with LTO (because of |
* Optimization for scanning less layers. * Rename NUM_LAYERS to MAX_LAYER.
* Optimization for scanning less layers. * Rename NUM_LAYERS to MAX_LAYER.
* Optimization for scanning less layers. * Rename NUM_LAYERS to MAX_LAYER.
* Optimization for scanning less layers. * Rename NUM_LAYERS to MAX_LAYER.
* Optimization for scanning less layers. * Rename NUM_LAYERS to MAX_LAYER.
* Optimization for scanning less layers. * Rename NUM_LAYERS to MAX_LAYER.
* Optimization for scanning less layers. * Rename NUM_LAYERS to MAX_LAYER.
* Optimization for scanning less layers. * Rename NUM_LAYERS to MAX_LAYER.
This is a quick cleanup of layer_state_t that allows users to input their maximum layer number.
Description
Generally, most users have 2-4 layers. Scanning from 31 down to 0 and checking KC_NO /KC_TRANS 32 times before hitting layer one takes quite a bit of time.
This speeds it up by scanning from NUM_LAYERS, which is automatically defined if you've used
#define LAYER_STATE_8BIT
, or you can define it in your config.h.I've tested with LAYER_STATE_8BIT + NUM_LAYERS = 2, on my 2 layer keyboard.
One caveat is you have to put in your maximum layer number + 1 (e.g. [0] and [1] will need NUM_LAYERS = 2. [0] [1] [4] will require NUM_LAYERS = 5)
Types of Changes
Checklist