Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization for scanning less layers. #8311

Merged
merged 2 commits into from
May 11, 2020
Merged

Conversation

alex-ong
Copy link
Contributor

@alex-ong alex-ong commented Mar 4, 2020

This is a quick cleanup of layer_state_t that allows users to input their maximum layer number.

Description

Generally, most users have 2-4 layers. Scanning from 31 down to 0 and checking KC_NO /KC_TRANS 32 times before hitting layer one takes quite a bit of time.
This speeds it up by scanning from NUM_LAYERS, which is automatically defined if you've used #define LAYER_STATE_8BIT, or you can define it in your config.h.

  • Added auto define for NUM_LAYERS
  • Changed scanning code from highest_layer (32) to NUM_LAYERS

I've tested with LAYER_STATE_8BIT + NUM_LAYERS = 2, on my 2 layer keyboard.
One caveat is you have to put in your maximum layer number + 1 (e.g. [0] and [1] will need NUM_LAYERS = 2. [0] [1] [4] will require NUM_LAYERS = 5)

Types of Changes

  • Core
  • Bugfix
  • New feature
  • Enhancement/optimization
  • Keyboard (addition or update)
  • Keymap/layout/userspace (addition or update)
  • Documentation

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have tested the changes and verified that they work and don't break anything (as well as I can manage).

@alex-ong
Copy link
Contributor Author

alex-ong commented Mar 4, 2020

One possible change is renaming it to MAX_LAYER_INDEX and removing the -1 from the scanning code..

@zvecr
Copy link
Member

zvecr commented Mar 4, 2020

Any performance metrics on if this is actually real world faster?

@alex-ong
Copy link
Contributor Author

alex-ong commented Mar 5, 2020

Lucky there's Travis - Not sure why the #define failed on that other keyboard.
Yes i'll try and get performance metrics this weekend.

From memory (like last time lol) it gave a small but definitely measurable improvement. Again the case that i'm targeting is pressing "zxcv" at the same time. From memory with the sym_g algorithm back in 2018, pressing zxcv would cause no chatter on entire keyboard (lets say 15ms), then finally it would know that there are zxcv all pressed.

In TMK 2018, it would then do the following (with QMK_KEYS_PER_SCAN 4):

Process Z, queue Z
Process X, queue X
Process C, queue C
Process V, queue V
send zxcv

Process involves checking layers, then finally enacting an action.c. Then queuing it. Note that QMK does not currently support multiple keys per USB message, so the absolute best case scenario is getting z,x,c,v one ms apart from each other. #4904.

This would mean current qmk does this (with QMK_KEYS_PER_SCAN 4)

scan
Process Z, send Z (blocking)
Process X, send X (blocking)
Process C, send C (blocking)
Process V, send V (blocking)
scan

In 2018 this bug wasn't there, so (again, from long term memory), it was sending all zxcv in one packet, which was good, but it was taking 2-3 milliseconds to process all the keys. adding latency. Adding the scan from layer 1 (i only use two layers) sped this up significantly. The effect is most compounded when using sym_g + a slow scan rate in both qmk 2020(yikes) and tmk 2018 (little yikes). And even more so if you press 6-8 keys. Imagine a rhythm game where you have to press multi's 6-8ms earlier than singles. Yeah that's yikes.

Anyway, I will definitely benchmark this. It might not give real world performance yet due to #4904 being a thing.

E.g. if it cuts half a millisecond/press but we had .9 milliseconds in the window before the 1000hz poll, there would be no realworld performance. If you're using sym_g it would help, but eager_pk alleviates it significantly since you're more likely to press 8 keys over a 4ms window than a 0ms window

@drashna
Copy link
Member

drashna commented Mar 5, 2020

A good question would be to see if there is way to get the number of layers automatically, and handle stuff based on that. I'm not sure if that would help, or be faster. Or a way to just skip layers not set.

@alex-ong
Copy link
Contributor Author

alex-ong commented Mar 5, 2020

Yep, that was discussed in TMK land too, and i couldn't find a clean solution for automatically determining number of layers at compile time. My "best" solution was a sane default and/or a shell script. QMK is a bit ahead of the curve thanks to one of your commits (or was it zvecr) since it already defines how many bits layer_state_t is, and then you can at least use 8/16/32 as a default instead of just 32.

We could do something at runtime. Like in keyboard_init(), literally check if the entire layer is full of KC_TRANS, right? Then iterate from layer max_layer_for_bits() down to 0 and stop when we find a non-KC_TRANS. I don't like this approach since it's quite ugly, and adds a lot of bytes. Though once I know performance numbers we can go from there.

tmk/tmk_keyboard#523

@drashna
Copy link
Member

drashna commented Mar 6, 2020

QMK is a bit ahead of the curve thanks to one of your commits (or was it zvecr) since it already defines how many bits layer_state_t is, and then you can at least use 8/16/32 as a default instead of just 32.

ROFLMAO. #3637 :)

And yeah, I suspected that it was something like that. Having a shell script that ran during make, and checked the keymap.c/json file could work, but it would be complicated, and make is already really complicated.

Though, @yanfali this is something that the configurator could do automatically, since it's a known quantity that has to be processed anyways.

@alex-ong
Copy link
Contributor Author

alex-ong commented Mar 7, 2020

Alright, i've done my testing:
Keyboard: xealousbrown
scan_rate: 7700~

So i simply modified action_layer.c, adding a loop to do the layer check multiple times (just 20).

uint8_t layer_switch_get_layer(keypos_t key) {
#ifndef NO_ACTION_LAYER
    action_t action;
    action.code = ACTION_TRANSPARENT;
    uint16_t      timer  = timer_read();
    layer_state_t layers = layer_state | default_layer_state;
    /* check top layer first */
    uint8_t result = 0;
    for (uint8_t t = 0; t < 20; t++) {
        for (int8_t i = 2; i >= 0; i--) {
            if (layers & (1UL << i)) {
                action = action_for_key(i, key);
                if (action.code != ACTION_TRANSPARENT) {
                    result = i;
                    break;
                }
            }
        }
    }
    uint16_t timer2 = timer_read();
    if (timer2 != timer && timer2 != timer + 1) {
        print("get_layer:\n");
        print_dec(timer2-timer);
        print("\n");
    }
    return result;

#else
    return get_highest_layer(default_layer_state);
#endif
}

Finally, i just pressed the letter "a". It's in layer 0.

Layers Output TotalMs Divided by 20 Divided by 60
32 get_layer: 5 get_layer: 5 get_layer: 5 15 0.75 0.25
16 get_layer: 2 get_layer: 2 get_layer: 2 6 0.3 0.1
2 (no output) 0 0 0

The surprising thing is that it calls layer_switch_get_layer three times when i press the key. From memory it used to do this once, but maybe i remembered wrong. Note that obviously 2 layers doesn't take no time, it is simply rounded. Likewise 32 and 16 layers times are approximated.

I'm simplifying the explanation to :

  • scan()
  • layer_switch_get_layer()
  • process_action()

Is 0.75ms a significant gain? If your scan() + process_action() is above x.25 ms, it bumps your send to the next millisecond. With 16 layers, your scan() + process_action() has to be above x.70 ms. etc etc.

Basically this easy fix (assuming 2-4 layers) would reduce latency by 1ms 75% of the time. Pretty good for a 4-5 line code change! Combined with eager_pk (5ms improvement) and optimized matrix scanning (1-3ms improvement) you can see I'm really trying to whittle down input latency.

@tzarc tzarc requested a review from a team March 8, 2020 04:59
@tzarc
Copy link
Member

tzarc commented Mar 8, 2020

Does sizeof(keymaps) / (NUM_ROWS*NUM_COLS*sizeof(uint8_t)) work for determining the number of keymaps? Or is it treated like a pointer despite the extern and []?

@alex-ong
Copy link
Contributor Author

alex-ong commented Mar 8, 2020

Good idea! Which file would we put it in though?

@tzarc
Copy link
Member

tzarc commented Mar 8, 2020

Was more referring to this section:

We could do something at runtime. Like in keyboard_init(), literally check if the entire layer is full of KC_TRANS, right? Then iterate from layer max_layer_for_bits() down to 0 and stop when we find a non-KC_TRANS. I don't like this approach since it's quite ugly, and adds a lot of bytes. Though once I know performance numbers we can go from there.

I don't know specifics as far as where needs changing, just that if the intention was to determine the highest available layer on the keymap, I was curious if it'd suffice.

@alex-ong
Copy link
Contributor Author

alex-ong commented Mar 8, 2020

Right, I do like using some form of sizeof to get max layer. I guess also people who use layers 1…4 then say 10 would get rekt. Not sure if the sizeof() compiles since it's have to be in layers.c to be shared which doesn't have access to the specific keyboards mapping? So the question for me is where to implement it. But great idea.

@tzarc
Copy link
Member

tzarc commented Mar 12, 2020

Yeah it's annoying that it complains that it's an incomplete type if you try to take sizeof from another compilation unit.
I think the remaining option would be to perform the initial iteration through the layers and cache the result. Could be done as one of the first things executed, even before USB is attempted to be connected. Unsure how long it'd take...

@tzarc
Copy link
Member

tzarc commented Mar 21, 2020

How about something like this in the init code somewhere:

int max_layer = 0;
for(int l = 0; l < MAX_LAYER; ++l) {
    for(int r = 0; l != max_layer && r < MATRIX_ROWS; ++r) {
        for(int c = 0; l != max_layer && c < MATRIX_COLS; ++c) {
            if(keymaps[l][r][c] != KC_TRNS || keymaps[l][r][c] != KC_NO) {
                max_layer = l;
                break;
            }
        }
    }
}

Probably needs re-execution if VIA is enabled, too.

@drashna
Copy link
Member

drashna commented Mar 21, 2020

For via, there is:

# define DYNAMIC_KEYMAP_LAYER_COUNT 4

@tzarc
Copy link
Member

tzarc commented Mar 21, 2020

For via, there is:

# define DYNAMIC_KEYMAP_LAYER_COUNT 4

Theoretically though, my STM32L082 could support nearly 50 layers in VIA, if I were to use the full EEPROM. The hard-coded max is fine for an upper cap, it'd still be preferable to make sure the max is detected.

Given that it's likely only to occur once on startup, or once during VIA update, it's not likely to be noticeable.

@alex-ong
Copy link
Contributor Author

What is VIA? Also that implementation seems fine. I think maybe we should update documentation and tell people to use contiguous layers if possible also; though iirc not many people would use layers [0, 1, 2, 3, 9].

@tzarc
Copy link
Member

tzarc commented Mar 22, 2020

VIA is a configurable item VIA_ENABLE = yes which allows use of the VIA client (http://caniusevia.com), and allows configuration of layers at runtime, updating EEPROM with new keymaps without requiring reflashing a new firmware.

The dynamically updatable bit is where we'd need to hook in, in order to be able to correctly identify the highest layer used by the user's VIA keymap definition.

@drashna
Copy link
Member

drashna commented Mar 23, 2020

In this context, though, it's the "dynamic keymap" feature that stores and reads the keymap from EEPROM rather than progmem.

@purdeaandrei
Copy link
Contributor

purdeaandrei commented Apr 11, 2020

@tzarc

How about something like this in the init code somewhere:

int max_layer = 0;
for(int l = 0; l < MAX_LAYER; ++l) {
    for(int r = 0; l != max_layer && r < MATRIX_ROWS; ++r) {
        for(int c = 0; l != max_layer && c < MATRIX_COLS; ++c) {
            if(keymaps[l][r][c] != KC_TRNS || keymaps[l][r][c] != KC_NO) {
                max_layer = l;
                break;
            }
        }
    }
}

I'm confused, why would this work? If higher layers are not assigned in the keymap, then sizeof() of the array will be smaller, and addressing an inexistent layer would overflow the layer and would read something else from progmem, which is undefined, and depends on what order the linker places sections in. Or is there some kind of hack somewhere I'm not aware of to fill the following memory with KC_TRANS/KC_NO codes?

I can think of the following ways to count the number of layers:

  1. GCC is already configured to output a separate section for everything (that will be garbage-collected at linking time) (-fdata-sections). So for example on AVR, the keymap will be in an input object file section called ".progmem.data.keymaps". One could write a custom ldscript for the linker, to generate a symbol at the address immediately following .progmem.data.keymaps section.
    Here I added the default AVR ldscript (which is in avr-gcc), and added on top of that a modification, that I am suggesting: https://gist.github.com/purdeaandrei/38763cd30574cf6fe8616ffe4933d09c/revisions

    The in C code you could do something like:

    extern char PROGMEM END_OF_KEYMAPS;
    

    then you could do a runtime calculation such as

    layers_count = (((uint16_t)(&END_OF_KEYMAPS)) - (uint16_t)(&keymaps)) / sizeof(keymaps[0]);
    

    The downside is that you'd have to do this for every architecture QMK builds on, and it may be a little different on every architecture. I have not idea what the sections, and default ldscripts look like on ARM for example, but I think it's likely the keymaps section will likely not be called ".progmem.data.keymaps", but something else.

  2. some shell script, to take the compiled keymap.o file, extract the size of the keymaps, and generate another .c file that will contain something like "const int layers = 123;". I don't suggest generating an .h file because then it will be a nightmare to determine order of compilation of things. Generating a .c file the only order you have to maintain is keymap.o -> kemapsize.c -> keymapsize.o -> final_full_keyboard_firmware.elf

  3. All of the above seems to some degree hackish. I think long-term we should do a breaking change, that will force all keymaps to include a line like "const char keymap_layers = sizeof(keymaps)/sizeof(keymaps[0]);"

  4. Another (slightly hackish) change but may be easier to do would be to:

    cat (...)/keymap.c > .build/keymap_generated.c
    echo "const char keymap_layers = sizeof(keymaps)/sizeof(keymaps[0]);" >> .build/keymap_generated.c
    

    And then including keymap_generated.c in the build, but excluding the original keymap.c.
    This would work the same for all architectures, but the disadvantage compared to 3) is that you'd have to always name the file that contains your keymap, as keymap.c

I think we should do 4) and in the longterm switch to 3) with a breaking change.

All of the above that I have written, I don't really know how it would apply to VIA, as I have not yet investigated using it.

@alex-ong
Copy link
Contributor Author

alex-ong commented Apr 12, 2020

I'm confused, why would this work? If higher layers are not assigned in the keymap, then sizeof() of the array will be smaller, and addressing an inexistent layer would overflow the layer and would read something else from progmem

Good catch. Right. It'd have to be changed to code similar to how action_layer.c works, which uses get_action_for_key(layer,key). I'm assuming this function is safe, since keyboard scan from 32 downwards already :D

I think the existing solution (opt-in user defined MAX_LAYER) is good. The only "clean" way forward is:

  1. force people to use contiguous layers from 0
  2. use sizeof, since its now contiguous.

The solutions you suggested are basically writing a hack because we want to make it easy for users with gaps in their maps. I had the same thoughts. My original solution would be hackerman shell script such as (2) or (4). In terms of cleanliness, its probably better not to have hackerman shells and instead do the long and boring "scan full keymap" method described above but use get_action_for_key for safety.

But in the end i do agree; we should eventually just make a breaking change.

@purdeaandrei
Copy link
Contributor

purdeaandrei commented Apr 12, 2020

@alex-ong

  1. force people to use contiguous layers from 0

I'm not sure why you care about contiguous/non-contiguous layers, See this example:
I have a 8-key keyboard with layers 0, 1, and 2, and this gets compiled as:

0000012a <keymaps>:
     12a:       01 51 04 00 05 00 06 00 07 00 08 00 09 00 2c 00     .Q............,.
     13a:       01 00 1e 00 1f 00 20 00 21 00 22 00 23 00 02 51     ...... .!.".#..Q
     14a:       01 00 0a 00 0b 00 00 5c 0d 00 0e 00 01 00 01 00     .......\........

If I change it, and use only layers 0, 1, and 3, this is in the output of objdump:

0000012a <keymaps>:
     12a:       01 51 04 00 05 00 06 00 07 00 08 00 09 00 2c 00     .Q............,.
     13a:       01 00 1e 00 1f 00 20 00 21 00 22 00 23 00 03 51     ...... .!.".#..Q
        ...
     15a:       01 00 0a 00 0b 00 00 5c 0d 00 0e 00 01 00 01 00     .......\........

While the .elf doesn't specify what to fill the hole with, the hole must always be there. This is how .c arrays work (and by the way, looking in the .hex file for AVR, the hole is filled with zeros, but that may not be true for all architectures)

In the second case I have looked at the value of sizeof(keymaps)/sizeof(keymaps[0]), and it's printed as "4".

So effectively the hole is treated as if there was a layer there, that just isn't in use ever. Of course it's inefficient to have holes in your keymap, but I don't see the reason to force the user to not have holes.

I think the existing solution (opt-in user defined MAX_LAYER) is good.

Yes, I agree, it's better then what's currently in there. But let me try to propose yet another solution that may be a little better:
What if we put this in some .c file?

#if defined(LAYER_STATE_8BIT)
const unsigned char keymaps_size __attribute__((weak)) = 8;
#elif defined(LAYER_STATE_16BIT)
const unsigned char keymaps_size __attribute__((weak)) = 16;
... (and so on)

And in some header file you should have:

extern const unsigned char keymaps_size;

And in action_layer.h:

#define MAX_LAYER keymaps_size

This solution occupies 1 extra byte of RAM (which is very scarce on AVR), so another option would be:
in some .c file:

#if defined(LAYER_STATE_8BIT)
const unsigned char PROGMEM keymaps_size __attribute__((weak)) = 8;
#elif defined(LAYER_STATE_16BIT)
const unsigned char PROGMEM keymaps_size __attribute__((weak)) = 16;
... (and so on)

And in some header file you should have:

extern const unsigned char keymaps_size;

And in action_layer.h:

#define MAX_LAYER pgm_read_byte(&keymaps_size)

So the advantage of the above solutions, is that the user can use the following in their keymap.c, to further optimize their keyboard:

const unsigned char PROGMEM keymaps_size = sizeof(keymaps)/sizeof(keymaps[0]);

This way they don't have to be aware of holes in their keymap, and they don't have to count the number of layers they have to redefine MAX_LAYER. Also we could put the above line in the default keymap template that is used when you run the new keymap script, so a lot of new keyboards would have this by default. We don't need to make a breaking change immediately to force people to give us the keymap size, but it will be available. And we could also do a massive commit eventually to add it into all existing keymaps.

@drashna
Copy link
Member

drashna commented Apr 12, 2020

I'm not sure why you care about contiguous/non-contiguous layers, See this example:

Because, it seems to increase the firmware size. And people that are counting the bytes (like myself), it can be a problem.

Unless I'm misinterpreting things.

@purdeaandrei
Copy link
Contributor

purdeaandrei commented Apr 12, 2020

@drashna

I'm not sure why you care about contiguous/non-contiguous layers, See this example:

Because, it seems to increase the firmware size. And people that are counting the bytes (like myself), it can be a problem.

I agree it's inefficient performance-wise, and firmware size-wise to have holes in your keymap, but why force the user to not insert holes? That's what I meant.

@alex-ong
Copy link
Contributor Author

The reasons you stated were the reason I would force it. One of the general complaints about qmk is firmware bloat and speed reduction over time. But gaps do lead to ease of use in terms of programming your layouts. Hackerman layer renumbering, or making people use enums for layers can help keep layers contiguous, and is cleaner anyway.

Maybe forcing is unnecessary, this thread is more about automatically determining max layer after all, which should account for gaps.

Your solution looks good; it's using the sizeof method but optionally. A further improvement would be if the end-user could just #define auto layers or something, or the number was 100% automatic with no user intervention, since users may be confused by pasting something as verbose as

const unsigned char PROGMEM keymaps_size = sizeof(keymaps)/sizeof(keymaps[0]);

I realise that sizeof needs to be on the keyboard side; just wondering if there's an even cleaner way.

The dumb count method on startup is good because it gets the layercount without users having to define anything, speeds up 100% of keyboards, but it's bad because it's size bloaty and kludgy and completely not succinct

@zvecr zvecr merged commit 361ac2f into qmk:master May 11, 2020
@zvecr
Copy link
Member

zvecr commented May 11, 2020

Thanks!

@sigprof
Copy link
Contributor

sigprof commented May 11, 2020

The name that was chosen in the finally merged version was really unfortunate — I would expect MAX_LAYER to be the index of the last layer, not the number of layers.

Good catch. Right. It'd have to be changed to code similar to how action_layer.c works, which uses get_action_for_key(layer,key). I'm assuming this function is safe, since keyboard scan from 32 downwards already :D

That code works only if nobody tries to activate a nonexistent layer. If somehow a bit for a nonexistent layer gets set in layer_state or default_layer_state, the code would still try to read beyond the initialized part of the keymaps array.

const unsigned char PROGMEM keymaps_size = sizeof(keymaps)/sizeof(keymaps[0]);

This looks nice; however, avr-gcc would probably be unable to optimize it to a real constant even with LTO (because of pgm_read_byte). And defining a rough limit (like LAYER_STATE_8BIT) would still need to be done manually.

bitherder pushed a commit to bitherder/qmk_firmware that referenced this pull request May 15, 2020
* Optimization for scanning less layers.

* Rename NUM_LAYERS to MAX_LAYER.
drashna pushed a commit to zsa/qmk_firmware that referenced this pull request May 24, 2020
* Optimization for scanning less layers.

* Rename NUM_LAYERS to MAX_LAYER.
sowbug pushed a commit to sowbug/qmk_firmware that referenced this pull request May 24, 2020
* Optimization for scanning less layers.

* Rename NUM_LAYERS to MAX_LAYER.
fdidron pushed a commit to zsa/qmk_firmware that referenced this pull request Jun 12, 2020
* Optimization for scanning less layers.

* Rename NUM_LAYERS to MAX_LAYER.
turky pushed a commit to turky/qmk_firmware that referenced this pull request Jun 13, 2020
* Optimization for scanning less layers.

* Rename NUM_LAYERS to MAX_LAYER.
jakobaa pushed a commit to jakobaa/qmk_firmware that referenced this pull request Jul 7, 2020
* Optimization for scanning less layers.

* Rename NUM_LAYERS to MAX_LAYER.
sjmacneil pushed a commit to sjmacneil/qmk_firmware that referenced this pull request Feb 19, 2021
* Optimization for scanning less layers.

* Rename NUM_LAYERS to MAX_LAYER.
BorisTestov pushed a commit to BorisTestov/qmk_firmware that referenced this pull request May 23, 2024
* Optimization for scanning less layers.

* Rename NUM_LAYERS to MAX_LAYER.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants