Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fw_update: check for existing candidate image before attempting download #743

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

hasheddan
Copy link
Contributor

@hasheddan hasheddan commented Feb 3, 2025

Refactors fw_update logic to check whether the existing image in the candidate slot matches the desired update image in a received manifest. If so, we attempt to boot into the image without downloading it again.

Implementing involved some cleanup of the interface to fw_update platform ports. Namely, fw_update_post_download now returns a status, and fw_update_validate receives an image hash and size. The former is to allow for ports to return an error if they are unable to "finalize" a download. This is particularly relevant for esp_idf, as completing the download also implicitly performs image validation. The latter change is to allow for validation to be performed on a candidate image outside the context of the download of that image, which may have taken place prior to the most recent reboot.

Because ports can perform custom validation in fw_update_post_download and we implement hash validation in fw_update for downloaded images, the fw_update_validate implementation for a port is no longer invoked after download. Instead, it is now only invoked on receiving a new manifest to check whether the desired update image is already present. The primary reason for not also invoking fw_update_validate post download is that it is not implemented for all ports at this time, though it is also somewhat redundant with our existing hash check. There are two options that would allow for calling fw_update_validate post download and prior to download:

  1. Provide implementations for all ports.
  2. Allow for differentiating between "not supported" and "validation failed".

(1) feels like the right path long term, and given that we do not lose any functionality today by removing the post download call (i.e. ports previously just had no-op implementations), moving forward with the fw_update_post_download + fw_update's hash check seems like a reasonable course of action.


Work items that should follow this implementation:

  • We should update our OTA HIL testing to include downloading an image that will never connect to Golioth, then ensure that when we rollback and see the manifest again that we attempt to boot into it without download. The current tests effectively just validate that fw_update_validate always fails.
  • This implementation avoids downloading the same image over and over again, but we should also implement some sort of upgrade backoff to avoid booting into the new image over and over again. Continuously doing so results in flash wear from swapping the candidate image into the primary slot on upgrade and revert.

Updates fw_update_post_download to return a status and updates platform
port implementations to match. This is in service of subsequently moving
logic currently in fw_update_validate in some ports to
fw_update_post_download.

Signed-off-by: Daniel Mangum <[email protected]>
Moves the fw_update_post_download invocation out of the block download
callback. This makes it more obvious when and if post download operations
are invoked.

Signed-off-by: Daniel Mangum <[email protected]>
Updates the esp_idf port by moving the OTA completion logic from validation
into post download. While esp_idf OTA support does perform some validation
when completing a download, the primary purpose is closing the OTA download
context. This frees up fw_update_validate to be modified to allow for
calling outside the context of a specific download.

Signed-off-by: Daniel Mangum <[email protected]>
Copy link

github-actions bot commented Feb 3, 2025

Visit the preview URL for this PR (updated for commit bea64d7):

https://golioth-firmware-sdk-doxygen-dev--pr743-fw-update-chec-de1cpdoh.web.app

(expires Thu, 13 Feb 2025 22:56:54 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: a9993e61697a3983f3479e468bcb0b616f9a0578

@hasheddan hasheddan force-pushed the fw_update/check-pending-image branch from 1d7a337 to 0a23494 Compare February 3, 2025 14:27
Copy link

github-actions bot commented Feb 3, 2025

Code Coverage

Code Coverage

Package Line Rate Branch Rate Health
include.golioth 75% 50%
port.linux 62% 34%
port.utils 58% 46%
port.zephyr 53% 22%
src 67% 30%
Summary 66% (2719 / 4128) 30% (1126 / 3748)

Copy link

codecov bot commented Feb 3, 2025

Codecov Report

Attention: Patch coverage is 0% with 64 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
port/zephyr/golioth_fw_zephyr.c 0.00% 33 Missing ⚠️
src/fw_update.c 0.00% 27 Missing ⚠️
port/linux/fw_update_linux.c 0.00% 4 Missing ⚠️
Files with missing lines Coverage Δ
port/linux/fw_update_linux.c 0.00% <0.00%> (ø)
src/fw_update.c 0.00% <0.00%> (ø)
port/zephyr/golioth_fw_zephyr.c 0.00% <0.00%> (ø)

... and 5 files with indirect coverage changes

@hasheddan hasheddan force-pushed the fw_update/check-pending-image branch from f297bfd to b352821 Compare February 5, 2025 18:45
Comment on lines +264 to +267
/*
* @note This is similar to Zephyr's flash_img_check() but uses the Golioth
* port's sha256 API.
*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to being consistent with our sha256 implementation, this is also necessary because building with flash_img_check enabled with mbedtls is currently broken on nordic platforms due to zephyrproject-rtos/zephyr#85238

@hasheddan hasheddan force-pushed the fw_update/check-pending-image branch 2 times, most recently from 9c6508e to 1f0926e Compare February 5, 2025 18:58
@hasheddan hasheddan marked this pull request as ready for review February 5, 2025 19:29
Copy link
Contributor

@szczys szczys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @hasheddan!

{
return GOLIOTH_OK;
// TODO(hasheddan): support validating candidate firmware.
return GOLIOTH_ERR_FAIL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return GOLIOTH_ERR_FAIL;
return GOLIOTH_ERR_NOT_IMPLEMENTED;

// Nothing to do
return GOLIOTH_OK;
// TODO(hasheddan): support validating candidate firmware.
return GOLIOTH_ERR_FAIL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return GOLIOTH_ERR_FAIL;
return GOLIOTH_ERR_NOT_IMPLEMENTED;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice! I didn't realize we already had this status. In theory this would allow us to also call fw_update_validate post download and just ignore result if GOLIOTH_ERR_NOT_IMPLEMENTED. However, I think I'd lean towards not doing that given our existing hash check post download (as mentioned in the PR body). Regardless, should definitely be using the status here! 👍🏻


golioth_sys_sha256_t hash_ctx = golioth_sys_sha256_create();

buf_size = sizeof(_flash_img_context.buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for other reviewers: This buf size defaults to CONFIG_IMG_BLOCK_BUF_SIZE which defaults to 512.

{
return GOLIOTH_ERR_FAIL;
}
if (memcmp(hash, data.image_digest, sizeof(data.image_digest)) != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to verify the length of hash is the same as the length of data.image_digest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know the length of hash here -- it is somewhat implicit in the port API contract as implemented, so it may be worth making it more explicit that a 32 byte sha256 hash is being passed (at the very least in the documentation on fw_update_validate). In practice, because data.image_digest is hard-coded to 32 bytes and we always pass a GOLIOTH_OTA_COMPONENT_BIN_HASH_LEN (32) then they should always be same length.

I am still in process of testing the esp-idf fw_update_validate implementation though, so there may need to be some further changes made here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the background.

In practice, because data.image_digest is hard-coded to 32 bytes and we always pass a GOLIOTH_OTA_COMPONENT_BIN_HASH_LEN (32) then they should always be same length.

Would it make sense to change the function as below?

enum golioth_status fw_update_validate(const uint8_t hash[GOLIOTH_OTA_COMPONENT_BIN_HASH_LEN],
                                       const size_t size)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typically shy away from using arrays as function parameters in C because in practice they decay to a pointers. However, they can give the impression that, for example, sizeof(hash) in a function implementation would return the size of the array, but in reality it would return the size of the pointer. Some folks certainly argue the other side of this, so I'm open to discussion 🙂

.offset = _update_partition->address,
.size = _update_partition->size,
};
if (esp_image_verify(ESP_IMAGE_VERIFY, &part_pos, &data) != ESP_OK)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be changing this in a moment because esp_image_verify will use the appended digest of the image, rather than the actual hash of the bytes in the partition, so we'll end up with a mismatch with the hash in the manifest. This is similar to what we are avoiding in the Zephyr port with mcuboot.

ref: https://github.com/espressif/esp-idf/blob/c5865270b50529cd32353f588d8a917d89f3dba4/components/bootloader_support/src/esp_image_format.c#L311

@hasheddan hasheddan force-pushed the fw_update/check-pending-image branch from 1f0926e to c9f0a69 Compare February 6, 2025 21:07
Update fw_update_validate to take an image hash and size such that
validation is not coupled to any state maintained in ports. Ports should be
able to validate an image in a candidate slot without needing to have
downloaded since last reboot.

Signed-off-by: Daniel Mangum <[email protected]>
Validating candidate firmware images is not yet supported on linux.

Signed-off-by: Daniel Mangum <[email protected]>
Validating candidate firmware images is not yet supported on modus_toolbox.

Signed-off-by: Daniel Mangum <[email protected]>
Adds support for validating candidate firmware images by hash and size to
the Zephyr firmware update port.

Signed-off-by: Daniel Mangum <[email protected]>
Adds support for validating candidate firmware images by hash and size to
the esp_idf firmware update port.

Signed-off-by: Daniel Mangum <[email protected]>
Update fw_update thread to check whether the requested firmware update
image is already present in candidate slot. If so, download is skipped and
we attempt to reboot into existing candidate slot image.

Note that the call to fw_update_validate is now removed from the post
download flow. fw_update_validate was previously a no-op for most platform
ports, and for those that did implement checks (esp_idf), the functionality
was moved to fw_update_post_download, which is still called following an
image download.

Signed-off-by: Daniel Mangum <[email protected]>
@hasheddan hasheddan force-pushed the fw_update/check-pending-image branch from c9f0a69 to bea64d7 Compare February 6, 2025 22:56
@hasheddan hasheddan requested a review from szczys February 7, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants