Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OTA Block size check wrong #503

Closed
BogdanTheGeek opened this issue Dec 13, 2023 · 5 comments · Fixed by #504
Closed

[BUG] OTA Block size check wrong #503

BogdanTheGeek opened this issue Dec 13, 2023 · 5 comments · Fixed by #504
Labels
bug Something isn't working

Comments

@BogdanTheGeek
Copy link
Contributor

If a network disconnect occurs and the ota task is suspended and resumed, when the job description comes back, the library checks the pFileContext->blocksRemaining against OTA_MAX_BLOCK_BITMAP_SIZE (128) and silently stops the ota, without updating the job. The same issue doesn't happen if the device reboots instead of resuming the ota task. The issue also never manifests if the ota job is not suspended in the middle of job, so the number of blocks is not checked unless we get a new job document while already having one?
The issue is 2 fold, the check is wrong, it should be(ota.c:2404):

    else if( pFileContext->blocksRemaining > (OTA_MAX_BLOCK_BITMAP_SIZE * BITS_PER_BYTE) )

Applying the change above fixes the issue. I can create a PR with the change if you can confirm that my understanding of the issue is correct.

The other is that the ota library fails silently and the job will not resume until the device reboots.

This is the log after a network reconnection:

I (105814) hota: Received job document: {...}
W (105074) awsota: Index: 3. OTA event id: 3
W (105080) awsota: OTA size (266 blocks) greater than can be tracked. Increase `OTA_MAX_BLOCK_BITMAP_SIZE`(128)
I (105090) awsota: Unable to initialize Job Parsing: OtaJobParseErr_t=OtaJobParseErrBadModelInitParams
I (105100) awsota: otaPal_GetPlatformImageState
E (105106) awsota: Failed to execute state transition handler: Handler returned error: OtaErr_t=OtaErrJobParserError
E (105117) awsota: Current State=[WaitingForJob], Event=[ReceivedJobDocument], New state=[CreatingFile]

This is all on the main branches of ota-for-aws-iot-embedded-sdk, coreMQTT, jobs etc.

@BogdanTheGeek BogdanTheGeek added the bug Something isn't working label Dec 13, 2023
@BogdanTheGeek
Copy link
Contributor Author

the buggy check was introduce in #477, its also not doing what the PR is intending it to do.

@bradleysmith23
Copy link

Hello @BogdanTheGeek and thank you for bringing this to our attention. You are indeed correct, the existing check is buggy. If you would like to create a PR to address that would be great! We will review it once it is created. As for the second part of the issue, we are continuing to investigate this and will update you with our findings.

@BogdanTheGeek
Copy link
Contributor Author

@bradleysmith23 i've had a chance to look though the rest of the ota code.

The check is only happening in parseJobDoc().
I don't think pFileContext->blocksRemaining is initialised when the check is being done.
From what i can see the call stack is this:

  • processJobHandler -> pOtaFileContext == NULL
  • getFileContextFromJob -> pOtaFileContext == NULL
  • parseJobDoc -> pOtaFileContext == pFileContext == NULL
  • initDocModel -> Initialises the fileContext?
  • returns and now checks the blocksRemaining
    But, blocksRequired is not yet parsed from the model, so it will be 0 until the job has already been parsed once. This is why the check only runs after a suspend/resume cycle, because the job doc is requested on resume.
    The only place where blocksRemainng actually gets set is after parseJobDoc returns into getFileContextFromJob: ota.c:2538

I would suggest that a better place to check if the number of blocks in the ota is valid would be in validateAndStartJob.

@bradleysmith23
Copy link

@BogdanTheGeek thank you for your continued effort! I agree, you are correct that pfileContext->blocksRemaining is not initialized when the check is being done. validateAndStartJob() does seem to be an appropriate place for this check, thank you for the suggestion! If you would like to amend your PR (#504) with this change, I will review it. If not, I am happy to raise a PR for this issue as well.

@BogdanTheGeek
Copy link
Contributor Author

@bradleysmith23 I will move the check the into validateAndStartJob()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants