Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pallet-verifier #2234

Merged
merged 4 commits into from
Mar 5, 2024
Merged

Add pallet-verifier #2234

merged 4 commits into from
Mar 5, 2024

Conversation

davidsemakula
Copy link
Contributor

@davidsemakula davidsemakula commented Feb 26, 2024

Project Abstract

This application is a response to the RFP for Static Analysis for Runtime Pallets.

pallet-verifier is a tool for static analysis of FRAME runtime pallets that are used for building Substrate-based blockchains.

pallet-verifier aims to detect common security vulnerabilities found in FRAME pallets using static program analysis techniques like data-flow analysis, abstract interpretation and symbolic execution.

Grant level

  • Level 1: Up to $10,000, 2 approvals
  • Level 2: Up to $30,000, 3 approvals
  • Level 3: Unlimited, 5 approvals (for >$100k: Web3 Foundation Council approval)

Application Checklist

  • The application template has been copied and aptly renamed (project_name.md).
  • I have read the application guidelines.
  • Payment details have been provided (Polkadot AssetHub (DOT, USDC & USDT) address in the application and bank details via email, if applicable).
  • I am aware that, in order to receive a grant, I (and the entity I represent) have to successfully complete a KYC/KYB check.
  • The software delivered for this grant will be released under an open-source license specified in the application.
  • The initial PR contains only one commit (squash and force-push if needed).
  • The grant will only be announced once the first milestone has been accepted (see the announcement guidelines).
  • I prefer the discussion of this application to take place in a private Element/Matrix channel. My username is: @_______:matrix.org (change the homeserver if you use a different one)

@github-actions github-actions bot added the admin-review This application requires a review from an admin. label Feb 26, 2024
@davidsemakula davidsemakula force-pushed the pallet-verifier branch 2 times, most recently from aa57ea2 to a042d81 Compare February 27, 2024 09:07
Copy link
Contributor

@bhargavbh bhargavbh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for showing interest in the RFP. The proposal has sufficient technical depth, appreciate it. I have a few comments/questions:

  • you rightly mention some of the limitations of MIRAI when generics and macros are used. Whats your approach to tackle this? are you looking at modifying (simplifying) the complexity at the MIR level? could you please elaborate?

  • my personal experience with MIRAI for arithmetic overflows/underflows was that it produced a lot of false positives. This was specifically because "possible incomplete analysis of call because of a nested call to a function without a MIR body". Have you looked into this issue? Static analysers inherently generate false-positives as they over-approximate the program, and in general it is a very hard problem. I would also argue that an FP of 50-60% is still decent (as a developer i would be very happy if i find 1 real bug out of 2-3 warning that pallet-verifier generates). However it gets tricky if its higher. Can you include a small benchmark to quantify the accuracy of the tool?

@davidsemakula
Copy link
Contributor Author

davidsemakula commented Mar 1, 2024

Thanks for showing interest in the RFP. The proposal has sufficient technical depth, appreciate it.

Hi @bhargavbh, thanks for taking the time to review, I appreciate the feedback 🙂

you rightly mention some of the limitations of MIRAI when generics and macros are used. Whats your approach to tackle this? are you looking at modifying (simplifying) the complexity at the MIR level? could you please elaborate?

The short answer is yes, but I actually describe the approach in detail in the "Finding tractable entry points" and "Generating tractable entry points automatically" subsections.
In general the whole "Identifying (and generating) tractable entry points" section is essentially dedicated to this problem.
The first 3 subsections (i.e. "The generics challenge", "Too many macro-generated entry points", "The TestExternalities and SubstrateHostFunctions challenge") outline the problems/challenges (as you noted here about generics and macros), and the last 2 subsections (i.e. "Finding tractable entry points" and "Generating tractable entry points automatically" - the ones linked above) detail the solution.
LMK if I should clarify anything from those subsections. 🙂

my personal experience with MIRAI for arithmetic overflows/underflows was that it produced a lot of false positives. This was specifically because "possible incomplete analysis of call because of a nested call to a function without a MIR body". Have you looked into this issue? Static analysers inherently generate false-positives as they over-approximate the program, and in general it is a very hard problem. I would also argue that an FP of 50-60% is still decent (as a developer i would be very happy if i find 1 real bug out of 2-3 warning that pallet-verifier generates). However it gets tricky if its higher. Can you include a small benchmark to quantify the accuracy of the tool?

Yes, so in general MIRAI emits a lot of warnings from the standard library and system calls when run at the paranoid diagnostic level. This is mostly because the paranoid level is maximally strict and so warns about "any issue that may be an error" by essentially disabling some of MIRAI's "normal" mechanisms for reducing false positives.
One of these mechanisms is "suppressing any diagnostics that arise within the standard library or third party code [... i.e. diagnostics that are [not] tied to source locations within the [current] crate]".

Something I've noticed is that most false positives only show up at the paranoid level, precisely because of this "suppression mechanism" being disabled. But at the same time, due to some complexity arising from indirection and macro call site information from our target domain, we most likely have to use the paranoid level to get MIRAI to emit/buffer our desired warnings, because they "seem" (to MIRAI) to arise from third-party code (i.e. frame_support) and so MIRAI suppresses them at lower diagnostics levels (i.e.. --diag=default|verify|library).

So I've detailed a hybrid approach in the "Using MIRAI as a library" section, that essentially allows us to access MIRAI's diagnostics buffer directly and decide which diagnostics we want to "suppress" or "promote" based our domain knowledge and the reported primary and related locations for the diagnostic. (the aforementioned section has some more details about this 🙂).

But beyond that, another lever at our disposal, specifically for the "function without a MIR body" issue, is to add "summaries/foreign contracts" for these missing "foreign functions" (e.g. standard library functions that aren't inlined and system calls).
This part of MIRAI is not that well documented, but at the same time it's not that hard to figure out how to add these, from the existing examples and related utilities which actually show that these "summaries/foreign contracts" can be defined outside MIRAI, which certainly gives us more flexibility without having to modify MIRAI itself.

Having said that, with the approach of bypassing TestExternalities and HostFunctions described in detail in the proposal (see links to relevant sections in the answer to the first question), combined with the ability to "suppress" and "promote" warnings as described above, I don't expect that we'll have to pull this lever that much.

Finally, when it comes to the panic and arithmetic overflow/underflow detection itself, MIRAI seems to be quite accurate at that for simple examples/functions, likely because Rust's type system and relative simplicity of these checks in MIR makes these analyses quite tractable compared to other languages.
So I think if we can effectively suppress unrelated warnings from "foreign functions" with missing MIR bodies (e.g. standard library functions that aren't inlined and system calls) or provide "summaries/foreign contracts" for the most problematic ones, we should achieve relatively high accuracy.
I'd personally expect us to do a lot better than 60% TBH 🙂

It's a bit late over here, so I'll think about the methodology for benchmarking the accuracy of the tool and share that tomorrow.
In the meantime, LMK if these responses and the details in the linked sections make sense. Thanks!

@bhargavbh
Copy link
Contributor

Thanks @davidsemakula . I like your approach with adding summaries/ post-conditions (contracts) to frequently occurring external calls that are problematic to address the false positive rates.
My only suggestion would be adding a benchmark (need not be comprehensive) as part of the deliverable just to showcase the performance and accuracy metrics of the tool on a few production pallets. I'm happy to approve the proposal after this change.

@davidsemakula
Copy link
Contributor Author

davidsemakula commented Mar 3, 2024

Thanks @bhargavbh ... I added the benchmark to the deliverables 🙂

Copy link
Contributor

@bhargavbh bhargavbh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal has sufficient technical depths. Main challenges and potential approaches to tackle them have been described. A developer productivity tool like this is needed in the ecosystem. Suggested changes have been incorporated. Happy to approve it.

@davidsemakula
Copy link
Contributor Author

davidsemakula commented Mar 4, 2024

@bhargavbh Awesome! thanks for the swift review/approval and for the positive feedback and recommend improvements 🙂
cc @Noc2

@Noc2
Copy link
Collaborator

Noc2 commented Mar 4, 2024

I will share it again with the rest of the team.

@Noc2 Noc2 added the ready for review The project is ready to be reviewed by the committee members. label Mar 4, 2024
@semuelle semuelle self-requested a review March 4, 2024 22:52
Copy link
Contributor

@keeganquigley keeganquigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @bhargavbh based on this and also your previous work @davidsemakula happy to approve.

Copy link
Collaborator

@takahser takahser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@semuelle
Copy link
Member

semuelle commented Mar 5, 2024

Hey @davidsemakula, are you aware that grants are now 30% in DOT? Just asking because you only provided one payment address.

In the meantime, could you fill out the KYC form (assuming you are applying as an individual, otherwise please use this form)?

@davidsemakula
Copy link
Contributor Author

Hi @semuelle

are you aware that grants are now 30% in DOT? Just asking because you only provided one payment address.

Yes, I'm aware of the 30% split, but I assumed the same address could be use for both (DOT & USDT), LMK if that's not the case.

In the meantime, could you fill out the KYC form

I've completed the KYC form (for individuals). Thanks! 🙂

@keeganquigley keeganquigley removed the admin-review This application requires a review from an admin. label Mar 5, 2024
@Noc2 Noc2 merged commit 68ef847 into w3f:master Mar 5, 2024
10 of 12 checks passed
Copy link
Contributor

github-actions bot commented Mar 5, 2024

Congratulations and welcome to the Web3 Foundation Grants Program! Please refer to our Milestone Delivery repository for instructions on how to submit milestones and invoices, our FAQ for frequently asked questions and the support section of our README for more ways to find answers to your questions.

Before you start, take a moment to read through our announcement guidelines for all communications related to the grant or make them known to the right person in your organisation. In particular, please don't announce the grant publicly before at least the first milestone of your project has been approved. At that point or shortly before, you can get in touch with us at [email protected] and we'll be happy to collaborate on an announcement about the work you’re doing.

Lastly, please remember to let us know in case you run into any delays or deviate from the deliverables in your application. You can either leave a comment here or directly request to amend your application via PR. We wish you luck with your project! 🚀

@davidsemakula davidsemakula deleted the pallet-verifier branch March 5, 2024 20:41
@davidsemakula
Copy link
Contributor Author

Minor update: Should be able to wrap this up in the next 1-2 weeks!

@keeganquigley
Copy link
Contributor

Hi @davidsemakula I don't believe we ever received any deliveries, are you still planning to deliver these milestones?

@davidsemakula
Copy link
Contributor Author

Hi @keeganquigley
I'll make the deliveries next week, current progress is at https://github.com/davidsemakula/pallet-verifier
Mostly docs and tests (and may be a few std contracts) left. 🙂

@keeganquigley
Copy link
Contributor

Hi @davidsemakula just checking in.

@davidsemakula
Copy link
Contributor Author

Hi @keeganquigley
I just opened a PR for the delivery of both milestones 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review The project is ready to be reviewed by the committee members.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants