The advisory file format is going to change a bit. #139

briandfoy · 2024-02-17T20:23:06Z

briandfoy
Feb 17, 2024
Maintainer

The current cpansa YAML format doesn't have a way to specify information, such as the distribution name, that applies to all advisories in the file. As such, a new format will move all the advisory hashes into the the advisories key as an array, and the top level will be a YAML dict.

With this sort of breaking change, there's a chance to add other data or features to the format, which we can talk about in this discussion thread.

At the moment I'm working in the briandfoy/github-advisory-database branch (this branch started as a way to add a new field, and expanded scope), and you can look at an example file and also look at the mess that is the Rx schema experiment in t/validate.t.

As part of this, I've had to give up some of the nice formatting of the files because a lot of the work and future benefits come from loading the files with YAML::XS and re-dumping them.

Current (v1) format

---
- id: ...
  distribution: ...
  cves: [ ... ]
  ...
- id: ...
  distribution: ...
  cves: [ ... ]
  ...

New (v2) format

The new format will allow for top-level information about all of the advisories in the file, and will look something like this in structure. The cpansa_version specifies the format (and I'm working on an Rx schema), where a missing cpansa_version indicates the original format:

---
distribution: Some-Package
darkpan: false
metacpan: ...
cpansa_version: 2
advisories:
   - id: ...
    cves: [ ... ]
    github_advisory_database: [ ... ]
    ...
    - id: ...
    cves: [ ... ]
    github_advisory_database: [ ... ]
   ...

External reports

This change mirrors some of the format stuff from the external_reports YAML files where we are tracking vulnerabilities in third-party (non-Perl) libraries that some CPAN lists use. Those reports have an advisories key as well as top-level basic information. Now some of the tools don't have to think about which report type they have to grab the advisories (top-level array or hash key value).

Better data validation

A new format doesn't help with this, but making a new format forces me to check the data for format and consistency problems.

Along the way I've started using Data::Rx, a Perl implementation of the Rx schema validation for data structures (mostly thinking about JSON and YAML, but it works on the Perl representation of those). As I've started to figure out how this lightly documented thing works, I've been able to clean up quite a bit of the data so things are consistent.

Tux · 2024-02-18T10:14:26Z

Tux
Feb 18, 2024

I think I can amend Test::CVE to allow both. Can I get a snapshot of the new datafile to verify?

2 replies

briandfoy Feb 18, 2024
Maintainer Author

See the branch I linked to. This should be close to the final form, but we can also adjust things. However, I think you take a JSON feed, so even though the source files are different, we can make the same feed if that's easier for you. Or, we can make a new feed that we lock down and Test::CVE people use. I'm not going to change CPAN::Audit::DB, but I do need to adjust how I generate it.

Tux Feb 19, 2024

I'm happy with both approaches. Whatever you think is the best one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The advisory file format is going to change a bit. #139

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

The advisory file format is going to change a bit. #139

briandfoy Feb 17, 2024 Maintainer

Current (v1) format

New (v2) format

External reports

Better data validation

Replies: 1 comment · 2 replies

Tux Feb 18, 2024

briandfoy Feb 18, 2024 Maintainer Author

Tux Feb 19, 2024

briandfoy
Feb 17, 2024
Maintainer

Replies: 1 comment 2 replies

Tux
Feb 18, 2024

briandfoy Feb 18, 2024
Maintainer Author