Dealing with NPM v7 Audit Changes

Dealing with NPM v7 Audit Changes

A bit more than one month ago, the 15th version of NodeJS was released. As an early adopter of everything, I moved my project to this new major version of NodeJS, and most things continued working smoothly. However, I also noticed that the known vulnerabilities of my dependencies suddenly dropped to zero.

Screenshot 2020-12-04 at 13.51.42.png

In the past, I wrote two other blog posts about capturing in Jenkins an NPM audit report for both --parseable and --json formats. As I was aware of the implementation details, I quickly checked the logs from my Jenkins and saw that the NPM audit parser is missing some important data and fails to process the output of NPM. Then I decided just to run the human-readable version of the npm audit command locally, and instead of the familiar output:

# Run  npm update bl --depth 4  to resolve 1 vulnerability
┌───────────────┬─────────────────────────────────────────────────┐
│ High          │ Remote Memory Exposure                          │
├───────────────┼─────────────────────────────────────────────────┤
│ Package       │ bl                                              │
├───────────────┼─────────────────────────────────────────────────┤
│ Dependency of │ exceljs                                         │
├───────────────┼─────────────────────────────────────────────────┤
│ Path          │ exceljs > archiver > tar-stream > bl            │
├───────────────┼─────────────────────────────────────────────────┤
│ More info     │ https://npmjs.com/advisories/1555               │
└───────────────┴─────────────────────────────────────────────────┘

I saw this thing:

bl  <=1.2.2 || 2.0.1 - 2.2.0 || 3.0.0 || 4.0.0 - 4.0.2
Severity: high
Remote Memory Exposure - https://npmjs.com/advisories/1555
fix available via `npm audit fix`
node_modules/bl

Which is more concise without a doubt. But human-readable? For example, I can get that the bl package has a high-severity vulnerability. But I don't depend on that package, right? So it was useful that the old output had such info as

Dependency of: exceljs

or even

Path: exceljs > archiver > tar-stream > bl

Besides, the old format told me that the issue could be fixed with the npm update bl --depth 4 command, and now the only option that I have is to run npm audit fix blindly.

But hey! Let's cool down. All changes are tough. They break our routines. They throw us out of our comfort zone. But this is how this world is working: it's constantly changing. And if we don't adapt, we become obsolete. Let's take one step back and look with a calm mind at the changes that took place, analyze what they mean, come up with a transition strategy, and only reflect on how we feel about that.

Communication

The change of NPM audit format came to me as a surprise. And for me, this happens with many changes because I'm too lazy to read all changelogs. Let's take a look at the official release notes to see if we can spot anything that could warn me about the NPM audit changes. In fact, if we carefully read the announcement post, we can spot a mention about breaking changes in the NMP section:

Node.js 15 comes with a new major release of npm, npm 7. npm 7 comes with many new features — including npm workspaces and a new package-lock.json format. npm 7 also includes yarn.lock file support. One of the big changes in npm 7 is that peer dependencies are now installed by default. For more information on the npm 7 release, including details of the breaking changes, check out the GitHub blog.

If we follow up to the referenced GitHub blog, we can spot the NPM audit mention in the breaking changes section:

Breaking changes in npm 7.0.0 include:

• The output of npm audit has significantly changed both in the human-readable and --json output styles.

To learn more about the breaking changes in npm 7.0.0, please check out our in-depth post on the npmjs.com blog.

So if we make an additional step and go to the in-depth post on the npmjs.com blog, we can read a dedicated section about NPN audit:

Output and data structure is significantly refactored to call attention to issues, identify classes of fixes not previously available, and remove extraneous data not used for any purpose.

BREAKING CHANGE: Any tools consuming the npm audit output will almost certainly need to be updated, as this has changed significantly, both in the readable and --json output styles.

Yeah, it is important information. And as we already saw, the readable output style changed indeed and became unreadable. Oh wait, I had to cool down. Why did I get frustrated again, though? Because there is no explanation whatsoever on what changed and why… But let's google for "npm audit", there surely has to be some documentation.

You can learn about the npm audit command in the official npm documentation on their website. The documentation for the 7th version of NPM discusses implementation details and how they use different endpoints to make more efficient queries. But there is no explanation on how to treat the NPM Audit output… In fact, if you check the documentation for the 6th version, there is no explanation about the output format as well.

Is this a sufficient level of communication? Kind of…… I mean, I'd like to have a more screaming mention of NPM audit changes. But if we are honest with each other, auditing is not the central feature of NodeJS and NPM. If someone creates a tool that relies on the audit output format, they will check the changelog for the latest NPM version and notice the breaking changes. On the other hand, there was no and is no explanation about the format of NMP audit output, and I find this confusing. Especially now, when the format changed. But if there is no documentation, we have to figure out the formats by ourselves, don't we?

The Vulnerability Concept in v6

Let's start by defining the concept of a vulnerability, as the NPM v6 audit reports it. To do this, we need a simple example of dependent packages.

vulnerabilities.png

In essence, our project depends on some x-client package. That package, in turn, depends on other packages. In a certain part of the dependency subtree, the packages depend on the hoek package with a reported vulnerability. When you run npm audit for this repository, it will notice that you have the vulnerable version of the hoek package and report all the dependency paths as a vulnerability. To better visualize it, I painted the dependency paths to the hoek package in yellow:

vulnerabilitiesv6overlay.png

As you can see, there are 4 possible paths. And the human-readable output of the npm audit v6 will report each path as a separate vulnerability:

┌───────────────┬─────────────────────────────────────────────────────┐
 Moderate       Prototype Pollution                                 
├───────────────┼─────────────────────────────────────────────────────┤
 Package        hoek                                                
├───────────────┼─────────────────────────────────────────────────────┤
 Patched in     > 4.2.0 < 5.0.0 || >= 5.0.3                         
├───────────────┼─────────────────────────────────────────────────────┤
 Dependency of  x-client                                            
├───────────────┼─────────────────────────────────────────────────────┤
 Path           x-client > request > hawk > boom > hoek             
├───────────────┼─────────────────────────────────────────────────────┤
 More info      https://npmjs.com/advisories/566                    
└───────────────┴─────────────────────────────────────────────────────┘
┌───────────────┬─────────────────────────────────────────────────────┐
 Moderate       Prototype Pollution                                 
├───────────────┼─────────────────────────────────────────────────────┤
 Package        hoek                                                
├───────────────┼─────────────────────────────────────────────────────┤
 Patched in     > 4.2.0 < 5.0.0 || >= 5.0.3                         
├───────────────┼─────────────────────────────────────────────────────┤
 Dependency of  x-client                                            
├───────────────┼─────────────────────────────────────────────────────┤
 Path           x-client > request > hawk > cryptiles > boom > hoek 
├───────────────┼─────────────────────────────────────────────────────┤
 More info      https://npmjs.com/advisories/566                    
└───────────────┴─────────────────────────────────────────────────────┘
┌───────────────┬─────────────────────────────────────────────────────┐
 Moderate       Prototype Pollution                                 
├───────────────┼─────────────────────────────────────────────────────┤
 Package        hoek                                                
├───────────────┼─────────────────────────────────────────────────────┤
 Patched in     > 4.2.0 < 5.0.0 || >= 5.0.3                         
├───────────────┼─────────────────────────────────────────────────────┤
 Dependency of  x-client                                            
├───────────────┼─────────────────────────────────────────────────────┤
 Path           x-client > request > hawk > hoek                    
├───────────────┼─────────────────────────────────────────────────────┤
 More info      https://npmjs.com/advisories/566                    
└───────────────┴─────────────────────────────────────────────────────┘
┌───────────────┬─────────────────────────────────────────────────────┐
 Moderate       Prototype Pollution                                 
├───────────────┼─────────────────────────────────────────────────────┤
 Package        hoek                                                
├───────────────┼─────────────────────────────────────────────────────┤
 Patched in     > 4.2.0 < 5.0.0 || >= 5.0.3                         
├───────────────┼─────────────────────────────────────────────────────┤
 Dependency of  x-client                                            
├───────────────┼─────────────────────────────────────────────────────┤
 Path           x-client > request > hawk > sntp > hoek             
├───────────────┼─────────────────────────────────────────────────────┤
 More info      https://npmjs.com/advisories/566                    
└───────────────┴─────────────────────────────────────────────────────┘

In my humble opinion, that's three vulnerabilities too much, and it's enough just to know that I have a single vulnerability brought to me by the hoek package, which is a dependency of the x-client package that I explicitly depend on. At the same moment, I understand that if I have to patch only the hoek package, then I have to deal with four dependency constraints (out of which three are direct ones). Whether you like the idea of having four vulnerabilities or prefer to have only one, the NPM development team would tell you, "hold my beer!"

The Vulnerability Concept in v7

On the first glance, if you run the npm audit command with the 7th version, the human-readable output is pretty good on the first glance:

hoek  <=4.2.0 || 5.0.0 - 5.0.2
Severity: moderate
Prototype Pollution - https://npmjs.com/advisories/566
No fix available
node_modules/hoek
  boom  <=3.1.2
  Depends on vulnerable versions of hoek
  node_modules/boom
    cryptiles  <=4.1.1
    Depends on vulnerable versions of boom
    node_modules/cryptiles
      hawk  0.0.6 - 6.0.2
      Depends on vulnerable versions of boom
      Depends on vulnerable versions of cryptiles
      Depends on vulnerable versions of hoek
      Depends on vulnerable versions of sntp
      node_modules/hawk
        request  2.16.0 - 2.83.0 || 2.85.0 - 2.86.0
        Depends on vulnerable versions of hawk
        node_modules/x-client/node_modules/request
          x-client  *
          Depends on vulnerable versions of request
          node_modules/x-client
  sntp  0.0.0 || 0.1.1 - 2.0.0
  Depends on vulnerable versions of hoek
  node_modules/sntp

7 moderate severity vulnerabilities

Starting from the top, you can immediately see that that there is one issue with the hoek package. You can also see the severity, group (i.e., Prototype Pollution), link to the page with a detailed description, info if the vulnerability can be fixed automatically. Then you have the short info about the dependency tree, where each dependency level is indented. Conveniently, there is an asterisk next to the x-client package, which is our top-level dependency. Then you continue to the bottom of the audit report and WAIT, WHAT?!! There are seven vulnerabilities? To help you understand where do these vulnerabilities come from, I've encircled each one of them with a yellow color on our diagram:

vulnerabilitiesv7overlay.png

Yes, each package that depends on a package with a reported vulnerability is considered a vulnerability. How can you figure that out from a "human-readable" output? Just count all the package names mentioned in the audit (the original output tends to emphasize the package names with a bold font. Are you confused yet? Don't worry: it can be even worse. Imagine that the cryptiles package also got a report of a high severity vulnerability. In the textual output, it will add this information:

cryptiles  <=4.1.1
Severity: high
Insufficient Entropy - https://npmjs.com/advisories/1464
Depends on vulnerable versions of boom
No fix available
node_modules/cryptiles
  hawk  0.0.6 - 6.0.2
  Depends on vulnerable versions of boom
  Depends on vulnerable versions of cryptiles
  Depends on vulnerable versions of hoek
  Depends on vulnerable versions of sntp
  node_modules/hawk
    request  2.16.0 - 2.83.0 || 2.85.0 - 2.86.0
    Depends on vulnerable versions of hawk
    node_modules/x-client/node_modules/request
      x-client  *
      Depends on vulnerable versions of request
      node_modules/x-client

Nothing special about this report, just another vulnerable package: the severity is high, and some packages depend on it. But also, you will see at the bottom the following summary:

7 vulnerabilities (3 moderate, 4 high)

Congrats! You got four additional high-severity vulnerabilities. But the number of moderate vulnerabilities decreased by four as well. Let's depict this on our diagram:

vulnerabilitiesv7overlayx.png

As we already mentioned before, each package is considered a single vulnerability. And as all of our previous packages were already reported as vulnerabilities, we already reached the maximum possible number. The only thing that changed now is that the cryptiles package and its parent-packages are considered to have two "vulnerability reasons," and the more severe is considered for the final summary.

I have no clue how you can understand what is happening in your system by looking at the human-readable audit report (and without checking out the diagram that I made by hand). But let's take a look at the other output formats as they could be more informative.

The Parseable Format

The 6th version of NPM had a --parseable flag that would output the detected vulnerabilities in a tab-separated table format. You can read my analysis of this kind of output in my original "NPM Audit + Jenkins Warnings" post. The parseable format was, in any case suffering from the previously discussed lack of documentation. I even opened an issue in the official NPM audit repository asking for the explanation of the data in the last column, but no one reacted to that 😣.

I don't think that the tab-separated table is a good format, but it served as a simple way to consume audits from bash scripts (or other trivial programming approaches). There was even an example with awk in the official NPM documentation.

In any case, the --parseable output is not available in NPM v7. So if you want to consume the audit output programmatically, JSON is your only sane option.

The JSON Format

As with the parseable format, there is a blog post of mine with an analysis of the npm audit --json command output for the 6th version of NPM. I have to say that I liked that JSON structure. It had some duplication, and you had you put some effort to compose vulnerabilities from the obtained data. But the information you were getting was clearly divided into two groups: vulnerability reports and vulnerability cases in your repo.

‼️ The vulnerability reports on the NPM website are called security advisories, or simply 🔸advisories🔸. We are going to use this terminology throughout the rest of this post.

With the 7th version of NPM, you immediately get the vulnerabilities in your hand, as the obtained JSON output has the following structure:

{
  "auditReportVersion": 2,
  "vulnerabilities": {
      "boom": {...},
      "cryptiles": {...},
      ...
    },
  "metadata": {...},
}

First of all, let's mention that the metadata property holds information about the total number of packages for prod, dev, etc.… as well as the number of vulnerabilities of each severity. Now let's focus on the vulnerabilities object, as this is the thing we came after. As you can see from the example above, this object has keys made out of package names and values that describe each vulnerability in detail. Here is an example of such vulnerability-object for the cryptiles package:

{
  "name": "cryptiles",
  "severity": "high",
  "via": [
    {
      "source": 1464,
      "name": "cryptiles",
      "dependency": "cryptiles",
      "title": "Insufficient Entropy",
      "url": "https://npmjs.com/advisories/1464",
      "severity": "high",
      "range": "<4.1.2"
    },
    "boom"
  ],
  "effects": [
    "hawk"
  ],
  "range": "<=4.1.1",
  "nodes": [
    "node_modules/cryptiles"
  ],
  "fixAvailable": false
}

Please recall that cryptiles is a special vulnerability: it has an explicit security advisory, but also it depends on the vulnerable package boom.

Each vulnerability has some straightforward properties:

  • name: a package name
  • severity: the severity level
  • range: affected versions of the package
  • fixAvailable: if it's possible to auto-fix it

Then there are two relational properties: effects and via. The former lists the names of the dependents, as they inherit the vulnerability. The later one called via specifies the source of the vulnerability. It is also a list that may contain package names or objects representing an advisory. Ultimately, the advisory objects also have the package name, the severity, and the range of affected versions. Additionally, these objects also provide the type of vulnerability (e.g., Insufficient Entropy) and the link to the page with a detailed description of the issue.

To get a better understanding, let's look again at the package diagram and analyze how the effects and via properties will work for various nodes.

vulnerabilitiesv7overlayx.png

  • For cryptiles effects will have only one package: hawk;
  • for hawk effects will also have one package: request (and so on);
  • for x-client the effects array is going to be empty;
  • on the other hand, for boom the effects array will have 2 package names: cryptiles and hawk;
  • similarly, for hoek the effects array will have the following package names: boom, hawk, sntp;
  • cryptiles will have the unique via array with the boom package name and the npmjs.com/advisories/1464 advisory;
  • packages like request will have a via array with a single package: in this case hawk;
  • the hawk package has the biggest fan-out and thus will have a via array with four package names: cryptiles, boom, hoek, sntp;
  • hoek is the second package with a real advisory npmjs.com/advisories/566, and it will be the only element of its via array.

Porting the Jenkins Warnings

I have a concrete use case related to the NPM's auditing functionality. My goal is to capture the vulnerabilities for each build with the standard Jenkins Warnings GN plugin. To learn more about the main ideas, please follow my previous blog post. In this section, we will dive straight into generating plugin-compatible issues from an npm audit --json output.

‼️ While the vulnerability count reported by the official tool may be illogical and unintuitive, to avoid any kind of confusion, we will ensure that the Jenkins warnings perfectly mirror the vulnerabilities of the official tool.

The easiest way of deriving some data from the audit JSON is relatively simple:

const issues = Object.values(auditJSON.vulnerabilities).map((vulnerability) => {
    return {
        fileName: vulnerability.name,
        type: vulnerability.fixAvailable ? 'autofixable' : 'non-autofixable',
        severity: warningsNGSeverity(vulnerability.severity)
    };
});

The only three properties that we specify are:

  1. fileName: which is the main name of a Jenkins warning. It will be unique as each of the vulnerabilities is representing a distinct package.
  2. type: in the previous version, we were specifying the resolution type, i.e., whether it's an update, a review, or some other action. As in this action, we don't have such detailed information, we are just specifying if it's possible to fix the issue automatically.
    1. severity: this is the only thing that didn't change from the previous version.

Then I started to think about whether we can extract some more info and group vulnerabilities in reasonable categories. In the previous version, we were grouping by the "category" of the advisory ("Insufficient Entropy," "Prototype Pollution," etc.). I thought that we could traverse the via property for each vulnerability until we reach leaves and extract this information. But it's not uncommon to have multiple advisories leaves of a single vulnerability. Thus we may end up with the categories like "A", "B", "C", "A+B", "A+C",… and this won't be truly helpful. In the same way, we could group vulnerabilities by the top-level dependency packages, but again a single vulnerability can have multiple of those. Thus I simply don't get how one should reason about vulnerabilities based on how they are modeled.

Ultimately I decided to do a workaround and group the vulnerabilities (with packageName) by the fact whether they are advisories or just derivatives. To do this, I had to check if there are objects in the via property (as opposed to having only package names). Then for advisory vulnerabilities, we can contact all the types and use that as a message, while we can also put the concatenated URLs into the description. This way, one can filter out just advisories and look up what exactly is going on. Here is the final script:

const issues = Object.values(auditJSON.vulnerabilities).map((vulnerability) => {
    const advisories = vulnerability.via.filter((item) => typeof item === 'object');
    const isAdvisory = advisories.length > 0;

    return {
        fileName: vulnerability.name,
        packageName: isAdvisory ? 'advisory' : 'derived',
        type: vulnerability.fixAvailable ? 'autofixable' : 'non-autofixable',
        message: isAdvisory ? advisories.map((adv) => adv.title).join(' and ') : '',
        description: isAdvisory ? `See: ${advisories.map((adv) => adv.url).join(' and ')}` : '',
        severity: warningsNGSeverity(vulnerability.severity)
    };
});

This solution is far from being perfect, but it's still usable. You can go to the issues stored in Jenkins, select the advisory ones, see which are auto-fixable, check their severity, read more about their class, or follow up to the official web page.

Screenshot 2020-12-04 at 20.44.34.png

Conclusion

You know, I could say that this new audit format is ok. And that they wrote about a breaking change on their blog. And that you can still figure out which packages in your repo have security vulnerabilities and why. And also that this is a fresh version, and they can still tweak things a bit in the future.

But I won't do that. The new auditing functionality sucks. And here are a few points that summarise all the issues with it:

  1. It's not clear what they are doing. They made a big change without explaining it, and I had to reverse-engineer their concepts and the resulting formats.
  2. The approach "all the dependent packages are vulnerable" overwhelms an average user with not-so-important data. The dependency tree has its own set of important features. Still, when used as a core model of a vulnerability report, it detracts focus from vulnerable packages and brings dependency confusion.
  3. The multi-root multi-leaf graph makes it hard to reason about the vulnerabilities. In the past, one could reason: "there are these advisories that apply to my project, and I have that set of actions to perform with my dependencies." With the new model, you are just drowning in a high number of vulnerabilities and hoping that a magical auto-fix will aid you.
  4. There is fewer accompanying data. Of course, in the past, the audit provided many unpopular details, such as the advisory author's name. However, some details could be used to immediately understand how a certain vulnerability manifests or when it was discovered. Now you have to follow the provided link to look up all the details.
  5. You need to type-check the obtained JSON. Do you remember the via array? When you want to extract some useful data, you have to check if an element is a string or an object. In my experience, such code/data models are mostly written by students or junior developers and don't convey much confidence.

In the end, I'm left with a simple question: why? Why, oh, why did you do this, NPM developers? The previous model was much more elaborate. The real-life entities were clearly modeled as dedicated advisories or actions. There was no type-checking upon data consumption. And ultimately, the number of boxes from the output matched with the number of vulnerabilities in summary.

I get an impression that a long time ago, enlightened sirs and ladies implemented the previous version; the new version, however, was overridden by a bunch of monkeys.

Sorry, I know that my previous sentence is offending, but it's based on all the 3.5+ thousand words that came priorly in this post.