npm Security Insights API Preview Part 3: Behavioral Analysis

This is the third in a series of blog posts we’re running to preview and gather input on the new security insights API we’re developing.

Today’s topic: Behavioral Analysis

A lot of stuff happens when you install an npm package. npm downloads and extracts dependencies, but it also runs install hooks. These install hooks can have all kinds of side effects from compiling binaries, downloading other dependencies or maybe something a bit more malicious in nature. Post-install scripts are the most popular malware infection method right now.

In an effort to understand this further and to make side effects (malicious or not) transparent, the npm security team has been hard at work building infrastructure to do behavioral analysis of npm packages at scale.

What is behavioral analysis?

For every package published to the registry, we run the package installation process inside a controlled and instrumented environment. Anything that goes in or out, gets added or deleted is captured and stored. This type of analysis is especially useful in situations with obfuscated source code or binary artifacts where reverse engineering or static analysis is particularly difficult. It does however product a mountain of data–data that you’ll be able to sift through effectively with the security insights API

What will the security insights API have available?

We’re going to initially launch with three pieces of behavioral metadata exposed. Through this analysis, we will have a lot more data than you are ever going to want to sift through. To make it useful, we’re going to pare back what we’re going to make initially available and, as our capabilities grow, we’ll continue to expose more data.

The initial release of the API will likely contain the following info from our behavioral analysis on a package:

DNS requests (hosts contacted / looked up)
processes executed (including command lines and arguments)
network traffic summary

Real world example: cryptocurrency malware

Let’s take a look at a simple cryptocurrency malware sample. This sample runs coin mining software on install similar to what we have seen in the past.

Figure 1: Request

Figure 2: Response

In this example, we’re going to query for the answers returned by the dns requests made. It very obviously shows a site that is out of place for a typical npm install.

Figure 3: Request

Figure 4: Response

Investigating further by requesting the command line and its arguments for this package shows a suspicious binary being executed, extremely useful information when trying to understand what might be going on with a package without installing or running it yourself.

How would you use it?

How cool would it be to know if a package is making insecure network requests before you install it or be able to diff two versions to see what might have changed? We’ve seen a lot of interesting ideas from the previous insights we’ve shared.

Update: Signups for the private beta are now closed.