OpenSSF Package Analysis

/ landscape / OpenSSF Package Analysis

What it is #

Package Analysis is an Open Source Security Foundation project that automatically analyses the behaviour of new package releases on public open-source registries. It feeds a public dataset of behavioural signals intended to surface malicious packages and to inform research on the open-source ecosystem.

The pipeline is structured as four components: a package feeds subscriber that watches upstream registries for new releases, a scheduler that assigns each release to an analysis worker, the worker itself, and a loader that pushes the results into a public BigQuery dataset. Workers run each package in an isolated gVisor sandbox to keep the analysis safe from the package under test.

Five ecosystems are covered: PyPI, npm, RubyGems, Packagist, and crates.io. Go modules are not in scope.

Analysis model #

Each package release is subjected to two analysis passes.

A dynamic analysis pass installs the package and imports it inside the sandbox, recording every file accessed (read/write/delete), every network socket opened (address, port, hostname), every DNS query resolved, and every external command executed. The output is essentially a behavioural trace of what the package did during installation and first import.

A static analysis pass extracts basic file information (size, type, hash), parses source files into an AST, and runs detection rules over the parsed code and raw text looking for suspicious signals such as high-entropy strings, base64-encoded blobs, and obfuscation patterns.

The temporal scope of dynamic analysis is significant. package-analysis observes what happens during install and import, not at runtime in a production environment. A package whose payload is gated on an environment variable, triggered by a specific function call, or scheduled for a later date will appear behaviourally clean during analysis and only misbehave later. The signals are useful precisely because the install/import window is where many real attacks land (postinstall scripts, import-time exfiltration); they are not a complete behavioural profile.

Workflow #

Producer side: producers are the OSSF infrastructure and any third party running their own copy. New releases on supported registries are detected, queued, and analysed without user involvement. There is no per-release opt-in or opt-out.

Consumer side: results are published to a public BigQuery dataset, queryable by anyone. The same data is also exposed through downstream tooling such as deps.dev. A consumer interested in a specific package queries the dataset directly; there is no “verify my dependencies against package-analysis” CLI.

Comparison to OpenVet #

OpenVet and package-analysis answer different questions about a dependency. OpenVet asks “has a trusted reviewer asserted specific properties of this code?”, a question that requires a human to read code and sign claims. package-analysis asks “what does this package do when it is installed and imported?”, a question that an automated sandbox can answer at registry scale. Both questions are useful and they catch different things.

The structural differences:

Operating model. package-analysis is fully automated: every new release on a covered registry is analysed without user involvement. OpenVet audits are human-authored: an auditor reads the code and signs claims about specific properties. OpenVet itself does not run any checks, it only provides the tooling and the hosting for audits.

Output shape. package-analysis produces a behavioural trace plus heuristic signals: lists of files touched, hostnames contacted, commands run, suspicious strings detected. OpenVet auditors produce structured claims (impl-crypto, uses-network) that consumers compose into requirements.

Coverage trigger. package-analysis covers every new release on every supported registry, automatically. OpenVet covers dependencies that a human has chosen to audit; coverage grows with effort rather than automatically.

Consumer interface. package-analysis output is queried from a public BigQuery dataset. OpenVet output is delivered as content-addressed audit trees fetched against the consumer’s lockfile by the OpenVet CLI.

Ecosystem. package-analysis covers PyPI, npm, RubyGems, Packagist, and crates.io. OpenVet covers Cargo, npm, PyPI, Go modules, and RubyGems (no Packagist; adds Go).

Using alongside OpenVet #

package-analysis cannot tell a consumer “this package is safe” or “this package is not safe”, but the data is useful in two ways alongside OpenVet.

For a consumer using a dependency that has not been audited (or has not been audited by a publisher the consumer trusts), the package-analysis signals are a stop-gap. A release that opens new network sockets, drops a binary, or executes a shell command is a concrete signal that the consumer can use to decide whether to investigate further before continuing to depend on it. The signals do not replace a review, but they are better than no signal at all on a package nobody trusted has looked at.

For an auditor producing a new audit, package-analysis output can be a direct input to the review. The audit should account for the observed behaviour, especially where it changes between releases: a release that introduces new network or filesystem activity calls for the auditor to verify whether the activity is legitimate and either explain it in the audit or surface it as a finding. Most consumers should not be reading behavioural traces directly, the auditing flow is where the data becomes actionable.

Edit this page on GitLab