Skip to content

Source Code and Data

Overview

This section gives an overview of the data and the notebooks (code + documentation) for the analysis that is part of this guide:

  • Running the Code - the environments it can be run
  • Data Sources - the public data sources used
  • Analysis - the code for the analysis, the output of which is used in this guide

Running the Code

The source code provided with this guide is available as Jupyter Notebooks.

These can be run

  • locally or offline (requires that you have a Jupyter Notebooks environment setup)
  • online via browser in Colab. (No environment setup required)

The source code notebooks are available in Colab to run from your browser

Quote

Colab, or "Colaboratory", allows you to write and execute Python in your browser, with

  • Zero configuration required
  • Access to GPUs free of charge
  • Easy sharing

https://colab.research.google.com/

Data Sources

Data Source Detail ~~ CVE count K
CISA KEV Active Exploitation 1
EPSS Predictor of Exploitation 220
Metasploit modules Weaponized Exploit 3
Nuclei templates Weaponized Exploit 2
ExploitDB Published Exploit Code 25
NVD CVE Data NVD CVEs 220
Qualys TruRisk Report The 2023 Qualys TruRisk research report lists 190 CVEs from 2022 with QVS scores .2
Microsoft Security Response Center (MSRC) CVEs Exploited and with Exploitability Assessment .2

Analysis

See analysis directory for these files.

  1. enrich_cves.ipynb
    1. Take the data sources from data_in/
    2. Enrich the CVE data from NVD with the other data sources
    3. Add an "Exploit" column to indicate the source of the exploitability (used later to set colors of CVE data in plots)
    4. store the output in data_out/nvd_cves_v3_enriched.csv.gz
  2. kev_epss_cvss.ipynb
    1. Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
    2. Read the data from CISA KEV alert reports in ./data_in/cisa_kev/
    3. Plot CISA KEV datasets showing EPSS, CVSS by source of the exploitability
    4. Write data_out/cisa_kev/csa/csa.csv.gz which is the CISA KEV CyberSecurity Alerts (CSA) subset with EPSS and other data
  3. qualys.ipynb
    1. Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
    2. Read the data from ./data_in/qualys
    3. Plot Qualys dataset showing EPSS, CVSS by source of the exploitability
    4. Write data_out/qualys/qualys.csv.gz which is the Qualys data with EPSS and other data
  4. msrc.ipynb
    1. Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
    2. Read the data from ./data_in/msrc
    3. Plot Microsoft Exploitability Index dataset showing EPSS, CVSS by source of the exploitability
    4. Write data_out/msrc/msrc.csv.gz which is the MSEI data with EPSS and other data

CISA SSVC Decision Trees

See cisa_ssvc_dt directory for these files.

CISA SSVC Decision Tree From Scratch Example Implementation

DT_from_scratch.ipynb

  1. Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
  2. Read the Decision Tree definition cisa_ssvc_dt/DT_rbp.csv
  3. Define the Decision Logic for the Decision Nodes
  4. Calculate the Decision Node Values for all CVEs
  5. Do some Exploratory Data Analysis with Venn Diagrams to understand our data
  6. Calculate the Output Decision from the Decision Node Values
  7. Plot Flow of All CVEs across the Decision Tree aka Sankey
  8. Read the Sankey Diagram template definition cisa_ssvc_dt/DT_sankey.csv
  9. Triage some CVEs
  10. Read a list of CVEs to triage cisa_ssvc_dt/triage/cves2triage.csv
  11. Get Decisions
  12. Plot

CISA SSVC Decision Tree Analysis for Feature Importance

DT_analysis.ipynb

  1. Read the Decision Tree definition cisa_ssvc_dt/DT_rbp.csv
  2. Perform Feature Importance using 2 methods
  3. Permutation Importance
  4. Drop-column Importance

See https://github.com/CERTCC/SSVC/issues/309 for the suggestion to add drop column importance to CISA SSVC.

Getting Data from Data Sources

A snapshot of the data used for this guide is available

A snapshot of this data is already available with the source in data_in

  • A date.txt file is included in each folder with the data that contains the date of download.

But you can download current data as described here.

  • get_data.sh gets the data that can be downloaded automatically and used as-is.
  • Other data is manually downloaded - see instructions below.
    • MSRC
    • ExploitDB
    • GPZ
  • Larger files are gzip'd

National Vulnerability Database (NVD)

Get NVD data automatically

  • A notebook or script in nvd downloads the NVD data.
  • The data is output to data_out/CVSSData.csv.gz
  • Note: The download method used will be deprecated some time after Dec 2023 per https://nvd.nist.gov/vuln/data-feeds

Google Project Zero (GPZ)

See 0day "In the Wild" GoogleSheet

  • Select "All" tab.
  • File - Download as csv

Microsoft Security Response Center (MSRC)

Qualys TruRisk Report

The CVE data was extracted from the Qualys TruRisk Report PDF. This data is static so a date.txt is not included.

ExploitDB

Other Vulnerability Data Sources

See other vulnerability data sources that are not currently used here.