Source Code and Data¶
Overview
This section gives an overview of the data and the notebooks (code + documentation) for the analysis that is part of this guide:
- Running the Code - the environments it can be run
- Data Sources - the public data sources used
- Analysis - the code for the analysis, the output of which is used in this guide
Running the Code¶
The source code provided with this guide is available as Jupyter Notebooks.
These can be run
- locally or offline (requires that you have a Jupyter Notebooks environment setup)
- online via browser in Colab. (No environment setup required)
The source code notebooks are available in Colab to run from your browser
Quote
Colab, or "Colaboratory", allows you to write and execute Python in your browser, with
- Zero configuration required
- Access to GPUs free of charge
- Easy sharing
Data Sources¶
Data Source | Detail | ~~ CVE count K |
---|---|---|
CISA KEV | Active Exploitation | 1 |
EPSS | Predictor of Exploitation | 220 |
Metasploit modules | Weaponized Exploit | 3 |
Nuclei templates | Weaponized Exploit | 2 |
ExploitDB | Published Exploit Code | 25 |
NVD CVE Data | NVD CVEs | 220 |
Qualys TruRisk Report | The 2023 Qualys TruRisk research report lists 190 CVEs from 2022 with QVS scores | .2 |
Microsoft Security Response Center (MSRC) | CVEs Exploited and with Exploitability Assessment | .2 |
Analysis¶
See analysis directory for these files.
- enrich_cves.ipynb
- Take the data sources from data_in/
- Enrich the CVE data from NVD with the other data sources
- Add an "Exploit" column to indicate the source of the exploitability (used later to set colors of CVE data in plots)
- store the output in data_out/nvd_cves_v3_enriched.csv.gz
- kev_epss_cvss.ipynb
- Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
- Read the data from CISA KEV alert reports in ./data_in/cisa_kev/
- Plot CISA KEV datasets showing EPSS, CVSS by source of the exploitability
- Write data_out/cisa_kev/csa/csa.csv.gz which is the CISA KEV CyberSecurity Alerts (CSA) subset with EPSS and other data
- qualys.ipynb
- Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
- Read the data from ./data_in/qualys
- Plot Qualys dataset showing EPSS, CVSS by source of the exploitability
- Write data_out/qualys/qualys.csv.gz which is the Qualys data with EPSS and other data
- msrc.ipynb
- Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
- Read the data from ./data_in/msrc
- Plot Microsoft Exploitability Index dataset showing EPSS, CVSS by source of the exploitability
- Write data_out/msrc/msrc.csv.gz which is the MSEI data with EPSS and other data
CISA SSVC Decision Trees¶
See cisa_ssvc_dt directory for these files.
CISA SSVC Decision Tree From Scratch Example Implementation¶
- Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
- Read the Decision Tree definition cisa_ssvc_dt/DT_rbp.csv
- Define the Decision Logic for the Decision Nodes
- Calculate the Decision Node Values for all CVEs
- Do some Exploratory Data Analysis with Venn Diagrams to understand our data
- Calculate the Output Decision from the Decision Node Values
- Plot Flow of All CVEs across the Decision Tree aka Sankey
- Read the Sankey Diagram template definition cisa_ssvc_dt/DT_sankey.csv
- Triage some CVEs
- Read a list of CVEs to triage cisa_ssvc_dt/triage/cves2triage.csv
- Get Decisions
- Plot
CISA SSVC Decision Tree Analysis for Feature Importance¶
- Read the Decision Tree definition cisa_ssvc_dt/DT_rbp.csv
- Perform Feature Importance using 2 methods
- Permutation Importance
- Drop-column Importance
See https://github.com/CERTCC/SSVC/issues/309 for the suggestion to add drop column importance to CISA SSVC.
Getting Data from Data Sources¶
A snapshot of the data used for this guide is available
A snapshot of this data is already available with the source in data_in
- A date.txt file is included in each folder with the data that contains the date of download.
But you can download current data as described here.
- get_data.sh gets the data that can be downloaded automatically and used as-is.
- Other data is manually downloaded - see instructions below.
- MSRC
- ExploitDB
- GPZ
- Larger files are gzip'd
National Vulnerability Database (NVD)¶
Get NVD data automatically
- A notebook or script in nvd downloads the NVD data.
- The data is output to data_out/CVSSData.csv.gz
- Note: The download method used will be deprecated some time after Dec 2023 per https://nvd.nist.gov/vuln/data-feeds
Google Project Zero (GPZ)¶
See 0day "In the Wild" GoogleSheet
- Select "All" tab.
- File - Download as csv
Microsoft Security Response Center (MSRC)¶
- Go to https://msrc.microsoft.com/update-guide/vulnerability
- Edit columns - ensure these columns are selected "Exploitability Assessment" and "Exploited"
- Download
Qualys TruRisk Report¶
The CVE data was extracted from the Qualys TruRisk Report PDF. This data is static so a date.txt is not included.
ExploitDB¶
- Download https://gitlab.com/exploit-database/exploitdb/-/blob/main/files_exploits.csv (manually for now - credentials required for automation)
- Extract the CVEs using the script in the directory i.e. some entries don't have CVEs - and have only Open Source Vulnerability Database (OSVDB) entries instead.
Other Vulnerability Data Sources¶
See other vulnerability data sources that are not currently used here.