Predicting Vulnerability Exploitation: A Machine Learning Approach

Cybersecurity Risk Analysis using EPSS, CISA KEV & NIST NVD Data

Author

Patrick Lefler

Published

February 3, 2026

Abstract

Security teams face a paradox: thousands of vulnerabilities are disclosed annually, yet only a fraction are ever exploited in the wild. Treating every CVE as an equal priority is not a risk strategy — it is the absence of one. This analysis constructs a Random Forest classification model to predict which vulnerabilities will be actively weaponized, drawing on three real-time data sources: NIST’s National Vulnerability Database, CISA’s Known Exploited Vulnerabilities catalog, and the Exploit Prediction Scoring System. The dataset spans 7,150 CVEs across an extreme class imbalance — 37 confirmed exploits against 7,113 non-exploited entries — addressed through SMOTE resampling during training. The final model achieves 99.44% accuracy and a 99.32% ROC-AUC on held-out data. A critical finding: EPSS score and percentile rank predict exploitation far more reliably than CVSS severity grades. The practical implication is direct — organizations that prioritize by severity score are solving the wrong problem.

Introduction

In today’s threat landscape, organizations face an overwhelming deluge of Common Vulnerabilities and Exposures (CVEs) published daily. For executive risk and operations teams, the current “patch everything” mentality is no longer viable as it depletes critical IT resources, induces alert fatigue, and fails to meaningfully reduce material business risk. Despite the noisy volume of reported vulnerabilities, existent data indicates that only a small fraction are successfully exploited by threat actors in the wild.

This analysis bridges the gap between raw theoretical data and actionable risk intelligence by utilizing innovative machine learning techniques, specifically a Random Forest classification model. By synthesizing thousands of data points from standard government and industry sources, this model accurately predicts the likelihood of a vulnerability being exploited. This prognostic capability empowers security operations to shift from a reactive, compliance-driven posture to a proactive, risk-based prioritization strategy. Ultimately, this ensures that resources are allocated specifically to the threats that pose the most danger to business continuity.

While thousands of Common Vulnerabilities and Exposures (CVEs) are published annually, only a fraction are actively exploited in the wild. This analysis utilizes innovative machine learning (Random Forest) to predict which vulnerabilities pose a authentic threat, allowing security teams to shift from a “patch everything” mentality to a risk-based prioritization model.

Display code

library(ggplot2)
library(gt)
library(httr2)
library(janitor)
library(jsonlite)
library(kableExtra)
library(lubridate)
library(plotly)
library(skimr)
library(themis)
library(tidymodels)
library(tidyverse)
library(vip)

readLiveData = FALSE # If TRUE, read EPSS, KEV & NVD data live via API; if FALSE, read pre-loaded data via ""data" folder

Data Acquisition

The foundation of any strong prognostic risk model is the quality, timeliness, and diversity of its underlying data. In this phase, we programmatically ingest and aggregate threat intelligence from three premier, real time cybersecurity sources: NIST’s National Vulnerability Database (NVD) for core vulnerability characteristics, CISA’s Known Exploited Vulnerabilities (KEV) catalog for exploitation data, and the Exploit Prediction Scoring System (EPSS) for probabilistic threat assessments. Combining these datasets into a single dataset provides a comprehensive, multi-dimensional view of the threat landscape, allowing the model to detect complex patterns that a human analyst might miss.

1. Download CISA Key Exploited Vulnerabilities (KEV) Catalog data

CISA Known Exploited Vulnerabilities (KEV) Catalog

The CISA Known Exploited Vulnerabilities (KEV) Catalog is a dynamic list maintained by the U.S. Cybersecurity and Infrastructure Security Agency. It aggregates CVEs confirmed to be actively exploited in the wild, shifting focus from theoretical risk to real-world threats. This authoritative resource helps organizations prioritize patching effectively. While mandatory for U.S. federal agencies under Binding Operational Directive 22-01, the catalog is an essential tool for any organization seeking to reduce its attack surface against active adversaries.For more information on the CISA Key Exploited Vulnerabilities Catalog: CISA Key Exploited Vulnerabilities Catalog

Display code

if (readLiveData == TRUE) {

kev_url <- "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"

kev_raw <- fromJSON(kev_url)
kev_data <- kev_raw$vulnerabilities |> 
  clean_names() |> 
  select(cve_id, date_added, due_date, known_ransomware_campaign_use) |> 
  mutate(is_exploited = TRUE) 
} else {kev_data <- read_csv("data/kev_data.csv")
}
  

tbl_data <- kev_data |>
  slice(1:6) |>
  rename("CVE ID" = cve_id,
        "Date Added" = date_added,
        "Due Date" = due_date,
        "Known Ransomware Campaign Use" = known_ransomware_campaign_use,
        "Is Exploited" = is_exploited)
         
kable(tbl_data, 
      caption = "CISA Known Exploited Vulnerabilities (KEV) Catalog: First Six Rows",
      format = "html") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", font_size = 7, full_width = F))

CISA Known Exploited Vulnerabilities (KEV) Catalog: First Six Rows
CVE ID	Date Added	Due Date	Known Ransomware Campaign Use	Is Exploited
CVE-2018-14634	2026-01-26	2026-02-16	Unknown	TRUE
CVE-2025-52691	2026-01-26	2026-02-16	Unknown	TRUE
CVE-2026-23760	2026-01-26	2026-02-16	Unknown	TRUE
CVE-2026-24061	2026-01-26	2026-02-16	Unknown	TRUE
CVE-2026-21509	2026-01-26	2026-02-16	Unknown	TRUE
CVE-2024-37079	2026-01-23	2026-02-13	Unknown	TRUE

2. Download NIST National Vulnerability Database (NVD)

NIST National Vulnerability Database (NVD)

The National Vulnerability Database (NVD) is the U.S. government’s central repository for standards-based vulnerability management data, maintained by the National Institute of Standards and Technology (NIST). It enriches the MITRE CVE list with detailed analysis, including CVSS severity scores and affected product configurations, enabling automation in vulnerability management. While the CISA KEV catalog identifies only those threats with confirmed active exploitation, the NVD functions as an exhaustive encyclopedia containing every reported software vulnerability, regardless of its immediate real-world threat status. For more information on the NIST National Vulnerabilities Database: NIST National Vulnerability Database

Display code

### 1.2 FETCH and WRANGLE NVD VULNERABILITY DATA (365 DAYS via PAGINATION)

if(readLiveData == TRUE) {

date_intervals <- tibble(
  start = c(Sys.Date() - 360, Sys.Date() - 270, Sys.Date() - 180, Sys.Date() - 90),
  end   = c(Sys.Date() - 271, Sys.Date() - 181, Sys.Date() - 91,  Sys.Date())
)

fetch_nvd_chunk <- function(start_date, end_date) {
  nist_start <- paste0(start_date, "T00:00:00.000")
  nist_end <- paste0(end_date, "T23:59:59.000")
  
  req <- request("https://services.nvd.nist.gov/rest/json/cves/2.0") |> 
    req_url_query(pubStartDate = nist_start, pubEndDate = nist_end) |> 
    req_headers(apiKey = Sys.getenv("NIST_API_KEY")) |> 
    req_retry(max_tries = 3) |> 
    req_throttle(rate = 50 / 30) 
  
  resp <- req_perform(req)
  resp_body_json(resp)
}

nvd_raw_list <- map2(date_intervals$start, date_intervals$end, fetch_nvd_chunk)

extract_nvd_features <- function(item) {
  # Some CVEs in NVD don't have CVSS metrics yet. We use a safe extractor.
  metrics <- item$cve$metrics$cvssMetricV31[[1]]$cvssData
  
  tibble(
    cve_id = item$cve$id,
    published_date = as.Date(item$cve$published),
    base_score = metrics$baseScore,
    base_severity = metrics$baseSeverity,
    attack_vector = metrics$attackVector,
    attack_complexity = metrics$attackComplexity,
    privileges_required = metrics$privilegesRequired,
    user_interaction = metrics$userInteraction
  )
}

# Apply the function to the list of chunks
nvd_flat <- map_df(nvd_raw_list, ~ map_df(.x$vulnerabilities, extract_nvd_features))
} else {nvd_flat <- read_csv("data/nvd_flat.csv")
}


tbl_data <- nvd_flat |>
  slice(1:6) |>
  rename("CVE ID" = cve_id,
         "Published Date" = published_date,
         "Base Score" = base_score,
         "Base Severity" = base_severity,
         "Attack Vector" = attack_vector,
         "Attack Complexity" = attack_complexity,
         "Privileges Required" = privileges_required,
         "User Interaction" = user_interaction)

kable(tbl_data, 
      caption = "NIST National Vulnerability Database (NVD): First Six Rows",
      format = "html") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

NIST National Vulnerability Database (NVD): First Six Rows
CVE ID	Published Date	Base Score	Base Severity	Attack Vector	Attack Complexity	Privileges Required	User Interaction
CVE-2024-11780	2025-02-01	6.4	MEDIUM	NETWORK	LOW	LOW	NONE
CVE-2024-12171	2025-02-01	8.8	HIGH	NETWORK	LOW	LOW	NONE
CVE-2024-12184	2025-02-01	5.3	MEDIUM	NETWORK	LOW	NONE	NONE
CVE-2024-12620	2025-02-01	5.3	MEDIUM	NETWORK	LOW	NONE	NONE
CVE-2024-13343	2025-02-01	8.8	HIGH	NETWORK	LOW	LOW	NONE
CVE-2024-13547	2025-02-01	6.4	MEDIUM	NETWORK	LOW	LOW	NONE

3. Download Exploit Prediction Scoring System (EPSS) data

Exploit Prediction Scoring System (EPSS)

The Exploit Prediction Scoring System (EPSS) is a data-driven effort for estimating the likelihood (probability) that a software vulnerability will be exploited in the wild. While other industry standards have been useful for capturing innate characteristics of a vulnerability and provide measures of severity, they are limited in their ability to assess threat. EPSS fills that gap because it uses current threat information from CVE and real-world exploit data. The EPSS model produces a probability score between 0 and 1 (0 and 100%). The higher the score, the greater the probability that a vulnerability will be exploited. For more information on the EPSS Scoring System: EPSS Scoring System

Display code

if (readLiveData == TRUE) {

epss_url <- paste0("https://epss.empiricalsecurity.com/epss_scores-", Sys.Date(), ".csv.gz")

epss_data <- read_csv(epss_url, comment = "#", show_col_types = FALSE) |> 
  clean_names() |> 
  rename(epss_score = epss)
} else {epss_data <- read_csv("data/epss_data.csv")
}

tbl_data <- epss_data |>
  slice(1:6) |>
  rename("CVE ID" = cve, 
         "EPSS Score" = epss_score, 
         "Percentile" = percentile) 

kable(tbl_data, 
      caption = "EPSS Scores: First Six Rows",
      format = "html") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", font_size = 7, full_width = F))

EPSS Scores: First Six Rows
CVE ID	EPSS Score	Percentile
CVE-1999-0001	0.01151	0.78081
CVE-1999-0002	0.09123	0.92445
CVE-1999-0003	0.89352	0.99527
CVE-1999-0004	0.03037	0.86289
CVE-1999-0005	0.13652	0.94053
CVE-1999-0006	0.08244	0.91994

4. Create the Comprehensive Dataset from the Three Disparate Datasets

Display code

ml_dataset <- nvd_flat |> 
  left_join(epss_data, by = c("cve_id" = "cve")) |> 
  left_join(kev_data, by = "cve_id") |> 
  mutate(
    is_exploited = replace_na(is_exploited, FALSE),
    days_since_pub = as.numeric(Sys.Date() - published_date)
  )

tbl_data <- ml_dataset |>
  slice(1:6) |>
  rename ("CVE ID" = cve_id,
    "Published Date" = published_date,
    "Base Score" = base_score,
    "Base Severity" = base_severity,
    "Attack Vector" = attack_vector,
    "Attack Complexity" = attack_complexity,
    "Privileges Required" = privileges_required,
    "User Interaction" = user_interaction,
    "EPSS Score" = epss_score,
    "Percentile" = percentile,
    "Date Added" = date_added,
    "Due Date" = due_date,
    "Known Ransonware Campaign Use" = known_ransomware_campaign_use,
    "Is Exploited" = is_exploited,
    "Days Since Published" = days_since_pub)
         
  kable(tbl_data, 
      caption = "Comprehensive Dataset: First Six Rows",
      format = "html") |>
      kable_styling(bootstrap_options = c("striped", "hover", "responsive", full_width = T))

Comprehensive Dataset: First Six Rows
CVE ID	Published Date	Base Score	Base Severity	Attack Vector	Attack Complexity	Privileges Required	User Interaction	EPSS Score	Percentile	Date Added	Due Date	Known Ransonware Campaign Use	Is Exploited	Days Since Published
CVE-2024-11780	2025-02-01	6.4	MEDIUM	NETWORK	LOW	LOW	NONE	0.00077	0.23040	NA	NA	NA	FALSE	453
CVE-2024-12171	2025-02-01	8.8	HIGH	NETWORK	LOW	LOW	NONE	0.00208	0.43092	NA	NA	NA	FALSE	453
CVE-2024-12184	2025-02-01	5.3	MEDIUM	NETWORK	LOW	NONE	NONE	0.00328	0.55221	NA	NA	NA	FALSE	453
CVE-2024-12620	2025-02-01	5.3	MEDIUM	NETWORK	LOW	NONE	NONE	0.00379	0.58824	NA	NA	NA	FALSE	453
CVE-2024-13343	2025-02-01	8.8	HIGH	NETWORK	LOW	LOW	NONE	0.00176	0.39195	NA	NA	NA	FALSE	453
CVE-2024-13547	2025-02-01	6.4	MEDIUM	NETWORK	LOW	LOW	NONE	0.00077	0.23040	NA	NA	NA	FALSE	453

Feature Engineering & Exploratory Data Analysis

Raw threat data is rarely ready for advanced modeling. In Phase 2, we perform Feature Engineering and Exploratory Data Analysis (EDA) to transform disparate metrics into high-quality predictive signals. We clean inconsistencies and impute missing values to ensure integrity. Most importantly, we visualize the critical “class imbalance”—the reality that while thousands of vulnerabilities exist, only a tiny fraction are actually exploited or weaponized. Understanding this disparity is vital for tuning the model to detect rare, high-impact threats without generating excessive false alarms.

Display code

# 1. FEATURE ENGINEERING & DATA TYPING

cve_features <- ml_dataset |> 
  # Filter out any malformed data (e.g., CVEs without base scores)
  filter(!is.na(base_score)) |> 
  mutate(
    # Ensure dates are properly formatted
    published_date = as.Date(published_date),
    
    # Feature 1: Time Decay (older CVEs may be less likely to be newly exploited)
    days_since_pub = as.numeric(Sys.Date() - published_date),
    
    # Feature 2: Is the EPSS score missing? (If so, impute with 0 or mean)
    epss_score = replace_na(epss_score, 0),
    
    # Convert character strings into categorical Factors for Machine Learning
    base_severity = factor(base_severity, levels = c("LOW", "MEDIUM", "HIGH", "CRITICAL")),
    attack_vector = as.factor(attack_vector),
    attack_complexity = as.factor(attack_complexity),
    privileges_required = as.factor(privileges_required),
    user_interaction = as.factor(user_interaction),
    
    # Ensure Target Variable is a factor for classification
    is_exploited = as.factor(is_exploited)
  ) |> 
  # Drop redundant or non-predictive columns for the model
  select(-cve_id, -date_added, -due_date, -known_ransomware_campaign_use)

Data Summary Table

The table below functions as a comprehensive “health check” for our dataset before advanced modeling begins. It provides a transparent inventory of every variable, categorizing them by type (e.g., numeric scores, logical indicators, or categories) and calculating key statistics like averages and distributions.

Crucially, this summary highlights the “Completion Rate” (n_missing), allowing us to verify that our prior data cleaning processes successfully resolved any gaps or errors. For stakeholders, this step validates the integrity of the raw materials used in our analysis. Just as a financial audit ensures accurate accounting, this summary confirms our risk model is built upon a foundation of complete, high-quality intelligence.

Display code

# 2. EXPLORATORY DATA ANALYSIS (EDA) & CLASS IMBALANCE CHECK

# In Quarto, 'skimr' creates a readable summary table of the data.

skim(cve_features)

Data summary
Name	cve_features
Number of rows	7150
Number of columns	11
_______________________
Column type frequency:
Date	1
factor	6
numeric	4
________________________
Group variables	None

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
published_date	0	1	2025-02-01	2025-11-12	2025-05-15	63

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
base_severity	1	1	FALSE	4	MED: 3493, HIG: 2666, CRI: 628, LOW: 362
attack_vector	0	1	FALSE	4	NET: 5227, LOC: 1641, ADJ: 208, PHY: 74
attack_complexity	0	1	FALSE	2	LOW: 6318, HIG: 832
privileges_required	0	1	FALSE	3	NON: 3602, LOW: 2760, HIG: 788
user_interaction	0	1	FALSE	2	NON: 4930, REQ: 2220
is_exploited	0	1	FALSE	2	FAL: 7113, TRU: 37

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
base_score	1	6.66	1.69	0	5.40	6.50	7.80	10.00	▁▁▅▇▃
epss_score	1	0.01	0.06	0	0.00	0.00	0.00	0.94	▇▁▁▁▁
percentile	1	0.24	0.22	0	0.08	0.17	0.31	1.00	▇▃▁▁▁
days_since_pub	1	311.85	100.03	169	258.00	350.00	436.00	453.00	▇▇▁▇▇

The Class Imbalance Plot

The bar chart below visually demonstrates the core challenge in vulnerability management: while thousands of software flaws exist, only a tiny fraction are ever weaponized by attackers. The massive disparity between the tall “Non-Exploited” bar and the small “Exploited” sliver proves that a “patch everything” strategy is inefficient. This data validates our need for a targeted AI model to pinpoint the few critical threats hiding within the noise. Note: The y-axis is log-scaled so that viewer can get a better sense of the values.

Display code

# 3. VISUALIZING THE RISK IMBALANCE

# It is critical to understand that most vulnerabilities are NOT exploited.
# This visualization proves the need for our ML model.

eda_plot <- ggplot(cve_features, aes(x = is_exploited, fill = is_exploited)) +
  geom_bar(alpha = 0.8) +
  scale_fill_manual(values = c("#2c3e50", "#e74c3c")) + # Professional color palette
  geom_text(
    stat = "count", 
    aes(label = scales::comma(after_stat(count))), 
    vjust = 2.0, 
    size = 3.5,
    color = "white", # Change font color to white
  ) +
  scale_y_continuous(trans='log10', labels = scales::comma) +
  labs(
    title = "Class Imbalance: Exploited vs. Non-Exploited Vulnerabilities",
    subtitle = "The vast majority (99.5%) of NVD vulnerabilities are never exploited in the wild.",
    x = "Is Exploited (per CISA KEV)",
    y = "Count of CVEs (log scale)",
    fill = "Exploited?"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

eda_plot

Machine Language Modeling

In Phase 3, we transition from preparation to predictive modeling. One of the challenges in cybersecurity risk analysis is measure the extreme rarity of actual exploitation. To overcome this, we implement the Synthetic Minority Over Sampling Technique (SMOTE), which mathematically balances the training data, forcing the algorithm to learn the nuanced characteristics of true threats. We then train a Random Forest classifier a robust, ensemble learning method capable of detecting complex, non linear patterns across our features. This training process ensures the model is not just memorizing data, but learning to generalize risk.

Display code

## 1. Data Splitting - Train / Test

set.seed(42)  # Set a seed for reproducibility

# Split the 'cve_features' dataframe from Phase 2 (80% train, 20% test)
# Stratify by our target variable to maintain the same ratio of exploited CVEs
cve_split <- initial_split(cve_features, prop = 0.80, strata = is_exploited)

cve_train <- training(cve_split)
cve_test  <- testing(cve_split)

# Create 10-fold cross-validation folds for model evaluation
cve_folds <- vfold_cv(cve_train, v = 10, strata = is_exploited)

print(paste("Training set:", nrow(cve_train), "CVEs. Testing set:", nrow(cve_test), "CVEs."))

#> [1] "Training set: 5720 CVEs. Testing set: 1430 CVEs."

Display code

# 2. MODEL SPECIFICATION (Random Forest)

rf_spec <- rand_forest(trees = 500) |> 
  set_engine("ranger", importance = "impurity") |> 
  set_mode("classification")

# 3. RECIPE DEFINITION (With Date Removal)

cve_recipe <- recipe(is_exploited ~ ., data = cve_train) |> 
  
  # Remove the Date column since we already have 'days_since_pub'
  step_rm(published_date) |>
  
  # Treat NAs in base_severity as a new category called "unknown"
  step_unknown(base_severity) |> 
  
  # Handle any new factor levels that might appear in future data
  step_novel(all_nominal_predictors()) |> 
  
  # One-Hot Encoding: Convert categorical factors into dummy variables
  step_dummy(all_nominal_predictors()) |> 
  
  # Remove any variables that have zero variance (no predictive value)
  step_zv(all_predictors()) |> 
  
  # Normalize numeric features (epss_score, base_score, days_since_pub)
  step_normalize(all_numeric_predictors()) |> 
  
  # Address class imbalance by oversampling the exploited class
  step_smote(is_exploited)


# 4.  WORKFLOW & TRAINING

# Combine model and recipe into a single workflow
cve_workflow <- workflow() |> 
  add_model(rf_spec) |> 
  add_recipe(cve_recipe)

# Train and evaluate the model using K-fold cross-validation
rf_resamples <- fit_resamples(
  cve_workflow,
  resamples = cve_folds,
  control = control_resamples(save_pred = TRUE)
)

The Summary Metrics Table

The table below serves as the model’s internal “report card” generated during the training phase. Rather than relying on a single test, these results represent the average performance across ten separate simulations to ensure reliability. Key metrics include Accuracy, which measures the percentage of correct predictions, and ROC_AUC, which scores the model’s ability to clearly distinguish between harmless and dangerous vulnerabilities. High values here confirm the model is consistent, robust, and ready for real-world application.

Display code

# Show the performance metrics
collect_metrics(rf_resamples)

.metric	.estimator	mean	n	std_err	.config
accuracy	binary	0.9949301	10	0.0008819	pre0_mod0_post0
brier_class	binary	0.0046705	10	0.0005472	pre0_mod0_post0
roc_auc	binary	0.9909070	10	0.0027999	pre0_mod0_post0

Evaluation & Business Insights

In this final phase, we transition from theoretical training to real-world validation. We test the model against a “holdout” dataset—vulnerabilities the system has never seen before—to prove its reliability in a live environment. Beyond abstract accuracy scores, we visualize the critical trade-off between “false alarms” (which waste resources) and “missed threats” (which introduce risk). This evaluation confirms the model is not only statistically robust but also operationally transparent and ready for deployment.

The Final Metrics Table

The table below represents the model’s “final exam” results, tested against a holdout dataset of vulnerabilities it had never encountered during training. The two critical metrics here are Accuracy and ROC_AUC. Accuracy measures the raw percentage of correct predictions, while ROC_AUC serves as a reliability score, indicating how well the model separates legitimate threats from false alarms. The high percent values for both confirm that the model hasn’t just memorized historical data but has learned to accurately forecast risk in a live, dynamic environment.

Display code

# 1. FINAL FIT (Evaluate on the Test Set)

# print("Fitting final model and predicting on the holdout test set...")

# 'last_fit' fits on the training data and evaluates on the test data defined in 'cve_split'
final_fit <- last_fit(cve_workflow, split = cve_split)

# View standard performance metrics (Accuracy, ROC_AUC) on the test data
final_metrics <- collect_metrics(final_fit)

tbl_data <- final_metrics |>
  rename(Metric = .metric,
         Estimator = .estimator,
         estimate = .estimate,
         Configuration = .config) 
  tbl_data$Estimate = paste0(round(tbl_data$estimate, 4) * 100, "%") 
  
  tbl_data <- tbl_data |>
    select(Metric, Estimator, Estimate, Configuration)
  

kable(tbl_data, 
      caption = "Final Metrics",
      format = "html") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", font_size = 7, full_width = F))

Final Metrics
Metric	Estimator	Estimate	Configuration
accuracy	binary	99.44%	pre0_mod0_post0
roc_auc	binary	99.32%	pre0_mod0_post0
brier_class	binary	0.44%	pre0_mod0_post0

The Confusion Matrix Summary

The heatmap below visualizes the operational reality of deploying the model. It compares our predictions against actual outcomes to reveal the cost of errors. The critical area for risk management is the False Negatives (missed threats), which represent exploited vulnerabilities that slipped through the cracks. Conversely, False Positives represent “false alarms” that waste remediation resources. This view helps leadership decide if the model is calibrated correctly to balance safety against efficiency.

Display code

# 2.  THE CONFUSION MATRIX

# Extract the predictions
test_predictions <- collect_predictions(final_fit)

# Generate and plot the Confusion Matrix
conf_matrix_plot <- test_predictions |> 
  conf_mat(truth = is_exploited, estimate = .pred_class) |> 
  autoplot(type = "heatmap") +
  labs(
    title = "Confusion Matrix: Test Set Results",
    subtitle = "Assessing False Positives (Wasted Effort) vs. False Negatives (Missed Threats)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

# Render Confusion Matrix
print(conf_matrix_plot)

The Receiver Operating Characteristic (ROC) Curve

The Receiver Operating Characteristic (ROC) curve illustrates the model’s overall predictive power across different thresholds. It visualizes the trade-off between “catching true threats” (Sensitivity) and “avoiding false alarms” (Specificity). A curve that hugs the top-left corner indicates a superior model that successfully separates dangerous vulnerabilities from harmless ones. The Area Under the Curve (AUC) serves as a single quality score—the closer to 1.0, the more reliable our strategic risk predictions are.

Display code

# 3. RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE

# Visualizes the trade-off between sensitivity and specificity
roc_plot <- test_predictions |> 
  roc_curve(is_exploited, .pred_TRUE) |> 
  autoplot() +
  labs(
    title = "ROC Curve: Predictive Performance",
    subtitle = "A curve closer to the top-left indicates superior classification capability."
  ) +
  theme_minimal()

# Render ROC Curve
print(roc_plot)

Understanding the Drivers of Exploitation Risk

The chart below offers a transparent look inside the model’s decision-making process. It ranks the top specific features—such as EPSS scores, base severity ratings, or attack vectors—that the algorithm found most valuable when predicting exploitation. What’s most interesting is that the EPSS score and Percentile are, by far, the leading drivers of exploitation risk, whereas the actual risk score / severity grades (Critical, High, Medium & Low) don’t seem to have much effect on predicting exploitation risk within the current model.

For leadership, this visualization is critical because it moves beyond a simple “risk score” to explain why a vulnerability is flagged. By identifying these primary risk drivers, security teams can understand the root causes of threats and tailor their defense strategies to focus on the specific characteristics that matter most in the wild.

Display code

# 4. VARIABLE IMPORTANCE (VIP)

#| label: phase-4-vip

# Extract the fitted model from the workflow
final_tree <- extract_fit_parsnip(final_fit)

# 5. Extract the raw importance scores into a dataframe
importance_scores <- vi(final_tree) |>
  slice_max(Importance, n = 10) |>
  mutate(Variable01 = c("EPSS Score",
                        "Percentile",
                        "Severity: MEDIUM",
                        "Base Score",
                        "User Interaction: Required",
                        "Severity: CRITICAL",
                        "Days Since Published",
                        "Privileges Required: NONE",
                        "Severity: HIGH",
                        "Severity: LOW")) |>
  select(Variable01, Importance) |>
  rename(Variable = Variable01)

# 6. Plot top 10 features using native ggplot
ggplot(importance_scores |> slice_max(Importance, n = 10), 
       aes(x = Importance, y = reorder(Variable, Importance))) +
  
  # Create the bars with the professional dark blue color
  geom_col(fill = "#2c3e50", alpha = 0.9) +
  
  # Add white text labels inside the bars
  geom_text(
    aes(label = round(Importance, 2)), 
    hjust = - 0.25,       # Move text inside the right edge of the bar
    color = "black",   # White text
    size = 3
  ) +
  
  # Formatting and Labels
  labs(
    title = "Drivers of Exploitation Risk",
    subtitle = "Top 10 features utilized by the model to predict exploitation",
    y = "Vulnerability Feature",
    x = "Importance (Impurity Reduction)"
  ) +
  xlim(0, 2500) +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank() # Remove distracting horizontal grid lines
  )

Insights & Conclusion

The analysis presented here helps moves vulnerability management from a volume-based compliance task to a precision-based risk operation. By integrating historical exploitation data with predictive modeling, we have demonstrated that it is possible to identify the small percentage of vulnerabilities that pose a genuine threat to the organization. This approach does not merely add another tool to the security stack; it helps change how resources are allocated. Instead of diluting engineering efforts across thousands of theoretical risks, teams can focus their remediation cycles on the verified threats identified by this model.

While traditional scoring methods often overestimate risk, this model utilizes real-world signals—such as active exploitation and attack complexity—to refine those assessments. This distinction is critical for leadership. It means that “high severity” no longer automatically equals “high priority.” By adopting this data-driven framework, the organization can reduce alert fatigue and operational friction. Ultimately, this ensures that limited security budgets and engineering hours are better directed toward the vulnerabilities that have the most potential to disrupt business operations, providing a clearer return on investment for the cybersecurity program.

R Session Information

#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.2 (2025-10-31)
#>  os       macOS Tahoe 26.2
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2026-04-30
#>  pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>  quarto   1.8.26 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date (UTC) lib source
#>  archive        1.1.12     2025-03-20 [1] CRAN (R 4.5.0)
#>  backports      1.5.0      2024-05-23 [1] CRAN (R 4.5.0)
#>  base64enc      0.1-3      2015-07-28 [1] CRAN (R 4.5.0)
#>  bit            4.6.0      2025-03-06 [1] CRAN (R 4.5.0)
#>  bit64          4.6.0-1    2025-01-16 [1] CRAN (R 4.5.0)
#>  broom        * 1.0.10     2025-09-13 [1] CRAN (R 4.5.0)
#>  class          7.3-23     2025-01-01 [1] CRAN (R 4.5.2)
#>  cli            3.6.5      2025-04-23 [1] CRAN (R 4.5.0)
#>  codetools      0.2-20     2024-03-31 [1] CRAN (R 4.5.2)
#>  crayon         1.5.3      2024-06-20 [1] CRAN (R 4.5.0)
#>  data.table     1.17.8     2025-07-10 [1] CRAN (R 4.5.0)
#>  dials        * 1.4.2      2025-09-04 [1] CRAN (R 4.5.0)
#>  DiceDesign     1.10       2023-12-07 [1] CRAN (R 4.5.0)
#>  digest         0.6.39     2025-11-19 [1] CRAN (R 4.5.2)
#>  dplyr        * 1.1.4      2023-11-17 [1] CRAN (R 4.5.0)
#>  evaluate       1.0.5      2025-08-27 [1] CRAN (R 4.5.0)
#>  farver         2.1.2      2024-05-13 [1] CRAN (R 4.5.0)
#>  fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.5.0)
#>  forcats      * 1.0.1      2025-09-25 [1] CRAN (R 4.5.0)
#>  foreach        1.5.2      2022-02-02 [1] CRAN (R 4.5.0)
#>  fs             1.6.6      2025-04-12 [1] CRAN (R 4.5.0)
#>  furrr          0.3.1      2022-08-15 [1] CRAN (R 4.5.0)
#>  future         1.68.0     2025-11-17 [1] CRAN (R 4.5.2)
#>  future.apply   1.20.0     2025-06-06 [1] CRAN (R 4.5.0)
#>  generics       0.1.4      2025-05-09 [1] CRAN (R 4.5.0)
#>  ggplot2      * 4.0.2      2026-02-03 [1] CRAN (R 4.5.2)
#>  globals        0.18.0     2025-05-08 [1] CRAN (R 4.5.0)
#>  glue           1.8.0      2024-09-30 [1] CRAN (R 4.5.0)
#>  gower          1.0.2      2024-12-17 [1] CRAN (R 4.5.0)
#>  GPfit          1.0-9      2025-04-12 [1] CRAN (R 4.5.0)
#>  gt           * 1.3.0      2026-01-22 [1] CRAN (R 4.5.2)
#>  gtable         0.3.6      2024-10-25 [1] CRAN (R 4.5.0)
#>  hardhat        1.4.2      2025-08-20 [1] CRAN (R 4.5.0)
#>  hms            1.1.4      2025-10-17 [1] CRAN (R 4.5.0)
#>  htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.5.0)
#>  htmlwidgets    1.6.4      2023-12-06 [1] CRAN (R 4.5.0)
#>  httr           1.4.7      2023-08-15 [1] CRAN (R 4.5.0)
#>  httr2        * 1.2.2      2025-12-08 [1] CRAN (R 4.5.2)
#>  infer        * 1.0.9      2025-06-26 [1] CRAN (R 4.5.0)
#>  ipred          0.9-15     2024-07-18 [1] CRAN (R 4.5.0)
#>  iterators      1.0.14     2022-02-05 [1] CRAN (R 4.5.0)
#>  janitor      * 2.2.1      2024-12-22 [1] CRAN (R 4.5.0)
#>  jsonlite     * 2.0.0      2025-03-27 [1] CRAN (R 4.5.0)
#>  kableExtra   * 1.4.0      2024-01-24 [1] CRAN (R 4.5.0)
#>  knitr          1.50       2025-03-16 [1] CRAN (R 4.5.0)
#>  labeling       0.4.3      2023-08-29 [1] CRAN (R 4.5.0)
#>  lattice        0.22-7     2025-04-02 [1] CRAN (R 4.5.2)
#>  lava           1.8.2      2025-10-30 [1] CRAN (R 4.5.0)
#>  lazyeval       0.2.2      2019-03-15 [1] CRAN (R 4.5.0)
#>  lhs            1.2.0      2024-06-30 [1] CRAN (R 4.5.0)
#>  lifecycle      1.0.5      2026-01-08 [1] CRAN (R 4.5.2)
#>  listenv        0.10.0     2025-11-02 [1] CRAN (R 4.5.0)
#>  lubridate    * 1.9.4      2024-12-08 [1] CRAN (R 4.5.0)
#>  magrittr       2.0.4      2025-09-12 [1] CRAN (R 4.5.0)
#>  MASS           7.3-65     2025-02-28 [1] CRAN (R 4.5.2)
#>  Matrix         1.7-4      2025-08-28 [1] CRAN (R 4.5.2)
#>  modeldata    * 1.5.1      2025-08-22 [1] CRAN (R 4.5.0)
#>  nnet           7.3-20     2025-01-01 [1] CRAN (R 4.5.2)
#>  otel           0.2.0      2025-08-29 [1] CRAN (R 4.5.0)
#>  parallelly     1.45.1     2025-07-24 [1] CRAN (R 4.5.0)
#>  parsnip      * 1.3.3      2025-08-31 [1] CRAN (R 4.5.0)
#>  pillar         1.11.1     2025-09-17 [1] CRAN (R 4.5.0)
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.5.0)
#>  plotly       * 4.11.0     2025-06-19 [1] CRAN (R 4.5.0)
#>  prodlim        2025.04.28 2025-04-28 [1] CRAN (R 4.5.0)
#>  purrr        * 1.2.0      2025-11-04 [1] CRAN (R 4.5.0)
#>  R6             2.6.1      2025-02-15 [1] CRAN (R 4.5.0)
#>  ranger         0.17.0     2024-11-08 [1] CRAN (R 4.5.0)
#>  RANN           2.6.2      2024-08-25 [1] CRAN (R 4.5.0)
#>  rappdirs       0.3.3      2021-01-31 [1] CRAN (R 4.5.0)
#>  RColorBrewer   1.1-3      2022-04-03 [1] CRAN (R 4.5.0)
#>  Rcpp           1.1.0      2025-07-02 [1] CRAN (R 4.5.0)
#>  readr        * 2.1.5      2024-01-10 [1] CRAN (R 4.5.0)
#>  recipes      * 1.3.1      2025-05-21 [1] CRAN (R 4.5.0)
#>  repr           1.1.7      2024-03-22 [1] CRAN (R 4.5.0)
#>  rlang          1.1.7      2026-01-09 [1] CRAN (R 4.5.2)
#>  rmarkdown      2.30       2025-09-28 [1] CRAN (R 4.5.0)
#>  ROSE           0.0-4      2021-06-14 [1] CRAN (R 4.5.0)
#>  rpart          4.1.24     2025-01-07 [1] CRAN (R 4.5.2)
#>  rsample      * 1.3.1      2025-07-29 [1] CRAN (R 4.5.0)
#>  rstudioapi     0.17.1     2024-10-22 [1] CRAN (R 4.5.0)
#>  S7             0.2.1      2025-11-14 [1] CRAN (R 4.5.2)
#>  scales       * 1.4.0      2025-04-24 [1] CRAN (R 4.5.0)
#>  sessioninfo    1.2.3      2025-02-05 [1] CRAN (R 4.5.0)
#>  skimr        * 2.2.2      2026-01-10 [1] CRAN (R 4.5.2)
#>  snakecase      0.11.1     2023-08-27 [1] CRAN (R 4.5.0)
#>  sparsevctrs    0.3.4      2025-05-25 [1] CRAN (R 4.5.0)
#>  stringi        1.8.7      2025-03-27 [1] CRAN (R 4.5.0)
#>  stringr      * 1.6.0      2025-11-04 [1] CRAN (R 4.5.0)
#>  survival       3.8-6      2026-01-16 [1] CRAN (R 4.5.2)
#>  svglite        2.2.2      2025-10-21 [1] CRAN (R 4.5.0)
#>  systemfonts    1.3.1      2025-10-01 [1] CRAN (R 4.5.0)
#>  tailor       * 0.1.0      2025-08-25 [1] CRAN (R 4.5.0)
#>  textshaping    1.0.4      2025-10-10 [1] CRAN (R 4.5.0)
#>  themis       * 1.0.3      2025-01-23 [1] CRAN (R 4.5.0)
#>  tibble       * 3.3.0      2025-06-08 [1] CRAN (R 4.5.0)
#>  tidymodels   * 1.4.1      2025-09-08 [1] CRAN (R 4.5.0)
#>  tidyr        * 1.3.1      2024-01-24 [1] CRAN (R 4.5.0)
#>  tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.5.0)
#>  tidyverse    * 2.0.0      2023-02-22 [1] CRAN (R 4.5.0)
#>  timechange     0.3.0      2024-01-18 [1] CRAN (R 4.5.0)
#>  timeDate       4051.111   2025-10-17 [1] CRAN (R 4.5.0)
#>  tune         * 2.0.1      2025-10-17 [1] CRAN (R 4.5.0)
#>  tzdb           0.5.0      2025-03-15 [1] CRAN (R 4.5.0)
#>  vctrs          0.7.1      2026-01-23 [1] CRAN (R 4.5.2)
#>  vip          * 0.4.5      2025-12-12 [1] CRAN (R 4.5.2)
#>  viridisLite    0.4.3      2026-02-04 [1] CRAN (R 4.5.2)
#>  vroom          1.6.6      2025-09-19 [1] CRAN (R 4.5.0)
#>  withr          3.0.2      2024-10-28 [1] CRAN (R 4.5.0)
#>  workflows    * 1.3.0      2025-08-27 [1] CRAN (R 4.5.0)
#>  workflowsets * 1.1.1      2025-05-27 [1] CRAN (R 4.5.0)
#>  xfun           0.54       2025-10-30 [1] CRAN (R 4.5.0)
#>  xml2           1.4.1      2025-10-27 [1] CRAN (R 4.5.0)
#>  yaml           2.3.10     2024-07-26 [1] CRAN (R 4.5.0)
#>  yardstick    * 1.3.2      2025-01-22 [1] CRAN (R 4.5.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Rendered with Quarto | Packages: gt httr2 janitor jsonlite kableExtra lubridate plotly skimr themis tidymodels tidyverse