Predicting Vulnerability Exploitation: A Machine Learning Approach

Cybersecurity Risk Analysis using EPSS, CISA KEV & NIST NVD Data

Author

Patrick Lefler

Published

February 3, 2026

Abstract
Security teams face a paradox: thousands of vulnerabilities are disclosed annually, yet only a fraction are ever exploited in the wild. Treating every CVE as an equal priority is not a risk strategy — it is the absence of one. This analysis constructs a Random Forest classification model to predict which vulnerabilities will be actively weaponized, drawing on three real-time data sources: NIST’s National Vulnerability Database, CISA’s Known Exploited Vulnerabilities catalog, and the Exploit Prediction Scoring System. The dataset spans 7,150 CVEs across an extreme class imbalance — 37 confirmed exploits against 7,113 non-exploited entries — addressed through SMOTE resampling during training. The final model achieves 99.44% accuracy and a 99.32% ROC-AUC on held-out data. A critical finding: EPSS score and percentile rank predict exploitation far more reliably than CVSS severity grades. The practical implication is direct — organizations that prioritize by severity score are solving the wrong problem.

Introduction

In today’s threat landscape, organizations face an overwhelming deluge of Common Vulnerabilities and Exposures (CVEs) published daily. For executive risk and operations teams, the current “patch everything” mentality is no longer viable as it depletes critical IT resources, induces alert fatigue, and fails to meaningfully reduce material business risk. Despite the noisy volume of reported vulnerabilities, existent data indicates that only a small fraction are successfully exploited by threat actors in the wild.

This analysis bridges the gap between raw theoretical data and actionable risk intelligence by utilizing innovative machine learning techniques, specifically a Random Forest classification model. By synthesizing thousands of data points from standard government and industry sources, this model accurately predicts the likelihood of a vulnerability being exploited. This prognostic capability empowers security operations to shift from a reactive, compliance-driven posture to a proactive, risk-based prioritization strategy. Ultimately, this ensures that resources are allocated specifically to the threats that pose the most danger to business continuity.

While thousands of Common Vulnerabilities and Exposures (CVEs) are published annually, only a fraction are actively exploited in the wild. This analysis utilizes innovative machine learning (Random Forest) to predict which vulnerabilities pose a authentic threat, allowing security teams to shift from a “patch everything” mentality to a risk-based prioritization model.

Display code
library(ggplot2)
library(gt)
library(httr2)
library(janitor)
library(jsonlite)
library(kableExtra)
library(lubridate)
library(plotly)
library(skimr)
library(themis)
library(tidymodels)
library(tidyverse)
library(vip)

readLiveData = FALSE # If TRUE, read EPSS, KEV & NVD data live via API; if FALSE, read pre-loaded data via ""data" folder

Data Acquisition

The foundation of any strong prognostic risk model is the quality, timeliness, and diversity of its underlying data. In this phase, we programmatically ingest and aggregate threat intelligence from three premier, real time cybersecurity sources: NIST’s National Vulnerability Database (NVD) for core vulnerability characteristics, CISA’s Known Exploited Vulnerabilities (KEV) catalog for exploitation data, and the Exploit Prediction Scoring System (EPSS) for probabilistic threat assessments. Combining these datasets into a single dataset provides a comprehensive, multi-dimensional view of the threat landscape, allowing the model to detect complex patterns that a human analyst might miss.

1. Download CISA Key Exploited Vulnerabilities (KEV) Catalog data
NoteCISA Known Exploited Vulnerabilities (KEV) Catalog

The CISA Known Exploited Vulnerabilities (KEV) Catalog is a dynamic list maintained by the U.S. Cybersecurity and Infrastructure Security Agency. It aggregates CVEs confirmed to be actively exploited in the wild, shifting focus from theoretical risk to real-world threats. This authoritative resource helps organizations prioritize patching effectively. While mandatory for U.S. federal agencies under Binding Operational Directive 22-01, the catalog is an essential tool for any organization seeking to reduce its attack surface against active adversaries.For more information on the CISA Key Exploited Vulnerabilities Catalog: CISA Key Exploited Vulnerabilities Catalog

Display code
if (readLiveData == TRUE) {

kev_url <- "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"

kev_raw <- fromJSON(kev_url)
kev_data <- kev_raw$vulnerabilities |> 
  clean_names() |> 
  select(cve_id, date_added, due_date, known_ransomware_campaign_use) |> 
  mutate(is_exploited = TRUE) 
} else {kev_data <- read_csv("data/kev_data.csv")
}
  

tbl_data <- kev_data |>
  slice(1:6) |>
  rename("CVE ID" = cve_id,
        "Date Added" = date_added,
        "Due Date" = due_date,
        "Known Ransomware Campaign Use" = known_ransomware_campaign_use,
        "Is Exploited" = is_exploited)
         
kable(tbl_data, 
      caption = "CISA Known Exploited Vulnerabilities (KEV) Catalog: First Six Rows",
      format = "html") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", font_size = 7, full_width = F))
CISA Known Exploited Vulnerabilities (KEV) Catalog: First Six Rows
CVE ID Date Added Due Date Known Ransomware Campaign Use Is Exploited
CVE-2018-14634 2026-01-26 2026-02-16 Unknown TRUE
CVE-2025-52691 2026-01-26 2026-02-16 Unknown TRUE
CVE-2026-23760 2026-01-26 2026-02-16 Unknown TRUE
CVE-2026-24061 2026-01-26 2026-02-16 Unknown TRUE
CVE-2026-21509 2026-01-26 2026-02-16 Unknown TRUE
CVE-2024-37079 2026-01-23 2026-02-13 Unknown TRUE


2. Download NIST National Vulnerability Database (NVD)
NoteNIST National Vulnerability Database (NVD)

The National Vulnerability Database (NVD) is the U.S. government’s central repository for standards-based vulnerability management data, maintained by the National Institute of Standards and Technology (NIST). It enriches the MITRE CVE list with detailed analysis, including CVSS severity scores and affected product configurations, enabling automation in vulnerability management. While the CISA KEV catalog identifies only those threats with confirmed active exploitation, the NVD functions as an exhaustive encyclopedia containing every reported software vulnerability, regardless of its immediate real-world threat status. For more information on the NIST National Vulnerabilities Database: NIST National Vulnerability Database

Display code
### 1.2 FETCH and WRANGLE NVD VULNERABILITY DATA (365 DAYS via PAGINATION)

if(readLiveData == TRUE) {

date_intervals <- tibble(
  start = c(Sys.Date() - 360, Sys.Date() - 270, Sys.Date() - 180, Sys.Date() - 90),
  end   = c(Sys.Date() - 271, Sys.Date() - 181, Sys.Date() - 91,  Sys.Date())
)

fetch_nvd_chunk <- function(start_date, end_date) {
  nist_start <- paste0(start_date, "T00:00:00.000")
  nist_end <- paste0(end_date, "T23:59:59.000")
  
  req <- request("https://services.nvd.nist.gov/rest/json/cves/2.0") |> 
    req_url_query(pubStartDate = nist_start, pubEndDate = nist_end) |> 
    req_headers(apiKey = Sys.getenv("NIST_API_KEY")) |> 
    req_retry(max_tries = 3) |> 
    req_throttle(rate = 50 / 30) 
  
  resp <- req_perform(req)
  resp_body_json(resp)
}

nvd_raw_list <- map2(date_intervals$start, date_intervals$end, fetch_nvd_chunk)

extract_nvd_features <- function(item) {
  # Some CVEs in NVD don't have CVSS metrics yet. We use a safe extractor.
  metrics <- item$cve$metrics$cvssMetricV31[[1]]$cvssData
  
  tibble(
    cve_id = item$cve$id,
    published_date = as.Date(item$cve$published),
    base_score = metrics$baseScore,
    base_severity = metrics$baseSeverity,
    attack_vector = metrics$attackVector,
    attack_complexity = metrics$attackComplexity,
    privileges_required = metrics$privilegesRequired,
    user_interaction = metrics$userInteraction
  )
}

# Apply the function to the list of chunks
nvd_flat <- map_df(nvd_raw_list, ~ map_df(.x$vulnerabilities, extract_nvd_features))
} else {nvd_flat <- read_csv("data/nvd_flat.csv")
}


tbl_data <- nvd_flat |>
  slice(1:6) |>
  rename("CVE ID" = cve_id,
         "Published Date" = published_date,
         "Base Score" = base_score,
         "Base Severity" = base_severity,
         "Attack Vector" = attack_vector,
         "Attack Complexity" = attack_complexity,
         "Privileges Required" = privileges_required,
         "User Interaction" = user_interaction)

kable(tbl_data, 
      caption = "NIST National Vulnerability Database (NVD): First Six Rows",
      format = "html") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
NIST National Vulnerability Database (NVD): First Six Rows
CVE ID Published Date Base Score Base Severity Attack Vector Attack Complexity Privileges Required User Interaction
CVE-2024-11780 2025-02-01 6.4 MEDIUM NETWORK LOW LOW NONE
CVE-2024-12171 2025-02-01 8.8 HIGH NETWORK LOW LOW NONE
CVE-2024-12184 2025-02-01 5.3 MEDIUM NETWORK LOW NONE NONE
CVE-2024-12620 2025-02-01 5.3 MEDIUM NETWORK LOW NONE NONE
CVE-2024-13343 2025-02-01 8.8 HIGH NETWORK LOW LOW NONE
CVE-2024-13547 2025-02-01 6.4 MEDIUM NETWORK LOW LOW NONE


3. Download Exploit Prediction Scoring System (EPSS) data
NoteExploit Prediction Scoring System (EPSS)

The Exploit Prediction Scoring System (EPSS) is a data-driven effort for estimating the likelihood (probability) that a software vulnerability will be exploited in the wild. While other industry standards have been useful for capturing innate characteristics of a vulnerability and provide measures of severity, they are limited in their ability to assess threat. EPSS fills that gap because it uses current threat information from CVE and real-world exploit data. The EPSS model produces a probability score between 0 and 1 (0 and 100%). The higher the score, the greater the probability that a vulnerability will be exploited. For more information on the EPSS Scoring System: EPSS Scoring System

Display code
if (readLiveData == TRUE) {

epss_url <- paste0("https://epss.empiricalsecurity.com/epss_scores-", Sys.Date(), ".csv.gz")

epss_data <- read_csv(epss_url, comment = "#", show_col_types = FALSE) |> 
  clean_names() |> 
  rename(epss_score = epss)
} else {epss_data <- read_csv("data/epss_data.csv")
}

tbl_data <- epss_data |>
  slice(1:6) |>
  rename("CVE ID" = cve, 
         "EPSS Score" = epss_score, 
         "Percentile" = percentile) 

kable(tbl_data, 
      caption = "EPSS Scores: First Six Rows",
      format = "html") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", font_size = 7, full_width = F))
EPSS Scores: First Six Rows
CVE ID EPSS Score Percentile
CVE-1999-0001 0.01151 0.78081
CVE-1999-0002 0.09123 0.92445
CVE-1999-0003 0.89352 0.99527
CVE-1999-0004 0.03037 0.86289
CVE-1999-0005 0.13652 0.94053
CVE-1999-0006 0.08244 0.91994


4. Create the Comprehensive Dataset from the Three Disparate Datasets
Display code
ml_dataset <- nvd_flat |> 
  left_join(epss_data, by = c("cve_id" = "cve")) |> 
  left_join(kev_data, by = "cve_id") |> 
  mutate(
    is_exploited = replace_na(is_exploited, FALSE),
    days_since_pub = as.numeric(Sys.Date() - published_date)
  )

tbl_data <- ml_dataset |>
  slice(1:6) |>
  rename ("CVE ID" = cve_id,
    "Published Date" = published_date,
    "Base Score" = base_score,
    "Base Severity" = base_severity,
    "Attack Vector" = attack_vector,
    "Attack Complexity" = attack_complexity,
    "Privileges Required" = privileges_required,
    "User Interaction" = user_interaction,
    "EPSS Score" = epss_score,
    "Percentile" = percentile,
    "Date Added" = date_added,
    "Due Date" = due_date,
    "Known Ransonware Campaign Use" = known_ransomware_campaign_use,
    "Is Exploited" = is_exploited,
    "Days Since Published" = days_since_pub)
         
  kable(tbl_data, 
      caption = "Comprehensive Dataset: First Six Rows",
      format = "html") |>
      kable_styling(bootstrap_options = c("striped", "hover", "responsive", full_width = T))
Comprehensive Dataset: First Six Rows
CVE ID Published Date Base Score Base Severity Attack Vector Attack Complexity Privileges Required User Interaction EPSS Score Percentile Date Added Due Date Known Ransonware Campaign Use Is Exploited Days Since Published
CVE-2024-11780 2025-02-01 6.4 MEDIUM NETWORK LOW LOW NONE 0.00077 0.23040 NA NA NA FALSE 453
CVE-2024-12171 2025-02-01 8.8 HIGH NETWORK LOW LOW NONE 0.00208 0.43092 NA NA NA FALSE 453
CVE-2024-12184 2025-02-01 5.3 MEDIUM NETWORK LOW NONE NONE 0.00328 0.55221 NA NA NA FALSE 453
CVE-2024-12620 2025-02-01 5.3 MEDIUM NETWORK LOW NONE NONE 0.00379 0.58824 NA NA NA FALSE 453
CVE-2024-13343 2025-02-01 8.8 HIGH NETWORK LOW LOW NONE 0.00176 0.39195 NA NA NA FALSE 453
CVE-2024-13547 2025-02-01 6.4 MEDIUM NETWORK LOW LOW NONE 0.00077 0.23040 NA NA NA FALSE 453

Feature Engineering & Exploratory Data Analysis

Raw threat data is rarely ready for advanced modeling. In Phase 2, we perform Feature Engineering and Exploratory Data Analysis (EDA) to transform disparate metrics into high-quality predictive signals. We clean inconsistencies and impute missing values to ensure integrity. Most importantly, we visualize the critical “class imbalance”—the reality that while thousands of vulnerabilities exist, only a tiny fraction are actually exploited or weaponized. Understanding this disparity is vital for tuning the model to detect rare, high-impact threats without generating excessive false alarms.

Display code
# 1. FEATURE ENGINEERING & DATA TYPING

cve_features <- ml_dataset |> 
  # Filter out any malformed data (e.g., CVEs without base scores)
  filter(!is.na(base_score)) |> 
  mutate(
    # Ensure dates are properly formatted
    published_date = as.Date(published_date),
    
    # Feature 1: Time Decay (older CVEs may be less likely to be newly exploited)
    days_since_pub = as.numeric(Sys.Date() - published_date),
    
    # Feature 2: Is the EPSS score missing? (If so, impute with 0 or mean)
    epss_score = replace_na(epss_score, 0),
    
    # Convert character strings into categorical Factors for Machine Learning
    base_severity = factor(base_severity, levels = c("LOW", "MEDIUM", "HIGH", "CRITICAL")),
    attack_vector = as.factor(attack_vector),
    attack_complexity = as.factor(attack_complexity),
    privileges_required = as.factor(privileges_required),
    user_interaction = as.factor(user_interaction),
    
    # Ensure Target Variable is a factor for classification
    is_exploited = as.factor(is_exploited)
  ) |> 
  # Drop redundant or non-predictive columns for the model
  select(-cve_id, -date_added, -due_date, -known_ransomware_campaign_use)
NoteData Summary Table

The table below functions as a comprehensive “health check” for our dataset before advanced modeling begins. It provides a transparent inventory of every variable, categorizing them by type (e.g., numeric scores, logical indicators, or categories) and calculating key statistics like averages and distributions.

Crucially, this summary highlights the “Completion Rate” (n_missing), allowing us to verify that our prior data cleaning processes successfully resolved any gaps or errors. For stakeholders, this step validates the integrity of the raw materials used in our analysis. Just as a financial audit ensures accurate accounting, this summary confirms our risk model is built upon a foundation of complete, high-quality intelligence.

Display code
# 2. EXPLORATORY DATA ANALYSIS (EDA) & CLASS IMBALANCE CHECK

# In Quarto, 'skimr' creates a readable summary table of the data.

skim(cve_features)
Data summary
Name cve_features
Number of rows 7150
Number of columns 11
_______________________
Column type frequency:
Date 1
factor 6
numeric 4
________________________
Group variables None

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
published_date 0 1 2025-02-01 2025-11-12 2025-05-15 63

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
base_severity 1 1 FALSE 4 MED: 3493, HIG: 2666, CRI: 628, LOW: 362
attack_vector 0 1 FALSE 4 NET: 5227, LOC: 1641, ADJ: 208, PHY: 74
attack_complexity 0 1 FALSE 2 LOW: 6318, HIG: 832
privileges_required 0 1 FALSE 3 NON: 3602, LOW: 2760, HIG: 788
user_interaction 0 1 FALSE 2 NON: 4930, REQ: 2220
is_exploited 0 1 FALSE 2 FAL: 7113, TRU: 37

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
base_score 0 1 6.66 1.69 0 5.40 6.50 7.80 10.00 ▁▁▅▇▃
epss_score 0 1 0.01 0.06 0 0.00 0.00 0.00 0.94 ▇▁▁▁▁
percentile 0 1 0.24 0.22 0 0.08 0.17 0.31 1.00 ▇▃▁▁▁
days_since_pub 0 1 311.85 100.03 169 258.00 350.00 436.00 453.00 ▇▇▁▇▇
NoteThe Class Imbalance Plot

The bar chart below visually demonstrates the core challenge in vulnerability management: while thousands of software flaws exist, only a tiny fraction are ever weaponized by attackers. The massive disparity between the tall “Non-Exploited” bar and the small “Exploited” sliver proves that a “patch everything” strategy is inefficient. This data validates our need for a targeted AI model to pinpoint the few critical threats hiding within the noise. Note: The y-axis is log-scaled so that viewer can get a better sense of the values.

Display code
# 3. VISUALIZING THE RISK IMBALANCE

# It is critical to understand that most vulnerabilities are NOT exploited.
# This visualization proves the need for our ML model.

eda_plot <- ggplot(cve_features, aes(x = is_exploited, fill = is_exploited)) +
  geom_bar(alpha = 0.8) +
  scale_fill_manual(values = c("#2c3e50", "#e74c3c")) + # Professional color palette
  geom_text(
    stat = "count", 
    aes(label = scales::comma(after_stat(count))), 
    vjust = 2.0, 
    size = 3.5,
    color = "white", # Change font color to white
  ) +
  scale_y_continuous(trans='log10', labels = scales::comma) +
  labs(
    title = "Class Imbalance: Exploited vs. Non-Exploited Vulnerabilities",
    subtitle = "The vast majority (99.5%) of NVD vulnerabilities are never exploited in the wild.",
    x = "Is Exploited (per CISA KEV)",
    y = "Count of CVEs (log scale)",
    fill = "Exploited?"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

eda_plot

Machine Language Modeling

In Phase 3, we transition from preparation to predictive modeling. One of the challenges in cybersecurity risk analysis is measure the extreme rarity of actual exploitation. To overcome this, we implement the Synthetic Minority Over Sampling Technique (SMOTE), which mathematically balances the training data, forcing the algorithm to learn the nuanced characteristics of true threats. We then train a Random Forest classifier a robust, ensemble learning method capable of detecting complex, non linear patterns across our features. This training process ensures the model is not just memorizing data, but learning to generalize risk.

Display code
## 1. Data Splitting - Train / Test

set.seed(42)  # Set a seed for reproducibility

# Split the 'cve_features' dataframe from Phase 2 (80% train, 20% test)
# Stratify by our target variable to maintain the same ratio of exploited CVEs
cve_split <- initial_split(cve_features, prop = 0.80, strata = is_exploited)

cve_train <- training(cve_split)
cve_test  <- testing(cve_split)

# Create 10-fold cross-validation folds for model evaluation
cve_folds <- vfold_cv(cve_train, v = 10, strata = is_exploited)

print(paste("Training set:", nrow(cve_train), "CVEs. Testing set:", nrow(cve_test), "CVEs."))
#> [1] "Training set: 5720 CVEs. Testing set: 1430 CVEs."
Display code
# 2. MODEL SPECIFICATION (Random Forest)

rf_spec <- rand_forest(trees = 500) |> 
  set_engine("ranger", importance = "impurity") |> 
  set_mode("classification")

# 3. RECIPE DEFINITION (With Date Removal)

cve_recipe <- recipe(is_exploited ~ ., data = cve_train) |> 
  
  # Remove the Date column since we already have 'days_since_pub'
  step_rm(published_date) |>
  
  # Treat NAs in base_severity as a new category called "unknown"
  step_unknown(base_severity) |> 
  
  # Handle any new factor levels that might appear in future data
  step_novel(all_nominal_predictors()) |> 
  
  # One-Hot Encoding: Convert categorical factors into dummy variables
  step_dummy(all_nominal_predictors()) |> 
  
  # Remove any variables that have zero variance (no predictive value)
  step_zv(all_predictors()) |> 
  
  # Normalize numeric features (epss_score, base_score, days_since_pub)
  step_normalize(all_numeric_predictors()) |> 
  
  # Address class imbalance by oversampling the exploited class
  step_smote(is_exploited)


# 4.  WORKFLOW & TRAINING

# Combine model and recipe into a single workflow
cve_workflow <- workflow() |> 
  add_model(rf_spec) |> 
  add_recipe(cve_recipe)

# Train and evaluate the model using K-fold cross-validation
rf_resamples <- fit_resamples(
  cve_workflow,
  resamples = cve_folds,
  control = control_resamples(save_pred = TRUE)
)
NoteThe Summary Metrics Table

The table below serves as the model’s internal “report card” generated during the training phase. Rather than relying on a single test, these results represent the average performance across ten separate simulations to ensure reliability. Key metrics include Accuracy, which measures the percentage of correct predictions, and ROC_AUC, which scores the model’s ability to clearly distinguish between harmless and dangerous vulnerabilities. High values here confirm the model is consistent, robust, and ready for real-world application.

Display code
# Show the performance metrics
collect_metrics(rf_resamples)
.metric .estimator mean n std_err .config
accuracy binary 0.9949301 10 0.0008819 pre0_mod0_post0
brier_class binary 0.0046705 10 0.0005472 pre0_mod0_post0
roc_auc binary 0.9909070 10 0.0027999 pre0_mod0_post0

Evaluation & Business Insights

In this final phase, we transition from theoretical training to real-world validation. We test the model against a “holdout” dataset—vulnerabilities the system has never seen before—to prove its reliability in a live environment. Beyond abstract accuracy scores, we visualize the critical trade-off between “false alarms” (which waste resources) and “missed threats” (which introduce risk). This evaluation confirms the model is not only statistically robust but also operationally transparent and ready for deployment.

NoteThe Final Metrics Table

The table below represents the model’s “final exam” results, tested against a holdout dataset of vulnerabilities it had never encountered during training. The two critical metrics here are Accuracy and ROC_AUC. Accuracy measures the raw percentage of correct predictions, while ROC_AUC serves as a reliability score, indicating how well the model separates legitimate threats from false alarms. The high percent values for both confirm that the model hasn’t just memorized historical data but has learned to accurately forecast risk in a live, dynamic environment.

Display code
# 1. FINAL FIT (Evaluate on the Test Set)

# print("Fitting final model and predicting on the holdout test set...")

# 'last_fit' fits on the training data and evaluates on the test data defined in 'cve_split'
final_fit <- last_fit(cve_workflow, split = cve_split)

# View standard performance metrics (Accuracy, ROC_AUC) on the test data
final_metrics <- collect_metrics(final_fit)

tbl_data <- final_metrics |>
  rename(Metric = .metric,
         Estimator = .estimator,
         estimate = .estimate,
         Configuration = .config) 
  tbl_data$Estimate = paste0(round(tbl_data$estimate, 4) * 100, "%") 
  
  tbl_data <- tbl_data |>
    select(Metric, Estimator, Estimate, Configuration)
  

kable(tbl_data, 
      caption = "Final Metrics",
      format = "html") |>
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive", font_size = 7, full_width = F))
Final Metrics
Metric Estimator Estimate Configuration
accuracy binary 99.44% pre0_mod0_post0
roc_auc binary 99.32% pre0_mod0_post0
brier_class binary 0.44% pre0_mod0_post0
NoteThe Confusion Matrix Summary

The heatmap below visualizes the operational reality of deploying the model. It compares our predictions against actual outcomes to reveal the cost of errors. The critical area for risk management is the False Negatives (missed threats), which represent exploited vulnerabilities that slipped through the cracks. Conversely, False Positives represent “false alarms” that waste remediation resources. This view helps leadership decide if the model is calibrated correctly to balance safety against efficiency.

Display code
# 2.  THE CONFUSION MATRIX

# Extract the predictions
test_predictions <- collect_predictions(final_fit)

# Generate and plot the Confusion Matrix
conf_matrix_plot <- test_predictions |> 
  conf_mat(truth = is_exploited, estimate = .pred_class) |> 
  autoplot(type = "heatmap") +
  labs(
    title = "Confusion Matrix: Test Set Results",
    subtitle = "Assessing False Positives (Wasted Effort) vs. False Negatives (Missed Threats)"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

# Render Confusion Matrix
print(conf_matrix_plot)

NoteThe Receiver Operating Characteristic (ROC) Curve

The Receiver Operating Characteristic (ROC) curve illustrates the model’s overall predictive power across different thresholds. It visualizes the trade-off between “catching true threats” (Sensitivity) and “avoiding false alarms” (Specificity). A curve that hugs the top-left corner indicates a superior model that successfully separates dangerous vulnerabilities from harmless ones. The Area Under the Curve (AUC) serves as a single quality score—the closer to 1.0, the more reliable our strategic risk predictions are.

Display code
# 3. RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE

# Visualizes the trade-off between sensitivity and specificity
roc_plot <- test_predictions |> 
  roc_curve(is_exploited, .pred_TRUE) |> 
  autoplot() +
  labs(
    title = "ROC Curve: Predictive Performance",
    subtitle = "A curve closer to the top-left indicates superior classification capability."
  ) +
  theme_minimal()

# Render ROC Curve
print(roc_plot)

NoteUnderstanding the Drivers of Exploitation Risk

The chart below offers a transparent look inside the model’s decision-making process. It ranks the top specific features—such as EPSS scores, base severity ratings, or attack vectors—that the algorithm found most valuable when predicting exploitation. What’s most interesting is that the EPSS score and Percentile are, by far, the leading drivers of exploitation risk, whereas the actual risk score / severity grades (Critical, High, Medium & Low) don’t seem to have much effect on predicting exploitation risk within the current model.

For leadership, this visualization is critical because it moves beyond a simple “risk score” to explain why a vulnerability is flagged. By identifying these primary risk drivers, security teams can understand the root causes of threats and tailor their defense strategies to focus on the specific characteristics that matter most in the wild.

Display code
# 4. VARIABLE IMPORTANCE (VIP)

#| label: phase-4-vip

# Extract the fitted model from the workflow
final_tree <- extract_fit_parsnip(final_fit)

# 5. Extract the raw importance scores into a dataframe
importance_scores <- vi(final_tree) |>
  slice_max(Importance, n = 10) |>
  mutate(Variable01 = c("EPSS Score",
                        "Percentile",
                        "Severity: MEDIUM",
                        "Base Score",
                        "User Interaction: Required",
                        "Severity: CRITICAL",
                        "Days Since Published",
                        "Privileges Required: NONE",
                        "Severity: HIGH",
                        "Severity: LOW")) |>
  select(Variable01, Importance) |>
  rename(Variable = Variable01)

# 6. Plot top 10 features using native ggplot
ggplot(importance_scores |> slice_max(Importance, n = 10), 
       aes(x = Importance, y = reorder(Variable, Importance))) +
  
  # Create the bars with the professional dark blue color
  geom_col(fill = "#2c3e50", alpha = 0.9) +
  
  # Add white text labels inside the bars
  geom_text(
    aes(label = round(Importance, 2)), 
    hjust = - 0.25,       # Move text inside the right edge of the bar
    color = "black",   # White text
    size = 3
  ) +
  
  # Formatting and Labels
  labs(
    title = "Drivers of Exploitation Risk",
    subtitle = "Top 10 features utilized by the model to predict exploitation",
    y = "Vulnerability Feature",
    x = "Importance (Impurity Reduction)"
  ) +
  xlim(0, 2500) +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank() # Remove distracting horizontal grid lines
  )

Insights & Conclusion

The analysis presented here helps moves vulnerability management from a volume-based compliance task to a precision-based risk operation. By integrating historical exploitation data with predictive modeling, we have demonstrated that it is possible to identify the small percentage of vulnerabilities that pose a genuine threat to the organization. This approach does not merely add another tool to the security stack; it helps change how resources are allocated. Instead of diluting engineering efforts across thousands of theoretical risks, teams can focus their remediation cycles on the verified threats identified by this model.

While traditional scoring methods often overestimate risk, this model utilizes real-world signals—such as active exploitation and attack complexity—to refine those assessments. This distinction is critical for leadership. It means that “high severity” no longer automatically equals “high priority.” By adopting this data-driven framework, the organization can reduce alert fatigue and operational friction. Ultimately, this ensures that limited security budgets and engineering hours are better directed toward the vulnerabilities that have the most potential to disrupt business operations, providing a clearer return on investment for the cybersecurity program.

R Session Information

#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.2 (2025-10-31)
#>  os       macOS Tahoe 26.2
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2026-04-30
#>  pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>  quarto   1.8.26 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version    date (UTC) lib source
#>  archive        1.1.12     2025-03-20 [1] CRAN (R 4.5.0)
#>  backports      1.5.0      2024-05-23 [1] CRAN (R 4.5.0)
#>  base64enc      0.1-3      2015-07-28 [1] CRAN (R 4.5.0)
#>  bit            4.6.0      2025-03-06 [1] CRAN (R 4.5.0)
#>  bit64          4.6.0-1    2025-01-16 [1] CRAN (R 4.5.0)
#>  broom        * 1.0.10     2025-09-13 [1] CRAN (R 4.5.0)
#>  class          7.3-23     2025-01-01 [1] CRAN (R 4.5.2)
#>  cli            3.6.5      2025-04-23 [1] CRAN (R 4.5.0)
#>  codetools      0.2-20     2024-03-31 [1] CRAN (R 4.5.2)
#>  crayon         1.5.3      2024-06-20 [1] CRAN (R 4.5.0)
#>  data.table     1.17.8     2025-07-10 [1] CRAN (R 4.5.0)
#>  dials        * 1.4.2      2025-09-04 [1] CRAN (R 4.5.0)
#>  DiceDesign     1.10       2023-12-07 [1] CRAN (R 4.5.0)
#>  digest         0.6.39     2025-11-19 [1] CRAN (R 4.5.2)
#>  dplyr        * 1.1.4      2023-11-17 [1] CRAN (R 4.5.0)
#>  evaluate       1.0.5      2025-08-27 [1] CRAN (R 4.5.0)
#>  farver         2.1.2      2024-05-13 [1] CRAN (R 4.5.0)
#>  fastmap        1.2.0      2024-05-15 [1] CRAN (R 4.5.0)
#>  forcats      * 1.0.1      2025-09-25 [1] CRAN (R 4.5.0)
#>  foreach        1.5.2      2022-02-02 [1] CRAN (R 4.5.0)
#>  fs             1.6.6      2025-04-12 [1] CRAN (R 4.5.0)
#>  furrr          0.3.1      2022-08-15 [1] CRAN (R 4.5.0)
#>  future         1.68.0     2025-11-17 [1] CRAN (R 4.5.2)
#>  future.apply   1.20.0     2025-06-06 [1] CRAN (R 4.5.0)
#>  generics       0.1.4      2025-05-09 [1] CRAN (R 4.5.0)
#>  ggplot2      * 4.0.2      2026-02-03 [1] CRAN (R 4.5.2)
#>  globals        0.18.0     2025-05-08 [1] CRAN (R 4.5.0)
#>  glue           1.8.0      2024-09-30 [1] CRAN (R 4.5.0)
#>  gower          1.0.2      2024-12-17 [1] CRAN (R 4.5.0)
#>  GPfit          1.0-9      2025-04-12 [1] CRAN (R 4.5.0)
#>  gt           * 1.3.0      2026-01-22 [1] CRAN (R 4.5.2)
#>  gtable         0.3.6      2024-10-25 [1] CRAN (R 4.5.0)
#>  hardhat        1.4.2      2025-08-20 [1] CRAN (R 4.5.0)
#>  hms            1.1.4      2025-10-17 [1] CRAN (R 4.5.0)
#>  htmltools      0.5.8.1    2024-04-04 [1] CRAN (R 4.5.0)
#>  htmlwidgets    1.6.4      2023-12-06 [1] CRAN (R 4.5.0)
#>  httr           1.4.7      2023-08-15 [1] CRAN (R 4.5.0)
#>  httr2        * 1.2.2      2025-12-08 [1] CRAN (R 4.5.2)
#>  infer        * 1.0.9      2025-06-26 [1] CRAN (R 4.5.0)
#>  ipred          0.9-15     2024-07-18 [1] CRAN (R 4.5.0)
#>  iterators      1.0.14     2022-02-05 [1] CRAN (R 4.5.0)
#>  janitor      * 2.2.1      2024-12-22 [1] CRAN (R 4.5.0)
#>  jsonlite     * 2.0.0      2025-03-27 [1] CRAN (R 4.5.0)
#>  kableExtra   * 1.4.0      2024-01-24 [1] CRAN (R 4.5.0)
#>  knitr          1.50       2025-03-16 [1] CRAN (R 4.5.0)
#>  labeling       0.4.3      2023-08-29 [1] CRAN (R 4.5.0)
#>  lattice        0.22-7     2025-04-02 [1] CRAN (R 4.5.2)
#>  lava           1.8.2      2025-10-30 [1] CRAN (R 4.5.0)
#>  lazyeval       0.2.2      2019-03-15 [1] CRAN (R 4.5.0)
#>  lhs            1.2.0      2024-06-30 [1] CRAN (R 4.5.0)
#>  lifecycle      1.0.5      2026-01-08 [1] CRAN (R 4.5.2)
#>  listenv        0.10.0     2025-11-02 [1] CRAN (R 4.5.0)
#>  lubridate    * 1.9.4      2024-12-08 [1] CRAN (R 4.5.0)
#>  magrittr       2.0.4      2025-09-12 [1] CRAN (R 4.5.0)
#>  MASS           7.3-65     2025-02-28 [1] CRAN (R 4.5.2)
#>  Matrix         1.7-4      2025-08-28 [1] CRAN (R 4.5.2)
#>  modeldata    * 1.5.1      2025-08-22 [1] CRAN (R 4.5.0)
#>  nnet           7.3-20     2025-01-01 [1] CRAN (R 4.5.2)
#>  otel           0.2.0      2025-08-29 [1] CRAN (R 4.5.0)
#>  parallelly     1.45.1     2025-07-24 [1] CRAN (R 4.5.0)
#>  parsnip      * 1.3.3      2025-08-31 [1] CRAN (R 4.5.0)
#>  pillar         1.11.1     2025-09-17 [1] CRAN (R 4.5.0)
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.5.0)
#>  plotly       * 4.11.0     2025-06-19 [1] CRAN (R 4.5.0)
#>  prodlim        2025.04.28 2025-04-28 [1] CRAN (R 4.5.0)
#>  purrr        * 1.2.0      2025-11-04 [1] CRAN (R 4.5.0)
#>  R6             2.6.1      2025-02-15 [1] CRAN (R 4.5.0)
#>  ranger         0.17.0     2024-11-08 [1] CRAN (R 4.5.0)
#>  RANN           2.6.2      2024-08-25 [1] CRAN (R 4.5.0)
#>  rappdirs       0.3.3      2021-01-31 [1] CRAN (R 4.5.0)
#>  RColorBrewer   1.1-3      2022-04-03 [1] CRAN (R 4.5.0)
#>  Rcpp           1.1.0      2025-07-02 [1] CRAN (R 4.5.0)
#>  readr        * 2.1.5      2024-01-10 [1] CRAN (R 4.5.0)
#>  recipes      * 1.3.1      2025-05-21 [1] CRAN (R 4.5.0)
#>  repr           1.1.7      2024-03-22 [1] CRAN (R 4.5.0)
#>  rlang          1.1.7      2026-01-09 [1] CRAN (R 4.5.2)
#>  rmarkdown      2.30       2025-09-28 [1] CRAN (R 4.5.0)
#>  ROSE           0.0-4      2021-06-14 [1] CRAN (R 4.5.0)
#>  rpart          4.1.24     2025-01-07 [1] CRAN (R 4.5.2)
#>  rsample      * 1.3.1      2025-07-29 [1] CRAN (R 4.5.0)
#>  rstudioapi     0.17.1     2024-10-22 [1] CRAN (R 4.5.0)
#>  S7             0.2.1      2025-11-14 [1] CRAN (R 4.5.2)
#>  scales       * 1.4.0      2025-04-24 [1] CRAN (R 4.5.0)
#>  sessioninfo    1.2.3      2025-02-05 [1] CRAN (R 4.5.0)
#>  skimr        * 2.2.2      2026-01-10 [1] CRAN (R 4.5.2)
#>  snakecase      0.11.1     2023-08-27 [1] CRAN (R 4.5.0)
#>  sparsevctrs    0.3.4      2025-05-25 [1] CRAN (R 4.5.0)
#>  stringi        1.8.7      2025-03-27 [1] CRAN (R 4.5.0)
#>  stringr      * 1.6.0      2025-11-04 [1] CRAN (R 4.5.0)
#>  survival       3.8-6      2026-01-16 [1] CRAN (R 4.5.2)
#>  svglite        2.2.2      2025-10-21 [1] CRAN (R 4.5.0)
#>  systemfonts    1.3.1      2025-10-01 [1] CRAN (R 4.5.0)
#>  tailor       * 0.1.0      2025-08-25 [1] CRAN (R 4.5.0)
#>  textshaping    1.0.4      2025-10-10 [1] CRAN (R 4.5.0)
#>  themis       * 1.0.3      2025-01-23 [1] CRAN (R 4.5.0)
#>  tibble       * 3.3.0      2025-06-08 [1] CRAN (R 4.5.0)
#>  tidymodels   * 1.4.1      2025-09-08 [1] CRAN (R 4.5.0)
#>  tidyr        * 1.3.1      2024-01-24 [1] CRAN (R 4.5.0)
#>  tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.5.0)
#>  tidyverse    * 2.0.0      2023-02-22 [1] CRAN (R 4.5.0)
#>  timechange     0.3.0      2024-01-18 [1] CRAN (R 4.5.0)
#>  timeDate       4051.111   2025-10-17 [1] CRAN (R 4.5.0)
#>  tune         * 2.0.1      2025-10-17 [1] CRAN (R 4.5.0)
#>  tzdb           0.5.0      2025-03-15 [1] CRAN (R 4.5.0)
#>  vctrs          0.7.1      2026-01-23 [1] CRAN (R 4.5.2)
#>  vip          * 0.4.5      2025-12-12 [1] CRAN (R 4.5.2)
#>  viridisLite    0.4.3      2026-02-04 [1] CRAN (R 4.5.2)
#>  vroom          1.6.6      2025-09-19 [1] CRAN (R 4.5.0)
#>  withr          3.0.2      2024-10-28 [1] CRAN (R 4.5.0)
#>  workflows    * 1.3.0      2025-08-27 [1] CRAN (R 4.5.0)
#>  workflowsets * 1.1.1      2025-05-27 [1] CRAN (R 4.5.0)
#>  xfun           0.54       2025-10-30 [1] CRAN (R 4.5.0)
#>  xml2           1.4.1      2025-10-27 [1] CRAN (R 4.5.0)
#>  yaml           2.3.10     2024-07-26 [1] CRAN (R 4.5.0)
#>  yardstick    * 1.3.2      2025-01-22 [1] CRAN (R 4.5.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Rendered with Quarto | Packages: gt httr2 janitor jsonlite kableExtra lubridate plotly skimr themis tidymodels tidyverse