Skip to content

crowdvector/cyber-report-published

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cyber-report

A local SQLite database of data breaches spanning 1984 to 2026, combining manually curated historical incidents with live disclosures pulled from the SEC EDGAR database.


Database

File: db/breaches.db (SQLite)
Total records: 758 — 72 curated historical + 19 from SEC EDGAR + 667 from databreach.com
Tickers assigned: 100 publicly-listed companies
Date range: 1984 – June 2026

The committed db/breaches.db file is the frozen reproducibility snapshot used for this report.

Schema (breaches table)

Column Type Description
company TEXT Affected company name
date_discovered TEXT ISO date or partial (YYYY / YYYY-MM / YYYY-MM-DD)
date_reported TEXT When publicly disclosed
attack_type TEXT e.g. Ransomware, SQL injection, Social engineering
attack_vector TEXT Entry point, e.g. VPN credentials, third-party vendor
records_affected TEXT Number and type of records
financial_loss TEXT Free-form: "$X million", "£X million"
financial_loss_usd REAL Numeric USD equivalent for sorting (NULL if unknown)
description TEXT Incident summary
source_url TEXT URL of the primary source
source_name TEXT Human-readable source label
data_types TEXT Comma-separated: SSN, email, credit card, etc.
sector TEXT Industry sector
threat_actor TEXT Known attacker / group
is_sec_filing INTEGER 1 if sourced from SEC EDGAR 8-K Item 1.05
sec_accession TEXT SEC accession number
sec_cik TEXT SEC Central Index Key
sec_filing_url TEXT Direct link to the SEC filing
ticker TEXT Stock exchange ticker (e.g. EFX, TGT, JPM)
synth_catt REAL Cumulative ATT from SCM analysis (log return units)
synth_p_value REAL Permutation test p-value for SCM effect
synth_market_cap_loss_usd REAL Estimated market cap loss in USD from SCM

Scripts

File Purpose
db/create_db.py Rebuilds the database from scratch (historical data embedded + SEC scan)
db/add_tickers.py Migration: adds ticker + SCM result columns to an existing breaches.db
db/query_examples.py Sample queries: top losses, SEC filings, by sector, by attack type
analysis/synth_control.py Synthetic Control Method pipeline — causal market impact estimation
analysis/car_check.py Short-window CAPM event study (CAR robustness check)
analysis/donor_pool.py Dynamic donor pool builder via SEC EDGAR SIC codes
analysis/event_study.py Event-time SCM gap plots for individual and pooled trajectories
analysis/robustness.py Multi-window robustness checks at 6, 12, and 24 months
analysis/prefetch.py Optional market-price cache regeneration helper
requirements.txt Python dependencies for the analysis pipeline

Data Sources

Curated historical breaches (72 records, 1984–2026)

Manually curated from open public sources:

SEC EDGAR — 8-K Item 1.05 filings (19 records, Dec 2023–Jun 2026)

The SEC's cybersecurity disclosure rule (effective December 15, 2023) requires public companies to report material cybersecurity incidents on Form 8-K, Item 1.05 within four business days.

Scanning method: The EDGAR full-text search API (efts.sec.gov) does not allow programmatic access from all IPs. Instead, all 10,433 publicly listed companies were scanned via the data.sec.gov/submissions/CIK{10-digit}.json endpoint. The filings.recent.items field was matched against the regex pattern 1\.05 to identify qualifying 8-K disclosures.

Source: SEC EDGAR – data.sec.gov

SEC Disclosures Found

Date of Incident Company Notes
2023-12-15 VF Corp Unauthorized access detected Dec 13, 2023
2023-12-20 First American Financial Corp Systems isolated from internet
2024-01-17 Microsoft Corp Unauthorized access to personal information
2024-01-19 Hewlett Packard Enterprise Cloud email environment breach (started Dec 12, 2023)
2024-02-06 SouthState Bank Incident response initiated
2024-02-12 Prudential Financial Unauthorized system access
2024-02-21 UnitedHealth Group Change Healthcare ransomware (ALPHV/BlackCat)
2024-02-21 Cencora, Inc. Data exfiltrated from IT systems
2024-04-29 Dropbox Unauthorized access to Dropbox Sign production environment
2024-05-06 AT&T Inc. Cybersecurity incident
2024-08-30 Halliburton Co Became aware Aug 21, 2024
2025-04-06 Sensata Technologies Cybersecurity incident
2025-05-13 Nucor Corp Cybersecurity incident
2025-05-14 Coinbase Global Bribed insider data theft
2025-06-21 United Natural Foods Follow-up disclosure
2025-10-15 F5, Inc. High-severity incident (discovered Aug 9, 2025)
2025-12-15 Coupang, Inc. Korean subsidiary account takeover (Nov 18, 2025)
2026-05-07 West Pharmaceutical Services Incident May 7, 2026
2026-06-10 iRhythm Holdings Unauthorized access June 8, 2026

Historical Breach Timeline

1980s — Dawn of Cyber Incidents

Date Company Attack Records Financial Loss
1984 TRW / Experian Unauthorized intrusion 90 million Unknown
1986 Canada Revenue Agency Network intrusion 16 million Unknown
1988-11-02 Internet (Morris Worm) Self-replicating worm ~6,000 systems First major internet outage

2000s — Mass Credential Theft

Date Company Attack Records Financial Loss
2005 CardSystems Solutions SQL injection 40M card records Company shut down
2007-01-17 TJX Companies WEP Wi-Fi exploit 94M card records ~$256M
2008 Heartland Payment Systems SQL injection / POS malware 130M card records ~$140M

2010s — Nation-States and Mega Breaches

Date Company Attack Records Financial Loss
2011-04-20 Sony PlayStation Network Network intrusion 77M accounts ~$171M
2013 Yahoo Nation-state intrusion 3 billion accounts $117.5M settlement
2013-10 Adobe Systems Intrusion + source code theft 153M records ~$1.1M settlement
2013-12 Target POS malware via HVAC vendor 110M customers ~$202M net
2014 Yahoo (second breach) Forged authentication cookies 500M accounts (included above)
2014-02 eBay Compromised employee credentials 145M users
2014-11-24 Sony Pictures Destructive wipe (North Korea) 47K employee SSNs + unreleased films ~$100M
2016-10 Uber AWS credentials on GitHub 57M riders/drivers $148M settlement
2017-07-29 Equifax Apache Struts CVE-2017-5638 147.9M Americans ~$1.38B
2018-11-19 Marriott / Starwood Nation-state (China, since 2014) 500M guests ~$72M GDPR fine
2019-07-19 Capital One AWS WAF misconfiguration / SSRF 106M customers $190M settlement

2020s — Ransomware Era

Date Company Attack Impact Financial Loss
2020-12-13 SolarWinds (Orion) Supply chain backdoor (Russia/SVR) 18,000+ orgs incl. US govt ~$40M direct
2021-05-07 Colonial Pipeline DarkSide ransomware 5-day US East Coast fuel outage $4.4M ransom paid
2021-05-30 JBS Foods REvil ransomware Global meat processing halted $11M ransom paid
2021-12-09 Multiple (Log4Shell) Log4j zero-day RCE (CVE-2021-44228) Millions of enterprise systems ~$100B+ remediation
2022-08 LastPass Dev environment → vault exfiltration Millions of encrypted vaults $24.5M class action (2025)
2023-01 T-Mobile API abuse 37M accounts $350M settlement
2023-09 Caesars Entertainment Social engineering (Scattered Spider) Loyalty program DB ~$15M ransom paid
2023-09-10 MGM Resorts Vishing + ransomware (Scattered Spider) 10-day casino/hotel outage ~$100M
2024-02-21 Change Healthcare (UnitedHealth) ALPHV/BlackCat ransomware, no-MFA Citrix 190M patient records ~$2.87B

2025

Date Company Attack Records Financial Loss
2025-01 Insight Partners Social engineering via cloud CRM
2025-03 Yale New Haven Health Network server intrusion 5.6M patients
2025-04 Marks & Spencer DragonForce ransomware Core IT crippled for weeks ~£300M revenue impact
2025-07 Ingram Micro SafePay ransomware 3.5 TB exfiltrated ~$136M/day during outage
2025-08 Bouygues Telecom Network intrusion 6.4M customers (incl. IBANs)
2025-08 Marquis Software Solutions Ransomware 74 banks & credit unions affected
2025-10 Substack Account compromise 663K accounts
2025-12 Raaga Data exfiltration 10M email addresses

2026 (through June)

Date Company Attack Records Financial Loss
2026-01 Navia (benefits admin) System intrusion 2.7M people
2026-01 Oncology Institute Network intrusion 1.8M patients
2026-03-12 Stryker Cyberattack (Handala / Iran-linked) Operations disrupted
2026-03 Aura Voice phishing → account takeover 900K records
2026-04 Canvas / Instructure LMS platform attack Millions of students
2026-04 Vercel Over-permissioned third-party AI tool Workspace compromise

Quick Stats

  • Largest by records: Yahoo 2013 — 3 billion accounts
  • Largest by financial loss: Change Healthcare 2024 — ~$2.87 billion
  • Largest settlement: Equifax 2017 — ~$1.38 billion total
  • Largest ransom paid: JBS Foods 2021 — $11 million (Colonial: $4.4M, partial recovery)
  • Widest impact: Log4Shell 2021 — millions of systems globally, ~$100B+ remediation
  • Longest undetected: Marriott/Starwood — 4 years (2014–2018)
  • Earliest recorded: TRW/Experian — 1984

Market Impact Analysis

A Synthetic Control Method (SCM) pipeline measures the causal effect of a data breach on a company's stock market value.

Why Synthetic Control?

Method Counterfactual Best for Inference
Event Study (CAR) CAPM factor model Short windows (days) t-test on abnormal returns
Synthetic Control Weighted average of peer companies Medium/long windows (months–years) Permutation tests (non-parametric)

SCM is preferred here because breaches have prolonged effects (litigation, regulatory fines, reputational damage) that compound over 1–2 years. SCM constructs a synthetic twin from sector peers, then measures how the breached company diverged post-incident.

Setup

pip install -r requirements.txt
python db/add_tickers.py        # adds ticker column to breaches.db

Running the Analysis

# Single company — Equifax breach (canonical validation case)
python analysis/synth_control.py --ticker EFX

# All publicly-listed companies in the database
python analysis/synth_control.py --all

# Short-window CAR robustness check (±30 days vs S&P 500)
python analysis/car_check.py --ticker EFX
python analysis/car_check.py              # all companies

Output Files (analysis/results/)

File Description
{TICKER}_comparison.png 4-panel comparison: SCM actual vs synthetic, CAR event window, permutation test, effect-size bar chart (SCM causal vs CAR naive)
{TICKER}_synth_plot.png SCM: actual vs synthetic cumulative return (full window)
{TICKER}_placebo_plot.png SCM permutation: treated gap vs all donor-company placebo gaps
results_summary.csv Merged table: SCM CATT + p-value + RMSPE ratio AND CAR + t-stat + beta side-by-side
car_results.csv Standalone CAR results (from car_check.py)

Methodology Details

Donor pool (curated per sector): Sector peers selected from analysis/synth_control.py::CURATED_DONORS. For dynamic SIC-based construction see analysis/donor_pool.py.

Comparison plot ({TICKER}_comparison.png): Each run automatically produces a 4-panel figure:

  • Top-left — SCM: actual vs synthetic cumulative log return (full 3+2 year window)
  • Top-right — CAR: daily abnormal returns and cumulative CAR over the ±35-day event window
  • Bottom-left — Permutation test: treated company's gap vs all donor-company placebo gaps
  • Bottom-right — Side-by-side bar chart of SCM effect (causal, long-term) vs CAR effect (classical, short-term) with significance stars

Outcome variable: Cumulative log returns indexed to 0 at breach disclosure date (stationary; additive; standard in event studies).

Study window: 3 years pre-treatment (756 trading days) + 2 years post-treatment (504 trading days).

Pre-treatment fit: Root Mean Squared Prediction Error (RMSPE). Models with pre-RMSPE > 0.01 should be interpreted cautiously.

Inference: Permutation test — SCM is applied to each donor company as a placebo-treated unit. The p-value is the fraction of placebo gaps ≥ the treated company's gap (|CATT|).

Market cap loss estimate: (exp(CATT) − 1) × pre-breach market cap — approximates the dollar value of the counterfactual gap.

Donor contamination exclusion: Companies with their own breach during the study window (as recorded in breaches.db) are excluded from the donor pool.

SCM Validity Caveats

  • Requires ≥ 30 donor companies with pre-treatment data for reliable permutation tests
  • Low pre-RMSPE (good synthetic fit) is a prerequisite for valid causal inference
  • Non-US / delisted tickers may lack Yahoo Finance history
  • Yahoo Finance applies IP-based rate limiting — the scripts include exponential backoff (4 retries); if rate-limited, wait 5–10 minutes before re-running

Files

File Purpose
db/add_tickers.py Migration: adds ticker, synth_catt, synth_p_value, synth_market_cap_loss_usd columns
analysis/synth_control.py Main SCM pipeline — curated donor pools, constrained weight optimisation, permutation tests, plots
analysis/car_check.py Short-window CAPM event study — OLS beta estimation, CAR computation, t-test
analysis/donor_pool.py Dynamic donor pool builder — SIC lookup via SEC EDGAR + market-cap filter
requirements.txt Python dependencies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages