A local SQLite database of data breaches spanning 1984 to 2026, combining manually curated historical incidents with live disclosures pulled from the SEC EDGAR database.
File: db/breaches.db (SQLite)
Total records: 758 — 72 curated historical + 19 from SEC EDGAR + 667 from databreach.com
Tickers assigned: 100 publicly-listed companies
Date range: 1984 – June 2026
The committed db/breaches.db file is the frozen reproducibility snapshot used for this report.
| Column | Type | Description |
|---|---|---|
company |
TEXT | Affected company name |
date_discovered |
TEXT | ISO date or partial (YYYY / YYYY-MM / YYYY-MM-DD) |
date_reported |
TEXT | When publicly disclosed |
attack_type |
TEXT | e.g. Ransomware, SQL injection, Social engineering |
attack_vector |
TEXT | Entry point, e.g. VPN credentials, third-party vendor |
records_affected |
TEXT | Number and type of records |
financial_loss |
TEXT | Free-form: "$X million", "£X million" |
financial_loss_usd |
REAL | Numeric USD equivalent for sorting (NULL if unknown) |
description |
TEXT | Incident summary |
source_url |
TEXT | URL of the primary source |
source_name |
TEXT | Human-readable source label |
data_types |
TEXT | Comma-separated: SSN, email, credit card, etc. |
sector |
TEXT | Industry sector |
threat_actor |
TEXT | Known attacker / group |
is_sec_filing |
INTEGER | 1 if sourced from SEC EDGAR 8-K Item 1.05 |
sec_accession |
TEXT | SEC accession number |
sec_cik |
TEXT | SEC Central Index Key |
sec_filing_url |
TEXT | Direct link to the SEC filing |
ticker |
TEXT | Stock exchange ticker (e.g. EFX, TGT, JPM) |
synth_catt |
REAL | Cumulative ATT from SCM analysis (log return units) |
synth_p_value |
REAL | Permutation test p-value for SCM effect |
synth_market_cap_loss_usd |
REAL | Estimated market cap loss in USD from SCM |
| File | Purpose |
|---|---|
db/create_db.py |
Rebuilds the database from scratch (historical data embedded + SEC scan) |
db/add_tickers.py |
Migration: adds ticker + SCM result columns to an existing breaches.db |
db/query_examples.py |
Sample queries: top losses, SEC filings, by sector, by attack type |
analysis/synth_control.py |
Synthetic Control Method pipeline — causal market impact estimation |
analysis/car_check.py |
Short-window CAPM event study (CAR robustness check) |
analysis/donor_pool.py |
Dynamic donor pool builder via SEC EDGAR SIC codes |
analysis/event_study.py |
Event-time SCM gap plots for individual and pooled trajectories |
analysis/robustness.py |
Multi-window robustness checks at 6, 12, and 24 months |
analysis/prefetch.py |
Optional market-price cache regeneration helper |
requirements.txt |
Python dependencies for the analysis pipeline |
Manually curated from open public sources:
- Digital Guardian – History of Data Breaches
- UpGuard – Biggest Data Breaches in US History
- Keepnet – Top 15 Data Breaches of 2025 and Their Financial Impacts
- Huntress – 27 Biggest Data Breaches in History
- TrollEye Security – Top 20 Worst Data Breaches Since 2000
- CM-Alliance – Biggest Cyber Attacks of 2025
- CM-Alliance – Biggest Cyber Attacks of March 2026
- Bright Defense – Recent Data Breaches 2026
- Tech.co – Data Breaches 2026 Update
- Have I Been Pwned
- BreachSense – Marriott Data Breach Case Study
- Wikipedia – 2017 Equifax data breach
- Wikipedia – Colonial Pipeline ransomware attack
- Wikipedia – LastPass 2022 data breach
- Wikipedia – Aura data breach
- Wikipedia – 2026 Canvas data breach
- Monroe University – Cybersecurity History
- FTC – Log4j Security Vulnerability Warning
The SEC's cybersecurity disclosure rule (effective December 15, 2023) requires public companies to report material cybersecurity incidents on Form 8-K, Item 1.05 within four business days.
Scanning method: The EDGAR full-text search API (efts.sec.gov) does not allow programmatic access from all IPs. Instead, all 10,433 publicly listed companies were scanned via the data.sec.gov/submissions/CIK{10-digit}.json endpoint. The filings.recent.items field was matched against the regex pattern 1\.05 to identify qualifying 8-K disclosures.
Source: SEC EDGAR – data.sec.gov
| Date of Incident | Company | Notes |
|---|---|---|
| 2023-12-15 | VF Corp | Unauthorized access detected Dec 13, 2023 |
| 2023-12-20 | First American Financial Corp | Systems isolated from internet |
| 2024-01-17 | Microsoft Corp | Unauthorized access to personal information |
| 2024-01-19 | Hewlett Packard Enterprise | Cloud email environment breach (started Dec 12, 2023) |
| 2024-02-06 | SouthState Bank | Incident response initiated |
| 2024-02-12 | Prudential Financial | Unauthorized system access |
| 2024-02-21 | UnitedHealth Group | Change Healthcare ransomware (ALPHV/BlackCat) |
| 2024-02-21 | Cencora, Inc. | Data exfiltrated from IT systems |
| 2024-04-29 | Dropbox | Unauthorized access to Dropbox Sign production environment |
| 2024-05-06 | AT&T Inc. | Cybersecurity incident |
| 2024-08-30 | Halliburton Co | Became aware Aug 21, 2024 |
| 2025-04-06 | Sensata Technologies | Cybersecurity incident |
| 2025-05-13 | Nucor Corp | Cybersecurity incident |
| 2025-05-14 | Coinbase Global | Bribed insider data theft |
| 2025-06-21 | United Natural Foods | Follow-up disclosure |
| 2025-10-15 | F5, Inc. | High-severity incident (discovered Aug 9, 2025) |
| 2025-12-15 | Coupang, Inc. | Korean subsidiary account takeover (Nov 18, 2025) |
| 2026-05-07 | West Pharmaceutical Services | Incident May 7, 2026 |
| 2026-06-10 | iRhythm Holdings | Unauthorized access June 8, 2026 |
| Date | Company | Attack | Records | Financial Loss |
|---|---|---|---|---|
| 1984 | TRW / Experian | Unauthorized intrusion | 90 million | Unknown |
| 1986 | Canada Revenue Agency | Network intrusion | 16 million | Unknown |
| 1988-11-02 | Internet (Morris Worm) | Self-replicating worm | ~6,000 systems | First major internet outage |
| Date | Company | Attack | Records | Financial Loss |
|---|---|---|---|---|
| 2005 | CardSystems Solutions | SQL injection | 40M card records | Company shut down |
| 2007-01-17 | TJX Companies | WEP Wi-Fi exploit | 94M card records | ~$256M |
| 2008 | Heartland Payment Systems | SQL injection / POS malware | 130M card records | ~$140M |
| Date | Company | Attack | Records | Financial Loss |
|---|---|---|---|---|
| 2011-04-20 | Sony PlayStation Network | Network intrusion | 77M accounts | ~$171M |
| 2013 | Yahoo | Nation-state intrusion | 3 billion accounts | $117.5M settlement |
| 2013-10 | Adobe Systems | Intrusion + source code theft | 153M records | ~$1.1M settlement |
| 2013-12 | Target | POS malware via HVAC vendor | 110M customers | ~$202M net |
| 2014 | Yahoo (second breach) | Forged authentication cookies | 500M accounts | (included above) |
| 2014-02 | eBay | Compromised employee credentials | 145M users | — |
| 2014-11-24 | Sony Pictures | Destructive wipe (North Korea) | 47K employee SSNs + unreleased films | ~$100M |
| 2016-10 | Uber | AWS credentials on GitHub | 57M riders/drivers | $148M settlement |
| 2017-07-29 | Equifax | Apache Struts CVE-2017-5638 | 147.9M Americans | ~$1.38B |
| 2018-11-19 | Marriott / Starwood | Nation-state (China, since 2014) | 500M guests | ~$72M GDPR fine |
| 2019-07-19 | Capital One | AWS WAF misconfiguration / SSRF | 106M customers | $190M settlement |
| Date | Company | Attack | Impact | Financial Loss |
|---|---|---|---|---|
| 2020-12-13 | SolarWinds (Orion) | Supply chain backdoor (Russia/SVR) | 18,000+ orgs incl. US govt | ~$40M direct |
| 2021-05-07 | Colonial Pipeline | DarkSide ransomware | 5-day US East Coast fuel outage | $4.4M ransom paid |
| 2021-05-30 | JBS Foods | REvil ransomware | Global meat processing halted | $11M ransom paid |
| 2021-12-09 | Multiple (Log4Shell) | Log4j zero-day RCE (CVE-2021-44228) | Millions of enterprise systems | ~$100B+ remediation |
| 2022-08 | LastPass | Dev environment → vault exfiltration | Millions of encrypted vaults | $24.5M class action (2025) |
| 2023-01 | T-Mobile | API abuse | 37M accounts | $350M settlement |
| 2023-09 | Caesars Entertainment | Social engineering (Scattered Spider) | Loyalty program DB | ~$15M ransom paid |
| 2023-09-10 | MGM Resorts | Vishing + ransomware (Scattered Spider) | 10-day casino/hotel outage | ~$100M |
| 2024-02-21 | Change Healthcare (UnitedHealth) | ALPHV/BlackCat ransomware, no-MFA Citrix | 190M patient records | ~$2.87B |
| Date | Company | Attack | Records | Financial Loss |
|---|---|---|---|---|
| 2025-01 | Insight Partners | Social engineering via cloud CRM | — | — |
| 2025-03 | Yale New Haven Health | Network server intrusion | 5.6M patients | — |
| 2025-04 | Marks & Spencer | DragonForce ransomware | Core IT crippled for weeks | ~£300M revenue impact |
| 2025-07 | Ingram Micro | SafePay ransomware | 3.5 TB exfiltrated | ~$136M/day during outage |
| 2025-08 | Bouygues Telecom | Network intrusion | 6.4M customers (incl. IBANs) | — |
| 2025-08 | Marquis Software Solutions | Ransomware | 74 banks & credit unions affected | — |
| 2025-10 | Substack | Account compromise | 663K accounts | — |
| 2025-12 | Raaga | Data exfiltration | 10M email addresses | — |
| Date | Company | Attack | Records | Financial Loss |
|---|---|---|---|---|
| 2026-01 | Navia (benefits admin) | System intrusion | 2.7M people | — |
| 2026-01 | Oncology Institute | Network intrusion | 1.8M patients | — |
| 2026-03-12 | Stryker | Cyberattack (Handala / Iran-linked) | Operations disrupted | — |
| 2026-03 | Aura | Voice phishing → account takeover | 900K records | — |
| 2026-04 | Canvas / Instructure | LMS platform attack | Millions of students | — |
| 2026-04 | Vercel | Over-permissioned third-party AI tool | Workspace compromise | — |
- Largest by records: Yahoo 2013 — 3 billion accounts
- Largest by financial loss: Change Healthcare 2024 — ~$2.87 billion
- Largest settlement: Equifax 2017 — ~$1.38 billion total
- Largest ransom paid: JBS Foods 2021 — $11 million (Colonial: $4.4M, partial recovery)
- Widest impact: Log4Shell 2021 — millions of systems globally, ~$100B+ remediation
- Longest undetected: Marriott/Starwood — 4 years (2014–2018)
- Earliest recorded: TRW/Experian — 1984
A Synthetic Control Method (SCM) pipeline measures the causal effect of a data breach on a company's stock market value.
| Method | Counterfactual | Best for | Inference |
|---|---|---|---|
| Event Study (CAR) | CAPM factor model | Short windows (days) | t-test on abnormal returns |
| Synthetic Control | Weighted average of peer companies | Medium/long windows (months–years) | Permutation tests (non-parametric) |
SCM is preferred here because breaches have prolonged effects (litigation, regulatory fines, reputational damage) that compound over 1–2 years. SCM constructs a synthetic twin from sector peers, then measures how the breached company diverged post-incident.
pip install -r requirements.txt
python db/add_tickers.py # adds ticker column to breaches.db# Single company — Equifax breach (canonical validation case)
python analysis/synth_control.py --ticker EFX
# All publicly-listed companies in the database
python analysis/synth_control.py --all
# Short-window CAR robustness check (±30 days vs S&P 500)
python analysis/car_check.py --ticker EFX
python analysis/car_check.py # all companies| File | Description |
|---|---|
{TICKER}_comparison.png |
4-panel comparison: SCM actual vs synthetic, CAR event window, permutation test, effect-size bar chart (SCM causal vs CAR naive) |
{TICKER}_synth_plot.png |
SCM: actual vs synthetic cumulative return (full window) |
{TICKER}_placebo_plot.png |
SCM permutation: treated gap vs all donor-company placebo gaps |
results_summary.csv |
Merged table: SCM CATT + p-value + RMSPE ratio AND CAR + t-stat + beta side-by-side |
car_results.csv |
Standalone CAR results (from car_check.py) |
Donor pool (curated per sector): Sector peers selected from analysis/synth_control.py::CURATED_DONORS. For dynamic SIC-based construction see analysis/donor_pool.py.
Comparison plot ({TICKER}_comparison.png): Each run automatically produces a 4-panel figure:
- Top-left — SCM: actual vs synthetic cumulative log return (full 3+2 year window)
- Top-right — CAR: daily abnormal returns and cumulative CAR over the ±35-day event window
- Bottom-left — Permutation test: treated company's gap vs all donor-company placebo gaps
- Bottom-right — Side-by-side bar chart of SCM effect (causal, long-term) vs CAR effect (classical, short-term) with significance stars
Outcome variable: Cumulative log returns indexed to 0 at breach disclosure date (stationary; additive; standard in event studies).
Study window: 3 years pre-treatment (756 trading days) + 2 years post-treatment (504 trading days).
Pre-treatment fit: Root Mean Squared Prediction Error (RMSPE). Models with pre-RMSPE > 0.01 should be interpreted cautiously.
Inference: Permutation test — SCM is applied to each donor company as a placebo-treated unit. The p-value is the fraction of placebo gaps ≥ the treated company's gap (|CATT|).
Market cap loss estimate: (exp(CATT) − 1) × pre-breach market cap — approximates the dollar value of the counterfactual gap.
Donor contamination exclusion: Companies with their own breach during the study window (as recorded in breaches.db) are excluded from the donor pool.
- Requires ≥ 30 donor companies with pre-treatment data for reliable permutation tests
- Low pre-RMSPE (good synthetic fit) is a prerequisite for valid causal inference
- Non-US / delisted tickers may lack Yahoo Finance history
- Yahoo Finance applies IP-based rate limiting — the scripts include exponential backoff (4 retries); if rate-limited, wait 5–10 minutes before re-running
| File | Purpose |
|---|---|
db/add_tickers.py |
Migration: adds ticker, synth_catt, synth_p_value, synth_market_cap_loss_usd columns |
analysis/synth_control.py |
Main SCM pipeline — curated donor pools, constrained weight optimisation, permutation tests, plots |
analysis/car_check.py |
Short-window CAPM event study — OLS beta estimation, CAR computation, t-test |
analysis/donor_pool.py |
Dynamic donor pool builder — SIC lookup via SEC EDGAR + market-cap filter |
requirements.txt |
Python dependencies |