epstein-forensic-finance

Epstein Financial Forensics

Forensic financial reconstruction from 1.48 million DOJ EFTA documents + 503K cataloged media items

Visitors License: CC BY 4.0

Question’s related to the Project

project@isipp.com

What This Is

I took 1.48 million documents the DOJ released under the Epstein Files Transparency Act and built a forensic financial database from scratch. Wrote all the extraction code. Designed the schema. Built the classification pipeline. Ran the analysis. Solo project, start to finish. AI tools helped me write code faster — same way you’d use a calculator. The analytical calls are mine.

My background is multi-affiliate financial reconciliation, budget variance analysis, and automated exception reporting at institutional scale. I applied those same methods here.

Far as I can tell, nobody else has tried to reconstruct the complete financial infrastructure in the EFTA corpus using quantitative forensic methods. Plenty of good narrative work out there. Plenty of search engines. This is the first attempt to model the full network — fund flows, entity relationships, shell trust hierarchies — at scale.

For the girls.


📌 Start Here

2 data narratives reconstruct how $2.146 billion moved through 14 shell entities across 8+ banking institutions. Every claim is anchored to specific court exhibits and bates stamps.

Read the Data Narratives · The Verification Wall (N20) · Blueprint of a Financial Machine (N19) · Explore the Interactive Network · View the Forensic Workbook

# Narrative Key Finding
19 Blueprint of a Financial Machine $2.146B, 123 nodes, 313 edges. Every bank, shell, operator, and key person mapped. Visualization
20 The Verification Wall ** Every document has a Bates number — the wall tests what’s behind it. 8 noise POIs ($144.4M, $0 bank docs) vs. Leon Black ($310.5M, 42 verified wires, 15 bank docs). NLP phantom autopsy with clickable EFTA source documents.**

The Database

**8.64 GB 43 tables 26.7 million rows 19 datasets**
Metric This Project Largest Narrative Repo Largest Search Platform Others
Total files indexed 1,476,377 + 503K media 1,380,937 1,120,000 < 20,000
Datasets covered 19 (DS1-12 + DS98-104) 12 12 1-3
Extracted text records 2.87M (page-level) 993,406 pages
Entity extraction (NLP) 11.4M entities ~4,000 curated 1,589 manual < 500
Unique persons identified 734,125 1,536 registry 1,589
Financial transactions modeled 81,451 (tiered) + 23,832 (directional) ~186 normalized 0 0
Directional fund flows (A→B) 23,832 qualitative 0 0
Wire transfers in master ledger 481 (Phase 5I audited) 0 0 0
Relational database tables 43 3-4
Confidence-tiered scoring ✅ 5-axis
Redaction proximity analysis ✅ (different method)
SAR cross-validation 104.4%
Multi-phase dedup pipeline ✅ 3-stage evolution
Shell hierarchy mapping ✅ 4-tier

Note: The largest narrative repo counts individual pages as records — their unique PDF file count is ~519,548. My 1,476,377 are unique files, each with a distinct DOJ URL or registered serial. The 503,154 media items are separately cataloged from DS10 evidence photos and videos. Other projects in this space are doing solid work — narrative forensic reporting, searchable archives, community preservation. My lane is systematic financial reconstruction at scale.


Headline Results

⚠️ All findings are navigational tools derived from automated extraction. Not independently verified. Not established fact. See COMPLIANCE.md for full professional standards disclaimers.

Metric Value
Publication Ledger Total $2,146,000,000 (10,964 unique transactions)
FinCEN SAR Benchmark $1,878,000,000
T1–T3 Coverage of SAR 104.4% ($1,960,600,000)
Payment Types Classified 10
Wire Transfers in Master Ledger 481
Unique Entities (Entity-Resolved) 228
Bank Coverage 14 banks
Bates Number Coverage 51%
Shell-to-Shell Transfers Identified 43
Shell Trust Hierarchy Tiers Mapped 4
Contamination Bugs Caught & Fixed 9

Four-Tier GAGAS-Aligned Confidence Framework

Tier Classification Amount % of Total
T1 Epstein-Controlled Entities $1,610,000,000 75.0%
T2 Known Associates $343,000,000 16.0%
T3 Extended Network $7,600,000 0.4%
T4 Unclassified $185,000,000 8.6%
T1–T3 Auditable Subtotal $1,960,600,000 104.4% of SAR
Total Publication Ledger $2,146,000,000

Why T1–T3 Exceeds 100%

The SAR benchmark ($1.878B) only counts transactions banks flagged as suspicious. The EFTA corpus has the complete financial record — including legitimate stuff like Sotheby’s auction proceeds ($11.2M), Tudor Futures returns ($12.8M), Kellerhals law firm settlements ($23M), and Blockchain Capital VC investments ($10.5M). Total financial flows should exceed the suspicious subset. That’s just how it works: SAR ⊂ Total Financial Activity. T4 (Unclassified) gets excluded from the SAR comparison because those transactions don’t have enough entity resolution to classify.


The Money Circuit: 4-Tier Trust Hierarchy

Full annotated flow diagram: NETWORK.md

TIER 1 — HOLDING TRUSTS (received external deposits)
  Southern Trust Company Inc.        $151.5M in  ← Black, Rothschild, Narrow Holdings
  The 2017 Caterpillar Trust          $15.0M in  ← Blockchain Capital

TIER 2 — DISTRIBUTION TRUSTS (redistributed internally)
  The Haze Trust (DBAGNY)             $49.7M out → Southern Financial, Southern Trust
  The Haze Trust (Checking)           $21.8M in  ← Sotheby's, Christie's
  Southern Financial LLC              $14.0M in  ← Tudor Futures
  Southern Financial (Checking)       $32.0M in  ← Haze Trust

TIER 3 — OPERATING SHELLS (paid beneficiaries)
  Jeepers Inc. (DB Brokerage)         $51.9M out → Epstein personal account (21 wires)
  Plan D LLC                          $18.0M out → Leon Black (4 wires)
  Gratitude America MMDA               $6.3M out → Morgan Stanley, charities
  Richard Kahn (attorney)              $9.3M out → Paul Morris, others
  NES LLC                              $554K out → Ghislaine Maxwell

TIER 4 — PERSONAL ACCOUNTS (terminal destinations)
  Jeffrey Epstein NOW/SuperNow        $83.4M in  ← Jeepers, Kellerhals, law firms
  Darren Indyke (estate attorney)      $6.4M in  ← Deutsche Bank

All amounts are (Unverified) automated extractions. See FINDINGS.md for details.

Money Flow Direction Analysis

Direction Wires Amount (Unverified) Share
MONEY IN — External → Epstein entities 91 $232,538,043 41.7%
INTERNAL MOVE — Shell → Shell reshuffling 39 $112,610,112 20.2%
PASS-THROUGH — Attorney/trust administration 130 $72,433,003 13.0%
MONEY OUT — Epstein entities → External 51 $63,266,349 11.3%
BANK → SHELL — Custodian disbursements 27 $53,717,045 9.6%
Other (Shell→Bank, Interbank, External→Bank) 44 $23,504,429 4.2%

SAR Benchmark (Public Record)

Bank Reported SARs
JPMorgan Chase ~$1.1B (4,700+ transactions)
Deutsche Bank ~$400M
Bank of New York Mellon ~$378M
Total known SARs $1.878B

Sources: U.S. Senate Permanent Subcommittee on Investigations; NYDFS Consent Order (2020); JPMorgan USVI Settlement (2023)


Database Schema (43 Tables)

Full architecture diagram: SCHEMA.md

Not a search index. A relational forensic database. 8.64 GB, 43 tables, 26.7 million rows.

Financial Analysis (13 tables)

Entity Intelligence (3 tables)

Redaction Analysis (3 tables)

Corpus Infrastructure (4 tables)

External Cross-Reference — FAA Aviation (7 tables)

External Cross-Reference — ICIJ Offshore Leaks (6 tables)


Pipeline Architecture

Phase 1    DOJ EFTA Scraper + Community Gap-Fill → 1.48M files + 503K media registered
Phase 2    Download & Verify → local corpus with integrity checks
Phase 3    Extract, Classify & Enrich → text, doc types, dates
Phase 3B   Entity Extraction (spaCy NLP) → 11.4M entities, 734K persons
Phase 5A   Person-of-Interest Network → news-filtered, multi-source scoring
Phase 5B   Operational Cost Model → confidence-tiered financial extraction
Phase 5C   Entity-to-Entity Fund Flows → directional A→B with 5-axis scoring
Phase 5D   Payment-Travel-Victim Correlation → temporal pattern analysis
Phase 5E   Redaction Map → navigational tool for document analysis
Phases 14-24  Wire Transfer Extraction Pipeline → 382-wire ledger, $1.964B
Phase 5I   Entity Resolution & Bank Expansion → 481-wire ledger, $973M entity-resolved, 14-bank coverage
Phase 5J   Multi-Bank Statement Parser → 1,202 verified transactions from 13 banks
Phase 5K   Payment Type Expansion → CHIPS, SWIFT, checks, bank statements beyond wire transfers
Phase 5L   Publication Ledger Assembly → 10,964 unique transactions, $2.146B, four-tier GAGAS framework

Financial Extraction Pipeline (Phases 14–5L)

Phase What Happened Impact
14.5-15 Known entity fund flows + wire indicators +$105M
16.1-16.2 Transaction-line parser + round-wire extractor +$83M
17-18 Trust transfers + full category sweep +$17M
19 Self-dedup bug fix (table checking against itself) +$60M recovered
20-21 Verified wires + STRONG/MODERATE new amounts +$63M
22 Forensic scrub — chain-hop inflation removed -$311M removed
23 Date-aware census (same amount, different dates) +$189M recovered
24 Above-cap verified wires + bank custodian audit +$121M / -$113M
25 Date recovery from source context fields +75 dates (31.9%→51.6%), 0 collisions
5I Entity resolution: 481 wires, 228 entities, 14 banks, 51% Bates coverage $973M entity-resolved
5J Multi-bank statement parser: 1,202 transactions from 13 institutions +$430K verified statements
5K Payment type expansion: CHIPS, SWIFT, checks, bank statements 10 payment types classified
5L Publication ledger: 10,964 unique transactions, four-tier GAGAS, T1–T3 = 104.4% SAR $2.146B total

Full phase-by-phase details: METHODOLOGY.md


Financial Methodology: 5-Axis Forensic Scoring

Every financial record gets scored independently across five axes:

Axis Weight What It Measures
Context Language ×3 Transaction vocabulary (wire, routing, SWIFT) vs. noise (lawsuit, net worth)
Amount Specificity ×1 $2,473,891.55 scores high; $10,000,000.00 exactly scores low
Date Presence ×1 Full date > year only > no date
Entity Quality ×2 28 known banks, 64 financial actors, 71+ garbage entity exclusions
Source Document Type ×1 Financial/spreadsheet > email > general document

Classification Tiers:

Validation: v6.2 spot-check hit 93% accuracy on top-30 PROVEN transactions (28/30), with 0% balance contamination (down from 47% in v5).


GAP Analysis

What’s Still Missing

Gap Source Estimable? Reason
WEAK/VERY_WEAK tier exclusions Yes — $5M-$15M $991M excluded as low-confidence; manual review of top entries could recover $5-15M
Sealed/withheld documents No Court-sealed records inaccessible to EFTA; dollar value unknown
Attempted vs. completed transactions No SARs count attempted; I extract completed only; gap is real but unquantifiable
Destroyed pre-retention records No Bank retention policies may have purged records; unquantifiable
Cross-bank SAR duplication No (directional) Same wire triggering SARs at both banks inflates the benchmark — reduces the gap

One gap has a credible dollar estimate ($5-15M in excluded tiers). The rest are real information gaps with unknown values. I’m not putting specific ranges on things I can’t measure.


Data Narratives

Read all Data Narratives

# Title Key Finding Data Scope
19 Blueprint of a Financial Machine Season 1 finale. $2.146B, 123 nodes, 313 edges. Full network mapped. Visualization 10,964 txns · 123 nodes · $2.146B
20 The Verification Wall Season 2, Narrative 20 1. Every document has a Bates number — the wall tests what’s behind it. 8 noise POIs ($144.4M claimed, Bates stamps → news/court filings, $0 bank docs) vs. Leon Black ($310.5M, 42 verified wires, 15 bank docs). NLP phantom autopsy with clickable EFTA source documents. 9 POIs · 15 bank docs · $310.5M verified

Source workbook: Forensic Workbook · Interactive Shell Network

Every claim anchored to specific wire transfers, entity classifications, and court exhibit references from the master ledger.

Repository Contents

├── README.md                              ← You are here
├── docs/
│   ├── METHODOLOGY.md                     ← 25-phase pipeline, 9 bugs, 5-axis scoring, limitations
│   ├── FINDINGS.md                        ← GAP analysis, 8 key discoveries, recommendations
│   ├── COMPLIANCE.md                      ← Professional standards, GAAS conformance, legal disclaimers
│   ├── SCHEMA.md                          ← Database architecture diagram
│   ├── NETWORK.md                         ← Trust network flow diagram
│   └── SOURCE_APPENDIX_TEMPLATE.md        ← Standard template for source appendices
├── narratives/                            ← 2 forensic data narratives with source appendices
├── data/
│   ├── publication_ledger_phase5l.json    ← 10,964 transactions, four-tier (publication dataset)
│   ├── master_wire_ledger_phase5i.json    ← 481 wires (wire-specific subset)
│   └── entity_classification.json         ← Entity → type mapping (228 entities)
├── visualizations/                        ← Interactive shell network diagram
└── tools/
    ├── narrative_sql_tools.py             ← SQL query functions for all 19 narrative data sources
    ├── linkify_efta.py                    ← Auto-link EFTA IDs → DOJ PDFs in .md files
    ├── convert_links_new_tab.py           ← Convert external links to target="_blank"
    ├── inject_efta_source_table.py        ← Add source document tables to narratives
    └── append_source_appendices.py        ← Append source appendices to narratives

Visual Guides

Forensic Workbook v9

Tab Name Description
1 Executive Summary Headline $2.146B, four-tier GAGAS framework, publication ledger
2 Extraction Phases Full pipeline with running totals, bug fixes color-coded
3 Money Flow Patterns Every wire classified: MONEY IN / INTERNAL MOVE / MONEY OUT
4 Shell Trust Hierarchy 4-tier network with actual dollar flows per entity
5 Master Wire Ledger 481 wires with flow direction, entity types, recovery flags
6 Above-Cap Verified Court-verified wires above $10M ($120.6M)
7 Date Recovery Same-amount different-date analysis (95 Phase 23 + 75 Phase 25 recoveries)
8 Entity P&L 228 entities with inflow/outflow/net, shell flags
9 Shell Network Shell-involved wires, 43 shell-to-shell
10 SAR Comparison Bank-by-bank vs FinCEN benchmarks
11 Methodology 9 bugs documented, data sources, 10 limitations
12 Bank Coverage 14 banks mapped with wire counts and volumes
13 Entity Resolution 228 canonical entities with alias mapping
14 Bates Index 51% Bates coverage with exhibit cross-references

What Makes This Different

I didn’t read the documents. I audited the money.

Other projects build search engines, write narrative reports, or create browsable archives. Good work, all of it. I took a different approach. I applied the same methodology I use professionally — multi-affiliate reconciliation, exception reporting, variance analysis, confidence tiering — and pointed it at the EFTA corpus.

The question isn’t “what do the documents say?” It’s: “Where did the money go, who moved it, and what did the DOJ redact around it?”


Why Findings Only — No Source Code or Database

This repo publishes methodology, findings, and summary data. The source code, database, and raw extraction pipeline are not included. That’s intentional.

The master wire ledger (481 wires) and entity classification data are published in full in the data/ directory. Those are the final audited outputs and they’re sufficient for independent verification of everything published here.


Author

Randall Scott Taylor · Data Scientist

BS Network & Cyber Security, Wilmington University MS Applied Data Science, Syracuse University

I built this project. Every line of extraction code, every database table, every classification rule, every phase of the pipeline. AI tools (Claude, Anthropic) helped me write code faster. The analytical judgments, methodology design, and forensic interpretations are mine.

Background: multi-affiliate financial reconciliation, automated classification and exception reporting systems, large-scale data operations.


Ethical Standards


Disclaimer

This analysis does not constitute an audit, examination, or review performed in accordance with GAAS, GAGAS, or AICPA SSFS No. 1. See COMPLIANCE.md for details.

All financial amounts are (Unverified) automated extractions unless explicitly noted otherwise. Entity classifications are based on OCR text extraction with automated normalization and may contain errors. Shell entity designations are analytical classifications, not legal determinations.


Citation

Taylor, R.S. (2026). Epstein Financial Forensics: Automated forensic financial
reconstruction from 1.48 million DOJ EFTA documents. GitHub.
https://github.com/randallscott25-star/epstein-forensic-finance#readme

License

This work is licensed under Creative Commons Attribution 4.0 International.

The underlying DOJ documents are U.S. government publications in the public domain. This repository contains only metadata, extracted analysis, and methodology — no copyrighted source material is reproduced.


Project Timeline

Date Milestone
Feb 7, 2026 Project started — DOJ scraper built, first dataset indexed
Feb 8 DS11 (76,969 financial ledgers) fully scraped
Feb 10 633,842 files indexed — published to GitHub and Archive.org
Feb 12 Phase 3 text extraction complete (513K files)
Feb 14 Entity extraction (3B) launched — 565K files queued
Feb 15 Corpus expanded to 1.48M files + 503K media with DS10 + community gap-fill
Feb 16 Phase 5 financial analysis chain operational
Feb 18 19 datasets online (DS1-12 + DS98-DS104)
Feb 20 Fund flows audit v6.2: $1.43B in P+S transactions, 39% SAR coverage
Feb 21 Wire extraction pipeline (Phases 14-24): $1.964B, 104.6% SAR coverage
Feb 21 Forensic workbook v6.1 published (11 tabs, 382-wire master ledger)
Feb 21 Phase 25: Date recovery from context fields — 75 dates (31.9%→51.6%), 0 collisions (credit: u/miraculum_one)
Feb 22 Repository made public. 17 Data Narratives published. 30 GitHub stars in 5 hours
Feb 24 Phase 5I: 481 wires, $973M entity-resolved, 228 entities, 14-bank coverage, 51% Bates
Feb 24 Workbook v7 published (14 tabs). Full database audit: 33 tables, 8.03GB, 26.6M rows
Feb 25 Phase 5J: Multi-bank statement parser. 1,202 verified transactions from 13 banks ($430K)
Feb 25 Workbook v8 (19 tabs). N18 published. JSON v26 community dataset.
Feb 25 Phase 5K–5L: Payment type expansion + publication ledger assembly. 10,964 unique transactions, $2.146B, four-tier GAGAS framework
Feb 25 Workbook v9. 19 data narratives live.
Feb 26 N19: Blueprint of a Financial Machine — season 1 finale. 123 nodes, 313 edges, full $2.146B corpus mapped. Timeline v9 with 69 vetted persons of interest.
Feb 27 N20: The Verification Wall — season 2 opener. Bates distinction framework. 8 noise POIs ($144.4M → $0 bank docs) vs. Leon Black ($310.5M, 42 wires, 15 bank docs). Clickable EFTA source documents.
Ongoing Additional data narratives and follow-on analysis