Forensic financial reconstruction from 1.48 million DOJ EFTA documents + 503K cataloged media items
I took 1.48 million documents the DOJ released under the Epstein Files Transparency Act and built a forensic financial database from scratch. Wrote all the extraction code. Designed the schema. Built the classification pipeline. Ran the analysis. Solo project, start to finish. AI tools helped me write code faster β same way youβd use a calculator. The analytical calls are mine.
My background is multi-affiliate financial reconciliation, budget variance analysis, and automated exception reporting at institutional scale. I applied those same methods here.
Far as I can tell, nobody else has tried to reconstruct the complete financial infrastructure in the EFTA corpus using quantitative forensic methods. Plenty of good narrative work out there. Plenty of search engines. This is the first attempt to model the full network β fund flows, entity relationships, shell trust hierarchies β at scale.
For the girls.
20 data narratives reconstruct how $2.146 billion moved through 14 shell entities across 8+ banking institutions. Every claim is anchored to specific court exhibits and bates stamps.
β Read the Data Narratives Β· The Verification Wall (N20) Β· Blueprint of a Financial Machine (N19) Β· One-Way Money (N17) Β· Explore the Interactive Network Β· View the Forensic Workbook
| # | Narrative | Key Finding |
|---|---|---|
| 20 | The Verification Wall | Season 2 opener. Every document has a Bates number β the wall tests whatβs behind it. 8 noise POIs ($144.4M, $0 bank docs) vs. Leon Black ($310.5M, 42 verified wires, 15 bank docs). NLP phantom autopsy with clickable EFTA source documents. |
| 1 | The Jeepers Pipeline | $57.9M brokerage shell β personal checking, every wire dated |
| 2 | Art Market as Liquidity Channel | Sothebyβs + Christieβs proceeds entered through Haze Trust |
| 3 | The Plan D Question | $18M out to Leon Black, near-zero inflow β where did it come from? |
| 4 | Chain-Hop Anatomy | 4-tier shell network mapped, $311M double-counting removed |
| 5 | Deutsche Bankβs Role | 38 wires, 75% of volume in last 6 months β and DB ranks 3rd by volume |
| 6 | Gratitude America | 88% to investments, 7% to charity β a βcharityβ that isnβt one |
| 7 | Follow the Money, Follow the Plane | Wire-flight temporal correlation analysis; financial events cluster with flight activity across 18 years |
| 8 | The Infrastructure of Access | The people who moved the money are the people victims named |
| 9 | 734,122 Names | Every person in 1.48M files scanned. 57 bridgers. No one hiding. |
| 10 | The Round Number Problem | Benfordβs Law fails: 84.3% exact round numbers. One decision-maker. |
| 11 | The Shell Map | 14 shells, 8 banks. Bear Stearns has 5.7Γ more activity than Deutsche Bank. |
| 12 | The Bank Nobody Prosecuted | Bear Stearns: 5.7Γ Deutsche Bank volume, zero enforcement action |
| 13 | Seven Banks, One Trust | Outgoing Money Trust used 7 banks for disbursement β textbook structuring |
| 14 | Where Leon Blackβs Money Went | 1,600 files. Every shell. $60.5M in, Apollo Management out the other side |
| 15 | Gratitude America: The Charity That Invested | Tax-exempt charity routing $2β20M to Boothbay, Honeycomb, Valar, Coatue |
| 16 | The Accountant | Richard Kahn / HBRK Associates: 18,833 emails, 11,153 files, touches every shell |
| 17 | One-Way Money | $272M in. $63M out. First multi-institution balance sheet. Visualization |
| 18 | Offshore Architecture: The BrunelβBVIβICIJ Bridge | DOJ subpoena names BVI shell. ICIJ confirms. Scouting International β Tortola, 2003, defunct. 172 docs, 3 databases cross-referenced. |
| 19 | Blueprint of a Financial Machine | Season 1 finale. $2.146B, 123 nodes, 313 edges. Every bank, shell, operator, and key person mapped. Visualization |
| **8.64 GB | 43 tables | 26.7 million rows | 19 datasets** |
| Metric | This Project | Largest Narrative Repo | Largest Search Platform | Others |
|---|---|---|---|---|
| Total files indexed | 1,476,377 + 503K media | 1,380,937 | 1,120,000 | < 20,000 |
| Datasets covered | 19 (DS1-12 + DS98-104) | 12 | 12 | 1-3 |
| Extracted text records | 2.87M (page-level) | 993,406 pages | β | β |
| Entity extraction (NLP) | 11.4M entities | ~4,000 curated | 1,589 manual | < 500 |
| Unique persons identified | 734,125 | 1,536 registry | 1,589 | β |
| Financial transactions modeled | 81,451 (tiered) + 23,832 (directional) | ~186 normalized | 0 | 0 |
| Directional fund flows (AβB) | 23,832 | qualitative | 0 | 0 |
| Wire transfers in master ledger | 481 (Phase 5I audited) | 0 | 0 | 0 |
| Relational database tables | 43 | 3-4 | β | β |
| Confidence-tiered scoring | β 5-axis | β | β | β |
| Redaction proximity analysis | β | β (different method) | β | β |
| SAR cross-validation | β 104.4% | β | β | β |
| Multi-phase dedup pipeline | β 3-stage evolution | β | β | β |
| Shell hierarchy mapping | β 4-tier | β | β | β |
Note: The largest narrative repo counts individual pages as records β their unique PDF file count is ~519,548. My 1,476,377 are unique files, each with a distinct DOJ URL or registered serial. The 503,154 media items are separately cataloged from DS10 evidence photos and videos. Other projects in this space are doing solid work β narrative forensic reporting, searchable archives, community preservation. My lane is systematic financial reconstruction at scale.
β οΈ All findings are navigational tools derived from automated extraction. Not independently verified. Not established fact. See COMPLIANCE.md for full professional standards disclaimers.
| Metric | Value |
|---|---|
| Publication Ledger Total | $2,146,000,000 (10,964 unique transactions) |
| FinCEN SAR Benchmark | $1,878,000,000 |
| T1βT3 Coverage of SAR | 104.4% ($1,960,600,000) |
| Payment Types Classified | 10 |
| Wire Transfers in Master Ledger | 481 |
| Unique Entities (Entity-Resolved) | 228 |
| Bank Coverage | 14 banks |
| Bates Number Coverage | 51% |
| Shell-to-Shell Transfers Identified | 43 |
| Shell Trust Hierarchy Tiers Mapped | 4 |
| Contamination Bugs Caught & Fixed | 9 |
| Tier | Classification | Amount | % of Total |
|---|---|---|---|
| T1 | Epstein-Controlled Entities | $1,610,000,000 | 75.0% |
| T2 | Known Associates | $343,000,000 | 16.0% |
| T3 | Extended Network | $7,600,000 | 0.4% |
| T4 | Unclassified | $185,000,000 | 8.6% |
| T1βT3 | Auditable Subtotal | $1,960,600,000 | 104.4% of SAR |
| Total | Publication Ledger | $2,146,000,000 | β |
The SAR benchmark ($1.878B) only counts transactions banks flagged as suspicious. The EFTA corpus has the complete financial record β including legitimate stuff like Sothebyβs auction proceeds ($11.2M), Tudor Futures returns ($12.8M), Kellerhals law firm settlements ($23M), and Blockchain Capital VC investments ($10.5M). Total financial flows should exceed the suspicious subset. Thatβs just how it works: SAR β Total Financial Activity. T4 (Unclassified) gets excluded from the SAR comparison because those transactions donβt have enough entity resolution to classify.
Full annotated flow diagram: NETWORK.md
TIER 1 β HOLDING TRUSTS (received external deposits)
Southern Trust Company Inc. $151.5M in β Black, Rothschild, Narrow Holdings
The 2017 Caterpillar Trust $15.0M in β Blockchain Capital
TIER 2 β DISTRIBUTION TRUSTS (redistributed internally)
The Haze Trust (DBAGNY) $49.7M out β Southern Financial, Southern Trust
The Haze Trust (Checking) $21.8M in β Sotheby's, Christie's
Southern Financial LLC $14.0M in β Tudor Futures
Southern Financial (Checking) $32.0M in β Haze Trust
TIER 3 β OPERATING SHELLS (paid beneficiaries)
Jeepers Inc. (DB Brokerage) $51.9M out β Epstein personal account (21 wires)
Plan D LLC $18.0M out β Leon Black (4 wires)
Gratitude America MMDA $6.3M out β Morgan Stanley, charities
Richard Kahn (attorney) $9.3M out β Paul Morris, others
NES LLC $554K out β Ghislaine Maxwell
TIER 4 β PERSONAL ACCOUNTS (terminal destinations)
Jeffrey Epstein NOW/SuperNow $83.4M in β Jeepers, Kellerhals, law firms
Darren Indyke (estate attorney) $6.4M in β Deutsche Bank
All amounts are (Unverified) automated extractions. See FINDINGS.md for details.
| Direction | Wires | Amount (Unverified) | Share |
|---|---|---|---|
| MONEY IN β External β Epstein entities | 91 | $232,538,043 | 41.7% |
| INTERNAL MOVE β Shell β Shell reshuffling | 39 | $112,610,112 | 20.2% |
| PASS-THROUGH β Attorney/trust administration | 130 | $72,433,003 | 13.0% |
| MONEY OUT β Epstein entities β External | 51 | $63,266,349 | 11.3% |
| BANK β SHELL β Custodian disbursements | 27 | $53,717,045 | 9.6% |
| Other (ShellβBank, Interbank, ExternalβBank) | 44 | $23,504,429 | 4.2% |
| Bank | Reported SARs |
|---|---|
| JPMorgan Chase | ~$1.1B (4,700+ transactions) |
| Deutsche Bank | ~$400M |
| Bank of New York Mellon | ~$378M |
| Total known SARs | $1.878B |
Sources: U.S. Senate Permanent Subcommittee on Investigations; NYDFS Consent Order (2020); JPMorgan USVI Settlement (2023)
Full architecture diagram: SCHEMA.md
Not a search index. A relational forensic database. 8.64 GB, 43 tables, 26.7 million rows.
Financial Analysis (13 tables)
publication_ledger β 10,964 deduplicated transactions ($2.146B) with four-tier GAGAS classification (T1βT4), payment type, source exhibitfund_flows β 23,832 directional money movements (entity_from β entity_to, amount, date, confidence)fund_flows_audited β 7,355 classified flows (5-tier: PROVEN/STRONG/MODERATE/WEAK/VERY_WEAK) with FinCEN/ICIJ match flags, composite scoring, entity classificationverified_wires β 185 court-exhibit authenticated wire transfers (dates, bates numbers, exhibits)verified_bank_statements β 1,202 multi-bank statement transactions from 13 institutions with statement dates and balance contextfinancial_hits β 81,451 financial content extractions across 19 categories and 3 verification tiers (C1/C2/C3)financial_redactions β 2,395 recovered dollar amounts near redaction markers with confidence scoringfincen_transactions β 4,507 FinCEN suspicious activity report transaction recordsfincen_bank_connections β 5,498 bank-to-bank SAR relationship mappingsentity_aliases β 186 raw text β canonical name resolution rulesentity_roles β 74 classified entities with total inflow, outflow, net position, wire counts, and exhibit referencespayment_type_registry β 10 classified payment types (wire, CHIPS, SWIFT, bank statement, check, etc.)dedup_audit_log β Deduplication decision trail for publication ledger assemblyEntity Intelligence (3 tables)
entities β 11,438,134 extracted entities with NLP classification (PERSON, ORG, GPE, MONEY, NORP, FAC, LOC, LAW)poi_rankings β 2,000 persons of interest scored by multi-axis corpus frequency (file count, financial count, flight count, redaction dollars, direct dollars)evidence_index β 1,077,516 evidentiary chain records linking documents across datasets with bates numbers, checksums, and source typesRedaction Analysis (3 tables)
redaction_recovery β 157,984 content fragments recovered from under redaction overlays (with financial/names/dates flags and interest scoring)redaction_markers β 140,060 systematic redaction position records across corpusredaction_summary β 131,860 aggregated redaction analysis per documentCorpus Infrastructure (4 tables)
files β 1,476,377 file records with 30 columns: metadata, classification, dates, extraction status, doc typesextracted_text β 2,866,239 page-level text records with classification and extraction methoddates_found β 2,411,188 temporal references extracted across entire corpus with contextmedia_evidence β 503,154 DS10 image/video catalog with custodian, doc_type, confidentiality markingsExternal Cross-Reference β FAA Aviation (7 tables)
faa_master β 309,849 active aircraft registrationsfaa_dereg β 381,869 deregistered aircraft records with cancellation datesfaa_acftref β 93,521 aircraft type/model referencefaa_engine β 4,743 engine type referencefaa_dealer β 12,485 aircraft dealer registrationsfaa_reserved β 126,504 reserved N-numbersfaa_docindex β 11,440 FAA document indexExternal Cross-Reference β ICIJ Offshore Leaks (6 tables)
icij_entities β 814,344 offshore entities from Panama Papers, Paradise Papers, Pandora Papers, and other ICIJ investigationsicij_officers β 771,315 officers/directors of offshore entitiesicij_relationships β 3,339,267 entity relationship recordsicij_addresses β 402,246 offshore entity addresses worldwideicij_intermediaries β 25,629 shell company formation agentsicij_others β 2,989 other offshore entitiesPhase 1 DOJ EFTA Scraper + Community Gap-Fill β 1.48M files + 503K media registered
Phase 2 Download & Verify β local corpus with integrity checks
Phase 3 Extract, Classify & Enrich β text, doc types, dates
Phase 3B Entity Extraction (spaCy NLP) β 11.4M entities, 734K persons
Phase 5A Person-of-Interest Network β news-filtered, multi-source scoring
Phase 5B Operational Cost Model β confidence-tiered financial extraction
Phase 5C Entity-to-Entity Fund Flows β directional AβB with 5-axis scoring
Phase 5D Payment-Travel-Victim Correlation β temporal pattern analysis
Phase 5E Redaction Map β navigational tool for document analysis
Phases 14-24 Wire Transfer Extraction Pipeline β 382-wire ledger, $1.964B
Phase 5I Entity Resolution & Bank Expansion β 481-wire ledger, $973M entity-resolved, 14-bank coverage
Phase 5J Multi-Bank Statement Parser β 1,202 verified transactions from 13 banks
Phase 5K Payment Type Expansion β CHIPS, SWIFT, checks, bank statements beyond wire transfers
Phase 5L Publication Ledger Assembly β 10,964 unique transactions, $2.146B, four-tier GAGAS framework
| Phase | What Happened | Impact |
|---|---|---|
| 14.5-15 | Known entity fund flows + wire indicators | +$105M |
| 16.1-16.2 | Transaction-line parser + round-wire extractor | +$83M |
| 17-18 | Trust transfers + full category sweep | +$17M |
| 19 | Self-dedup bug fix (table checking against itself) | +$60M recovered |
| 20-21 | Verified wires + STRONG/MODERATE new amounts | +$63M |
| 22 | Forensic scrub β chain-hop inflation removed | -$311M removed |
| 23 | Date-aware census (same amount, different dates) | +$189M recovered |
| 24 | Above-cap verified wires + bank custodian audit | +$121M / -$113M |
| 25 | Date recovery from source context fields | +75 dates (31.9%β51.6%), 0 collisions |
| 5I | Entity resolution: 481 wires, 228 entities, 14 banks, 51% Bates coverage | $973M entity-resolved |
| 5J | Multi-bank statement parser: 1,202 transactions from 13 institutions | +$430K verified statements |
| 5K | Payment type expansion: CHIPS, SWIFT, checks, bank statements | 10 payment types classified |
| 5L | Publication ledger: 10,964 unique transactions, four-tier GAGAS, T1βT3 = 104.4% SAR | $2.146B total |
Full phase-by-phase details: METHODOLOGY.md
Every financial record gets scored independently across five axes:
| Axis | Weight | What It Measures |
|---|---|---|
| Context Language | Γ3 | Transaction vocabulary (wire, routing, SWIFT) vs. noise (lawsuit, net worth) |
| Amount Specificity | Γ1 | $2,473,891.55 scores high; $10,000,000.00 exactly scores low |
| Date Presence | Γ1 | Full date > year only > no date |
| Entity Quality | Γ2 | 28 known banks, 64 financial actors, 71+ garbage entity exclusions |
| Source Document Type | Γ1 | Financial/spreadsheet > email > general document |
Classification Tiers:
Validation: v6.2 spot-check hit 93% accuracy on top-30 PROVEN transactions (28/30), with 0% balance contamination (down from 47% in v5).
| Gap Source | Estimable? | Reason |
|---|---|---|
| WEAK/VERY_WEAK tier exclusions | Yes β $5M-$15M | $991M excluded as low-confidence; manual review of top entries could recover $5-15M |
| Sealed/withheld documents | No | Court-sealed records inaccessible to EFTA; dollar value unknown |
| Attempted vs. completed transactions | No | SARs count attempted; I extract completed only; gap is real but unquantifiable |
| Destroyed pre-retention records | No | Bank retention policies may have purged records; unquantifiable |
| Cross-bank SAR duplication | No (directional) | Same wire triggering SARs at both banks inflates the benchmark β reduces the gap |
One gap has a credible dollar estimate ($5-15M in excluded tiers). The rest are real information gaps with unknown values. Iβm not putting specific ranges on things I canβt measure.
| # | Title | Key Finding | Data Scope |
|---|---|---|---|
| 1 | The Jeepers Pipeline | $57.9M brokerage shell β personal checking, all dated, all on Exhibit C | 24 wires Β· $57,876,640 |
| 2 | Art Market as Liquidity Channel | Sothebyβs + Christieβs proceeds entered the shell network through Haze Trust | 20 wires Β· $103,786,473 |
| 3 | The Plan D Question | $18M out to Leon Black, near-zero inflow β where did Plan D get its money? | 34 wires Β· $163,097,604 |
| 4 | Chain-Hop Anatomy | 4-tier shell network mapped β and $311M in double-counting removed | 67 wires Β· $312,796,381 |
| 5 | Deutsche Bankβs Role | 38 wires across every major Epstein entity, 75% of volume in last 6 months | 38 wires Β· $56,792,936 |
| 6 | Gratitude America | 88% of outflows to investment accounts, 7% to charitable purposes | 20 wires Β· $13,080,518 |
| 7 | Follow the Money, Follow the Plane | Wire-flight temporal correlation: financial events cluster with flight activity consistently across 18 years | 185 wires Β· 321 flights |
| 8 | The Infrastructure of Access | The people who moved the money are the same people victims named β Maxwell in 204 financial docs and 1,312 victim docs | 11.4M entities Β· 1.48M files |
| 9 | 734,122 Names | Asked every person in 1.48M files who bridges financial and victim docs. 57 real names. 10 operational staff | 734,122 persons Β· 57 bridgers |
| 10 | The Round Number Problem | Benfordβs Law fails: digits 2 and 5 at 29.7% and 18.4%. 84.3% of wires are exact round numbers | 185 wires Β· $557M |
| 11 | The Shell Map | Wire ledger captured 7 entities. The corpus contains 14 β with 178K money references | 14 shells Β· 178K money refs |
| 12 | The Bank Nobody Prosecuted | Bear Stearns had 2.4M money mentions (5.7Γ Deutsche Bank) β zero fines, zero investigation | 2.4M money refs Β· 66 shared files |
| 13 | Seven Banks, One Trust | Outgoing Money Trust disbursed through DB, Wells Fargo, BofA, TD, JPMorgan, PNC, Sabadell | 180 financial docs Β· 7 banks |
| 14 | Where Leon Blackβs Money Went | 1,600 files, every shell, βBlack Family Partners LP c/o Apollo Managementβ β the round trip | 1,600 files Β· $60.5M Β· 7 shells |
| 15 | Gratitude America: The Charity That Invested | Tax-exempt charity routing $2β20M to Boothbay, Honeycomb, Valar, Coatue | 89 financial Β· $45M wires |
| 16 | The Accountant | Richard Kahn / HBRK Associates: 18,833 emails, 11,153 files, touches every shell | 18,833 emails Β· 11,153 files |
| 17 | One-Way Money | $272M in. $63M out. $209M gap. First multi-institution balance sheet | 481 wires Β· 228 entities Β· $558M |
| 18 | Offshore Architecture: The BrunelβBVIβICIJ Bridge | DOJ subpoena names BVI shell. ICIJ Offshore Leaks confirms. 3 databases cross-referenced | 172 docs Β· 3 databases Β· 8,526 pages |
| 19 | Blueprint of a Financial Machine | Season 1 finale. $2.146B, 123 nodes, 313 edges. Full network mapped. Visualization | 10,964 txns Β· 123 nodes Β· $2.146B |
| 20 | The Verification Wall | Season 2, Narrative 20 1. Every document has a Bates number β the wall tests whatβs behind it. 8 noise POIs ($144.4M claimed, Bates stamps β news/court filings, $0 bank docs) vs. Leon Black ($310.5M, 42 verified wires, 15 bank docs). NLP phantom autopsy with clickable EFTA source documents. | 9 POIs Β· 15 bank docs Β· $310.5M verified |
Source workbook: Forensic Workbook Β· Interactive Shell Network
βββ README.md β You are here
βββ docs/
β βββ METHODOLOGY.md β 25-phase pipeline, 9 bugs, 5-axis scoring, limitations
β βββ FINDINGS.md β GAP analysis, 8 key discoveries, recommendations
β βββ COMPLIANCE.md β Professional standards, GAAS conformance, legal disclaimers
β βββ SCHEMA.md β Database architecture diagram
β βββ NETWORK.md β Trust network flow diagram
β βββ SOURCE_APPENDIX_TEMPLATE.md β Standard template for source appendices
βββ narratives/ β 20 forensic data narratives with source appendices
βββ data/
β βββ publication_ledger_phase5l.json β 10,964 transactions, four-tier (publication dataset)
β βββ master_wire_ledger_phase5i.json β 481 wires (wire-specific subset)
β βββ entity_classification.json β Entity β type mapping (228 entities)
βββ visualizations/ β Interactive shell network diagram
βββ tools/
βββ narrative_sql_tools.py β SQL query functions for all 19 narrative data sources
βββ linkify_efta.py β Auto-link EFTA IDs β DOJ PDFs in .md files
βββ convert_links_new_tab.py β Convert external links to target="_blank"
βββ inject_efta_source_table.py β Add source document tables to narratives
βββ append_source_appendices.py β Append source appendices to narratives
| Tab | Name | Description |
|---|---|---|
| 1 | Executive Summary | Headline $2.146B, four-tier GAGAS framework, publication ledger |
| 2 | Extraction Phases | Full pipeline with running totals, bug fixes color-coded |
| 3 | Money Flow Patterns | Every wire classified: MONEY IN / INTERNAL MOVE / MONEY OUT |
| 4 | Shell Trust Hierarchy | 4-tier network with actual dollar flows per entity |
| 5 | Master Wire Ledger | 481 wires with flow direction, entity types, recovery flags |
| 6 | Above-Cap Verified | Court-verified wires above $10M ($120.6M) |
| 7 | Date Recovery | Same-amount different-date analysis (95 Phase 23 + 75 Phase 25 recoveries) |
| 8 | Entity P&L | 228 entities with inflow/outflow/net, shell flags |
| 9 | Shell Network | Shell-involved wires, 43 shell-to-shell |
| 10 | SAR Comparison | Bank-by-bank vs FinCEN benchmarks |
| 11 | Methodology | 9 bugs documented, data sources, 10 limitations |
| 12 | Bank Coverage | 14 banks mapped with wire counts and volumes |
| 13 | Entity Resolution | 228 canonical entities with alias mapping |
| 14 | Bates Index | 51% Bates coverage with exhibit cross-references |
I didnβt read the documents. I audited the money.
Other projects build search engines, write narrative reports, or create browsable archives. Good work, all of it. I took a different approach. I applied the same methodology I use professionally β multi-affiliate reconciliation, exception reporting, variance analysis, confidence tiering β and pointed it at the EFTA corpus.
The question isnβt βwhat do the documents say?β Itβs: βWhere did the money go, who moved it, and what did the DOJ redact around it?β
This repo publishes methodology, findings, and summary data. The source code, database, and raw extraction pipeline are not included. Thatβs intentional.
The master wire ledger (481 wires) and entity classification data are published in full in the data/ directory. Those are the final audited outputs and theyβre sufficient for independent verification of everything published here.
Randall Scott Taylor Β· Data Scientist
BS Network & Cyber Security, Wilmington University MS Applied Data Science, Syracuse University
I built this project. Every line of extraction code, every database table, every classification rule, every phase of the pipeline. AI tools (Claude, Anthropic) helped me write code faster. The analytical judgments, methodology design, and forensic interpretations are mine.
Background: multi-affiliate financial reconciliation, automated classification and exception reporting systems, large-scale data operations.
This analysis does not constitute an audit, examination, or review performed in accordance with GAAS, GAGAS, or AICPA SSFS No. 1. See COMPLIANCE.md for details.
All financial amounts are (Unverified) automated extractions unless explicitly noted otherwise. Entity classifications are based on OCR text extraction with automated normalization and may contain errors. Shell entity designations are analytical classifications, not legal determinations.
Taylor, R.S. (2026). Epstein Financial Forensics: Automated forensic financial
reconstruction from 1.48 million DOJ EFTA documents. GitHub.
https://github.com/randallscott25-star/epstein-forensic-finance#readme
This work is licensed under Creative Commons Attribution 4.0 International.
The underlying DOJ documents are U.S. government publications in the public domain. This repository contains only metadata, extracted analysis, and methodology β no copyrighted source material is reproduced.
| Date | Milestone |
|---|---|
| Feb 7, 2026 | Project started β DOJ scraper built, first dataset indexed |
| Feb 8 | DS11 (76,969 financial ledgers) fully scraped |
| Feb 10 | 633,842 files indexed β published to GitHub and Archive.org |
| Feb 12 | Phase 3 text extraction complete (513K files) |
| Feb 14 | Entity extraction (3B) launched β 565K files queued |
| Feb 15 | Corpus expanded to 1.48M files + 503K media with DS10 + community gap-fill |
| Feb 16 | Phase 5 financial analysis chain operational |
| Feb 18 | 19 datasets online (DS1-12 + DS98-DS104) |
| Feb 20 | Fund flows audit v6.2: $1.43B in P+S transactions, 39% SAR coverage |
| Feb 21 | Wire extraction pipeline (Phases 14-24): $1.964B, 104.6% SAR coverage |
| Feb 21 | Forensic workbook v6.1 published (11 tabs, 382-wire master ledger) |
| Feb 21 | Phase 25: Date recovery from context fields β 75 dates (31.9%β51.6%), 0 collisions (credit: u/miraculum_one) |
| Feb 22 | Repository made public. 17 Data Narratives published. 30 GitHub stars in 5 hours |
| Feb 24 | Phase 5I: 481 wires, $973M entity-resolved, 228 entities, 14-bank coverage, 51% Bates |
| Feb 24 | Workbook v7 published (14 tabs). Full database audit: 33 tables, 8.03GB, 26.6M rows |
| Feb 25 | Phase 5J: Multi-bank statement parser. 1,202 verified transactions from 13 banks ($430K) |
| Feb 25 | Workbook v8 (19 tabs). N18 published. JSON v26 community dataset. |
| Feb 25 | Phase 5Kβ5L: Payment type expansion + publication ledger assembly. 10,964 unique transactions, $2.146B, four-tier GAGAS framework |
| Feb 25 | Workbook v9. 19 data narratives live. |
| Feb 26 | N19: Blueprint of a Financial Machine β season 1 finale. 123 nodes, 313 edges, full $2.146B corpus mapped. Timeline v9 with 69 vetted persons of interest. |
| Feb 27 | N20: The Verification Wall β season 2 opener. Bates distinction framework. 8 noise POIs ($144.4M β $0 bank docs) vs. Leon Black ($310.5M, 42 wires, 15 bank docs). Clickable EFTA source documents. |
| Ongoing | Additional data narratives and follow-on analysis |