Data, Research & Sources
The evidence base behind the plan: the data we build on, the research that makes our analytics credible, the early-warning signals that give us a head start, and a plain-language glossary of every term used across this site.
This is a reference appendix, organized for lookup rather than reading start to finish. The Targeting page turns these datasets into a ranked county list; the Risks page covers the rules that limit access to some data (for example, 42 CFR Part 2 and restricted prescription-monitoring data). Every dataset link was checked live on 2026-06-25. Figures marked verify are early estimates we still need to confirm.
1 · The data we build on
Our analytics run on a single connected dataset that links information by county and by provider, almost all of it from free, public sources. That public data is widely available, so it isn't where our advantage lies — the advantage is the early-warning and county-targeting layer we build on top of it. Two reference files connect everything, so we set them up first: the HUD USPS ZIP crosswalk, which translates between ZIP codes, census tracts, and counties, and the NPPES NPI registry, which gives every healthcare provider a unique ID.
How the data connects (the join keys)
| Join key | What it links | Role |
|---|---|---|
| FIPS county (5-digit) | WONDER, SUDORS, VSRR, SVI, PLACES, AHRQ-SDOH, County Health Rankings, ACS, BLS, BEA, IRS-SOI, ARCOS, TEDS, Part D (agg.) | Primary spine — every county-week risk score keys here |
| Census tract (11-digit) / block group | SVI, PLACES, AHRQ-SDOH, ADI, Food Access Atlas, EJScreen, NLCD, LODES, ACS, RTI SynthPop | Sub-county vulnerability & SDOH detail |
| ZIP / ZCTA | PLACES (ZCTA), AHRQ-SDOH, IRS-SOI, APCDs, MarketScan, HUD crosswalk | Bridges claims/survey data to tract & county |
| NPI (provider) | NPPES, Part D prescribers, Part B PUF, Medicare/Medicaid claims, all-payer claims | Master provider key — ties prescribing & capability to geography |
| ICD-10 + HCPCS / NDC | WONDER, HCUP, CMS claims, MIMIC, MarketScan, Synthea, Part D (NDC) | Clinical event & drug coding |
| Lat/long → geocode → tract/county | EPA AQS/TRI, USGS, OSM, SafeGraph/Advan, NEMSIS (coarse), facility locators | Spatial joins for point data |
The HUD USPS ZIP crosswalk and the NPPES NPI registry hold everything else together, so they come first. Two timing notes: the older NPPES interface (V1) shuts down on 2026-03-03, so we use V2; and EJScreen was pulled from the EPA site in February 2025, so we use the PEDP Azure or Harvard Dataverse copy instead.
Key datasets by layer
| Dataset | Source | Layer | Granularity | Access |
|---|---|---|---|---|
| CDC WONDER — Multiple Cause of Death | CDC NCHS | Overdose mortality | County | Open |
| SUDORS / DOSE (OD2A) | CDC | Fatal-OD detail (toxicology) | State | Open |
| VSRR Provisional OD counts | CDC NCHS | Timely OD counts (3–6 mo lag) | State | Open + API |
| TEDS / TEDS-D | SAMHSA | Treatment admissions & discharges | State | Open PUF |
| N-SUMHSS / FindTreatment | SAMHSA | Facility capability census | Facility | Open |
| WaPo ARCOS pain-pill DB | WaPo / litigation | Pill-level opioid shipments '06–'14 | Pharmacy / county | Open + API |
| NPPES NPI Registry | CMS | Provider master key | NPI | Open |
| Medicare Part D Prescribers | CMS | Opioid / buprenorphine prescribing | NPI | Open |
| CDC Buprenorphine Dispensing Maps | CDC / IQVIA | MOUD access proxy | County | Open |
| CDC/ATSDR SVI | CDC/ATSDR | Social vulnerability index | Tract / county | Open |
| CDC PLACES | CDC | Local chronic-disease estimates | County / ZCTA / tract | Open + API |
| AHRQ SDOH Database | AHRQ | Pre-joined SDOH (5 domains) | County / ZIP / tract | Open |
| County Health Rankings | UW / RWJF | 90+ outcome & factor measures | County | Open |
| HUD USPS ZIP Crosswalk | HUD USER | The join bridge | ZIP ↔ tract/county | Open |
| Census ACS | Census | Universal denominator (demographics) | Nation → block group | Open API |
| Census LODES / LEHD | Census | Origin-destination commuting | Census block | Open |
| HCUP — SID/SEDD/NEDS | AHRQ | Inpatient & ED discharge records | State / hospital | App + DUA |
| Medicaid T-MSIS (TAF) | CMS / ResDAC | Medicaid claims (pays most US SUD care) | Beneficiary → county | DUA / App |
| NEMSIS (EMS) | NEMSIS TAC | EMS naloxone & OD responses | National (de-id) | Application |
| IDHS/SUPR DARTS | Illinois IDHS | True IL client/provider treatment records | Provider / client | Relationship-gated |
| Synthea / RTI SynthPop | MITRE / RTI | Synthetic build-and-test substrate | Synthetic → any geo | Open |
Start now (all free and open): CDC mortality data (WONDER, VSRR, SUDORS); provider and prescribing data (NPPES, Medicare Part D); the Washington Post ARCOS pill-shipment database; core demographic and social layers (Census ACS, HUD crosswalk, SVI, PLACES, AHRQ-SDOH, County Health Rankings); commuting and map data (LODES, OpenStreetMap); and the synthetic datasets (Synthea, SynthPop) used to test the system on fake data first.
Apply early — these take time to clear: hospital discharge records (HCUP), Medicare and Medicaid claims through ResDAC, EMS data (NEMSIS), and the All of Us research cohort.
Paid or commercial: MarketScan, Optum, and IQVIA claims; Advan mobility data; and state all-payer claims databases. Detailed prescription-monitoring (PDMP) records are not public — only the policy summaries (PDAPS) are open.
2 · The research behind our analytics
Our analytics are transparent and tested: the model's reasoning can be inspected, and its predictions are checked against real outcomes. That credibility rests on published research across five areas, plus the evidence on provider outcomes and opioid modeling that shapes the product. What follows maps the fields we draw on, not a full literature review; complete citations are in the source notes.
Does a result travel from one place to another?
Bareinboim & Pearl on combining and transporting data (PNAS 2016; Stat. Science 2014), with practical methods from Stuart et al. (2011) and Dahabreh et al. (2020). The formal basis for asking: does an effect proven in one state hold up in another?
Measuring what a policy change actually did
Synthetic control (Abadie et al. 2010), synthetic difference-in-differences (Arkhangelsky et al. 2021), corrections for staggered rollouts (Callaway & Sant'Anna 2021; Goodman-Bacon 2021), and interrupted time series (Bernal et al. 2017) — to measure the real effect of Illinois policy changes and hold settlement spending accountable.
Where to direct limited funding
Restless multi-armed bandits (Whittle 1988) and a real public-health deployment of them (Mate/Tambe/ARMMAN, AAAI 2022). The math behind "which county do we fund this quarter" — already proven in the field.
Predicting where overdoses are rising
Overdose forecasting (Sumetsky et al. 2021), neighborhood risk modeling (Bozorgi et al. 2021), self-exciting point processes (Mohler et al. 2011), and graph-based spatiotemporal networks (DCRNN, Li et al. 2018) — these power our county-by-week risk scores.
Keeping models honest across places
Importance weighting (Sugiyama et al. 2007), domain-generalization bounds (Ben-David et al. 2010), and stable-feature transport (Subbaswamy/Saria 2019) — keeping a model accurate when applied to a region whose data looks different from where it was trained.
Calibration and backtesting
Pattern-oriented modeling (Grimm et al. 2005), validation batteries (Barlas 1996), and simulation-based and Bayesian calibration. Opioid examples include the FDA's SOURCE model (Lim et al., PNAS 2022) and RESPOND (PLOS One 2024). Only about 61% of opioid models are calibrated and 31% validated (Cerdá et al. 2021) — that gap is our opening.
Data on how individual treatment providers actually perform is the hardest thing to get in this field. Public data gives state-level retention and completion rates (TEDS-D, the Medicaid Adult Core Set, mandatory since FFY2024), county-level access to medication (CDC buprenorphine maps), and a facility-by-facility capability census (N-SUMHSS). The claims-based "what works" standards — HEDIS measures like IET, FUA, and POD (staying on medication 180+ days) plus the academic Washington Circle measures — can only be calculated at the provider or county level with claims access (Illinois T-MSIS or DARTS), which is exactly where Dr. Barthwell's relationships open doors. On demand: a 2025 HHS-OIG report found Medicare could have saved $301.5M (53%) on opioid-treatment bundles where the full set of services wasn't actually delivered — a clear need for the spending accountability we provide.
3 · Early-warning signals
Death-certificate data (CDC WONDER) runs four to six months behind, and even the faster provisional counts (VSRR) lag about six months. Our edge turns real-time, upstream signals into county-by-week risk scores that lead overdose deaths by two to eight weeks — the early warning that health departments, insurers, harm-reduction groups, and treatment networks will pay for. Most of these signals already come tagged by county or EMS region, so they slot straight into our data.
The signals, ranked by how early they warn us
| Signal | Predicts | Lead time | County join | Access |
|---|---|---|---|---|
| ODMAP (suspected ODs) | Real-time OD spikes, automated alerts | Real-time | Excellent | MOU-gated |
| EMS / NEMSIS naloxone | Nonfatal-OD burden (best proxy) | Real-time | Good (agency→FIPS) | State-EMS partner |
| NPDS poison-center calls | Acute toxic exposures (~6-min latency) | ~Real-time | Good | Paid license |
| CDC NWSS wastewater | New/more-potent supply in a sewershed | Days | Crosswalk needed | MOU / utility |
| NPS Discovery / NDEWS | Novel adulterants (nitazenes, xylazine) | Days–weeks | Region only | Open |
| 988 / helpline volume | Population distress demand | Weeks | State / region | Mostly open |
| Syringe-services (SSP) data | Active-use population size | Weeks | Crosswalk | Partnership |
| Jail bookings / releases | Release = peak OD-risk window (lost tolerance) | Weeks (1–4) | Good | County scraping |
| HMIS homelessness | Structural risk-population flux | Months | CoC→FIPS | CoC-gated |
| Google Trends / social | Search/attention demand | Weeks | Coarse (DMA) | Open |
| PDMP / dispensing | Rx shifts; bup access (protective) | Mixed | NPI / FIPS (native) | Regulatory |
The three strongest products we can build from these
County Surge Score
Combines suspected-overdose spikes (ODMAP) with EMS naloxone use. It joins cleanly to county data, has the clearest buyer (state and county health departments), and carries little technical risk. Output: a daily or weekly surge score plus automatic spike alerts per Illinois county. It needs only an ODMAP data agreement, which Dr. Barthwell's federal-policy network can open.
Drug-Supply Early Warning
A "new threat in your county" alert combining national tracking of dangerous new additives (NPS Discovery, free) with local wastewater testing and syringe-program drug checking — the upstream signal almost nobody packages well. We can prototype it for free, then add Illinois-specific local data in a premium tier.
Risk vs. Treatment-Access Map
Compares how fast acute risk is rising (poison-center call data) against how much medication treatment is actually available, flagging counties where danger is climbing but access is thin. We can build it quickly on data we already have (Part D, ARCOS, NPPES); the only added cost is the poison-center data license.
4 · Glossary
Plain-language definitions for every acronym and term used across this plan. Click any column header to sort.
| Term | Full name | What it means here |
|---|---|---|
| MOUD | Medications for Opioid Use Disorder | The three FDA-approved meds — buprenorphine, methadone, naltrexone. Our clinical front door. |
| MAT | Medication-Assisted Treatment | Older term for MOUD (meds + behavioral support); now largely superseded by "MOUD." |
| OBOT | Office-Based Opioid Treatment | Office/telehealth buprenorphine prescribing. Billable in 4–8 weeks under Two Dreams' existing SUPR license; no X-waiver needed post-MAT Act 2023. |
| OTP | Opioid Treatment Program | Federally certified program that can dispense methadone. Requires SAMHSA cert + CARF/JC accreditation + DEA NTP registration (9–14 month build). |
| NTP | Narcotic Treatment Program | DEA's registration category for an OTP. A registered OTP may add a mobile van as an extension (2021 DEA rule); standalone mobile NTPs are barred. |
| MMHU | Medication-Assisted Recovery Mobile Health Units | Illinois IDHS/SUPR grant program (AG-certified up to $15M; MMHU-2 = $8.4M / 3 yrs / 4+ awards) funding the van, all three OUD meds, and staff. Two Dreams is eligible. |
| SUPR | Division of Substance Use Prevention & Recovery | The IDHS division that licenses SUD providers and administers SUD funding in Illinois. Two Dreams holds a SUPR license. |
| IDHS | Illinois Department of Human Services | The state agency that houses SUPR; the grantor/contracting body for MMHU, SOR, and related funds. |
| SOR | State Opioid Response | SAMHSA grant funding flowing to states (IDHS/SUPR), available to providers as subawards. Part of the capital stack. |
| RCORP | Rural Communities Opioid Response Program | HRSA grant program; RCORP-Impact is up to $750K/yr × 4 (rural), with for-profits eligible as a consortium lead. |
| SAMHSA | Substance Abuse & Mental Health Services Administration | The federal agency funding SOR, certifying OTPs, and collecting TEDS / N-SUMHSS data. |
| HRSA | Health Resources & Services Administration | The federal agency that runs RCORP and other rural-health funding. |
| ONDCP | Office of National Drug Control Policy | The White House drug-policy office. Dr. Andrea Barthwell is a former Deputy Director — the relationship moat for MOU-gated data. |
| ASAM | American Society of Addiction Medicine | The professional society for addiction medicine; Dr. Barthwell is a past president. |
| CoCM | Collaborative Care Model | Team-based integrated behavioral-health billing model (CPT 99492–99494, G2214; 2026 G0568 ≈ $162 verify). A core LTV layer in the BH stack. |
| IOP | Intensive Outpatient Program | A structured level of behavioral-health care; part of the separately-billable BH service stack. |
| SDOH | Social Determinants of Health | Non-clinical conditions (income, housing, food access) driving health outcomes; a major covariate layer in the fabric (SVI, AHRQ-SDOH, ADI). |
| SVI | Social Vulnerability Index | CDC/ATSDR 16-variable tract/county vulnerability index — the flagship SDOH join layer and a weight in the targeting model. |
| MAT Act | Mainstreaming Addiction Treatment Act (2023) | Eliminated the DEA X-waiver, letting any DEA-registered prescriber treat OUD with buprenorphine — the reason OBOT is billable in weeks. |
| IORAB | Illinois Opioid Remediation Advisory Board | The state body advising on allocation of Illinois opioid-settlement (abatement) funds. verify |
| IMPACT | Illinois Medicaid Provider Enrollment (IMPACT) | The IL portal through which a provider enrolls in Medicaid before MCO contracting and billing. |
| MCO | Managed Care Organization | Medicaid managed-care plans that must be contracted with to bill Medicaid volume in Illinois. |
| OASEE | Opioid Abatement Strategies Effectiveness Evaluator | An IL NOFO ($1.5M / 3 yrs to one org) to evaluate settlement-spending effectiveness — a direct buyer signal for our analytics. |
| DARTS | Departmental Automated Reporting & Tracking System | IDHS/SUPR's provider-reported, client-level treatment records — real IL provider-outcome data; not public (relationship-gated). |
| ODMAP | Overdose Detection Mapping Application Program | HIDTA-run real-time suspected-OD mapping with automated spike alerts; the strongest county-week early-warning join (MOU-gated). |
| NPDS | National Poison Data System | America's Poison Centers' near-real-time (~6-min) poisoning-surveillance feed; licensable acute-exposure signal. |
| NEMSIS | National EMS Information System | National EMS data system; source of naloxone-administration and OD-response signals (the best validated nonfatal proxy). |
| NWSS | National Wastewater Surveillance System | CDC system whose drug-metabolite layer detects new/more-potent supply in a sewershed days ahead of ED/morgue. |
| PDMP | Prescription Drug Monitoring Program | State controlled-substance dispensing database; near-real-time but record-level access is regulatory-restricted (clinician/agency only). |
| NPPES / NPI | National Plan & Provider Enumeration System / National Provider Identifier | The master US provider registry and key — the load-bearing provider join for the fabric (V1 retires Mar 2026 → V2). |
| ARCOS | Automation of Reports & Consolidated Orders System | DEA's controlled-substance distribution data; the WaPo litigation release gives transaction-level pill flows 2006–2014. |
| TEDS / TEDS-D | Treatment Episode Data Set (Discharges) | SAMHSA admissions/discharge records; the closest open thing to outcome data, but state-level only. |
| HEDIS | Healthcare Effectiveness Data & Information Set | NCQA's claims-based quality measures (incl. IET, FUA, POD) — the payer-trusted "what-works" standard. |
| POD | Pharmacotherapy for Opioid Use Disorder | HEDIS measure: % of OUD pharmacotherapy events lasting ≥180 days — the best claims-based MOUD-retention metric. |
| IET / FUA | Initiation & Engagement / Follow-Up After ED | HEDIS SUD process measures (treatment initiation/engagement; follow-up after an SUD-related ED visit). |
| T-MSIS / TAF | Transformed Medicaid Statistical Information System (Analytic Files) | CMS national Medicaid claims; the gated path to provider-level IL outcomes via ResDAC. |
| APCD | All-Payer Claims Database | State-held multi-payer claims (~20+ states); a Tier-3 commercial/DUA route to claims-based outcomes. |
| SUD / OUD | Substance Use Disorder / Opioid Use Disorder | The clinical conditions treated; OUD is the opioid-specific subset and our primary focus. |
| LTV | Lifetime Value | Per-patient revenue over the relationship; integrating BH lifts it ~2.6× annual / ~4.9× lifetime over MOUD-only. |
| FIPS | Federal Information Processing Standards (county code) | The 5-digit county code that is the primary spine of the data fabric. |
5 · How we work (methodology notes)
Build and test on synthetic data first
Before touching any real records, we build and test the whole system end to end on synthetic (computer-generated) patient data from Synthea and RTI SynthPop. This lowers engineering risk, lets us demo to partners, and keeps us clear of the strict privacy rules (42 CFR Part 2) during development.
Checking our predictions against reality
Illinois policy changes give us reliable, real-world cause-and-effect results that do double duty: a useful product on their own, and the answer key we test our prediction engine against. The claim we can stand behind — "we predicted how a result would carry over to a new area, and we were right" — is provable against these known outcomes.
Saying when a prediction doesn't apply
Every prediction comes with a confidence level. When the math says a result from one place won't reliably carry over to another, the product says so plainly — "collect local data first" — rather than overstate its confidence. We'd rather be clear about the limits than overclaim.
Caveats and data hygiene
Provider counts and access estimates point in the right direction but aren't exact (the CDC county maps are summarized from licensed IQVIA data, not raw records). In CDC WONDER, any county with fewer than 10 deaths is hidden to protect privacy. All links were checked live on 2026-06-25. Source-note identifiers marked verify come from expert recall and should be double-checked before being cited outside this document. Any financial figure on this site is preliminary until confirmed.
The clinical foundation — Two Dreams plus Dr. Barthwell — opens the door. What makes the advantage durable is the rest: a single connected dataset, an early-warning surge layer, and a transparent, evidence-backed methodology. Together they power our county-targeting today and a growing analytics business over time. See Deployment Targeting for the data in action, and Year 1·2·5+ for how it scales.