Dec 2023·15 min·Zero-Knowledge SystemsResearch

verifiable health intelligence

personalized health recommendations. your data never leaves your device. not a privacy policy. math.

the origin

it started with a glucose app i almost trusted.

i track everything. not out of obsession—out of curiosity. continuous glucose for over a thousand days. sleep architecture every night. HRV trends, training load, recovery metrics, macro composition of nearly every meal. i'm not trying to optimize myself into a machine. i just want to understand how my body actually works.

it's a strange kind of intimacy, this data. more revealing than a diary. three years of glucose readings tell you when i sleep, when i eat, when i'm stressed, when i'm sick. they map my metabolism with a precision that feels almost uncomfortable when i think about it.

last month, i came within one click of giving it all away.

the app was beautiful. the promise was compelling: "upload your CGM data and get personalized meal recommendations based on your glucose responses." finally—an AI that could correlate my spikes with specific foods, learn my metabolic patterns, tell me which meals actually work for me.

my finger hovered over "connect account."

then i read the privacy policy.

the company had been acquired twice in eighteen months. the current owner was a data aggregator i'd never heard of. their "anonymization" process was vague. their data retention was indefinite. and buried in paragraph 47: they reserved the right to share "de-identified" data with "research partners."

i closed the app.

here's the thing: i build AI systems. i know exactly how valuable this data is—and i know exactly how fragile the promises around it are. the startup that swears to protect your privacy today gets acquired tomorrow by a company with different incentives. the encryption that seems unbreakable sits on servers with keys that someone, somewhere, can access.

i have the kind of longitudinal health data that would make personalized AI genuinely powerful. and i won't give it to anyone.

if i won't share my data with the systems i'd build, why would anyone else?

the bigger picture

here's what happened in 2024:

276,775,457 healthcare records breached — 81% of the US population
190 million people exposed in the change healthcare ransomware attack alone
$10.1 million average cost per breach incident
725 large breaches (500+ records) reported to HHS

this isn't an outlier year. it's the trend.

the pattern is simple: centralized data stores are honeypots. the more valuable the data, the bigger the target. health data is worth 50x more than financial data on black markets.

every health AI company follows the same playbook: aggregate user data, store it centrally, run models against it, promise they'll keep it secure. and then, inevitably, they don't. not because they're careless—because the architecture is fundamentally vulnerable.

you cannot secure what you collect. the only guaranteed way to protect health data is to never have it in the first place.

the insight

traditional approach to health AI: aggregate data centrally, run models against it, promise security.

this assumes you need to see the data to reason about it.

you don't.

zero-knowledge proofs let you prove a statement is true without revealing why it's true. the math is sound—cryptographically verifiable. the proof cannot be faked. but it reveals nothing beyond the specific claim.

what you want to prove	what you reveal	what stays private
"i'm over 18"	true/false	your actual birthdate
"my A1C is in healthy range"	true/false	your exact A1C value
"i have no contraindications for Drug X"	true/false	your complete medical history
"my genetic risk score qualifies me for this study"	true/false	your genome
"my average glucose response to carbs is normal"	true/false	every meal and glucose reading

the verification is absolute. the privacy is absolute. and the data never crosses the device boundary.

this is not theoretical. zero-knowledge systems are in production today, processing billions of dollars in cryptocurrency transactions, proving identity for financial services, verifying credentials for enterprise security.

healthcare just hasn't adopted it yet.

the science

understanding why zero-knowledge proofs work—and why they're hard—requires understanding the underlying mathematics.

what makes a proof "zero-knowledge"?

a zero-knowledge proof system must satisfy three properties:

completeness — if the statement is true and both parties follow the protocol, the verifier will be convinced. if my A1C really is below 5.7%, the proof will verify.

soundness — if the statement is false, no cheating prover can convince the verifier (except with negligible probability). i cannot fake a proof that my A1C is healthy when it isn't.

zero-knowledge — if the statement is true, the verifier learns nothing beyond this fact. the verifier cannot extract my actual A1C value, my glucose history, or any other information from the proof.

the mathematical magic is in the third property. the proof convinces you something is true while revealing literally nothing about why it's true.

the evolution of proving systems

zero-knowledge proofs have evolved through several generations:

groth16 (2016) — proofs are tiny (~200 bytes), verification fast (~3ms). tradeoff: every circuit requires its own trusted setup ceremony. powers zcash.

PLONK (2019) — universal setup. one ceremony works for all circuits up to a certain size. eliminates the per-circuit trust problem. proof sizes larger (~400 bytes).

STARKs (2018) — eliminated trusted setup entirely. uses only collision-resistant hash functions. plausibly post-quantum secure. cost: proof sizes 50-100 KB, verification 10-50ms.

the re-identification problem

zero-knowledge proofs solve a problem that traditional anonymization cannot.

the evidence is damning:

99.98% of americans could be re-identified using just 15 demographic attributes
2016: journalists re-identified politicians in an "anonymized" browsing history of 3 million german citizens
australia: "de-identified" medical records for 10% of the population re-identified within 6 weeks

the fundamental problem: anonymization tries to remove identifying information while preserving utility. but the more useful the data, the more it can be linked to individuals. this is a mathematical inevitability.

zero-knowledge proofs sidestep this entirely. there's no anonymized dataset to re-identify because there's no dataset at all—just proofs about properties that reveal nothing beyond the specific claim.

the hard problems

building a system that proves health claims without revealing health data. these are the research questions:

1. circuit complexity — the cryptography isn't the bottleneck—expressing health logic as arithmetic circuits is. a simple threshold comparison is manageable. multi-variable risk scores, temporal pattern analysis, conditional logic based on age, sex, genetic markers, medication history—each layer adds constraints. more constraints mean larger circuits. larger circuits mean longer proving times.

2. proving time — current state: a complex health verification takes 15-30 seconds on mobile hardware. for real-time applications, this needs to drop below 2 seconds.

3. circuit expressiveness — not all health computations can be efficiently expressed as arithmetic circuits. machine learning models require massive circuit sizes. a simple neural network might require millions of constraints.

4. data freshness — proofs attest to data at a specific point in time. a proof that "my A1C was below 5.7% when i generated this proof" doesn't guarantee current status.

5. device security — the architecture assumes the user's device is secure. if malware compromises the device, it could access health data before proof generation. zero-knowledge proofs provide network-level privacy, not endpoint security.

these are open questions. the optimal solutions aren't known. that's what makes this research.

the architecture

the system maintains a strict privacy boundary between the user's device and external services.

on the user's device:

all sensitive health data stays local (glucose, HRV, sleep metrics)
ZK circuit runs locally with raw data + claim as input
output: a compact cryptographic proof

what crosses the boundary:

only the proof (nothing else)

on the verifier side:

receives proof + public inputs
verifies mathematical validity in milliseconds
confirms claim is true
never sees underlying health data

the pipeline breaks into three stages:

stage 1 - circuit compilation — health logic (risk scores, eligibility checks, pattern analysis) is expressed as arithmetic circuits using circom 2.0. these circuits define the constraints that must be satisfied for a claim to be true.

stage 2 - proof generation — the user's device runs the circuit locally with their private health data as input. this produces a cryptographic proof that certifies the computation was performed correctly without revealing the inputs.

stage 3 - verification — the proof is sent to a verifier. the verifier checks the proof's mathematical validity in milliseconds. if valid, the claim is guaranteed true. but the verifier learns nothing about the underlying data.

proving system selection

system	proof size	proving time	verification	trust assumption
groth16	~200 bytes	slow	3ms	trusted setup per circuit
PLONK	~400 bytes	medium	5ms	universal trusted setup
STARKs	~50-100 KB	fast	10-50ms	no trusted setup

for health applications, i'm exploring PLONK as the primary system:

universal setup eliminates per-circuit trust ceremonies
proof size remains practical for mobile bandwidth
verification fast enough for real-time applications
tooling is mature (snarkjs, circom ecosystem)

the stack

component	technology	why
circuit language	circom 2.0	mature tooling, active community
proving system	PLONK (snarkjs)	universal setup, practical proof sizes
on-chain verification	solidity	composability with decentralized identity
client runtime	rust → WASM	cross-platform, near-native performance
data schema	FHIR R4	healthcare interoperability standard

where it stands

iterating in public. system is partially built. the challenges are defining the research agenda.

what's working

proof-of-concept circuits for basic health claims (threshold comparisons, range checks, eligibility verification)
WASM compilation working in browser and mobile environments
verification contracts deployed and tested on ethereum testnets
integration with FHIR-formatted health data exports

what's hard

1. proving time — a complex health verification takes 15-30 seconds on mid-range smartphones. target: sub-2-second for interactive applications.

the path forward:

circuit optimization: reducing constraint counts through algebraic simplification
recursive composition: breaking complex proofs into smaller sub-proofs
hardware acceleration: leveraging GPU/TPU for finite field arithmetic

2. circuit design complexity — the optimal circuit architectures for health-specific computations are not known. each domain (cardiovascular risk, metabolic health, genetic analysis) has different constraint profiles.

3. adoption barriers — healthcare providers, insurers, and regulators must understand and trust this technology. explaining "your privacy is guaranteed by mathematics" to non-technical stakeholders is a challenge.

the numbers

current proving time: 15-30 seconds for complex health verification on mobile.

target: <2 seconds for interactive applications.

why this matters for regulation

the architectural properties of ZK proofs align with privacy regulation by design:

regulation	requirement	how ZK satisfies it
HIPAA	minimum necessary disclosure	system can only prove specific claims
GDPR Article 5	data minimization	no personal data leaves device
GDPR Article 17	right to erasure	nothing to erase—service never receives data
CCPA	consumer data control	user decides what to prove, when, to whom
FDA 21 CFR Part 11	electronic records integrity	proofs are cryptographically tamper-proof

traditional health AI achieves compliance through policy. zero-knowledge systems achieve compliance through architecture. the system is incapable of accessing raw health data because it never receives it.

what becomes possible

if you can prove health claims without revealing health data:

clinical trial eligibility screening — patients prove they meet inclusion criteria without exposing complete medical history. trials recruit faster, privacy preserved.

insurance underwriting — prove you qualify for certain rates without sharing every health detail. the insurer verifies eligibility without accessing raw data.

personalized AI without data upload — health AI runs recommendations based on verified characteristics without ever seeing the raw time-series data.

cross-border health data — european patient proves health status to US provider without GDPR violations. no personal data crosses jurisdictional boundaries—only proofs.

genetic privacy — prove you carry or don't carry specific risk alleles without anyone accessing your full genome.

limitations

technical

1. proving time — 15-30 seconds on mid-range mobile hardware. acceptable for infrequent verifications, impractical for real-time AI interactions.

2. circuit expressiveness — not all health computations can be efficiently expressed as arithmetic circuits. machine learning models require massive circuit sizes.

3. data freshness — proofs attest to data at a specific point in time. for time-sensitive applications, proof validity windows must be carefully designed.

4. device security assumption — zero-knowledge proofs provide network-level privacy, not endpoint security.

5. quantum vulnerability — current ZK systems rely on elliptic curve cryptography. a sufficiently powerful quantum computer could break these assumptions. STARKs are quantum-resistant but currently impractical for mobile.

ethical

the coercion risk — if proving health status becomes easy, it might become expected. an insurer might require ZK proofs before offering coverage. the technology that empowers voluntary disclosure could normalize mandatory verification.

selective disclosure and discrimination — a system that lets users prove "i don't have a pre-existing condition" enables exactly the discrimination that regulations were designed to prevent. the technology is neutral; the applications require guardrails.

the digital divide — zero-knowledge verification requires a capable smartphone, technical literacy to manage cryptographic keys, access to healthcare data in digital format. these requirements exclude vulnerable populations.

accountability — traditional health systems leave audit trails. ZK systems, by design, leave no trace. this is a feature for privacy but creates challenges for accountability.

what's next

now

circuit optimization — reducing constraint counts for common health claims
mobile performance profiling — systematic benchmarking across device classes
FHIR integration toolkit — libraries for converting FHIR resources to circuit-compatible formats
developer documentation — making ZK health verification accessible without cryptography expertise

later

hardware acceleration — leveraging mobile GPUs for finite field arithmetic
recursive proof composition — complex verifications from simpler sub-proofs
cross-platform SDK — native libraries for iOS and android
verifier network — decentralized infrastructure for proof verification

vision

ZK machine learning — proving that an ML model's prediction meets criteria without revealing the data or model weights
federated health intelligence — multiple users contributing ZK proofs to aggregate statistics without individual data leaving devices
post-quantum migration — transitioning to STARK-based systems as hardware improves

the point

the architecture of health AI determines its privacy guarantees. systems that aggregate data centrally will be breached—not because engineers are careless, but because centralized data stores are inherently vulnerable.

the 276 million breached healthcare records in 2024 are evidence of a structural problem, not an implementation failure.

zero-knowledge proofs offer a fundamentally different architecture: prove health claims without revealing health data. the mathematics is sound. the technology is deployed in production systems processing billions in value.

the challenges are real. proving times are too slow for interactive applications. circuit design requires specialized expertise. adoption requires educating stakeholders who've never encountered cryptographic proofs.

but these are engineering and education challenges, not fundamental barriers.

for the 81% of americans whose healthcare data was exposed in 2024—and for everyone who, like me, tracks their health obsessively but won't trust anyone with that data—verifiable health intelligence represents the only architecture that makes sense.

health AI can be both powerful and private. we just have to build it that way from the beginning.

references

Groth, J. (2016). On the Size of Pairing-Based Non-Interactive Arguments. EUROCRYPT 2016.
Ben-Sasson, E. et al. (2018). Scalable, Transparent, and Post-Quantum Secure Computational Integrity. IACR ePrint.
Gabizon, A. et al. (2019). PLONK: Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge. IACR ePrint.
iden3. (2024). snarkjs: zkSNARK Implementation in JavaScript & WASM.
iden3. (2024). Circom 2.0 Documentation.
HIPAA Journal. (2025). 2024 Healthcare Data Breach Report.
Rocher, L. et al. (2019). Estimating the success of re-identifications in incomplete datasets. Nature Communications, 10, 3069.
El Emam, K. et al. (2011). A Systematic Review of Re-Identification Attacks on Health Data. PLOS ONE, 6(12).
HL7 International. (2019). HL7 FHIR R4 Specification.
U.S. HHS. (2024). HIPAA Privacy Rule. 45 CFR Parts 160 and 164.
European Parliament. (2016). General Data Protection Regulation (GDPR).
Mssassi, S. & El Kalam, A.A. (2024). Blockchain Based Zero Knowledge Proof Protocol For Privacy Preserving Healthcare Data Sharing. JTIE.
Chen, Y. et al. (2022). Health-zkIDM: A Healthcare Identity System Based on Fabric Blockchain and Zero-Knowledge Proof. Sensors, 22(20).
Zerka, F. et al. (2024). A zero-knowledge proof federated learning on DLT for healthcare data. JPDC, 193.
Goldwasser, S. et al. (1989). The Knowledge Complexity of Interactive Proof Systems. SIAM Journal on Computing, 18(1).

last updated: Dec 2025