Skip to content
Dec 2023·21 minResearch

Verifiable Health Intelligence: Personalized Health Advice Without Exposing Your Data

Zero-knowledge proof systems for privacy-preserving health AI—enabling personalized recommendations without exposing sensitive user data. Because the best way to protect health data is to never expose it in the first place.

The Origin: The Data I Won't Share

I track everything. Not out of obsession—out of curiosity. Continuous glucose for over a thousand days. Sleep architecture every night. HRV trends, training load, recovery metrics, macro composition of nearly every meal. I'm not trying to optimize myself into a machine. I just want to understand how my body actually works—what makes me sharp, what makes me tired, what I can get away with and what I can't.

It's a strange kind of intimacy, this data. More revealing than a diary. Three years of glucose readings tell you when I sleep, when I eat, when I'm stressed, when I'm sick. They map my metabolism with a precision that feels almost uncomfortable when I think about it.

Last month, I came within one click of giving it all away.

The app was beautiful. The promise was compelling: "Upload your CGM data and get personalized meal recommendations based on your glucose responses." Finally—an AI that could correlate my spikes with specific foods, learn my metabolic patterns, tell me which meals actually work for me. Not generic advice. My data, making my nutrition smarter.

My finger hovered over "Connect Account."

Then I read the privacy policy.

The company had been acquired twice in eighteen months. The current owner was a data aggregator I'd never heard of. Their "anonymization" process was vague. Their data retention was indefinite. And buried in paragraph 47: they reserved the right to share "de-identified" data with "research partners."

I closed the app.

Here's the thing: I build AI systems. I know exactly how valuable this data is—and I know exactly how fragile the promises around it are. The startup that swears to protect your privacy today gets acquired tomorrow by a company with different incentives. The encryption that seems unbreakable sits on servers with keys that someone, somewhere, can access. The "anonymized" health data that can be re-identified with frightening accuracy when combined with other datasets.

I have the kind of longitudinal health data that would make personalized AI genuinely powerful. And I won't give it to anyone.

Not to a startup with a privacy policy that's one acquisition away from being meaningless. Not to a platform that stores my data encrypted but holds the keys. Not even to my own company—because the architecture that makes centralized health AI possible is the same architecture that makes it a honeypot.

The irony isn't lost on me: I want AI that understands my health deeply, but I won't trust any existing AI with the data that would make that possible.

If I won't share my data with the systems I'd build, why would anyone else?

The Thing Everyone Knows But Nobody Says

Here's what happened in 2024:

  • 276,775,457 healthcare records breached — 81% of the US population
  • 190 million people exposed in the Change Healthcare ransomware attack alone — 1 in 3 Americans
  • $10.1 million average cost per breach incident
  • 725 large breaches (500+ records) reported to HHS

This isn't an outlier year. It's the trend.

The pattern is simple: centralized data stores are honeypots. The more valuable the data, the bigger the target. Health data is worth 50x more than financial data on black markets.

Every health AI company follows the same playbook: aggregate user data, store it centrally, run models against it, promise they'll keep it secure. And then, inevitably, they don't. Not because they're careless—because the architecture is fundamentally vulnerable.

You cannot secure what you collect. The only guaranteed way to protect health data is to never have it in the first place.

The Insight: What If the Data Never Leaves Your Device?

The traditional approach to health AI is backwards. It assumes you need to see the data to reason about it. You don't.

Zero-knowledge proofs let you prove a statement is true without revealing why it's true. The math is sound—cryptographically verifiable. The proof cannot be faked. But it reveals nothing beyond the specific claim.

Consider what this means in practice:

What You Want to ProveWhat You RevealWhat Stays Private
"I'm over 18"True/FalseYour actual birthdate
"My A1C is in healthy range"True/FalseYour exact A1C value
"I have no contraindications for Drug X"True/FalseYour complete medical history
"My genetic risk score qualifies me for this study"True/FalseYour genome
"My average glucose response to carbs is normal"True/FalseEvery meal and glucose reading

The verification is absolute. The privacy is absolute. And the data never crosses the device boundary.

This is not theoretical. Zero-knowledge systems are in production today, processing billions of dollars in cryptocurrency transactions, proving identity for financial services, verifying credentials for enterprise security. The technology works.

Healthcare just hasn't adopted it yet.

Why This Is Hard: The Circuit Complexity Problem

The cryptography isn't the bottleneck—expressing health logic as arithmetic circuits is.

A zero-knowledge proof works by converting a computation into a mathematical constraint system. To prove "my A1C is below 5.7%", you need to encode that comparison as circuit gates—additions, multiplications, constraint checks. For simple threshold comparisons, this is manageable.

Real health intelligence is not simple.

Multi-variable risk scores. Temporal pattern analysis. Conditional logic based on age, sex, genetic markers, medication history. Each layer of complexity adds constraints. More constraints mean larger circuits. Larger circuits mean longer proving times.

Current state: a complex health verification takes 15-30 seconds on mobile hardware. For real-time applications, this needs to drop below 2 seconds.

This is the research frontier. Not "can we do zero-knowledge health AI"—we can. The question is "can we do it fast enough to be useful?"

Theoretical Framework: The Mathematics of Verifiable Privacy

Understanding why zero-knowledge proofs work—and why they're hard—requires understanding the underlying mathematics.

What Makes a Proof "Zero-Knowledge"?

A zero-knowledge proof system must satisfy three properties:

Completeness: If the statement is true and both parties follow the protocol, the verifier will be convinced. If my A1C really is below 5.7%, the proof will verify.

Soundness: If the statement is false, no cheating prover can convince the verifier (except with negligible probability). I cannot fake a proof that my A1C is healthy when it isn't.

Zero-Knowledge: If the statement is true, the verifier learns nothing beyond this fact. The verifier cannot extract my actual A1C value, my glucose history, or any other information from the proof.

The mathematical magic is in the third property. The proof convinces you something is true while revealing literally nothing about why it's true.

The Evolution of Proving Systems

Zero-knowledge proofs have evolved through several generations, each with different tradeoffs:

Groth16 (2016) Jens Groth's breakthrough paper introduced the most efficient zkSNARK to date:

  • Proofs are tiny: 3 elliptic curve elements, ~200 bytes
  • Verification: only 3 pairings (~3ms)
  • Tradeoff: every circuit requires its own trusted setup ceremony
  • Powers Zcash; remains the gold standard for proof size

PLONK (2019) Gabizon, Williamson, and Ciobotaru introduced universal setup:

  • One ceremony works for all circuits up to a certain size
  • Eliminates the per-circuit trust problem
  • Proof sizes larger (~400 bytes), verification slightly slower (~5ms)
  • Practical benefits enormous for systems deploying new circuits regularly

STARKs (2018) Ben-Sasson et al. eliminated trusted setup entirely:

  • Uses only collision-resistant hash functions (no elliptic curves)
  • Plausibly post-quantum secure
  • Cost: proof sizes 50-100 KB, verification 10-50ms
  • Currently prohibitive for mobile-first health applications

Arithmetic Circuits: Turning Health Logic into Math

The core challenge is expressing health computations as arithmetic circuits over finite fields.

Consider the simple claim "my average glucose over the last 30 days is below 120 mg/dL":

// Pseudocode for the circuit
signal private input glucose_readings[30];  // Private health data
signal output is_below_threshold;           // Public output

// Compute average
var sum = 0;
for (i = 0; i < 30; i++) {
    sum += glucose_readings[i];
}
var average = sum / 30;

// Check threshold (comparison becomes constraint)
is_below_threshold <== average < 120 ? 1 : 0;

This simple example requires ~50 constraints. A cardiovascular risk score incorporating blood pressure, cholesterol, smoking status, age, and diabetes status might require 10,000+ constraints. Each constraint adds to proving time.

The constraint explosion explains why complex health verifications take 15-30 seconds on mobile hardware. Optimizing these circuits—finding mathematically equivalent formulations with fewer constraints—is active research.

The Re-Identification Problem: Why "Anonymization" Fails

Zero-knowledge proofs solve a problem that traditional anonymization cannot: the re-identification attack.

The evidence is damning:

  • 99.98% of Americans could be re-identified using just 15 demographic attributes (Nature Communications)
  • 2016: Journalists re-identified politicians in an "anonymized" browsing history of 3 million German citizens
  • Australia: "De-identified" medical records for 10% of the population re-identified within 6 weeks

The fundamental problem: anonymization tries to remove identifying information while preserving utility. But the more useful the data, the more it can be linked to individuals. This is a mathematical inevitability, not a failure of implementation.

Zero-knowledge proofs sidestep this entirely. There's no anonymized dataset to re-identify because there's no dataset at all—just proofs about properties that reveal nothing beyond the specific claim.

Technical Architecture: How It Actually Works

The system maintains a strict privacy boundary between the user's device and external services:

On the user's device:

  • All sensitive health data stays local (glucose, HRV, sleep metrics)
  • ZK circuit runs locally with raw data + claim as input
  • Output: a compact cryptographic proof

What crosses the boundary:

  • Only the proof (nothing else)

On the verifier side:

  • Receives proof + public inputs
  • Verifies mathematical validity in milliseconds
  • Confirms claim is true
  • Never sees underlying health data

The verification is absolute. The privacy is absolute.

The pipeline breaks into three stages:

Stage 1 - Circuit Compilation: Health logic (risk scores, eligibility checks, pattern analysis) is expressed as arithmetic circuits using Circom 2.0. These circuits define the constraints that must be satisfied for a claim to be true.

Stage 2 - Proof Generation: The user's device runs the circuit locally with their private health data as input. This produces a cryptographic proof—a compact mathematical object that certifies the computation was performed correctly without revealing the inputs.

Stage 3 - Verification: The proof is sent to a verifier (could be an AI service, healthcare provider, clinical trial coordinator, insurance company). The verifier checks the proof's mathematical validity in milliseconds. If valid, the claim is guaranteed true. But the verifier learns nothing about the underlying data.

Proving System Selection

The choice of ZK proving system involves fundamental tradeoffs:

SystemProof SizeProving TimeVerificationTrust Assumption
Groth16~200 bytesSlow3msRequires trusted setup per circuit
PLONK~400 bytesMedium5msUniversal trusted setup (one-time)
STARKs~50-100 KBFast10-50msNo trusted setup required

For health applications, I'm exploring PLONK as the primary system:

  • Universal setup eliminates the need for per-circuit trust ceremonies
  • Proof size remains practical for mobile bandwidth constraints
  • Verification is fast enough for real-time applications
  • Tooling is mature (snarkjs, Circom ecosystem)

STARKs are compelling for their transparency (no trusted setup), but the proof sizes and verification times are still prohibitive for mobile-first health applications.

Technology Stack

ComponentChoiceRationale
Circuit LanguageCircom 2.0Mature tooling, active community, extensive libraries
Proving SystemPLONK (snarkjs)Universal setup, practical proof sizes, fast verification
On-chain VerificationSolidityEnables composability with decentralized identity, data marketplaces
Client RuntimeRust → WASMCross-platform, near-native performance, runs in browser
Data SchemaFHIR R4Healthcare interoperability standard, works with existing systems

The full stack is designed for pragmatic deployment—not bleeding-edge research, but proven technologies assembled in a novel way.

Research Challenges and Current Limitations

Completed:

  • Proof-of-concept circuits for basic health claims (threshold comparisons, range checks, eligibility verification)
  • WASM compilation working in browser and mobile environments
  • Verification contracts deployed and tested on Ethereum testnets
  • Integration with FHIR-formatted health data exports

In Progress:

  • Circuit optimization for complex multi-variable health computations
  • Trusted setup ceremony design for universal PLONK parameters
  • Performance benchmarking across mobile device classes (flagship vs. mid-range)

Key Challenge: Proving Time

A complex health verification currently takes 15-30 seconds on a mid-range smartphone. This is fine for infrequent verifications (proving eligibility for a clinical trial), but impractical for real-time AI interactions.

Target: sub-2-second proving for interactive applications.

The path forward involves:

  1. Circuit optimization: Reducing constraint counts through algebraic simplification
  2. Recursive composition: Breaking complex proofs into smaller sub-proofs that can be verified in parallel
  3. Hardware acceleration: Leveraging GPU/TPU for finite field arithmetic (the bottleneck in proof generation)

This isn't just an engineering problem—it's a research problem. The optimal circuit architectures for health-specific computations are not known. Each domain (cardiovascular risk, metabolic health, genetic analysis) has different constraint profiles that require custom optimization.

Why Privacy-by-Architecture Matters for Regulation

The architectural properties of ZK proofs align remarkably well with privacy regulation—not by chance, but by design:

RegulationRequirementHow ZK Satisfies It
HIPAAMinimum necessary disclosureSystem can only prove specific claims, nothing more
GDPR Article 5Data minimization principleNo personal data leaves user's device
GDPR Article 17Right to erasure ("right to be forgotten")Nothing to erase—service never receives data
CCPAConsumer data controlUser decides what to prove, when, to whom
FDA 21 CFR Part 11Electronic records integrityProofs are cryptographically tamper-proof

Traditional health AI achieves compliance through policy: privacy policies, data processing agreements, access controls, audit logs. These are necessary but not sufficient—they protect against internal misuse, but cannot prevent external breaches.

Zero-knowledge systems achieve compliance through architecture. The system is incapable of accessing raw health data because it never receives it. Compliance becomes a mathematical property, not a promise.

The Regulatory Opportunity: What Becomes Possible

If you can prove health claims without revealing health data, entire categories of applications become feasible that are currently impossible due to privacy constraints:

Clinical Trial Eligibility Screening: Patients prove they meet inclusion criteria without exposing their complete medical history. Trials recruit faster, privacy is preserved, and patients maintain control.

Insurance Underwriting: Prove you qualify for certain rates without sharing every health detail. The insurer verifies eligibility claims without accessing the raw data that could later be used to deny coverage.

Personalized AI Without Data Upload: Health AI models run recommendations based on verified characteristics ("glucose variability is high," "HRV trend is declining") without ever seeing the raw time-series data.

Cross-Border Health Data: European patient proves health status to US provider without GDPR violations, because no personal data crosses jurisdictional boundaries—only cryptographic proofs.

Genetic Privacy: Prove you carry or don't carry specific risk alleles without sequencing companies, researchers, or services ever accessing your full genome.

These aren't hypotheticals—they're the direct implications of proof systems that separate verification from disclosure.

Why I'm Building This

I started this research because I wanted to use AI on my own health data without trusting anyone with that data. The technology exists to make this possible. Zero-knowledge proofs are not theoretical—they're deployed in production systems processing billions in value.

Healthcare is the next frontier. Not because it's technically harder (it's not—the math is the same), but because the incentive structures are broken. Health data aggregators have no incentive to adopt privacy-preserving architectures. Their business model depends on data access.

The path forward requires building systems where data privacy is the default, not a policy. Where users don't have to trust the service, because the architecture makes betrayal impossible.

The 276 million breached records in 2024 aren't just a statistic. They're 276 million reasons to build systems where the data never leaves the user's device in the first place.

For someone who tracks everything but trusts no one with that data, verifiable health intelligence isn't a nice-to-have feature. It's the only architecture that makes sense.

This is early-stage research. The circuits are slow. The tooling is rough. But the direction is clear: health AI can be both powerful and private. We just have to build it that way from the beginning.

Limitations and Ethical Considerations

Current Technical Limitations

1. Proving Time: A complex health verification takes 15-30 seconds on mid-range mobile hardware. This is acceptable for infrequent verifications (clinical trial eligibility) but impractical for real-time AI interactions. The sub-2-second target requires fundamental advances in circuit optimization or hardware acceleration.

2. Circuit Expressiveness: Not all health computations can be efficiently expressed as arithmetic circuits. Machine learning models, in particular, require massive circuit sizes. A simple neural network for health risk prediction might require millions of constraints, making proof generation prohibitively slow.

3. Data Freshness: Proofs attest to data at a specific point in time. A proof that "my A1C was below 5.7% when I generated this proof" doesn't guarantee current status. For time-sensitive applications, proof validity windows must be carefully designed.

4. Device Security Assumption: The architecture assumes the user's device is secure. If malware compromises the device, it could access health data before proof generation or generate proofs from fabricated data. Zero-knowledge proofs provide network-level privacy, not endpoint security.

5. Adoption Barriers: Healthcare providers, insurers, and regulators must understand and trust this technology. Explaining "your privacy is guaranteed by mathematics" to non-technical stakeholders is a significant adoption challenge.

Ethical Considerations

The Coercion Risk: If proving health status becomes easy, it might become expected. An insurer might require ZK proofs of health metrics before offering coverage. An employer might request proof of vaccination status. The technology that empowers voluntary privacy disclosure could, paradoxically, normalize mandatory health verification.

Mitigations:

  • Design proof systems that require explicit user consent for each verification
  • Support selective disclosure—prove only what's necessary, nothing more
  • Advocate for regulatory limits on what health proofs can be required

Selective Disclosure and Discrimination: Zero-knowledge proofs allow proving specific claims. But which claims? A system that lets users prove "I don't have a pre-existing condition" enables exactly the discrimination that regulations like the ACA were designed to prevent. The technology is neutral; the applications require ethical guardrails.

The Digital Divide: Zero-knowledge health verification requires:

  • A smartphone capable of running WASM
  • Technical literacy to manage cryptographic keys
  • Access to healthcare data in digital format (FHIR exports)

These requirements exclude precisely the populations most vulnerable to health data exploitation. Privacy-preserving technology that only works for the privileged isn't a solution—it's a new form of inequity.

Accountability and Auditability: Traditional health data systems leave audit trails. ZK systems, by design, leave no trace of what was proven or when. This is a feature for privacy but creates challenges for accountability. If a clinical trial later discovers fraud, there's no data trail to investigate.

Quantum Vulnerability: Current ZK systems (Groth16, PLONK) rely on elliptic curve cryptography. A sufficiently powerful quantum computer could break these assumptions, retroactively compromising all proofs ever generated. STARKs are quantum-resistant but currently impractical for mobile deployment. The transition timeline is uncertain.

Future Directions

Next (Active Development)

  • Circuit optimization: Reducing constraint counts for common health claims (threshold checks, range proofs, aggregate statistics) through algebraic simplification and custom gadgets
  • Mobile performance profiling: Systematic benchmarking across device classes to understand the practical deployment envelope
  • FHIR integration toolkit: Libraries for converting standard FHIR resources to circuit-compatible formats
  • Developer documentation: Making ZK health verification accessible to health app developers without requiring cryptography expertise

Later (Roadmap)

  • Hardware acceleration: Leveraging mobile GPUs for finite field arithmetic to achieve sub-2-second proving times
  • Recursive proof composition: Enabling complex health verifications to be built from simpler sub-proofs, reducing per-claim proving time
  • Cross-platform SDK: Native libraries for iOS and Android with consistent APIs
  • Verifier network: Decentralized infrastructure for proof verification, eliminating single points of trust

Vision (Research Direction)

  • ZK Machine Learning: Proving that an ML model's prediction about your health data meets certain criteria without revealing the data or the model's internal weights. This enables privacy-preserving AI recommendations.
  • Federated Health Intelligence: Multiple users contributing ZK proofs to aggregate health statistics. Researchers learn population-level insights without any individual's data leaving their device.
  • Post-Quantum Migration: Transitioning to STARK-based systems as hardware improves, ensuring long-term privacy guarantees against quantum attacks.
  • Regulatory Sandboxes: Partnering with health regulators to establish frameworks for ZK-verified health claims, enabling practical deployment in clinical and insurance contexts.

Conclusion

The architecture of health AI determines its privacy guarantees. Systems that aggregate data centrally will be breached—not because engineers are careless, but because centralized data stores are inherently vulnerable. The 276 million breached healthcare records in 2024 are evidence of a structural problem, not an implementation failure.

Zero-knowledge proofs offer a fundamentally different architecture: prove health claims without revealing health data. The mathematics is sound. The technology is deployed in production systems processing billions in value. The application to healthcare is the next frontier.

The challenges are real. Proving times are too slow for interactive applications. Circuit design requires specialized expertise. Adoption requires educating stakeholders who've never encountered cryptographic proofs. But these are engineering and education challenges, not fundamental barriers.

For the 81% of Americans whose healthcare data was exposed in 2024—and for everyone who, like me, tracks their health obsessively but won't trust anyone with that data—verifiable health intelligence represents the only architecture that makes sense.

Health AI can be both powerful and private. The technology exists. We just have to build it that way from the beginning.

References

  1. Groth, J. (2016). On the Size of Pairing-Based Non-Interactive Arguments. Advances in Cryptology – EUROCRYPT 2016. https://eprint.iacr.org/2016/260.pdf

  2. Ben-Sasson, E., Bentov, I., Horesh, Y., & Riabzev, M. (2018). Scalable, Transparent, and Post-Quantum Secure Computational Integrity. IACR Cryptology ePrint Archive. https://eprint.iacr.org/2018/046.pdf

  3. Gabizon, A., Williamson, Z.J., & Ciobotaru, O. (2019). PLONK: Permutations over Lagrange-bases for Oecumenical Noninteractive arguments of Knowledge. IACR Cryptology ePrint Archive. https://eprint.iacr.org/2019/953

  4. iden3. (2024). snarkjs: zkSNARK Implementation in JavaScript & WASM. GitHub. https://github.com/iden3/snarkjs

  5. iden3. (2024). Circom 2.0 Documentation. https://docs.circom.io/

  6. HIPAA Journal. (2025). 2024 Healthcare Data Breach Report. https://www.hipaajournal.com/2024-healthcare-data-breach-report/

  7. HIPAA Journal. (2025). The Biggest Healthcare Data Breaches of 2024. https://www.hipaajournal.com/biggest-healthcare-data-breaches-2024/

  8. Rocher, L., Hendrickx, J.M., & de Montjoye, Y.A. (2019). Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications, 10, 3069. DOI: 10.1038/s41467-019-10933-3

  9. El Emam, K. et al. (2011). A Systematic Review of Re-Identification Attacks on Health Data. PLOS ONE, 6(12), e28071. DOI: 10.1371/journal.pone.0028071

  10. HL7 International. (2019). HL7 FHIR R4 Specification. https://www.hl7.org/fhir/R4/

  11. U.S. Department of Health and Human Services. (2024). HIPAA Privacy Rule. 45 CFR Parts 160 and 164.

  12. European Parliament. (2016). General Data Protection Regulation (GDPR). Regulation (EU) 2016/679.

  13. Mssassi, S. & El Kalam, A.A. (2024). Blockchain Based Zero Knowledge Proof Protocol For Privacy Preserving Healthcare Data Sharing. Journal of Technology Informatics and Engineering.

  14. Chen, Y. et al. (2022). Health-zkIDM: A Healthcare Identity System Based on Fabric Blockchain and Zero-Knowledge Proof. Sensors, 22(20), 7716. DOI: 10.3390/s22207716

  15. Zerka, F. et al. (2024). A zero-knowledge proof federated learning on DLT for healthcare data. Journal of Parallel and Distributed Computing, 193, 104957. DOI: 10.1016/j.jpdc.2024.104957

  16. Sun, X. et al. (2024). Integrating blockchain and ZK-ROLLUP for efficient healthcare data privacy protection system via IPFS. Scientific Reports, 14, 11638. DOI: 10.1038/s41598-024-62374-4

  17. Goldwasser, S., Micali, S., & Rackoff, C. (1989). The Knowledge Complexity of Interactive Proof Systems. SIAM Journal on Computing, 18(1), 186-208.

  18. Bowe, S., Gabizon, A., & Miers, I. (2017). Scalable Multi-party Computation for zk-SNARK Parameters in the Random Beacon Model. IACR Cryptology ePrint Archive. https://eprint.iacr.org/2017/1050

  19. StarkWare Industries. (2024). STARK Technology Overview. https://starkware.co/stark/

  20. Ethereum Foundation. (2024). ZK-Rollups. https://ethereum.org/en/developers/docs/scaling/zk-rollups/

  21. Sasson, E.B. et al. (2014). Zerocash: Decentralized Anonymous Payments from Bitcoin. IEEE Symposium on Security and Privacy.

  22. FDA. (2024). 21 CFR Part 11: Electronic Records; Electronic Signatures. U.S. Food and Drug Administration.

  23. Narayanan, A. & Shmatikov, V. (2008). Robust De-anonymization of Large Sparse Datasets. IEEE Symposium on Security and Privacy, 111-125.

  24. RISC Zero. (2024). About STARKs. RISC Zero Developer Documentation. https://dev.risczero.com/reference-docs/about-starks