dental data, readable by the machine
the age of ai is an age of apis. an intelligence is only as smart as the data it can read, and dentistry runs on software from a time that never asked the question. how to make that data readable without ever exposing a patient's name.
the api age
every era has a language it thinks in. the industrial age thought in machines and labor hours, the early internet in pages and clicks. the era we live in now thinks in data, and it demands that the data be machine-readable. an artificial intelligence is never smarter than what it finds at a structured interface. give it clean, normalized data through an api and it becomes a multiplier for almost everything a business does; turn it loose on a sprawling, sealed-off system and it stays an expensive toy. the api is not a technical footnote to this era. it is the precondition for it.
and that is exactly where the tension sits. we already live in a time that takes data mining, process optimization and automation for granted, where everything is meant to move fast and clean, ideally without a human typing anything at all. dentistry, of all industries, the one that could use these tools most, runs on systems built for the opposite: for a person at a keyboard, reading a number off one screen and entering it into the next.
software from another time
that the systems are this way is not a failing, it is history. their software comes from a world that never needed anything else. when the big lab and practice programs were written, digitization meant a record card moving into a database and an invoice coming out of a dot-matrix printer in the evening. that is what they were built for, and they still do it reliably today. the idea that one day a machine would want to read that same data in real time, compare it against a thousand others and optimize entire processes from it sat so far outside the imaginable that it never entered a single architectural decision.
a handful of programs still own the market, maybe five on the lab side and another five in the practices, and each is its own world with its own database and its own logic. one of the most common lab systems keeps its data in over three hundred tables with no fixed links between them and stores a date as a ten-character string. that was perfectly fine twenty years ago, because a person sat in front of it who knew what was meant. to a machine it says nothing.
how far out of step this is becomes clear from the people who work with it every day. two managing directors of large dental labs told me the same story, independently of each other: their staff types every order twice. the practice calls or sends a fax, someone reads the order off the page and enters it into the lab system, then turns to the next screen and types the same thing a second time into our software. every day, by hand, into two databases that sit a few meters apart and still know nothing about each other. when two people running different businesses bring you almost the same complaint word for word, it stops being an anecdote and becomes a pattern. there is no pipe between the systems, so a human becomes the pipe.
what the machine needs
before a machine can do anything with this data, it has to speak a common language first. that is the real, unglamorous work, and it has a name: normalization. each of the old systems gets a translator that turns its quirks into a single, stable model. an order then means the same thing everywhere, no matter which program it originally came from, and anyone who wants to read the data sees only that one clean shape instead of the nested tables behind it. this is the layer i am building; it is called denta.
most of that work sounds like trivia and matters anyway. dates have to be unified, amounts move from decimals into whole cents so nothing gets rounded in the math, and every identifier gets a unique, collision-free form. there are industry exchange formats, such as eLABZ for the order between practice and lab, but they mostly stop at order intake and say nothing about what happens afterward inside the lab. the real payoff lies past all of it anyway: only once "Müller", "MÜLLER" and "müller" reliably collapse to the same thing can you link it, analyze it, or hand it to an ai.
the patient the machine never sees
here a problem enters that weighs heavier in medicine than almost anywhere else. a dental lab's data hangs on patients, and patient data is specially protected. breaching medical confidentiality is a criminal offense in germany, and the gdpr treats health data as its own, especially sensitive category. an artificial intelligence meant to analyze that data must never get to see a patient's actual name in the first place.
the comfortable answer, that one will simply be careful, is no answer at all. a system that handles names can also lose them, and a promise has never protected anyone. the only dependable solution is for the layer not to know the names to begin with. that is what pseudonymization does when you take it seriously, and it should work in several stages.
the first stage is refusal. the translator that taps an old system is given a fixed list of forbidden fields, among them name, address, date of birth, insurance number and bank details, and every query is checked against that list before it ever reaches the database. what is there is never read. you cannot leak something you never looked at.
the second stage turns what you genuinely need as identity into a pseudonym. you have to be able to recognize that three orders belong to the same patient without knowing their name. so the name is turned into a stable token, always the same for the same person, with no way back.
the mechanics
technically this token is a keyed hash, not encryption, and the difference decides everything. encryption can be undone with the right key; a hash cannot. you take the normalized name, append the field type and a secret value that is random for each lab, and run the whole thing through a proven standard function that produces a fixed string. "erika musterfrau" becomes something like pat_a4f9b2c1, and tomorrow and a year from now the same name produces exactly that string again.
three properties fall out almost as a side effect. because the procedure always yields the same result, you can link on the token like normal, without ever seeing a name. because the field type goes into the calculation, the same text as a name produces a different token than the same text as a city, so nothing can quietly be joined across fields. and because the secret value differs for each lab, one business's tokens are simply meaningless in another's data space, even if one of them were to leak.
above this sit further stages, for the cases that are harder to catch. a name a technician typed into a note somewhere sits in no fixed column and needs a recognition that finds it in free text. especially sensitive content can additionally be encrypted, with keys that stay at the lab, so the platform itself sees only unreadable noise. the theoretical maximum would be that the data never leaves the lab's own device and an ai works only with proofs about it, without ever touching it; i have written about that approach elsewhere at length. but it carries a weight of effort and friction today that a pipe handling thousands of orders a day cannot yet bear. pseudonymization is therefore not the last word, but it is the strongest protection you can get into a working lab today.
what becomes possible
once the data is readable and the patient inside it is protected, what is possible shifts. the double entry disappears, because an order is created only once and arrives on its own wherever it is needed. the practice no longer has to call to learn the status of a job; it sees in real time whether the crown is still in the furnace or already on its way out. and on the level above that, the thing this whole era is actually about finally becomes possible: a lab can measure its turnaround times and hold them against the market, can see which step orders pile up at, can analyze rework and material consumption without anyone assembling numbers by hand in the evening. the machine takes over exactly what people still spend their evenings on today.
the limits
so much for the direction; it only stays honest if you also say where it ends. a pseudonym is no magic trick that makes data anonymous. as long as the mapping exists somewhere, the data stays personal in the legal sense and re-solvable in principle, and research shows a person can be traced back from just a handful of attributes. the token protects the name, it does not make the dataset anonymous, and confusing the two builds exactly the false security other industries have already failed on.
just as honest is that the old systems underneath still hold cleartext names. the layer does not read them, but it does not delete them either, and responsibility for that stays with the lab. and free text, that field people write anything at all into, remains the most vulnerable spot, because no automatic recognition is ever a hundred percent. none of this argues against the path. it only argues against making it sound prettier than it is.
the point
in the end this is not a story about cryptography, and not even one about dentistry in particular. it is the story of an industry whose tools stayed behind in another time while the world around it learned to think in real time and in machines. artificial intelligence does not wait for old software to catch up. it rewards anyone who makes their data readable and walks past everyone else.
the first step is always the same. you give the data a common language, you take the patient's name out before anything gets to see it, and you build that guarantee into the architecture instead of writing it into a privacy policy. then, and only then, can the machine read.
references
- VDDS / KZBV. (2023). eLABZ: Elektronischer Datenträgeraustausch Zahntechnik, V4.5.
- European Parliament. (2016). General Data Protection Regulation (GDPR), Art. 4(5) and Art. 9.
- Strafgesetzbuch (StGB), § 203: Verletzung von Privatgeheimnissen.
- Krawczyk, H., Bellare, M., Canetti, R. (1997). HMAC: Keyed-Hashing for Message Authentication. RFC 2104.
- Chen, L. (2009). Recommendation for Key Derivation Using Pseudorandom Functions. NIST SP 800-108.
- Rocher, L., Hendrickx, J.M., de Montjoye, Y.-A. (2019). Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications, 10, 3069.
- Microsoft. (2024). Presidio: Data Protection and De-identification SDK.
last updated: Jun 2026