athletic data protocol
you save workouts from ten different creators. none of them talk to each other. a unified format that makes any workout comparable and trackable.
the origin
it started with chris heria's weighted muscle-up video.
i've been training seriously for over a decade. not casually—seriously. the kind where you track progressive overload in spreadsheets, obsess over periodization, and know the difference between a romanian deadlift and a stiff-leg deadlift at 5:30am when your brain hasn't fully woken up.
tuesday night, 11pm. bookmarking heria's new weighted calisthenics video. perfect for tomorrow.
next morning, i opened my training app to plan the week.
and i hit the wall i've hit a hundred times before.
the heria bookmark lived on youtube. my training log lived in a spreadsheet. my mobility routine was saved on instagram—somewhere in a folder i'd named "flexibility stuff" six months ago. the stretching sequence from yoga body was buried in a different app. and that HIIT finisher i'd saved on tiktok? gone. lost in an endless scroll i'd never reconstruct.
four apps. three browser tabs. one spreadsheet. all containing workouts i genuinely wanted to do. none of them talking to each other.
but here's what really broke me: even if i did manage to piece together a plan, i still wouldn't know where i actually stand.
heria's video showed weighted muscle-ups with a 20kg vest. i can't do a single muscle-up—weighted or not. so what's my path from here to there?
- which progressions have i already tried?
- where did i plateau?
- what worked before?
that data exists somewhere across three different apps. good luck finding it when you need it.
the irony: i've accumulated hundreds of bookmarked workouts across a dozen platforms—freeletics for bodyweight, b42 for football conditioning, youtube for strength programs, instagram for mobility flows. thousands of hours of fitness content, saved with the best intentions.
and i can't build a single coherent training plan from any of it.
the bigger picture
that frustration sent me down a rabbit hole. and what i found was worse than i expected.
the numbers:
- 87.6% of people watch health-related content on youtube
- billions of workout videos across youtube, tiktok, instagram
- 11,000+ exercises cataloged in exercisedb alone
- zero standardized way to connect any of it
every workout exists in isolation. every training app exists in isolation. the knowledge is there. it's just trapped.
| platform | data format | interoperability |
|---|---|---|
| youtube | video (unstructured) | none |
| video/image (unstructured) | none | |
| freeletics | proprietary | none |
| b42 | proprietary | none |
| apple health | healthkit (siloed) | iOS only |
| google fit | google fit API (siloed) | android only |
google fit and apple health were supposed to solve this. they didn't. they're not interoperable with each other, let alone with the billions of workout videos that exist outside their ecosystems.
research published in the journal of medical internet research confirms what every serious athlete already knows: "lack of interoperability and the presence of data silos prevent users and health professionals from getting an integrated view of health and fitness data."
the infrastructure doesn't exist.
the insight
traditional approach to standardization: ask every platform and content creator to adopt common formats.
this coordination problem has proven intractable. it's why we still don't have fitness data interoperability despite decades of trying.
athletic data protocol inverts the model.
instead of expecting the world to structure its data, i'm building AI systems capable of extracting structure from any existing content format. video, image, text, audio—regardless of source, the output maps to a unified schema.
not waiting for the world to change. building intelligence that adapts.
the science
before building, i needed to understand how exercise knowledge is formally structured. and why existing approaches fail.
exercise as structured information
an exercise isn't just a name. it's a complex information structure:
| dimension | description | example values |
|---|---|---|
| movement pattern | fundamental biomechanical category | push, pull, hinge, squat, lunge, carry, rotation |
| primary muscles | target muscle groups | quadriceps, hamstrings, pectoralis major |
| equipment | required apparatus | barbell, dumbbell, bodyweight, cable, machine |
| plane of motion | anatomical movement plane | sagittal, frontal, transverse |
| loading parameters | intensity and volume | weight, sets, reps, time under tension |
| tempo | eccentric/concentric timing | 3-1-2-0 (3s down, 1s pause, 2s up, 0s top) |
traditional fitness content captures only a fraction of this explicitly. a youtube video title might say "chest day workout" while the actual content demonstrates specific exercises with particular form cues and rep schemes.
the system must infer the full semantic structure from partial, implicit, and multimodal signals.
existing ontologies
researchers have tried to formalize exercise knowledge:
physical activity ontology (PACO) — 268 concepts organized into daily living activity and exercise/leisure activity hierarchies.
exercise medicine ontology (EXMO) — published december 2024. 434 classes and 9,732 axioms. first core reference ontology specifically for exercise prescription.
exercisedb — 11,000+ exercises with structured metadata. most extensive practical catalog but lacks formal ontological structure.
a systematic review of physical activity ontologies evaluated 28 ontologies against 12 quality criteria. average score: 4.23 out of 12. no ontology met all criteria.
the gap between ontological theory and practical completeness defines the challenge: build on existing frameworks while extending them to cover real-world fitness content diversity.
the multimodal challenge
exercise understanding from video requires fusing multiple information streams:
visual stream — pose estimation extracts body landmark positions over time. mediapipe provides 33 pose landmarks at real-time speeds. joint angle calculation. movement trajectory analysis.
audio stream — verbal cues ("squeeze at the top," "three more reps"), counting, music tempo, breathing patterns.
textual stream — titles, descriptions, captions, on-screen text.
the research challenge: fusion. how to weight and combine these streams when they provide complementary, redundant, or contradictory information.
the hard problems
building a system that extracts structured workout data from any format. these are the research questions:
1. multimodal fusion — how do you optimally combine visual pose data, audio transcription, and textual metadata? a video titled "best bicep exercises" might show tricep exercises due to creator error. the visual evidence should override the textual label. but the system must learn when and how to make such judgments.
2. exercise disambiguation — romanian deadlift vs. stiff-leg deadlift. bent-over row vs. pendlay row. high bar vs. low bar squat. similar movement patterns, subtle differences. joint angle thresholds may not generalize across body types, camera angles, video quality.
3. structured extraction reliability — LLMs can produce structured JSON output, but reliability varies. even 1% error rates compound into serious data quality issues at scale.
4. exercise ontology — the fitness domain lacks a universally adopted ontology. PACO contains 268 concepts. EXMO comprises 434 classes. exercisedb catalogs 11,000+ exercises. these resources overlap incompletely and use different classification principles.
5. terminology chaos — the same exercise may be called "romanian deadlift," "RDL," "stiff-leg deadlift," or "straight-leg deadlift" across different sources. some distinctions reflect genuine biomechanical differences. others are synonyms.
these are open questions. the optimal solutions aren't known. that's what makes this research.
the architecture
five-stage pipeline. arbitrary fitness content → structured JSON.
- ingestion — platform integration, frame extraction, OCR
- multimodal analysis — pose estimation, action recognition, rep counting
- temporal segmentation — exercise boundaries, set identification, workout phases
- LLM extraction — structured output with schema enforcement and confidence scoring
- validation & enrichment — database cross-referencing, consistency checks
input: video URLs, screenshots, text descriptions, audio files. output: standardized ADP JSON schema accessible via REST API.
the schema
the core output captures complete semantics of athletic training: session metadata (source platform, duration, phases), exercise properties (movement patterns, target muscles, equipment), performance parameters (sets, reps, tempo, rest intervals).
each extraction includes confidence scores. the system is honest about uncertainty.
computer vision pipeline
mediapipe pose estimation — real-time, 33 body landmarks from video frames. joint angles, movement trajectories.
exercise classification via CNN — convolutional neural network for exercise classification. joint coordinates and angles → learned patterns. ensemble learning combines predictions from multiple frames.
action recognition benchmarks:
- WAVd (workout action video dataset): 95.81% accuracy
- UCF101: 93.2% accuracy
- youtube actions: 97.2% accuracy
but these benchmarks test general action recognition, not fine-grained fitness distinctions (romanian deadlift vs. stiff-leg deadlift). developing fitness-specific evaluation benchmarks is part of the research agenda.
rep counting and set detection — temporal analysis of joint angle trajectories. research demonstrates >90% accuracy using mediapipe for landmark detection with custom repetition logic.
why claude
LLM selection for structured extraction significantly impacts reliability.
1. structured output reliability — claude's tool use functionality enables schema-constrained output generation. constrained decoding restricts token generation to valid JSON matching the schema.
2. constitutional AI — for a system processing user-generated content at scale, safety matters. claude's training embeds safety constraints at the model level.
3. multi-model orchestration:
| task | model | why |
|---|---|---|
| primary content analysis | sonnet 4.5 | best balance of capability and cost |
| high-volume metadata extraction | haiku 4.5 | 4-5x faster, 1/3 cost |
| ambiguous cases / quality review | opus 4.5 | maximum capability for edge cases |
| real-time API responses | haiku 4.5 | low latency (<500ms) |
anthropic's insight: "sonnet 4.5 can break down a complex problem into multi-step plans, then orchestrate a team of multiple haiku 4.5s to complete subtasks in parallel."
the stack
| component | technology | why |
|---|---|---|
| orchestration | LangChain | modular pipeline, native claude integration |
| pose estimation | mediapipe | real-time, 33 landmarks, cross-platform |
| action classification | custom CNN + ensemble | fine-tuned for fitness domain |
| video processing | FFmpeg + OpenCV | frame extraction, preprocessing |
| audio transcription | whisper | state-of-the-art accuracy |
| vector database | pinecone | exercise embedding similarity search |
| schema validation | JSON schema + pydantic | runtime type checking |
| API layer | fastAPI + openAPI | standards-compliant REST interface |
exercise ontology mapping
multi-stage resolution for mapping extracted names to canonical identifiers:
| stage | process | fallback |
|---|---|---|
| 1. exact match | normalized dictionary of 11,000+ exercises | → stage 2 |
| 2. synonym resolution | learned mappings ("RDL" → "romanian deadlift") | → stage 3 |
| 3. semantic similarity | embedding similarity against exercise database | → stage 4 |
| 4. LLM classification | claude with exercise description and visual features | → stage 5 |
| 5. human review | confidence < 0.7 flagged for manual review | feedback loop |
this feedback loop continuously improves the synonym dictionary and embedding model.
where it stands
iterating in public. system is partially built. the challenges are defining the research agenda.
what's working
- multimodal extraction pipeline: video → pose → classification → structured output
- exercise ontology mapping: five-stage resolution with fallbacks
- preliminary test set: 87% extraction accuracy on 500-video evaluation
- API design: REST interface with confidence scores
what's hard
1. extraction accuracy — 87% means 13% of extracted data contains errors. error distribution:
- novel exercises not in training data: 15%
- ambiguous visual quality: 25%
- conflicting modality signals: 20%
- schema edge cases: 40%
2. real-time latency — target: <5 seconds for 60-second video. current:
- video download and preprocessing: ~2s
- pose estimation (all frames): ~3s
- LLM extraction: ~2s
- total: ~7s (exceeds target)
investigating: selective frame sampling, streaming pose estimation, parallel processing, model distillation.
3. exercise disambiguation — joint angle thresholds derived from biomechanical literature may not generalize across body types, camera angles, video quality.
4. single-person assumption — current pose estimation assumes one person per video. group fitness classes, partner exercises, crowded gym footage: unsolved.
5. language — english only. german, spanish have rich fitness traditions with terminology that doesn't map directly.
the numbers
current accuracy: 87% field-level on core fields (exercise name, sets, reps, equipment).
target: 95%+. the gap requires better context models and more training data.
introducing BLOCK
athletic data protocol isn't just a research project. it's the foundation for BLOCK—a consumer app that unifies the fragmented fitness content landscape.
BLOCK transforms any workout content into your personal training library:
- import from anywhere — youtube videos, instagram reels, tiktok workouts, screenshots, text descriptions
- universal workout library — every exercise you've ever done or want to do, structured, searchable, organized
- cross-platform sync — connect your freeletics history with gym sessions, your b42 football training with youtube follow-alongs
- smart recommendations — discover workouts that match your goals, equipment, training history
BLOCK is the consumer-facing implementation of athletic data protocol. bringing interoperability to everyday athletes.
stay tuned for early access.
limitations
technical
1. accuracy ceiling — 87% means 13% of estimates contain errors. for casual tracking, probably fine. for professional athletes or injury rehab—potentially problematic.
2. language scope — english only. german, spanish, other languages need culturally-specific training.
3. single-person limitation — group fitness, partner exercises, crowded gyms remain unsolved.
4. video quality dependency — performance degrades with poor lighting, unusual camera angles, low resolution.
5. no clinical validation — not validated for medical or rehabilitation use. don't use for injury recovery without professional oversight.
ethical
platform terms — extracting data from social media raises legal questions. the system uses official APIs where available, respects rate limits, processes only publicly available content, provides opt-out for creators.
intellectual property — workout content may be protected by copyright. the system extracts factual information (exercises, sets, reps) without reproducing creative expression.
data privacy — workout history reveals sensitive info. training patterns → health conditions. exercise timing → schedules. this data needs protection.
what's next
now
- latency optimization — selective frame sampling, parallel processing, model distillation
- accuracy improvement — active learning pipeline for edge cases
- multi-language — german and spanish first
later
- public API — tiered pricing for fitness apps, training platforms
- platform integrations — youtube, instagram, tiktok for seamless import
- mobile SDK — on-device extraction for privacy-sensitive use cases
vision
- real-time video analysis — extract exercises as videos play
- wearable integration — connect extracted workouts to heart rate, calories, recovery metrics
- progression tracking — automatic detection of strength gains, plateau patterns
- AI coaching — personalized recommendations based on training history and goals
the point
fitness data fragmentation isn't a technical limitation. it's an architecture failure.
we have:
- billions of workout videos
- comprehensive exercise databases
- robust pose estimation technology
they just don't talk to each other.
athletic data protocol builds intelligence that adapts to content as it exists. not waiting for the world to standardize. a system that understands workouts through multimodal analysis—video → pose → classification → structured data.
current state: 87% accuracy. clear path to 95%+.
research ahead: multimodal fusion. exercise disambiguation. real-time processing. multi-language support.
the impact goes beyond personal convenience. a universal layer connecting workout content to structured data enables:
- AI coaches that learn from any content source
- training logs that seamlessly aggregate across platforms
- analytics that span complete fitness journeys
for anyone who's ever tried to build a coherent training plan from bookmarked videos across five different apps—this should have existed ten years ago.
i'm building it now.
references
- Tang, Y. et al. (2025). Video Understanding with Large Language Models: A Survey. IEEE Transactions on Circuits and Systems for Video Technology.
- PMC11124794. (2024). Workout Classification Using a Convolutional Neural Network in Ensemble Learning. Applied Sciences.
- Jin, Q. et al. (2019). Developing a Physical Activity Ontology to Support the Interoperability of Physical Activity Data. JMIR.
- Tian, J. et al. (2024). Core reference ontology for individualized exercise prescription. Scientific Data.
- IJBNPA. (2023). Content and quality of physical activity ontologies: a systematic review. International Journal of Behavioral Nutrition and Physical Activity.
- ExerciseDB. (2025). ExerciseDB API Documentation. https://github.com/ExerciseDB/exercisedb-api
- Google. (2024). MediaPipe Pose Estimation. https://developers.google.com/mediapipe
- Anthropic. (2025). Claude Sonnet 4.5 Model Card. San Francisco: Anthropic.
- Anthropic. (2025). Introducing Claude Haiku 4.5. https://www.anthropic.com/news/claude-haiku-4-5
- Bai, Y. et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073.
last updated: Dec 2025