Skip to content
Jul 2024·14 minBuilding

athletic data protocol

you save workouts from ten different creators. none of them talk to each other. a unified format that makes any workout comparable and trackable.

the origin

it started with chris heria's weighted muscle-up video.

i've been training seriously for over a decade. not casually—seriously. the kind where you track progressive overload in spreadsheets, obsess over periodization, and know the difference between a romanian deadlift and a stiff-leg deadlift at 5:30am when your brain hasn't fully woken up.

tuesday night, 11pm. bookmarking heria's new weighted calisthenics video. perfect for tomorrow.

next morning, i opened my training app to plan the week.

and i hit the wall i've hit a hundred times before.

the heria bookmark lived on youtube. my training log lived in a spreadsheet. my mobility routine was saved on instagram—somewhere in a folder i'd named "flexibility stuff" six months ago. the stretching sequence from yoga body was buried in a different app. and that HIIT finisher i'd saved on tiktok? gone. lost in an endless scroll i'd never reconstruct.

four apps. three browser tabs. one spreadsheet. all containing workouts i genuinely wanted to do. none of them talking to each other.

but here's what really broke me: even if i did manage to piece together a plan, i still wouldn't know where i actually stand.

heria's video showed weighted muscle-ups with a 20kg vest. i can't do a single muscle-up—weighted or not. so what's my path from here to there?

  • which progressions have i already tried?
  • where did i plateau?
  • what worked before?

that data exists somewhere across three different apps. good luck finding it when you need it.

the irony: i've accumulated hundreds of bookmarked workouts across a dozen platforms—freeletics for bodyweight, b42 for football conditioning, youtube for strength programs, instagram for mobility flows. thousands of hours of fitness content, saved with the best intentions.

and i can't build a single coherent training plan from any of it.

the bigger picture

that frustration sent me down a rabbit hole. and what i found was worse than i expected.

the numbers:

  • 87.6% of people watch health-related content on youtube
  • billions of workout videos across youtube, tiktok, instagram
  • 11,000+ exercises cataloged in exercisedb alone
  • zero standardized way to connect any of it

every workout exists in isolation. every training app exists in isolation. the knowledge is there. it's just trapped.

platformdata formatinteroperability
youtubevideo (unstructured)none
instagramvideo/image (unstructured)none
freeleticsproprietarynone
b42proprietarynone
apple healthhealthkit (siloed)iOS only
google fitgoogle fit API (siloed)android only

google fit and apple health were supposed to solve this. they didn't. they're not interoperable with each other, let alone with the billions of workout videos that exist outside their ecosystems.

research published in the journal of medical internet research confirms what every serious athlete already knows: "lack of interoperability and the presence of data silos prevent users and health professionals from getting an integrated view of health and fitness data."

the infrastructure doesn't exist.

the insight

traditional approach to standardization: ask every platform and content creator to adopt common formats.

this coordination problem has proven intractable. it's why we still don't have fitness data interoperability despite decades of trying.

athletic data protocol inverts the model.

instead of expecting the world to structure its data, i'm building AI systems capable of extracting structure from any existing content format. video, image, text, audio—regardless of source, the output maps to a unified schema.

not waiting for the world to change. building intelligence that adapts.

the science

before building, i needed to understand how exercise knowledge is formally structured. and why existing approaches fail.

exercise as structured information

an exercise isn't just a name. it's a complex information structure:

dimensiondescriptionexample values
movement patternfundamental biomechanical categorypush, pull, hinge, squat, lunge, carry, rotation
primary musclestarget muscle groupsquadriceps, hamstrings, pectoralis major
equipmentrequired apparatusbarbell, dumbbell, bodyweight, cable, machine
plane of motionanatomical movement planesagittal, frontal, transverse
loading parametersintensity and volumeweight, sets, reps, time under tension
tempoeccentric/concentric timing3-1-2-0 (3s down, 1s pause, 2s up, 0s top)

traditional fitness content captures only a fraction of this explicitly. a youtube video title might say "chest day workout" while the actual content demonstrates specific exercises with particular form cues and rep schemes.

the system must infer the full semantic structure from partial, implicit, and multimodal signals.

existing ontologies

researchers have tried to formalize exercise knowledge:

physical activity ontology (PACO) — 268 concepts organized into daily living activity and exercise/leisure activity hierarchies.

exercise medicine ontology (EXMO) — published december 2024. 434 classes and 9,732 axioms. first core reference ontology specifically for exercise prescription.

exercisedb — 11,000+ exercises with structured metadata. most extensive practical catalog but lacks formal ontological structure.

a systematic review of physical activity ontologies evaluated 28 ontologies against 12 quality criteria. average score: 4.23 out of 12. no ontology met all criteria.

the gap between ontological theory and practical completeness defines the challenge: build on existing frameworks while extending them to cover real-world fitness content diversity.

the multimodal challenge

exercise understanding from video requires fusing multiple information streams:

visual stream — pose estimation extracts body landmark positions over time. mediapipe provides 33 pose landmarks at real-time speeds. joint angle calculation. movement trajectory analysis.

audio stream — verbal cues ("squeeze at the top," "three more reps"), counting, music tempo, breathing patterns.

textual stream — titles, descriptions, captions, on-screen text.

the research challenge: fusion. how to weight and combine these streams when they provide complementary, redundant, or contradictory information.

the hard problems

building a system that extracts structured workout data from any format. these are the research questions:

1. multimodal fusion — how do you optimally combine visual pose data, audio transcription, and textual metadata? a video titled "best bicep exercises" might show tricep exercises due to creator error. the visual evidence should override the textual label. but the system must learn when and how to make such judgments.

2. exercise disambiguation — romanian deadlift vs. stiff-leg deadlift. bent-over row vs. pendlay row. high bar vs. low bar squat. similar movement patterns, subtle differences. joint angle thresholds may not generalize across body types, camera angles, video quality.

3. structured extraction reliability — LLMs can produce structured JSON output, but reliability varies. even 1% error rates compound into serious data quality issues at scale.

4. exercise ontology — the fitness domain lacks a universally adopted ontology. PACO contains 268 concepts. EXMO comprises 434 classes. exercisedb catalogs 11,000+ exercises. these resources overlap incompletely and use different classification principles.

5. terminology chaos — the same exercise may be called "romanian deadlift," "RDL," "stiff-leg deadlift," or "straight-leg deadlift" across different sources. some distinctions reflect genuine biomechanical differences. others are synonyms.

these are open questions. the optimal solutions aren't known. that's what makes this research.

the architecture

five-stage pipeline. arbitrary fitness content → structured JSON.

  1. ingestion — platform integration, frame extraction, OCR
  2. multimodal analysis — pose estimation, action recognition, rep counting
  3. temporal segmentation — exercise boundaries, set identification, workout phases
  4. LLM extraction — structured output with schema enforcement and confidence scoring
  5. validation & enrichment — database cross-referencing, consistency checks

input: video URLs, screenshots, text descriptions, audio files. output: standardized ADP JSON schema accessible via REST API.

the schema

the core output captures complete semantics of athletic training: session metadata (source platform, duration, phases), exercise properties (movement patterns, target muscles, equipment), performance parameters (sets, reps, tempo, rest intervals).

each extraction includes confidence scores. the system is honest about uncertainty.

computer vision pipeline

mediapipe pose estimation — real-time, 33 body landmarks from video frames. joint angles, movement trajectories.

exercise classification via CNN — convolutional neural network for exercise classification. joint coordinates and angles → learned patterns. ensemble learning combines predictions from multiple frames.

action recognition benchmarks:

  • WAVd (workout action video dataset): 95.81% accuracy
  • UCF101: 93.2% accuracy
  • youtube actions: 97.2% accuracy

but these benchmarks test general action recognition, not fine-grained fitness distinctions (romanian deadlift vs. stiff-leg deadlift). developing fitness-specific evaluation benchmarks is part of the research agenda.

rep counting and set detection — temporal analysis of joint angle trajectories. research demonstrates >90% accuracy using mediapipe for landmark detection with custom repetition logic.

why claude

LLM selection for structured extraction significantly impacts reliability.

1. structured output reliability — claude's tool use functionality enables schema-constrained output generation. constrained decoding restricts token generation to valid JSON matching the schema.

2. constitutional AI — for a system processing user-generated content at scale, safety matters. claude's training embeds safety constraints at the model level.

3. multi-model orchestration:

taskmodelwhy
primary content analysissonnet 4.5best balance of capability and cost
high-volume metadata extractionhaiku 4.54-5x faster, 1/3 cost
ambiguous cases / quality reviewopus 4.5maximum capability for edge cases
real-time API responseshaiku 4.5low latency (<500ms)

anthropic's insight: "sonnet 4.5 can break down a complex problem into multi-step plans, then orchestrate a team of multiple haiku 4.5s to complete subtasks in parallel."

the stack

componenttechnologywhy
orchestrationLangChainmodular pipeline, native claude integration
pose estimationmediapipereal-time, 33 landmarks, cross-platform
action classificationcustom CNN + ensemblefine-tuned for fitness domain
video processingFFmpeg + OpenCVframe extraction, preprocessing
audio transcriptionwhisperstate-of-the-art accuracy
vector databasepineconeexercise embedding similarity search
schema validationJSON schema + pydanticruntime type checking
API layerfastAPI + openAPIstandards-compliant REST interface

exercise ontology mapping

multi-stage resolution for mapping extracted names to canonical identifiers:

stageprocessfallback
1. exact matchnormalized dictionary of 11,000+ exercises→ stage 2
2. synonym resolutionlearned mappings ("RDL" → "romanian deadlift")→ stage 3
3. semantic similarityembedding similarity against exercise database→ stage 4
4. LLM classificationclaude with exercise description and visual features→ stage 5
5. human reviewconfidence < 0.7 flagged for manual reviewfeedback loop

this feedback loop continuously improves the synonym dictionary and embedding model.

where it stands

iterating in public. system is partially built. the challenges are defining the research agenda.

what's working

  • multimodal extraction pipeline: video → pose → classification → structured output
  • exercise ontology mapping: five-stage resolution with fallbacks
  • preliminary test set: 87% extraction accuracy on 500-video evaluation
  • API design: REST interface with confidence scores

what's hard

1. extraction accuracy — 87% means 13% of extracted data contains errors. error distribution:

  • novel exercises not in training data: 15%
  • ambiguous visual quality: 25%
  • conflicting modality signals: 20%
  • schema edge cases: 40%

2. real-time latency — target: <5 seconds for 60-second video. current:

  • video download and preprocessing: ~2s
  • pose estimation (all frames): ~3s
  • LLM extraction: ~2s
  • total: ~7s (exceeds target)

investigating: selective frame sampling, streaming pose estimation, parallel processing, model distillation.

3. exercise disambiguation — joint angle thresholds derived from biomechanical literature may not generalize across body types, camera angles, video quality.

4. single-person assumption — current pose estimation assumes one person per video. group fitness classes, partner exercises, crowded gym footage: unsolved.

5. language — english only. german, spanish have rich fitness traditions with terminology that doesn't map directly.

the numbers

current accuracy: 87% field-level on core fields (exercise name, sets, reps, equipment).

target: 95%+. the gap requires better context models and more training data.

introducing BLOCK

athletic data protocol isn't just a research project. it's the foundation for BLOCK—a consumer app that unifies the fragmented fitness content landscape.

BLOCK transforms any workout content into your personal training library:

  • import from anywhere — youtube videos, instagram reels, tiktok workouts, screenshots, text descriptions
  • universal workout library — every exercise you've ever done or want to do, structured, searchable, organized
  • cross-platform sync — connect your freeletics history with gym sessions, your b42 football training with youtube follow-alongs
  • smart recommendations — discover workouts that match your goals, equipment, training history

BLOCK is the consumer-facing implementation of athletic data protocol. bringing interoperability to everyday athletes.

stay tuned for early access.

limitations

technical

1. accuracy ceiling — 87% means 13% of estimates contain errors. for casual tracking, probably fine. for professional athletes or injury rehab—potentially problematic.

2. language scope — english only. german, spanish, other languages need culturally-specific training.

3. single-person limitation — group fitness, partner exercises, crowded gyms remain unsolved.

4. video quality dependency — performance degrades with poor lighting, unusual camera angles, low resolution.

5. no clinical validation — not validated for medical or rehabilitation use. don't use for injury recovery without professional oversight.

ethical

platform terms — extracting data from social media raises legal questions. the system uses official APIs where available, respects rate limits, processes only publicly available content, provides opt-out for creators.

intellectual property — workout content may be protected by copyright. the system extracts factual information (exercises, sets, reps) without reproducing creative expression.

data privacy — workout history reveals sensitive info. training patterns → health conditions. exercise timing → schedules. this data needs protection.

what's next

now

  • latency optimization — selective frame sampling, parallel processing, model distillation
  • accuracy improvement — active learning pipeline for edge cases
  • multi-language — german and spanish first

later

  • public API — tiered pricing for fitness apps, training platforms
  • platform integrations — youtube, instagram, tiktok for seamless import
  • mobile SDK — on-device extraction for privacy-sensitive use cases

vision

  • real-time video analysis — extract exercises as videos play
  • wearable integration — connect extracted workouts to heart rate, calories, recovery metrics
  • progression tracking — automatic detection of strength gains, plateau patterns
  • AI coaching — personalized recommendations based on training history and goals

the point

fitness data fragmentation isn't a technical limitation. it's an architecture failure.

we have:

  • billions of workout videos
  • comprehensive exercise databases
  • robust pose estimation technology

they just don't talk to each other.

athletic data protocol builds intelligence that adapts to content as it exists. not waiting for the world to standardize. a system that understands workouts through multimodal analysis—video → pose → classification → structured data.

current state: 87% accuracy. clear path to 95%+.

research ahead: multimodal fusion. exercise disambiguation. real-time processing. multi-language support.

the impact goes beyond personal convenience. a universal layer connecting workout content to structured data enables:

  • AI coaches that learn from any content source
  • training logs that seamlessly aggregate across platforms
  • analytics that span complete fitness journeys

for anyone who's ever tried to build a coherent training plan from bookmarked videos across five different apps—this should have existed ten years ago.

i'm building it now.

references

  1. Tang, Y. et al. (2025). Video Understanding with Large Language Models: A Survey. IEEE Transactions on Circuits and Systems for Video Technology.
  2. PMC11124794. (2024). Workout Classification Using a Convolutional Neural Network in Ensemble Learning. Applied Sciences.
  3. Jin, Q. et al. (2019). Developing a Physical Activity Ontology to Support the Interoperability of Physical Activity Data. JMIR.
  4. Tian, J. et al. (2024). Core reference ontology for individualized exercise prescription. Scientific Data.
  5. IJBNPA. (2023). Content and quality of physical activity ontologies: a systematic review. International Journal of Behavioral Nutrition and Physical Activity.
  6. ExerciseDB. (2025). ExerciseDB API Documentation. https://github.com/ExerciseDB/exercisedb-api
  7. Google. (2024). MediaPipe Pose Estimation. https://developers.google.com/mediapipe
  8. Anthropic. (2025). Claude Sonnet 4.5 Model Card. San Francisco: Anthropic.
  9. Anthropic. (2025). Introducing Claude Haiku 4.5. https://www.anthropic.com/news/claude-haiku-4-5
  10. Bai, Y. et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073.

last updated: Dec 2025