Hypermedia Friendly Model Context Protocol App Architecture
I am working on speedystride.com (https://speedystride.com), a programming tool that helps athletes quickly input workouts on their Apple and Garmin watches.
These watches come with a built-in workout programming feature that is especially useful for structured programs. For example, runners will often do interval training, which could be something like 5x1000m with 2 minute rest.
And sometimes they’ll want to do a fartlek (Swedish for ‘Speed Play’) where they will vary their speed: run 400 meters fast - run 800 meters slower - sprint 200 meters. These smartwatches will vibrate and beep to help the user perform at the desired target, and also count down rest periods so the user is rested enough for that next hard interval.
Unfortunately, I did not like any of the first party workout builders. These are form-based, with a drag-and-drop interface to structure your workouts. I think these builders have a high user friction; more user inputs are required in proportion to the output. Additionally, these builders run on a small watch screen (https://support.apple.com/en-gb/guide/watch/apd66fcd5c5c/watchos), or require a separate app. This is less than ideal when you are trying to program your watch right before a track workout. There are third party tools in this space, but as far as I can tell, they do not fundamentally break this pattern.
I also wanted to share these workouts with everyone at my city’s track club. My service provides a scheduler that pushes workouts automatically at a specified time for our club training; about 90% of our members have either an Apple Watch or a Garmin so cross-platform compatibility is a very important factor.
To solve these problems, I came up with a very simple domain specific language (DSL) for both people and machines. It can describe exercises, define rest times, and combine everything together into repeat intervals. I implemented a simple recursive descent parser, and it outputs data formats for both Apple and Garmin devices. By defining a small language, I was able to avoid implementing complex forms unlike the current offerings. User input is reduced to plain text.
Example workout DSL (https://htmx.org/essays/mcp-apps-hypermedia/#example-workout-dsl) User:
10x200m max effort with 2 minute rest
DSL:
Repeat 10 times:
- Run 200m @RPE 10
- Rest 2 minutes
I had initially wanted coaches to learn this DSL and enter programs into my website assisted by a Codemirror editor. I incorrectly thought that it was close enough to English for people to quickly learn it when assisted by autocomplete features. I was not meeting my users where they were at; graybeard track coaches had zero interest in learning how to program. What I needed was a translator that could convert natural English into my DSL.
Model Context Protocol (MCP) and MCP Apps (https://htmx.org/essays/mcp-apps-hypermedia/#model-context-protocol-mcp-and-mcp-apps) As I started sharing my project with other people, large language models were becoming popular, and it was an obvious tool to translate natural English workouts into my training DSL. By integrating LLMs, I can massively reduce user friction. There are no forms with complex UIs to implement. There is no DSL to learn as the AI can translate natural language for you. Users can now express their workouts in their own way.
An AI can transform the above user input to this Domain Specific Language with a relatively small language specification. There was also a nice side effect of being very token efficient. JSON payloads defining a repeated workout set can get quite large while my DSL can stay compact. Any errors can be corrected as my parser can provide rich feedback on what went wrong. I have found that 95% of interval workouts I see can be expressed through my language.
LLMs also enable new capabilities, such as programming your watch from a photo of a whiteboard. Even more importantly, Model Context Protocol (MCP) was starting to gain traction. MCPs are a way for LLM systems to interact with the real world, which means that besides just outputting workout programs, the LLM can call a remote function to actually send that workout to your device.
Anthropic and OpenAI both support MCP. So it would be awesome for my business to support LLM integrations since so many users already have Claude and ChatGPT installed on their phone.
Still, there were opportunities to further improve user experience.
I had mentioned earlier that my track club uses speedystride.com to program members’ watches. In order to do so, we have to define a few parameters:
• What is the workout?
• Should the workout be added to my Monday night track intervals, or for Tuesday fartleks?
LLMs can help massively with the first question. But how about the second? Much of the current human-to-LLM interaction is text-based, and MCPs are no exception. Besides improving workout building UX, AI tools introduced new frictions. To associate a workout with an event, I had an events tool that would fetch upcoming events for the user and add them to the LLM context. Then it was up to the LLM to guide the user. Some systems like Claude do provide simple select controls if your tools output JSON objects that look like a set of choices. However, this interaction forces the developer to surrender control of the happy path, which often leaves users confused. Also, the AI would sometimes try to be too helpful and just guess the tool inputs. In summary, back and forth conversation with the LLM is not an ideal UX as the users have to figure out how to guide the AI to the right inputs.
A form with a selector interface is an obvious way to solve this problem.
Luckily for me, the new MCP Apps specification was released in January 2026. This is an extension to the MCP specification that allows rendering custom UI inside an
References:
• MCP (https://modelcontextprotocol.io/docs/getting-started/intro)
• MCP Apps (https://modelcontextprotocol.io/extensions/apps/overview)
MCP App Architecture (https://htmx.org/essays/mcp-apps-hypermedia/#mcp-app-architecture) You need to host an MCP server that can communicate with the AI systems.
Communication model: (https://htmx.org/essays/mcp-apps-hypermedia/#communication-model) MCP Server <-Proxied Request-> LLM Host (Claude or ChatGPT) <-App Bridge-> MCP App UI (rendered inside LLM
All traffic between the MCP App UI and the LLM host must be routed through the App Bridge. The LLM host will then make proxied requests to my MCP server.
Interactive Hypermedia UIs in MCP Apps (https://htmx.org/essays/mcp-apps-hypermedia/#interactive-hypermedia-uis-in-mcp-apps) Let’s say that we are developing a simple workout scheduler, where we have the LLM generated workout program. Our goal is to associate this workout with a calendar event occurrence. A user could have multiple events on her calendar, so a dynamic choice of occurrences should be available for each event she is subscribed to. On a traditional website, this could be trivially handled by a full page refresh.
On MCP App systems without interactivity, we would have to ask the LLM to fully render the MCP App in the chat. This adds friction and unnecessarily consumes tokens. So we must find a path to interactivity within the same UI context.
There are existing UI toolkits, such as MCP-UI (https://mcpui.dev) that works really well with React. However, using React for a simple