JSON Is The Wrong Content Type For LLM Inputs.

This isn't an exhaustive or fully baked idea yet, but I've been noticing a trend with MCP servers -- they love to just yeet a bunch of JSON at an LLM. I think this is well-intentioned but not super optimal.

In practice, I've been experimenting with different response types/modalities depending on the source data. It stands to reason that LLMs mostly can interpret many forms of structured input, and are also capable of implicit understanding of inputs based on type (even beyond overfitting due to alignment) due to the likelihood of those structured inputs in the training corpus.

Here's a few things I've noticed --

Send tabular data as CSV. The same data expressed as a table uses 50% fewer tokens with, near as i can tell, no real loss in coherence. I suspect if you have many columns this would decrease, but that leads to point 2...
Paginate, paginate, paginate. I think its pretty reasonable to expect that most users will be working within a 200k context window, so when you can avoid sending complete objects or pages, do so.
Pictures tell a thousand words, literally. Most multimodal models aren't at the point where they can do the really fancy o3-level zoom in/zoom out stuff yet, but I've found that you can pretty reliably have them interpret a plain image via OCR. Especially when it comes to interpreting data, sending a bar/line chart with a legend and clear axes/labels seems more efficient in terms of tokens than the raw points.

Bonus item -- the trickiest part about testing this stuff is definetely evals. I havent found a great solution here that isn't just 'write my own eval agent'. Most off the shelf stuff isnt optimized for multi-turn conversations.

@aparker.io

2025-05-24T00:00:00.000Z

austin

Post reaction in Bluesky

Reactions from everyone (0)

austin