Mini essays,  AI,  Tech

Code mode for mcp servers and llms

Code mode for mcp servers is about LLM writing and calling the code to use a proper MCP instead of calling it directly with the whole context. It makes the call a lot smaller, no overhead is passed, just the basics that are required to call the proper MCP method. Just as You do in code, method or a function, proper parameters, everything validated and… BAM ! We are returning a context that LLM uses in further stages.

Anthropics wrote…

This reduces the token usage from 150,000 tokens to 2,000 tokens—a time and cost saving of 98.7%. Nice that we know it but it`s the companies that know people need more and more conbtext and it is getting out of hand… whole excels just pushed into the LLM with hope for the bess ….

Code mode for mcp servers

This is really nice looking but only for a huge models with a 1kk tokens of a context. We need to remember that this is not possible on any kind of high end consumer pc. Additionaly a lot of llm providers move to requests instead of tokens.

We should always remember that any LLM will loose context just as we do during our conversation. Things will get evaluated as more important and less important based on some tokens or words and repetition.

Read this :

  • https://blog.cloudflare.com/code-mode/
  • https://www.anthropic.com/engineering/code-execution-with-mcp

Example list of code mode for mcp servers

Example of a txt file that could work as an entrypoint for our army of MCP servers

MCP Servers Tools:

  • Everything: tools/list, tools/call {„name”: „echo”, „arguments”: {„text”: „hi”}}
  • Filesystem: tools/call(read_file) {„name”: „read_file”, „arguments”: {„path”: „/data.txt”}}
  • Git: tools/call(git_log) {„name”: „git_log”, „arguments”: {„repo”: „user/repo”}}
  • Fetch: tools/call(fetch_url) {„name”: „fetch_url”, „arguments”: {„url”: „https://example.com”}}
  • Memory: tools/call(query_memory) {„name”: „query_memory”, „arguments”: {„query”: „user prefs”}}
  • PostgreSQL: tools/call(query_db) {„name”: „query_db”, „arguments”: {„sql”: „SELECT * FROM orders”}}
  • Google Drive: tools/call(list_files) {„name”: „list_files”, „arguments”: {„folder”: „docs”}}
  • Puppeteer: tools/call(navigate) {„name”: „navigate”, „arguments”: {„url”: „https://site.com”}}
  • Slack: tools/call(send_message) {„name”: „send_message”, „arguments”: {„channel”: „#dev”, „text”: „update”}}

Flow: tools/list → LLM picks tool → tools/call(params) → result callback

Code mode for mcp servers and llms can look like a list of methods, just an API schema, yaml file or swagger. Txt will also work cause our model (llm) will and/or should figure that out on its own.

RAG for mcp (optional)

Retrieval-Augmented Generation for mcp servers and llms would be almost ideal. The perfect example is Context7 retrieving the library id and then documentation. Based on the prompt, instruction, context send to LLM we need to figure out what mcp server and method to use and what data to send to that method.
RAG server would analize the query and based on that provide vektor with embedded data that will describe our execution of the mcp server. Here there be dragons cause making a RAG vectors embeddings like BERT or Wav2Vec . A lot of fancy stuff to do but this is for another time.

Summary tl;dr

Next time we should remember that throwing around the whole context might not be the most efficient thing to do. There are also additional considerations like security, why should an mcp server even know about the whole context ? Where is the separation of concerns ? Nice to see the big players started to see those issues and are fixing those pretty fast.
The race is still on 🙂

Piotr Kowalski