Head of the Agents and Assistants Department

Let me state upfront: my attitude toward AI assistants cannot be expressed as a boolean value. If you need an answer to the question posed point-blank—“New York Yankees or Boston Red Sox?”—I do not watch baseball at all; I’m a Barça fan. That said, I find AI assistants a perfectly legitimate and even liquid asset. The text below is an account of what made my work with agents pleasant and reduced their errors and rough edges to an acceptable minimum.

A little under a year ago I began working on Cure, a programming language with dependent types, finite state machines as first-class citizens, SMT verification, and other niceties, compiling to the BEAM.

My first approach to the apparatus ended in ignominious failure. I got tangled in my own architectural decisions, started piling on crutches wherever they fit, turned the code completely into an Italian restaurant menu, and lost heart. In a fit of idiocy I had an assistant generate the website and showed it to the public—the ideas themselves were intriguing enough, dependent types and SMT solvers on the BEAM are hardly redundant, and I harboured a quiet hope for community interest. The community correctly identified the language’s site as slop generated by a language model and received my attempt with arctic indifference. I got a great many “nothing works” responses and not a single coherent suggestion for improvement (no fault of the community’s—against the backdrop of the slop-flood of those days, my project would not have looked like Noah’s Ark even to an extremely charitable observer).

I stepped back, examined my creation from all angles, and was forced to admit: I had produced a monster. I saw no chance of licking it into shape, even through a global refactor. I bought a ream of paper and some pencils and started drawing, in order to understand exactly where I had gone wrong. (Spoiler: I had been so enchanted by the idea itself, and so desperate to get something to launch and run, that I had done literally everything wrong.)

I had no intention of giving up—and before me, full height, loomed the necessity of rewriting everything from scratch without repeating the mistakes. By that point I knew that artificial assistants could significantly accelerate the actual writing of code, so I started by erecting scaffolding around the future project. It was obvious to me then (I can now confirm that former intuition with experience) that all those prompts along the lines of “You are a genius architect with three hundred years of experience designing languages with dependent types and SMT solvers” work no better than the morning pep talk to an intern at standup: “You are a great programmer who has written three hundred million billion lines of code without a debugger.” If an assistant tasked with writing coherent code to a spec needed motivational gibberish to function, it would not be worth using under any circumstances whatsoever. Seriously, think about it: a model is asked to implement a GCD module using the Euclidean algorithm—are you really suggesting its deeply baked-in internal rules will not guide it down the correct branches of the conditional operators without first being told it has the soul of a prima ballerina and an avant-garde poet? What on earth does “You are a senior architect” actually mean? Do the people who advocate this believe that without such a preamble the training pathways via the memoirs of axolotl breeders will activate instead?

So, instead of all those skills/agents/whatever, I started by feeding this mechanised beast the source code of all my own libraries, lovingly written by hand, with the note: “Here are examples of good code. Write like this. Not like that—do not write like that.” I know it is immodest, but it is my assistant. You are welcome to feed yours your own code.

Then I reclaimed the wasteland: thus were born Metastatic, implementing MetaAST for different languages across different paradigms, and Ragex—a RAG built on AST rather than plain text (my hunch that AST fits into a context window far more easily and is far better structured than raw source code turned out to be correct).

My task was building a new language; constructing a new ecosystem from scratch was not part of the brief. So I analysed existing solutions—Rust, Go, Elm, Gleam—and chose the one I considered most mature (I never promised the project would be neutral with respect to my tastes and preferences). I simply copied the Elixir ecosystem and added to it what I had personally found lacking over the last ten years. Thanks to standing on the shoulders of giants in this regard, the model wrote almost the entire ecosystem for me; I simply told it: “Look how beautifully new project creation is handled in Elixir—do the same for Cure.” Language models are strong at translation, and Elixir is considerably more intelligible to them compared to almost every other language.

So, before the first line of code, my backpack already held: the right AST—a language in which the assistant and I can communicate far more easily than in homespun English—a handcrafted RAG, and a clear understanding that every step must be a simple, atomic change. The fewer choices the assistant has to make between two paths, the cleaner the result. This principle outweighs the quality of all prompts combined.

Next I had to solve the problem of validating the produced code. My eyes are sharp, but they occasionally miss non-obvious bad decisions in review. Thus was born oeditus_credo—a set of nearly forty additional credo checks covering vulnerabilities, anti-patterns, and the like. The library also ships a mix oeditus_assistant_rules command, a rules generator for the soulless assistant. To those rules I also added: after each stage, verify that mix format && mix credo --strict && mix dialyzer && mix test passes; update all documentation; add regression tests for new code; then run all regression tests and confirm they are still as green as my face the morning after a party.

Every reasonably significant stage also ends with creating an “example” in the examples folder. Something for people to look at, and the regression tests never idle.

At this point I felt ready to start writing actual code. Cure has certain critical parts I wrote by hand from scratch. Every other pull request gets manual edits from me. Every bug I find through manual testing I fix by hand (the obvious ones aside). And yet I reached the desired result considerably faster than if I had written every line in Vim.

Over the course of working on Cure I learned to keep the number of errors—and consequently manual edits—to a minimum. A fairly substantial release, v0.26.0, required not a single correction, for example. Here is the distillate of my rules for communicating with an assistant, in case it proves useful to anyone:

the task must be self-contained but not too large; “first define the types for numeric, then we will write the converters” does not work
inside the task there must be no undefined ambiguities that will derail our T9; there must be exactly one path to the solution
before tackling any task, demand an implementation plan and edit it until all ambiguities have vanished
the task must be roughly solved in your own head before turning to the assistant; otherwise the odds of agreeing with an incorrect solution are uncomfortably high
generated code must be comprehensible and elegant, “rewrite this nicely, I fed you three gigabytes of rules” does not work; if the solution is aesthetically repellent, there is a problem with the task formulation—close the session and start over
setting a task and going for coffee is the direct path to the infinite iterations described in the previous point; the stream of unconsciousness must be watched and any attempt to stray from the planned path killed without mercy
finally, if it seems to you that the cognitive load is decreasing and that any cook could now implement such a project—you need pharmaceutical intervention; your fingers tire less, yes, but if you could not have implemented the project from scratch in a text editor, the LLM is no help to you; it will generate something suspiciously twitching, no argument there, but the first halfway-serious complexity requiring a considered architectural decision will put a cross on the whole enterprise.

That is my experience. Yours may differ; I can bear with you on that matter.

Try https://cure-lang.org and maybe you will like it. The site now has a playground where you can experiment with types in real time, and an almost real console where you can play with the REPL without installing anything locally.

Happy curing!