Revisiting 2023 Thoughts on AI

In 2023, I had a few thoughts about how LLMs would affect the world. I’m revisiting this today, and thinking about what’s next.

Retrospective

OpenAPI first-mover advantage

I thought that companies like NYT would start to lock down their APIs and make it harder for generative models to be trained, which would give OpenAPI a first-mover advantage.

The point I think I was trying to make here is that OpenAPI would have access to a lot more data to train with than other companies training foundation models later.

This point doesn’t feel

Taking Software Development Jobs

About two years later, here’s what my prediction was:

The harder questions to answer are how to generate good structure for projects at scale and choices when considering many engineers working on complex codebases. When I ask it ‘best practices’ questions, such as how to structure something or if I should choose a particular service for a usecase, it seems to try to tell me what I want to hear or doesn’t “think too deep” about the domain.

This is directionally correct. Structure matters at scale, and modularizing the code in a good way can make it easier for agents to traverse it.

Based on larger-scale projects that are mostly vibe-coded, when the project gets too large and context is lost between llm sessions (compacting doesn’t help enough), it will start to re-implement code and get lost.

However I’m confident this “best practices” or “good software practices” aspect will be mitigated as tooling improves and maximum context increases.

Still, there are some places where I truly believe it is better for prompt it for more quality. Some examples:

Configuration management: You can use pydantic-settings to strictly enforce configuration fields and fetch it from a combination of envvars or .env files. Without prompting, an LLM will use just dotenv and dictionary .get("config-key") which works but does not provide as much extensibility.
Toolchain: Without prompting, llms will just use pip and not take advantage of uv, or do ty type checking, which would greatly improve code quality and is easy for agents to run.
Patterns: I would love if the llm preferred test classes over just top-level test methods,

Note all these issues contain some variation of “without prompting…”. This is where the software development experience is crucial.

With LLMs it is so much faster to implement these patterns and refactor them. If I want to use a repository pattern for a data store, I can essentially just mention this and the llm can fully implement a working solution.

At what point do none of the patterns matter if the code is a black box that ‘just works’ though? That day will certainly come soon.

Lack of hype

I write that I was “over the hype.” I remained unhyped until agent usage was more mature, and models got better.

Some point between 2023 and now, LLMs stopped being wrong so often, especially with code.

Now models write better code than me. Sometimes it doesn’t conform to the best practices I like (aggressive python type hinting, abstracting common behavior, treating some code paths as mappings of classes to other classes, among others), but it will do this eagerly when prompted.

And when they started running tools in my IDE to edit files or run aws commands, it really took off. The ceiling is blown off again and the future will probably look more like the “Background Agent” that Ramp is building.

For the future

I don’t want to make many more predictions about the technical capacity. My life keeps me far away from the coasts and circles of people that are on the bleeding edge of this. So I’ll let simonw make these predictions instead.

Instead I’ll focus on what I want to accomplish with llms. Some of this is already in progress:

Zero-to-prod applications only with prompting I created a template that is a good foundation for building applications that can go from prompt to something hosted at a subdomain, as cheap as possible.

Part of this strategy is encoding my ‘best practices’ in the AGENTS.md file, and putting domain knowledge in a SPEC.md.

The AGENTS.md describe how to interact with AWS to set up the resources required to get this working, and recommendations for datastores to keep it cheap. Like sqlite mounted to gp3 storage.

And for these applications to send messages, it requires some configuration like twilio/sendgrid logins, etc but I have a worker which these ‘microservices’ can dispatch tasks too.

(As an aside: It looks like we moved from microservices to monoliths and are now moving back to microservices. I need to write a followup to my “project structure opinions” article)

This structure has worked well for a project in use by others already, and as I pare it down to this “generic application / specific application” distinction it’s getting faster and faster to deploy apps.

The goal is to eventually be able to just prompt something to a subdomain, and give the llm some basic tools and libraries to accomplish tasks that would otherwise require external service integrations.

IoT with AI I recently installed a home assistant setup in my home. The classic usecase for llms is a ‘jarvis’-style interface where I can tell it to turn off the lights and it turns off the lights.

But more interesting to me is if I can get it to generate code which reads and interacts with the home assistant platform. For example, asking it to turn off the lights at sunset on every weekday, or to let me know when a package gets here by enabling some image recognition task on the front porch camera; texting me when it sees a box.

I’ve had enough fun playing with pure software, and am ready to expand into the world of hardware which interacts with the world we live and breath in.

Sandboxing An experiment I am trying is allowing a tightly-scoped part of the codebase be a little fuzzy, allowing for pydantic classes to be generated in a sandbox.

As enough data comes in from a stream, it could determine what the structure of the data is and codify a pydantic class, assuming it won’t change in the future.

Another aspect is having some backend code which exposes a library to a frontend, so users can ask natural language questions and have the system generate code to be run in a sandbox to return results to the user.

Cloudflare is already offering a sandbox product and I won’t be surprised when this is the next buzzword being thrown around on the business side of tech.

Advantages of Fast LLMs I’ve been experimenting with models that are optimized for speed like Mercury. They aren’t very robust, but I believe we will have faster and faster models like how we transcended slow internet modems.

Working on Ambitious Projects There are some repos I’ve been iterating on that are a lot of work and technically complex. With LLMs, I hope I can put more cycles into these projects.

Conclusion

Not much else to say. I’m excited about this, and if my usage of AI and increased productivity is any indication, I’ll continue to work with code, writing or prompting, for the coming years.

Retrospective

For the future

Conclusion

See also