Szymon Kaliski

Independent Consulting, and Interfacing with LLMs

Hi,

the biggest news of this quarter is that I'm an independent consultant now!

I left Replit ↗ in October, after 2.5 years. While there, I oscillated between exploring and prototyping, and designing and developing production features. It definitely was a valuable experience to be a part of rapid two-orders-of-magnitude ARR growth. Some of my notable projects are documented on Replit's blog: 1 ↗, 2 ↗, 3 ↗, 4 ↗, and the Agent itself.

The company is now focused on creating production applications by prompting, and I've been itching for a while to explore some of the other Open Questions Around LLM Interfaces. Going independent seemed like the best way to do so, and I'm excited to share that I'm currently working with Google Creative Lab ↗, where I'm exploring some of these ideas!

Reach out if you're interested in working together: hi@szymonkaliski.com


In the meantime, I've been making small vignettes around interacting with the language models. I believe that chatting is not going to work for everything, and that there's still a lot to figure out in how to do things with these new fuzzy functions.

Below is a small sampling of these explorations, some of them done on my own time, some at Google.

Codegen Environments

One of the directions I'm excited about is how can a little codegen fit within a Programming System — instead of generating whole applications, can we, by the nature of the environment, get compartmentalization, interoperability, and other interesting properties automatically?

Below is a canvas-based environment where each node is generated by the model from a short piece of text, and the way you use it is a mix of natural canvas interactions: duplicating, resizing, moving around... — with the added communication channel of being able to react to other nodes:

A friend ↗ made a good point ↗ that instead of asking "to the left of" we could have draggable whiskers coming out of the nodes, and that evolved into a full on operating system of sorts, where every tool is generated on the fly, and I can combine them together by roughly describing what I want:

I also looked a bit into the meta direction of making the process of making visible — which was inspired by earlier experiments with Liunon:

Here, instead of changing the prompt inline, you create modifications on top of the things you already made. I quite like this linear history of progress, but this clearly starts lacking when you want to explore multiple possible directions (which is where something like Spellburst ↗ comes in).

Finally, I poked around combining some of these ideas with Programmable Ink, but I didn't get very far:

The main issues were that bounding box detection for detecting similar hand-drawn items remained finicky, and I hit a wall with my vibe-coding skill and would have to get into the code to debug some of the performance issues, which really didn't seem like a fun way to spend a couple of evenings of my time.

Improving Chat

I also played around with expanding the chat interface itself.

Often when I try to learn with the help of an LLM, I want to start follow-up questions from specific passages in the reply:

While this is a complete prototype, I don't really use it — I think partly because I'd have to go to some directory and run npm start, and partly because often I'm already a couple of questions deep in an LLM session, and it's very hard to move that over somewhere else. I'm not sure what to do about this! I could host this tool somewhere, and that would solve the first problem, but the second one is a larger issue — I'd either have to move all my conversations there, or know up-front when something will turn into more of an exploratory session.

If only we had Malleable Software ↗ so I could jam this into some existing tool that I already use...

On the topic of learning with LLMs, another idea that I explored was prompting the model to generate explanations together with interactive demos, with an extra twist — these demos have to store their state in a way that's inspectable, and attached as a part of conversation. What that allows for, is exploring some behavior in an interactive way, and then asking follow-up questions about this specific state you're seeing:

I know you could always take a screenshot and paste it back into the chat, but I do wonder why none of the AI providers do this automatically. We can create an artifact, a canvas, or what-have-you, but its run state remains invisible to the model.

Little Tools

A completely different direction that I'm interested in, is exploring small situated "fuzzy" tools:

name-pdfs ↗ renames PDF files by their content. I use it a couple of times a week after downloading a bunch of articles.

ai-grep ↗ finds things "fuzzily" matching the search term. It nicely complements something like rg (even the output format is the same), and can be useful when I forget the exact wording I used. I use it a couple of times a month, most often when searching through my wiki, or an unfamiliar codebase

ai-translate ↗ was an experiment in bi-directional text-to-text translation, it sounded interesting on paper, but I don't use it at all. Still, the idea of translating between code and pseudo-code has stuck with me:


I'm hopeful about the possibilities LLMs open for End-User Programming and creative tools — it's been exciting to revisit the ideas I wrote about seven years ago in this Ink&Switch essay ↗. One of the hard parts is now (at least partly) solved, but a lot of the other ones ↗ still remain!

Worth Checking Out

What I've been reading lately:

On the web: