What is Vintage May ‘23 AI good for?

I believe that the recent innovations in language models is probably the biggest innovation to happen in the last 20 years, and will inevitably transform many parts of daily life over the coming years. This is the big one. Here are what I think are the most exciting parts:

ChatGPT is a uniquely intuitive & versatile interface
1. You used to need to require users to communicate in a more structured way. Now your website can accept arbitrary plain-text or speech! (This is good and bad, but IMO mostly good).
2. Note that in this way, AI feels very much part of a continuum — hand punched cards in the 1940’s, through to writing hexadecimal assembly code, towards early programming languages, and onto Web 2.0 style UI interfaces.
3. Compare this to skeuomorphism. Using icons that looked like the physical objects people associated with each task made it intuitive for the masses to adopt computing technology. Everyone can communicate in english, so it’s intuitive —and if they speak in a language other than english, LLMs can work in those languages too, and can translate very well between languages. Most technically savvy people substantially underestimate how much people struggle to use modern desktop based user interfaces.
4. A general rule of good API / software design — be forgiving about the formats you accept, but be extremely consistent about the format that you output. LLMs are extremely good at accommodating different incoming formats, and there are decent solutions to regulate the format you output.
5. The next 3-6 months of software development will look like a ton of ChatGPT-clone UIs (or just adding text bars to existing UIs). This seems like a natural extension of the “Command-K” control palette — but much more flexible. But going forward, most LLMs will be operating under the hood most of the time, doing tasks that you aren’t explicitly prompting them to do. If you work with an EA, you don’t prompt them directly every time you need them to do something with an email / calendar event — instead, you set policies and workflows with them, and they do the implementation. Copy/paste into ChatGPT is too high friction.
LLMs can reformat many previously “low N” tasks to be “high N” tasks instead.
1. A quick refresher on the “Low N” vs “High N” terminology: It’s rarely worth automating if you’re only going to do it once. But if you’re going to do it many times, and it takes a while, then it’s often worth automating it. “Low N” tasks are tasks you won’t repeat that task often (one-offs) and “High N” tasks are ones you’ll repeat many times (think: factory assembly lines). The upfront investment required means that you’ll usually only automate the high N stuff.
2. Many tasks that we do are fairly similar to each other, but just different enough that without LLMs, they weren’t similar enough to automate — e.g. summarizing action items from a transcript of a meeting. With LLMs, that stuff is easy to do now.
3. Note that LLMs still don’t allow you to do “low N” tasks — there isn’t a time difference between writing a LLM-based script vs a standard automation script, and they both do a similar amount of work (if anything, LLM based tools generally need a bit more hand-holding & verification through the process).
Most companies believes they can achieve positive ROI by hiring people with IQs below 100. How do they achieve that?
1. Implement process — flowcharts, checklists, approval processes, orgcharts. These intuitively map onto things you can do with LLMs (see langchain and prompt engineering techniques). Different agents should have different responsibilities, prompting, authority, and data access — much like people in real org charts.
2. Implement the same types of processes for the humans working with the LLMs, and use LLMs to help enforce and implement these processes!
3. Require citations — much like a high school paper, your answers get better when you’re required to cite your sources.
4. Writing everything down — your LLM should have an explicitly worded version of everything you want it to remember. One wrinkle on this — maybe you will literally want to have notes about the personalities and work styles of people the LLM will engage with — a very detailed CRM.
5. Many very smart people are allergic to process, because it takes time. But LLMs work 100x faster than a human. So process is a great thing to give a LLM.
Now is a very good time to get API-access setup between your various tools. If you rely on something that isn’t AI-accessible, now may be a good time to think about whether you want things to stay that way or not.
LLMs make it much easier to write more LLMs
1. LLMs can write decent code snippets to help write their own code. These initial code snippets are quite often wrong, but even when they are wrong, the error is usually informative. I generally find chatgpt and github copilot to be a moderately faster way to solve coding problems i’m unfamiliar with, when compared to using google.
2. They can also write decent example data or prompts.
LLMs have limitations that are not intuitive to everybody
1. Prompt-injection and other issues are serious issues for LLM development — and probably worth being thoughtful about.
2. LLMs, in my hands, seem to be much less creative than the average person I know. At present, LLMs give me the response I’d expect of a well-read, normie, not super-smart person.
3. As a corollary, ChatGPT will share common biases and mistakes that most people have.
4. LLM’s tokenization can cause unexpected problems — for example LLMs will struggle to work with DNA sequence information.
5. For now, processing images / charts etc is limited. Multimodal models currently in training now may fix this.
6. LLMs are basically clones. They have the same blindspots, and struggle to effectively grade the results that they produce.
7. These limitations will significantly shape how LLMs get successfully deployed.
8. Standard website UX makes it very clear to users what things the website can do, and critically, what things the website cannot do. ChatGPT style interfaces struggle to implicitly explain the same thing to their users. Maybe a “skeumorphic” approach could work here? A picture cartoon of the LLM in it’s “Uniform” so you know what questions to ask it?
9. For LLMs, P =/= NP. Don’t use LLMs in places where verification of the end results would take longer than just doing it yourself, unless you’re committing to a plan that the LLM will continue to improve.
The bottleneck to implementing LLMs right now is that most people haven’t spent much time thinking about what parts of their day could get automated. Here are some thoughts: Some places where AI can be good:
1. We will manage LLMs, and LLMs will manage us. Setup your LLM so that it will tell you when you need to complete a task on it’s behalf. Use your LLM as a executive assistant — provide it clear and explicit tasks.
2. AI as executive assistant.
3. retrieval-integrated LLMs (see PaperQA, and others)
4. Put LLMs into places where a response timeline of seconds/minutes 24/7 is highly desirable. Large researching tasks, that could once have been done by a high schooler or undergrad with access to google, can now be down well by LLMs.
5. Find very specific tasks & workflows that you can automate. Use as general an approach as you can, but expect to gradually build your way up to higher levels of abstraction.
The community is currently too hung up on:
1. specific terminology — newcomers to the field make sloppy use of “foundation models”, “agent-based architecture”, etc — let words take on the meaning that people find for them, and find alternative words instead that reproduce the original intended precise meaning. The old definitions will get outnumbered in AI’s eternal September.
2. General purpose AI — a tiny number of teams (<10) have a realistic path to meaningful contributions on broadly useful AI. If you aren’t working on general purpose AI, the majority of the EAG-flavored AI safety conversation isn’t relevant to your work.
3. architecture trends — there is substantial capability overhang. Pick a problem that matches the criteria above, and go solve it. Your ability to pick a good problem is much, much more important than your ability to pick the right vector db. Once you have first-hand contact with the problem, upgrade your approach as needed.