Global Markets

Building AI products

June 9, 2024

This is an ‘unfair’ test. It’s a good example of a ‘bad’ way to use an LLM. These are not databases. They do not produce precise factual answers to questions, and they are probabilistic systems, not deterministic. LLMs today cannot give me a completely and precisely accurate answer to this question. The answer might be right, but you can’t guarantee that.

There is something of a trend for people (often drawing parallels with crypto and NFTs) to presume that this means these things are useless. That is a misunderstanding. Rather, a useful way to think about generative AI models is that they are extremely good at telling you what a good answer to a question like that would probably look like. There are some use-cases where ‘looks like a good answer’ is exactly what you want, and there are some where ‘roughly right’ is ‘precisely wrong’.

Indeed, pushing this a little further, one could suggest that exactly the same prompt and exactly the same output could be a good or bad result depending on why you wanted it.

Be that as it may, in this case, I do need a precise answer, and ChatGPT cannot, in principle, be relied on to give me one, and instead it gave me a wrong answer. I asked it for something it can’t do, so this an unfair test, but it’s a relevant test. The answer is still wrong.

There are two ways to try to solve this. One is to treat it as a science problem – this is early, and the models will get better. You could say ‘RAG’ and ‘multi-agentic’ a lot. The models certainly will get better, but how much better? You could spend weeks of your life watching YouTube videos of machine learning scientists arguing about this, and learn

Read the rest of the article here.