please empty your brain below

I asked it to come up with a list of all the Tube stations in Zone 1 the other day. It repeatedly gave me some outside of Zone 1, whilst also missing some out.

When I corrected it, it updated the list each time with my correction and then made a new mistake.

After about 20 minutes I gave up and went outside for a bike ride in the sun - no AI there. 🤭
ChatGPT and all other language-model based AI systems are basically guessing machines with no built-in 'intelligence'. They do not seem to be designed to tell you how much of a guess they made. Worse case scenario they simply make up stuff - e.g. fictitious legal cases they cite as precedent.

AI is almost useless if you can't verify the output by other means. But, used properly, it can be helpful. Clearly quizzes are beyond it at the moment.

I don't think the answer to Which London pub claims Shakespeare as patron? (The Bunch of Grapes) is necessarily wrong. It didn't say he was a patron merely that this London pub claims he was a patron - which may be true.
It isn't true, the pub doesn't claim this. As can be easily checked.
Perhaps the problem lies in an ambiguity of the english language. In German we refer to "KĂĽnstliche Intelligenz". Here KĂĽnstlich clearly means not naturally occuring. Artificial on the other hand can mean the same, or refer to being insincere. So perhaps the jokes on us and all the time people have been devising a software that mocks us while making us think we are clever. A sort of Super Marvin. I prefer to spend my brain power on more important things. Like where on earth I left my glasses.
I don't think using AI to generate quizzes, or for that matter any creative output, will necessarily be the best use of one's time - if you want one, it's always best to make it yourself, due to GPT not being always factually correct all the time, and often being cliched if asked about a longer piece.

I mainly use ChatGPT as a helper of sorts - just have it look at something I wrote, and sometimes it will pick out spelling errors or give advice on something. That doesn't mean I have to act on it, and I certainly never use AI to write something that I will go on to publish - it's not worth my time or the reader's time.

About the search engine thing, yeah I don't get it either. The issue is that Google itself is getting worse every day, and even now it's using AI for its responses. I remember a few months back reading an article where Google AI gave the wrong length of Waterloo Bridge.

As with all tools, ChatGPT has certain uses where it can be good. And besides, it's also not the only AI out there, merely the most popular.
But isn't that the point, disinformation at the push of a button, what will Winston Smith do now?

There has been an explosion in AI creepypasta complete with AI narration, but I just view it as the modern equivalent of pulp fiction, some of it so so, some of it quite enjoyable, bit like classic Twilight Zone or Outer Limits.
The cynic in me says that “AI” still very much lacks the “I”. They feed it lots of information without necessarily indicating if the information is factually correct. So it ends up obeying rule one of computers: Garbage in, garbage out.

I’ve tried to use it in the past to write little blurbs, but ended up doing it myself as the output wasjust utter rubbish.
Generative AI summaries are convincingly phrased, which was a breakthrough in natural language processing - but that often comes at the expense of facts. If the underlying data is ambiguous or simply not there, then the language generation continues fact free, in order to keep the illusion of plausibility going. And there seems to be a level of built-in obsequity, meaning that it takes a cue from the question and tries to re-enforce what it believes the questioner expects - plus all those hapless apologies when you push back.
If AI is so useless, why are big companies investing so much in developing it? Something that needs huge data processing centres and sacks working people can't be good. It must be because the data centres are cheaper than the people.
My son who is a software developer working for an AI company tells me that products advertised as using AI often don't or very little. Companies often just use AI as a marketing tool. It makes the product seem more impressive to prospective purchasers to be told it uses AI.
Re: why so much is being invested in AI, an alternate explanation is that superficial improvements in natural-sounding language generation have convinced a large number of investors that AI is ready for mass usage, and therefore companies are falling over themselves to get some of that investment, whether or not there's anything really useful underpinning it.

I was at a business conference last month, speaking on a panel on AI as a sceptic for its current usefulness. I realised afterwards that I could have set the context better by opening with 'so, did everyone finish up their blockchain implementations?'
The other issue with AI is that the more it is used incorrectly the more it will refer to wrong answers. For instance I reckon your blog will now come up as a source of more misinformation about potential London quiz questions because of this post!
Most software development makes money by convincing potential investors that something profitable has a good chance of happening. Actually profitability doesn't matter, so long as the hope is there.
I speculate many companies have to show they are using AI because they don't want their share price to fall. It's no different from listed companies telling investment analysts they had a strategy for the Internet, Social Media, Web 2.0, Big Data, Outsourcing, Cloud, Crypto Currency & Blockchains.
I create monthly quizzes for my team at work, and about 3 months ago I tried to use ChatGPT to generate a multiple choice quiz of “fun animal facts”. And quickly learned, after checking its answers, that the concept of a multiple choice quiz is not one of its abilities - for almost all the questions two or more of the answers were correct, despite it only indicating one correct answer each time.

I was a huge AI sceptic. Of late I’ve been experimenting, and there are some things now where it’s got proven value for me. And plenty more where it hasn’t! It’s not a panacea, and half the fun is trying it for things and deciding, nope, it’s garbage for this, or hm, maybe I can use it to my advantage here.

And quizzes, for me, is a big nope. (For other factual things I always ask it in the prompt to quote its source so I can then double check.)
I asked the deep research mode to have a go at making a quiz that would challenge Diamond Geezer - I think it did ok, but probably wouldn’t be that hard!

chatgpt.com/share/685fce19-dcf0-8009-9de1-126cd2351c3a
For those of you without a ChatGPT login, Ed's quiz is here.

I got 8 out of 10 so I'm quite pleased.

That said, Q1 is the age-old cliche about tube station names and mackerel, Q8 has the answer in the question and Q9 is really unfair - that's not how quiz questions work!
The current tools being touted as "AI" are nothing of the sort: They're large language models. What does this mean? They "read" a very large corpus of text from the internet and build a very fancy database about the relationship of all the words they've encounted.

At no point does the software "understand" the words: It just forms associations between them. e.g. it might associate the words "pub" and "alcohol". This is why ChatGPT et al produce rubbish: They're not trying to produce facts but regurgitate relationships between words that fit the question.

These tools do have some uses, but they're not going to be putting people out of work. The current hype is just that: Hype.

The kicker in all this is all the energy (i.e. CO2) required to train and run these bloated lie making machines.
ChatGPT was absolutely correct with one of its suggested question 10s: Julius Caesar was never Roman Emperor.
Though Constantius Chlorus' reconquest of Britannia during the Caruasian Revolt could be argued to be an invasion, I suppose. Relieved I didn't use the word "surely" in my previous comment.
I use AI extensively, though not for fact-finding—as you’ve discovered, it’s not particularly reliable for that.

Where it does excel is in tasks like writing basic code, rephrasing text for clarity and consistency, and checking whether all parts of a question have been addressed, eg your job application has covered all the points asked for. That said, it often requires very precise instructions, which can be frustrating.
Does a large language model "understand" anything? Granted, it forms associations, like pub goes with alcohol. But so does a pet. The word "walk" with the event. Does the dog understand?
There seems to be some confusion in various comments. AI is not the same as large language modules (LLM). LLMs is simply one facet of AI and not the one that businesses necessarily plan on using in future.

A possible future use for LLMs is for business to make their company information more accessible - something they currently do badly with chatbots. The issue of inaccurracy would largely go away since you restrict the the data to verified company data.

AI can be extremely good at at optimisation issues such as scheduling. A fairly recent AI breakthough was a slightly more efficient method of multiplying matrices that humans had not spotted. 'Slightly more efficient' does not sound like much of a deal but when applied to huge matrices now used it has a significant benefit.

Let's not forget AI has identified the structure of 2.5 million proteins and accurately recreated the structure of all previously known protein structures. Previously, identifying the structure of each protein would typically be worthy of a PhD thesis. In the process. This led to two AI programmers getting the Nobel prize for chemistry despite not being chemists.

In maths alone, AI will have a huge impact. But maths has the advantage of the existence of 'formal proofs' which are verifiable. So AI can generate the code for a formal proof and if it runs in a formal proof checker without error it is correct.

I am told that AI is brilliant for (human) language learning though I do not know how true it is. Certainly, grammar checkers are pretty good these days.
I use AI a lot for my work as a programmer. It’s valuable but not infallible - it often makes mistakes and it makes a lot of my work chasing down mistakes in the code it has written (much faster than I would write it). This obviously is quite skilful in itself - in some ways it’s harder to debug someone else’s code than to write code yourself - and this is something that makes it challenging for inexperienced programmers. You can very quickly get yourself into the deep end without the skills to handle it. Arguably it makes it harder to learn those skills too.

I also use it to learn things - but for this I generally try to verify key aspects with secondary sources.

Finally I use it for creative tasks - from brainstorming, proof-reading writing and creating recipes. In these cases precision isn’t important - I’m not going to cook something that will poison me, but having a quick recipe that uses up a few random ingredients I have is useful when tired!

It’s not great for truly creative tasks - but for derivative work it can be useful.

The developing field of tools that LLMs can use (via MCP) is interesting and I think this points to where a lot of the value can come from. Being able to “talk” to your computer and have it pull data from multiple apps, website etc and combine them together is really powerful. But also really dangerous and open to exploitation.

I don’t think the argument of whether an LLM “understands” is really meaningful or at least important: these are useful tools that can save a lot of time. On a good day it can easily double my productivity. Whether that’s putting someone out of work or making my work more valuable is unclear. Certainly as a company we can’t afford to hire anyone else, so these tools mean we can do more with the limited resources we do have.
Testing house. I noted you didn’t specify which model you were using so I assumed the default. Using the o3 model I got:

Questions
1. Which hidden river meets Thames near Blackfriars?
2. London’s smallest public statue depicts which animals?
3. Where can you find Roman amphitheatre remains?
4. Which Tube line stays solely Zone One?
5. St Paul’s towers feature two golden what?
6. Which bridge began as Brunel railway?
7. Which square houses Sir John Soane’s museum?
8. First Tube escalator installed at which station?
9. Wembley’s arch replaced which twin-towered feature?
10. The Monument’s urn commemorates which disastrous event?

Which seem slightly better or at least more specific. The Wembley one is a bit rubbish. :)

It does give me the answers if you want to see them.
Ed's quiz Q7 has the old chestnut about what is the only Underground line to cross all he others. The usual answer is the Jubilee, but I'm sure the Northern line does as well...?

dg writes: since 2007
I'm proud to have never wasted energy on these glorified autocompletes owned and run by thieving far-right sociopaths.

I'm also amused/confused by the idea of it stealing my job. About 5-10 years ago it was a regular refrain from management in my industry was that there was a flood of junior devs coming out of tech bootcamps who were able to copy-paste bits off stackoverflow and get something which vaguely resembled the desired result, but poorly, and without really understanding what they'd written, and that this was a problem, and there was a great excess of those type of candidates and a great shortage of more senior ones.

Any programmer of more than 6 months experience knows that reading code is harder than writing it (even reading your own), and peer reviewing and correcting shoddy work of a junior takes longer than just writing it yourself. Now apparently an infinite supply of poorly semi-copy-pasted-cobbled-together-with-no-understanding is a boon which is going to boost productivity, we have people all the way up to and including the PM saying we should throw our economy behind this stuff.

Do I (a) spend 10 minutes and very little energy writing some simple code, or (b) spend 30 minutes fighting chatGPT and using the energy equivalent of Belgium, to get the same result? Apparently (b). Mental
Claude.ai is much better than ChatGPT.
Try this: [login required]
Technically three of them are not questions, or am I being pedantic?
I tend not to use the AI for things I can easily figure out myself. I swear the simpler it is, the more likely it is to cock it up. I did have a use the other day, when someone decried how samey cars (particularly compact SUVs) look today. I felt that the most popular segments have always looked the same, but I wasn't willing to spend time to prove it. I asked Copilot to give me a chart of six compact hatchbacks from 1985, side-on, labelled. It duly produced a side-on view of compact hatchbacks from 1985(ish). Real ones. (and yes, they look the same) But the labelling? It got Volkswagen, and then proceeded to make all of the rest up in complete gibberish. It sort of proved the point, but not in a way I was going to offer up to the world. It still wasn't worth my time to label them correctly.
Your experience and that of lots of commenters emphasises the need to double-check AI results against secondary sources. The danger then appears to be that, over time, these may also become corrupted as AI use grows — eg, I remember your experimental AI post about the river Quaggy included an entirely fictitious railway viaduct, so what happens if this (or similar) appears to be the secondary source used to “check” a potentially incorrect AI result? Maybe not life-threatening in the blogosphere, but when decisions on health treatment, benefit entitlements or job eligibility increasingly rely more on AI screening the dangers of misinformation reinforcing misinformation seem scary. Checking AI accuracy often requires a human to know better (eg correcting quiz answers), but what happens if humans get too much of their “superior” knowledge from already corrupted AI data?
I confess I expected people to try the quiz and tell us how they did. Here are the answers anyway.

1. The George Inn
2. River Fleet
3. In the crypt (south aisle)
4. Belsize Park
5. Bow Street
6. Hatchards
7. Brydges Place
8. Greenwich (Observatory)
9. Tower Bridge
10. Walthamstow (The Ancient House)

I’ve not properly measured them both (yet), but according to several sources the northern end of Emerald Street (a block away from Kirk Street) is at least 7.5 inches narrower than the admittedly taller and more dramatic Brydges Place.
Other generative AI models are available and would probably do a better job. I would suggest perplexity.ai.
The German version of ChatGPT told me, that the is no name of an underground station containing an "X". It says that "Oxford Circus" sounds like it contains an "X" but does not („Oxford Circus“ klingt so, hat aber kein X.) - The English version correctly named Uxbridge and Oxford Circus, but ignored at least Vauxhall. Strange!
ChatGPT (and LLMs in general) perform very badly at word puzzles because the basic blocks they work with are “tokens” which are groups of letters. They find it very difficult to work on a sub-token level as a result. You could get it to write code to work out the answer (if you ask it to) which will be more successful.

The classic example is asking it how many “r”s there are in the word “Strawberry”.
While designing quizzes may be hard, answering them is much easier, I asked copilot (other LLMs blocked on work laptop) and it got 8/10.

It wrongly suggested the Marine Police Establishment (1798 not Bow Street's 1749). And believes Crawford Passage in Clerkenwell is narrower - stealing the answer from a Londonist article that measured between double yellow lines rather than wall to wall.










TridentScan | Privacy Policy