My experience with so-called “artificial intelligence” (AI) goes back to the late 1980s and my research into the application of neural networks in music. This included the design of a system able to learn a musical style and then generate a live accompaniment in the same style—all of which led to my 1996 MPhil thesis, imaginatively titled ‘The Design of a Computer-Assisted Composition System’.

Expose the system to enough Bach, for example, and the idea was it would respond to a suitable musical prompt with a novel Bach-like piece. It’s a similar approach to the large language model (LLM) used by ChatGPT to respond with text: consume vast amounts of someone else’s work to train the network and use it to generate “new” output when prompted.

My experiments last year with ChatGPT found it to be wildly contradictory, inaccurate and derivativeand that includes GPT4. Which exactly mirrors my experiences with AI and music those many decades ago.

More recently, I’ve been testing various AI text to image generation tools. The three tools I’ve experimented with so far are Bing’s creative search (with image creation powered by DALL-E from OpenAI), runway, and Midjourney (accessed via Discord).

I’ll briefly illustrate what each of these has produced in response to my text prompts. As you’ll see towards the end of this post, Midjourney I think is currently in a class of its own in terms of image quality.

Bing / DALL-E

DALL-E-powered Bing image creation is fun and reasonably fast. The results are highly variable, particularly people—who can possess unwanted limbs and distorted visual features. Here’s a less distorted one I created of a cave man cycling a bike in central London:

A cave man cycles a bike in central London

I wondered what a similar figure might look like settling down in his cave with a pint of beer. Well, here’s the answer to that:

A cave man in the shadows of his cave enjoying a beer

As is well known—and which I’m sure ChatGPT can readily confirm—cats actually run the world (or at least social media). So here’s an image of the true original creators of the pyramids busy with their work:

Ginger cats wearing sunglasses, busy building the pyramids

As you can see, despite requesting a realistic, high definition photo, it looks more like an illustration.

Okay, let’s change the subject entirely, and see how well the system copes when requested to create a recently rediscovered 1920s image of batgirl:

A sepia tint image of a woman in the 1920s wearing a top with the bat logo

I suspect that’s good enough to fool some people that it really is an old photo, and not something created from a text prompt.

I also experimented with more abstract concepts. I wanted to discover whether I could describe some of my dreams and have them visualised. The results are interesting, if frequently derivative:

A dreamlike, pastel shade image of a landscape and clouds with the outline of a giant figure striding into the distance
A pastel shaded image of a landscape with human figure staring into the distance and stance shapes bobbling out of the

Enough abstraction. Let’s get back to attempts at realism and photographic quality. In this one, I described an image of a concert taking place in the grounds of a certain regal castle in Windsor:

A large stage and band and big audience sitting on the grass with a castle in the background

I then turned my attention to space, describing an image I wanted for an article about comets:

A comet streaking through space

That’s enough Bing / DALL-E for now—apart from this final one of my cave dweller enjoying a nightcap:

A caveman sits in his cave by a cosy fire, holding a glass with a nightcap

What about the costs? Well, so far I’ve been able to generate as many images as I like with no attempt to make me sign up to a subscription service. But that may be because I’m an Office 365 subscriber—or just got lucky.

runway

Next up is runway. I found it similar in quality and reliability to Bing / DALL-E.

Here’s an example of an image where I described a bat person illustrated in the style of Bruegel (although I didn’t specify which Bruegel):

A Bruegel-like image and landscape with distorted figures of a bat person and what may or may not be some children

And then I described an image of batman walking through Victorian London:

Batman strolling boldly through a black and white street scene of what is meant to be Victorian London

I also wanted to visualise a Roman centurion walking through today’s London, along with his chariot. The result looks more like something from a Terry Gilliam film:

Modern London, with a figure based on a Roman Centurion walking behind a strange cart covered in a red robe

This next one is a personal favourite, created in response to my description of a nervous giant fish trying to sneak undetected around the back streets:

A very large shimmering blue fish levitates along the backstreets of a city

And finally, here’s the Houses of Parliament in a goldfish bowl:

An image of the Houses of Parliament inside a glass globe

Runway is also significant in that it can generate video from text. However, access to this latest generation feature is limited and I haven’t been able to play with evaluate it yet.

Runway has various subscription plans, from an initial free one, to a Standard $12 per month to a Pro edition of $28 a month (if you choose to pay annually—otherwise the monthly fee is higher at $15 per month and £35 a month respectively). The more you pay, the more you can do—as you’d expect.

Midjourney

I’ve kept the best text to image generator to last. Midjourney is in a class of its own—my experiences suggest it’s a generation ahead of the others.

Let’s start with a still image taken from my imaginary new film about ginger cats attempting to seize power in London. During this revolution, they storm Buckingham Palace and other famous landmarks:

Angry ginger cats face the camera, with the blurred image of Buckingham Palace in the background
Ginger cats wearing leather armour alongside Tower Bridge, while spears are being thrown at them
A ginger cat looking scared, surrounded by other querulous ginger cats, with Tower Bridge visible in the background

After the success of this feline revolution, there is, of course, a coronation and new queen in town.

A portrait style photo of a ginger cat wearing a crown and facing the camera, soft focus

Talking of cats, here’s one of a somewhat larger variety: a young leopard in the wild in Malawi (even if his paw magically goes through the tree trunk):

A very young leopard in a golden light amongst the trees lies down facing the camera, face-on

Next up, I described various images to illustrate a storybook I might get around to writing one day—all about an unemployed wizard and his cat:

In a green clearing, an old bearded man wearing a battered top hat holds a mug of tea whilst on the table in front of him a ginger cat looks into the distance

In this one, my wizard appears a little confused about how best to make a cup of tea—and the cat is clearly embarrassed by his efforts:

In a green clearing, an old bearded man sits pouring something into a teapot whilst on the table in front of him a ginger cat looks as is he/she disapproves of the whole thing

And finally, back to another imaginary film. In this opening scene, a detective stands looking out over the city he loves:

A man in a dark raincoat has his back towards us as he stands on the top of a skyscraper and looks out over the night time lights of the city below

Like runway, Midjourney lets you try it for free. Once you’ve burned through the initial credits, if you go for annual billing, a Basic plan will set you back $8 a month, a Standard Plan $24 a month, and the Pro Plan $48 a month. Choose to pay monthly, and those costs go to $10, £30 and $60, respectively.

What I learned

Testing these tools has been great fun. But they made me very conscious of long-standing issues about the uncredited creative works used to train the underlying systems—and the derivative nature of much of what they produce.

These issues of ownership and original creators’ rights are important. They need to be resolved in a way that respects creatives’ work rather than ripping them off. A decent percentage of the subscription charges need to go into a fund for onward distribution to the original creators—something like the way the ALCS collects ‘secondary use’ rights on behalf of authors.

I enjoyed using these tools. I’m interested in the feature I haven’t been able to access yet—generating video from text—particularly after seeing some of the runway text to video renders doing the rounds of social media. And there’s also the text to interactive 3D world tools:

As these tools carry on improving, I’m hoping they’ll help me visualise and share the often surreal imagery, unlikely plotlines, and half-remembered dreams long trapped inside my head. And I’m optimistic they’ll also help unleash a wave of untapped creativity in others, particularly as the costs come down to make the tools more accessible.

I’ll post a follow-up article once I’ve had the chance to access and test text to video and other related options. So you can expect a future post full of bizarre events, places, and creatures—oh, and the sounds/music to go with them too, of course. (You have been warned 😄).

p.s. There remain many related concerns about the societal and political impacts of the false images, videos, and audio that can now easily be created by such tools. These issues are outside the topic of this short piece, but discussed in more detail in my new book Fracture.