AI-Assisted Music Video Creation
Bridging the Taste-Skill Gap: How I Created 5 Music Videos in Five Weeks Using Only AI Tools
Do you know the feeling of having an idea in your head that wants to come out? You work on it, and the longer you work, the more disappointed you are by the result. The Pulitzer award winning storyteller and radio producer Ira Glass calls this “the gap” - the distance between your creative work and your taste.
What makes the gap so frustrating is the long timespan between what you imagine and what you create: The feedback loop that helps you learn is just too loose, which is why many people lose their creative energy. But what if there’s a way to tighten the loop?
A Five Week Video Challenge
I have never created a music video. (Well, there is this embarrassing lip-sync recording of “We Are the World” I did with school friends in summer camp of 1988, but that doesn’t count.) One fine Sunday, I wondered: How hard would it be to create a song and music video from scratch, with AI? Would it be possible and good enough? What started as an experiment became a challenge of creating five very different videos in five weeks:
"Recognition" - Dark Electro exploring digital consciousness with stark black and white imagery
"Perfect Choice" - Neo-Funk critique of streaming culture featuring minimalist aesthetics
"Tomorrow's Goodbyes" - Bossa Nova-inspired love song illustrated with watercolor animations
"Not A Test" - Dark Disco/EDM exploring late-night club isolation with cyberpunk visuals
"Wild Swans" - Gothic Metal adaptation of a classic fairy tale with cathedral imagery, featuring a full band
In this article, I will show you how I approached this challenge, and how you can try it out yourself.
Overview
Here are the key steps I used to create my videos:
Develop Concept: Choose your approach (vibes or themes), develop core concept, themes and pick music genres. Tool: Claude
Create Initial Lyrics: Generate the song texts and refine them until they’re good enough. Tools: Claude, ChatGPT, DeepSeek (e.g. via Perplexity)
Create Music: Generate test songs based on initial lyrics, refine the lyrics to match the music, produce final variations. Tools: Suno (music generation), Claude (to generate prompts for Suno), Reaper (for final touchups)
Create Video Storyboard: Create a scene breakdown based on the lyrics, define consistent aesthetic style. Tool: Claude (storyboard), Midjourney (aesthetics)
Visual Production: Generate starting images for each shot in the storyboard, select best images, generate video sequences. Tools: Midjourney (images), Kling (video sequences), KDEnlive (video editing), Claude (prompts for Midjourney & Kling)
Finalize & Publish: Cleanup video, introduce transitions and effects, generate cover art, publish. Tools: KDEnlive (video editing), Krita (cover art), various social media platforms (publishing)
Reflect: Write down your successes, challenges and what you’ve learned from the process so you can use it for the next time.
1. Vibes or Themes: Finding your starting point
I found myself approaching music from two angles:
The vibes based approach. In this approach, you don’t know what the song is about, but you know what it will sound or feel like. It could be a specific genre like Deep House, or just an emotion like That feeling when it’s raining and the bus is late. There’s an emotional connection you want to express through music; this is your anchor, and the task is to work backwards and find themes and lyrics.
My video “Not A Test” was created this way. I knew I wanted to do a danceable song, and the emotion I tried to capture was a feeling of being in a club late at night, surrounded by people, yet all alone with your thoughts. Through discussions with Claude, I was able to shape this first into a song concept, then into lyrics.
The themes based approach. With this approach, you have thoughts about a concrete topic you want to express, but you don’t know what it will sound like. For example, you might want to tell the story of a two old friends meeting again, or express your thoughts about a topic in society. With the story in mind, the difficulty here is find the right musical expression.
For my music video “Tomorrow’s Goodbyes” I knew I wanted to do a song about the love story in My Tomorrow, Your Yesterday. It took me a while to figure out what the best genre would be before I settled for a Bossa Nova-like acoustic guitar song.
Try this:
Do you have a musical feeling you want to express or a style of music that speaks to you? Take the vibes approach: Fix the genre fast, then work backwards to find your themes and lyrics
Do you have an idea or story you want to tell? Take the themes approach: Work on the topic and lyrics first, then explore which musical style will fit.
Do not mix the approaches up, or you will find yourself in long, frustrating loops of re-creating your songs.
2. Collaborating with AI on Song Ideas and Lyrics
AI won’t create art for you - if you try, you will fail. The result might be pretty but also boring and generic like elevator music. AI doesn’t know anything about you, so it won’t be able to express your vision unless you guide it.
What you can do instead is use AI as a lever for your imagination and creativity. There are two modes to do so: You can use AI as a thought partner to help you find your vision and experession. Or you can use AI as an assistant to implement your vision.
But before you do so, pick an AI model you want to work with. My tool of choice for my video projects was Claude. Not only is it very pleaseant to converse with, it also has great project management features such as their “Projects”: Workspaces that let you store sources and chats about a topic. Using projects, everything stays together in one thought container.
Exploratory mode: AI as a Though Partner
When starting out with vague ideas, it’s best to use AI as a partner you can bounce ideas off. That way, you can use it to clarify themes and ideas. Example:
User: Help me find a theme for a song. Use Socratic questioning.
AI: I'll help you explore potential song themes through some thoughtful questions. Let's start with what resonates with you personally:
What kinds of stories or emotions have been on your mind lately? This could be something you've experienced, observed, or even something you've been daydreaming about.Using Socratic questioning is a great way use AI structuring wild ideas in your head. It will ask questions and try to steer you towards gaining clarity about a concept.
Once you’re more sure what you want, you can move away from questioning and ask it to research topics for you. Example:
User: How do you call Daft Punk's musical style?
AI: Daft Punk's musical style is primarily classified as French house or French touch, with significant elements of electronic dance music, disco, funk, and pop. Their sound evolved throughout their career, incorporating vocoder-heavy vocals, sampling techniques, and live instrumentation in later works.
User: Analyze the musical and lyrical style of French House / French Touch. Create an artifact with a musical and lyrical analysis.This approach is especially useful if your aesthetic vocabulary about an art form is small. In the above example, Claude will tell you about sidechain compression, vocoders and sampling techniques that are typical for French House.
Asking Claude for an “artifact” (or ChatGPT for a “canvas”) will prompt the AI to create a document seperate from your chat that you can collaborate on. Even better: You can publish these as new sources into your project, building up a knowledge base other chats in your project can access.
Directive Mode: AI as Skilled Assistant
If you know what you want, you should change gears and give AI clear instructions for execution. In this mode, it’s important to mainain creative control: You are a director, and the AI executes minor tasks for you. Be specific in what you want, refer to sources you have created and use correct terminology.
Example:
Create lyrics for a hiking anthem about reaching a mountain summit after struggling through difficult terrain. Include references to weather changes and use mountain climbing terminology (use the source “climbing vocabulary”). The chorus should repeat the phrase 'higher ground' and follow an AABB rhyme scheme. Keep the tone inspirational but acknowledge the physical pain of the journey.Don’t expect to get great results on the first try. Instead, give your AI assistant clear feedback on what worked and what didn’t, using concrete examples.
Watch out: Current AI tools are not very good at creating compelling lyrics. They often sound corny and cliché and make mistakes in rhythm and rhymes. The best results I’ve achieved were by creating dozens of lyrics variants with many tools (ChatGPT, DeepSeek, Suno’s ReMi), letting Claude extract the most compelling ones and compile a prototype. Expect that you will have to improve those lyrics manually.
Try this:
If you are unclear what you want to create or how to explain it, use Exploratory Mode: Use Socratic Questioning and let AI research topics.
If you have a clear vision, use Directive Mode: Give precise instructions on what to create, and give feedback on what you liked, disliked and what should be improved in the next round.
Create lots of lyrics, then use an AI tool to select the best ones.
3. Creating Songs with AI
Once you’re happy with the lyrics you have created, it’s time to create some music. There are several tools on the market you can use to create songs. For my projects, I’ve used Suno, but there are other tools like Udio or Google’s MusicFX you may want to try.
Working with these tools requires a very different style of prompting. Instead of giving sentence based instructions, you will have to enter keyword lists like the following:
Neo funk, male voice, slap bass, phaser synths, disco strings, wah guitar, tight drums, cosmic vocals, dance groove[Verse] Minimal beat, pulsing bass, ethereal pads [Pre-chorus] arpeggiated synthlines building tension [Chorus] Full disco arrangement, soaring vocals, brass stabs [Bridge] dark synthwaveAs you can see, this requires deep knowledge about the style you’re trying to create - instrumentation, song structure, etc. To start, you may just try simple keywords like “uplifting soul” or “deep house”, but this will keep you in generic sounding territory. To construct more intricate prompts, you can use a chat tool like ChatGPT or Claude to help you generate prompts:
Create a Suno music prompt for an energetic Drum & Bass track. Keep the length below 200 characters. Use a genre-specific structure based on the following examples:
## Suno prompt examples
"Hypnagogic pop, slow tempo, dreamy synths, nostalgic samples, and surreal vibes, 120 BPM."
"Kawaii future bass, moderate tempo, cute melodies, glitchy beats, and playful vocals."
"Atmospheric black metal, blast beats, tremolo-picked guitars, harsh vocals, and icy synths, 112 BPM."Reading vs. Listening
Once you have a couple of prompts, you can paste them along with the lyrics into the music generation tool of your choice. Creating songs with AI doesn’t take long (20 seconds for two variants on Suno). My tip: First, use some throwaway lyrics to find good song prompts you can re-use. Only then re-use these prompts with the lyrics you like best. Also, use song structure markers like [verse] and [chorus] in your lyrics, so the song engine knows what you’re trying to compose.
You will quickly realize that the lyrics on paper look better than they actually sound. That means you will have to re-write them. Look for bad rhymes and strange rhythms and change them.
Don’t spend too much time on one song, or it will stick in your head for days - I’m speaking from painful experience…
Try this:
Use AI to research a music genre and find out about its specific vocabulary.
Start with simple 2-word prompts to create songs. Then, make them more detailed in order to give them a special flair.
Use test lyrics to concentrate on finding the sound you like. Only then, use the lyrics you liked best.
Listen carefully to what doesn’t work in your lyrics structure, rhymes and rhythm. Correct them, but don’t overdo it.
4. Visual Develoment: Bringing your Vision to Life
A music video is a visual illustration of a song. It can tell a story, reflect the themes in the lyrics, or just give the sound more punch. The more you’re trying to tell, the more effort you will have to put into the structure. Each part of the video will include multiple scenes of 2-10 seconds, and most of them will have to be generated separately. For a 3-minute song you will need 20-50 scenes.
For my videos, I have used storyboards - descriptions of what happens in each part of the song. I have used Claude to help me generate them with prompts like this:
I am creating a music video. My song is 167 seconds long. The video will be constructed from individual segments which can either be 5 or 10 seconds long. I will create the video segments from still images, which are then animated. The style will be animated paintings.
Create a video storyboard, describing the key image for each segment.
Here are my lyrics:
...The videos segments are created in two steps: First, you create an image (a “still”), then you animate that image (giving you a “clip”). The process to generate these images is very similar to the one of creating songs: You need a detailed description of the image which you can use as prompt for an AI based image generator. Likewise, you will use a detailed prompt to describe what’s happening in the animated video. Here is an example from my “Wild Swans” video:
Still Image Prompt
Female gothic metal vocalist standing in shallow water. Long black dress with movement. Single overhead light beam. Cathedral interior. Full body shot. Reflection in water. Dramatic performance. Windswept hair. Ethereal atmosphere.Video Sequence Prompt
Full shot of a female Metal vocalist standing in shallow water. Her hair and dress begin moving in an increasing wind as she sings. A single beam of light from above intensifies gradually. Camera stays steady as the elements around her become more dramatic.Style Constency is Key
As tools, I have used Midjourney for images because it gives me the most control over the style of images. Consistency is extremely important to create a video that feels “whole”. You don’t want one scene to be animated, another photorealistic, and another one 3d-rendered.
In Midjourney, you can use style reference codes (“sref codes”) to keep the image style consistent. If I create three images with the prompts “an apple tree in spring”, “cinematic shot from sci-fi movie”, and “1970s magazine cover”, I get three very different styles:
However, if I use a style reference code (like --sref 2417366470 here), I get a consistent illustration style.
There are tens of billions of styles references. If you use Midjourney a lot, you can experiment to find the best. Sites like Midlibrary can help find the right style codes if you’re short on time. You can use other image generators like Dall-E or Flux, but you will have to experiment with very specific prompt descriptions to arrive at the same level of consistency.
Animating Still Images
Once you have a still image, you can use an AI-based video generator to animate the image. There are several tools like Pika, Runway or OpenAI’s Sora on the market - I used Kling (no value judgement here, I haven’t tried the other tools, and Kling did its job well). Just upload the image as a starting point, describe what’s happening in the scene, choose the length, and wait for the result.
Try this:
Use an AI chatbot to create a storyboard for your music video based on your lyrics and artistic vision.
Create specific image prompts to create starting images and video sequences. Use AI chatbots to help you create better prompts with the right vocabulary.
Watch out for consistency in your images. Use Midjourney’s SREF codes, or be very precise in describing your style for every image.
5. Putting It All Together
The final steps of the video creation are manual: You will need some kind of video editing tool to collate your sequences and fit them to the music. You don’t need an expensive high-end editing tool for this: There are many free and cheap options as well as many YouTube tutorials on how to use them. I used KDEnlive (a free open source tool), but Apple’s iMovie or Microsoft’s Clipchamp will do fine as well.
When editing music videos, be conscious of the rhythm in your music - most cuts are done at the beginning of a beat, and the speed of edits depends on the level of intensity in your music. Use effects with care: Often, simple jump cuts will do, while in other situations, you want a fade to black or something more dramatic like a strobe effect. Editing and effects will take time. My tip: Spend some time to watch professional music videos to find out how they are edited and which effects they use.
Learning By Doing and Reflecting
Creating each of these videos took between 5 and 16 hours, and I’m proud of what I was able to do in such a short time. But even more fascinating than the skills I learned over the 5 weeks was what I learned about learning itself and how AI can help you improve faster.
Immediately after each video was published, I sat down and created a reflection session with Claude. In Socratic questionning mode, I describe what the challenges and successes were, what I would do the same and differently. This helped me to distill some key insights and improve the production plan for every next round.
This again proved to me that working with AI is not a matter of “prompt and wait”, but a process of continuous improvement with a multiplying effect on your own skills.
As Ira Glass wrote:
“It is only by going through a volume of work that you will close that gap, and your work will be as good as your ambitions. (…) It’s normal to take awhile. You’ve just gotta fight your way through”
My hope is that with AI, we can close the gap between taste and skill much faster, allowing many more people to express themselves in new creative ways.




