a technology and design blog

Chopping up a speech into word chunks using Adobe software

If you ever need to chop up a piece of spoken word video into chunks, with single words from the speech in each video clip, then I have a great automated solution for you.

First bring your video clip into Soundbooth. Double-click the file to display the video and show its sound wave. Open the ‘Metadata’ window and click Transcribe at the bottom of the panel. Select the language of the speaker. Adjust the quality if needed. Identify speakers should be checked if there are more than one speaker. Click OK.

Picture 2Picture 3

You’ll then have a transcription of the speech. Select each word in turn and click play to check the accuracy of the word. Double click into a word to change the word or right-click to merge words into each other if there are inaccuracies. The transcription process works best with clear speech and no noise so spend a bit of time with these settings and any noise reduction you can apply to get the best results. You can export this data to XML by using File/Export/Speech Transcription. This might be all you’d need to do as this XML can be used directly in Flash to seek streamed video to specific parts based on the ‘Name’ node, ‘Time’ node and ‘Duration’ node.

However, you may wish to take this further as I did today and actually chop the video up in a fairly automated fashion. So from Soundbooth use the File/Save As functionality to save a new version of the video with embedded XMP metadata information. In the export settings, make sure you select your preferred audio and video export settings but before you click OK to render the file make sure that ‘Include Source XMP Metadata’ is checked from the drop down menu.

Picture 5

Switch over to After Effects and import the video file with XMP metadata. Go to After Effects/Preferences/Media & Disk Cache and check ‘Create Layer Markers from Footage XMP Metadata’.

Picture 6

Create a new composition from this video file and you should now see there are layer markers labeled with the text spoken for each word in the speech.

Now we need a couple of scripts to help things along so head over to Paul Tuersley’s site to get his ‘Split at Markers’ script and Jeff Almasol’s site to get his RD Pre-Compose script. These should be saved to Applications/Adobe After Effects CS4/Scripts. Restart After Effects having saved your changes.

Go to File/Scripts/SplitAtMarkers to split that layer into chunks on the timeline. Then select the words that you’d like to render by selecting the newly chopped up timeline layer and go to File/Scripts/Rd_Pre-Compose. It’s important to name the new composition with the same name as the word and to check the ‘Trim new composition to combined layers duration’.

Picture 7Once you’ve done all of this you will have lots of named compositions that can be exported using the batch renderer at whatever settings your heart desires.

[Post to Twitter] Tweet This

Comments are closed.

Powered by Wordpress | Designed by Elegant Themes