Wrapping Up Week 4: Mixing For Variation

Navy Boards

Developing Narrative
Experiences for
Amazon Alexa

Wrapping Up Week 4: Mixing For Variation Wrapping Up Week 4: Mixing For Variation - Adventures In Developing For Alexa

Wrapping Up Week 4: Mixing For Variation

updates

Mar 30, 2020

This was a busy week, but I definitely got some benefit from the seeds planted last week around testing. Fortunate, because it’s also the first week I really started feeling the impact of COVID-19. Spent a day this week manning the fort while my partner worked from home, and I wasn’t able to realistically move forward at the breakneck pace I wanted without damaging my headspace. As always though, my problems are small, and I’m blessed to be in the situation I’m in. I spent a little more time this week getting some interactions in the point and click experience working (so we’d get certain responses triggering when interacting with the experience) but once that was in place, there were some interesting discoveries:

Audio Is The Last, Late Step

It’s possible to implement audio as part of a skill as one of the last steps in development. And if development and testing are done right, there’s minimal changes necessary to even the testing suite used. There was one git commit done during development this week where the core implementation changed from text-to-speech to audio, with no unittest failures. This is exciting because audio testing is easily the most time-intensive process. Get everything working and tested, and then plug in audio and start testing that. VELOCITY FTW!

Non-destructive Enhancements

An Alexa Skill’s implementation as Python code in a Lambda function is a series of Python classes with a common pair of methods implemented. (That’s a mouthful of jargon to be sure) One of these methods determines whether a handler can deal with a given intent request - essentially the mail man coming the door with a package, showing it and saying “This yours, guvnah?”

Because these handlers are registered in a particular order, the first handler that says yes gets the package (getting to process the request) it’s possible to add new functionality (or deal with specific intents differently) by adding new handlers and registering them earlier in the order. The big advantage here is pluggability and easy reversion to older code. Deal with a new intent a different way? Add a new handler. Not working suddenly? Comment out the line registering the new handler.

Brevity is Key

Especially with text to speech, brevity is key. Using the shortest number of words to provide clear instructions or feedback ensures a positive experience. Greater facility with SSML (Alexa’s markup language for giving instruction on how text should be read) may help here, but there’s limits to what Amazon Polly (the actual implementation of text-to-speech voices) can do to interpret a script and bring words to life. Even with a human’s ability to alter rhythm and bring meaning to a script, shorter is better.

To that end, I’ve modified a recording template used by Tavern of Voices to give feedback on the length of a script overall, and for individual lines. Once I get a feel for the ideal limits, I’ll be adding colour-coding to the cells for the word counts to automatically draw the eye to potential problem areas in a script. You can find a copy of the spreadsheet here

Fat Audio Workflow

One outcome from the week was realising that audio for Alexa seems to require some fairly heavy compression (reducing dynamic range to ‘squish’ the difference between loud and soft) I didn’t spend a huge amount of time on the sample audio I created, but even with what I felt was a high amount of compression, the audio still sounded ‘thin’, making it hard to hear compared to text-to-speech voices. I’ll be experimenting with a normalise, compress, normalise approach that I’ve used before with audiobooks to see if that helps.

I’ll also be investing some time in an automated audio workflow to publish assets to S3 (Amazon’s file storage solution) including post-processing audio. I ran into some issues with file permissions in S3 that will require more sophisticated solutions, so that’s been added to the list

I’ve written up a post on some techniques for mixing for variation as a result of this week’s work.

What’s On For Next Week?

Next week I’ll be starting development of a skill involving complex rule systems, using the awesome consulation advice from this week (and some ongoing consulting) I’ve decided to go for a space trading game of some sort. I’ll be spending time tomorrow nailing down the poetic layer of the experience and working through Alexander Swords’ Forest Paths method to nail down what I want it to do. I’m beyond excited to dig into this - one real validation that came out of this week’s work is that playing with a complex skill is fun, as a player. Playing with one that has voice acting implemented is even better - it feels like a responsive, intuitive interface. There’s a lot that goes into the user experience to make that smooth, but the core loop is FUN.

I’ll be spending some time tomorrow implementing a pipeline build in Jenkins to automate a lot of the steps I spent time on manually this week, too - I’m expecting that to be another investment in time that pays off dividends in the coming weeks.

I’ve also got my voice acting consultation tomorrow (via Zoom) and I’m behind on reading for Art of Game Design, which I really want to get back to.

Stay safe out there, folks. Things will get better. We will see the other side of this.

And in the meantime, I’m headed to space. See you out in the black.

Tags: