Tales of the Rampant Coyote

Adventures in Indie Gaming!

Voice Acting: Am I on the Fringe, Here?

Posted by Rampant Coyote on September 11, 2014

I tend to agree with Ken Rolston on the use of voice acting in RPGs. And Shamus Young. I’ll just excerpt the salient bit from Ken’s interview a few years back:

“[L]et’s talk in the abstract about the worst thing that ever happened to role-playing games is recorded audio for dialogue. I happen to believe that was the death of my joy. Because that limits… that causes production things… the content has to be nailed down at a certain point.

“So [voiced] text is not easily revisable. As I play, text is easily revisable; audio isn’t. As I play, I want to make tiny little changes to the tone, to the feel of things, but you can’t do that when you have all this audio — oh my god, all the audio that we have to record! So what I’m going to say is: for what the audience wants, we are forced to create these things that are very brittle, that cannot be revised.

“Whereas in the happy old days of Baldurs Gate and things like that, I thought you had the best of both worlds. You could have a little snippet of dialogue that would give character, but then you would get in text trees which you could easily scan and click through. For page, that’s the important thing; dialogue pace. In a good old-fashioned role-playing game, the user controls the pace, where unfortunately in both video and recorded audio, you can’t scan it and you can’t backtrack in it.”

Now – I love good voice acting in games. Everybody does. And I cringe as much as everybody else at the bad stuff…  Few things can make a little animated character on the screen come to life as well as expertly done human voice. We hear the nuance. We hear the emotion. We hear character.

But… like much of story, it comes at a cost. It conflicts with gameplay. It limits, as Young and Rolston explain. If you want your game to be more like a movie, so that the gameplay is a linear affair that acts as an interactive separation between talky bits, fine. But I like my RPGs to be a bit more open-ended than that.

Speaking as a developer… I had text and dialog changes constantly in Frayed Knights: The Skull of S’makh-Daon. Right up until release (and afterwards, actually), I was tweaking the story as much as the gameplay. The two had to work together, and in spite of my best efforts I’d find dialog that had to be adjusted (or branched) to account for the different ways in which the player might navigate the story and the world.  That’s hard enough to write when it’s just text. What if those had to be revised with yet another trip to the recording studio? What if the original voice-actor wasn’t available? Does the cost and inflexibility of voice-over work encourage developers to skimp on potential dialog? (If so, I haven’t noticed, but it would certainly impact me, the shoestring budget guy).

Talking to someone at Comic Con last weekend, they recommended fully voiced dialog for my game because they hate to read when playing a game, but they will listen to somebody talk to them “all day.” I mentioned that for me, I’m kind of the opposite. Maybe for the first couple of lines – especially if it’s a major bit of character-revelation or a significant plot development – I’ll listen all the way through. But most of the time, I’ll read through the line of dialog, and then interrupt the voice-over in mid-sentence to jump to the next part. Voice is just too slow.

I love the idea of bringing the Baldur’s Gate style back into vogue – where the first line or paragraph was voiced in any particular scene, so you get the flavor of the character. But in talking to some modern players, they aren’t so keen on that. In short – players don’t want to read when they are playing a game. (I suspect some of them don’t like to read, period, but that’s another story.)  I do get that. I find myself in the same boat. When playing a game, certain parts of your brain are active, and get into a rhythm. Going into text-reading mode completely breaks that flow, and engages different parts of your brain. At least, that’s how it feels. It breaks the flow of things. And while gaming is primarily visual, we can be interpret audio information and communication in a way that’s less disruptive then stuff we have to process visually.

So I dunno. Maybe I’m out there on the fringe, wanting a return to the Baldur’s Gate style “samples” in hopes of getting the best of both worlds. Maybe I’m really out to lunch on this. Maybe I’m too far from the “mainstream.” I dunno.

Filed Under: Game Development - Comments: 18 Comments to Read

  • CdrJameson said,

    In the past I’ve had to go back and tweak the recorded audio for length, or even stitch together a completely new section from bits to cover a case that we missed in the recording.

    It really is a pain.

    Samples & text FTW.

  • Xenovore said,

    I’m totally fine with doing it Baldur’s Gate style. I think that is a great compromise, actually. As you said, you get the “flavor” of the character, but then you can have lengthy, involved conversations if you want. (And if you’ve provided enough voiced dialogue, the text dialogue tends to read in the character’s voice, at least in my mind.)

    And for a downloadable game, or one that can potentially go onto a mobile platform, you don’t want gigabytes of audio data anyway.

  • Felipe Pepe said,

    Do like in Alien Fires: 2199 AD, use a text-to-speech software. :3

  • Steve H said,

    Why wouldn’t a developer wait until a script is tested, revised, then locked down before recording audio? I’m not a developer, so have zero insight into games production requirements, but it strikes me as highly troublesome to create a component of a project that you can’t change while you’re still changing everything else around it.

    I like games with flexible outcomes, so I’m okay with text and little or no voice. The trouble is, a lot of games writing isn’t very good, and so often its bloated with tons of exposition, flavour text, and self-conscious character beats. The Elder Scrolls games are a perfect example of this, I can’t think of another game series at which I shout “Oh shut up already” so frequently.

    A more economical writing style might be a good halfway point solution, you could have more outcomes within a game with the same volume of voice recordings.

  • Felix said,

    I feel the same about video and audio in general. Text, I can read at my own pace. I can skim, go back, pause and resume without moving a finger. I can easily deal with interruptions. With video and audio… not so much.

    But there are other arguments in favor of having text for dialogues. (Because a lot of games seem to eliminate text entirely.) One, what if I can’t have sound on at the time, for whatever reason? What if I’m *deaf*? What if… wait for it… English is my third language? Even after all these years, some accents are hard to comprehend for me. Whereas text is always crystal-clear.

    But nooo… they have to force their wonderful canned “experiences” on the poor little players. Who are, isn’t it, invariably dumb white male teenagers, native English speakers and with no disabilities.

  • Xenovore said,

    @Felix: Great reasons to primarily have text.

  • Cuthalion said,

    It does seem like you’d have to wait until near the end to do voice recording.

    For my part, I love voice acting… but not in games. I prefer the snippet approach you mention from Baldur’s Gate. Half the time, I get impatient; and the other half, I don’t like the acting. It comes off weird and doesn’t match the animations or something. It’s uncanny valley, I guess?

  • Rachel said,

    Fire Emblem: Awakening only has voice acted “snippets” and I’d say it’s completely mainstream! I’m with you: I prefer just a little bit of voice acting, if any, otherwise it’s boring and slow. Of course, I also can’t stand podcasts, whereas lots of people seem to like them.

  • Ruber Eaglenest said,

    You know what? You are not mad. All the points arisen in your post are correct, well, all less one. Gamers that don’t like read are not your market, for obvious reasons.

    Well, to the point…

    I think you should cast an eye to Unepic Game, of Fran, or consult this guy. He made the game… he released it in 2011. He sold around six thousands copies, gather a community and such, it was rejected by Steam. Then he made a crowdfunding for the dubbing of the game in 2013, two years before release. Then, then Greenlight came, he was admitted, etc.

    Right now it has sixteen localizations, and dubbing only for Spanish and English. The thing is, the tools are there, the game is open enough to allow users localizations and dubbing. And all this came just after being two years in the wild. So the text was already tested and done.

    I don’t know the sales right now, but I think this is a good example of continuing development, and for audio dialogues, I think it is the way to go.

    The Snippets thing sounds great for me.

    Ah!, the dubbing by a professional studio cost him around 1200 euros.

  • McTeddy said,

    Nope, I agree with you. When I was young and stupid I thought voice acting was amazing, but I’ve changed.

    Games that have voice acting tend to take much longer to play than if I was reading the text. I end up getting impatient and skipping the dialogue anyways because I can read the subtitles.

    The other thing I like about text is that modern animation *shaking his head sadly* doesn’t do a good job of conveying subtle emotion. We either get the dull american animations or the overly silly japanese ones. *Teddy’s face contorts sickly as he lets out an audible sigh.*

    Combine these mechanical issues with the high development costs of implementing it and I’m not a fan of voice-work. The snippits are a good balance.

    Although, we could all be biased since we’re all old-school RPG’ers.

  • Rampant Coyote said,

    Yeah, I definitely admit my bias. I mean, I used to play text adventures.

    But it’s not like the old-school RPGs were very *wordy* or anything. I mean, the amount of text they could fit inside a text window on a 320 x 200 screen was pretty minimal – especially for fuzzy non-HD TV screens.

    But blending text and gaming may just seem more natural to me, whereas people who grew up watching TV instead of playing games may find it a bizarre combination. You never read anything on TV!

  • Mephane said,

    “Do like in Alien Fires: 2199 AD, use a text-to-speech software.”

    I am not update to date with regards to speech synthesizers, but I suppose they are still deep down in the uncanny valley, otherwise they would have found widespread use in video games. But I suppose eventually there will be a break-through, won’t it? I mean, ingame rendered cutscenes have largely replaced video sequences with human actors, too, once a certain level of visual fidelity had been reached; I guess the same could happen for audio, eventually. I suppose the hardware etc. is easily capable of that, and it is merely a matter of the software itself.

  • Anon said,

    For mainstream, often “cinematic” games voice acting is simply a necessity. No way around that. Imagine a game like Uncharted without voice acting! Yeah…

    For low-budget or indie titles the situation isn’t so clear, is it? There’s stuff with good voice acting out there (but often not with too much content), there are games strictly using text (and losing mass appeal because of it) and there are games that don’t use speech (neither text nor voice acting) at all – but that’s hardly a general solution.

    Where do I stand? Let’s take a quick “history lesson”:

    I come from a long gaming background and know the old systems that weren’t able to put out speech by their own or only with a massive amount of software (one package being called “S.A.M. – Software Automated Mouth”) or hardware (you could buy add-on boards, even with good phoneme generators but they weren’t used by many games – see Apple II or TI 99/4a). Some games like “Mission Impossible” didn’t use add-on hardware and were brilliant in the audio department, too. However, they only had a few sentences because voices cost too much memory. Most of the games had to fit into less than 64KB, after all…

    Then came the 16-bit computers (and consoles) with more computing power, more memory and (stereo) DAC sound hardware like the Amiga and later the SoundBlaster card for the PC. This revolutionized games sound – like many others I really wanted voice acting by then and early games with it like the laster Space Quest entries were really much fun. They often were quality games, too, but that drove games development costs up massively.

    Nowadays, pretty much any commercial graphics adventure and RPG has voice acting – but what I do know?
    Whenever a game offers subtitles I activate them and read them quicker than the voice acting takes – and then I click the (mouse) button to advance in the story or be able to move again.

    In other words: Now that I have voice acting I don’t care much about it anymore. I want more freedom, I want to *play* the damn game, I want the interactivity back!

    Non-cinematic indie games (action/adventures, RPGs, even jump&runs) are what I play mostly now and I don’t care for speech anymore – but I’m old, in the mid-forties – LOL!
    What about the youth? The one that can’t even write correctly, as can be witnessed by reading web forums. Yes, web forums, I know…, but companies are complaining about the “mad comprehension skillz0rs” of their (not “there” ;)) job applicants, too. They blame the schools and rightfully claim that they can’t make up for what the young ones didn’t learn there.

    It seems that language skills deteriorate quickly and when I recently thought of that I had to read that the youngest generation can’t even speek complete sentences anymore!:

    WTF!?! If they can’t even speak complete sentences, how long are the sentences they can read and understand? Three words?

    (IMHO it’s clear what the reasons are but that’s a different discussion)

    For whom am I designing my own game (which doesn’t feature voice acting – I promise! ;)) now? For old farts like myself? And here I thought I could stimulate young folks to think about history like the Assassin’s Creed series does (one of their undeniable virtues)…

    So whatever you think of voice acting or not – it well be may the *only* choice you have as a game developer/publisher in the future!

  • Xian said,

    One thing that drives me crazy is the repetitive phrases that even the minimally voiced games have. The first time going into an area in Might and Magic X a character would say “You are aware that there may be brigands in these parts”. The first time added color to the game, but I was grinding my teeth by the hundredth repetition. I am currently playing Risen 3 and nearly every other battle my current companion tells the opponent that he is going to make a hat out of his skin.

    Even if you are going to just voice small parts, at least mix it up some, not just a few stock phrases to be repeated over and over.

  • Anon said,

    I used to be an adventurer but then I took an arrow in the knee!

  • ogg said,

    As much as I like hammy acting, I think text serves an open game better. As said by those above, it is more flexible with fewer assumptions about the player or dev process. I play almost entirely open games, and voiced dialogue is too expensive for devs to put in much variety.

    Assuming some dialogue experience like webcomics, at least with text the guard could have mentioned his past a few dozen different ways for the same dev cost.

  • Keldryn said,

    @Steve H:

    Voice recording is usually done later in the production cycle. In the case of a fully-voiced RPG, recording has to start well before all of the content is finally locked-down, simply given the sheer volume of dialogue necessary. Recording sessions need to be scheduled well in advance — often long before the release date is determined.

    The in-game animations and cutscenes can’t be finalized until the recorded dialogue is in place. For example, in a game like Mass Effect or Dragon Age, the characters’ mouth movements need to be synchronized to the dialogue, plus gestures, body movements, and facial expressions need to be timed to the dialogue as well.

    Content isn’t usually completely locked down until developers get to the last couple of months of bug-fixing. Even then, some minor content changes will need to be made. Game design is an iterative process throughout the entire production, no matter how solid the initial design. No design document will survive the production cycle intact. In-house testing is done throughout the entire production, but the really extensive testing doesn’t generally happen until closer to the end, and that’s when things that the developers didn’t think of really start coming to light.

    As for my preferences, I like the “snippets” of voice acting, as in Baldur’s Gate and Planescape: Torment. It’s enough to convey some of the character’s personality without becoming restrictive.

    A fully-voiced conversation in a cutscene will often feel pretty natural. The pacing feels right, and there isn’t an overwhelming amount of information being conveyed.

    The fully-voiced interactive conversations usually end up feeling very stilted and unnatural. The pacing is weird, and it often feels like an interrogation or a lecture. The player asks a question, the NPC responds with 3 paragraphs. With the typical “tree” structure, the conversation jumps from one topic to another, and often the NPC will finish one branch of dialogue sounding sad or angry, only to immediately follow up with the upbeat-sounding “so, what can I do for you today?” root node.

    Characters also frequently look like they are having some sort of motor-control problems, as they are constantly waving their arms around, shifting from one foot to the other, nodding their heads, or pacing back and forth.

    In short, full voice acting in an RPG is very expensive and labor-intensive work that not only imposes a large degree of inflexibility upon the game, but the results are often look and sound unnatural. Plus I can read the dialogue five times over in the time it takes for it to be spoken. 🙂

  • Anon said,

    Good arguments about the pacing and the “mood problems” in interactive conversations!

    And you are spot on with the “motor-control problems”:
    I once had a game where the main character nearly constantly adjusted his glasses – it was unbearable after a about two hours into the game.

    Too many repetitions generally kill the mood, though.

    Example: When the main character in Uncharted 2 meets a yak(?) while in a a Tibetian mountain village he casually says “Moo!”. With the high-quality voice acting by Nolan North this is really funny, simply because it’s somewhat unexpected in this game and it doesn’t interrupt the game play because no text window pops up etc.
    A gag like this only works well once and any repetition instantly kills it. If there are too many repetitions then a game like this would become “the game where the lead character always says Moo! to cows”…