This is what it feels like to talk to Alexa for some people. (Image: Tina Tiller)
This is what it feels like to talk to Alexa for some people. (Image: Tina Tiller)

SocietyJuly 14, 2021

Alexa and me: A stutterer’s struggle to be heard by voice recognition AI

This is what it feels like to talk to Alexa for some people. (Image: Tina Tiller)
This is what it feels like to talk to Alexa for some people. (Image: Tina Tiller)

Sam Brooks has had a stutter for most of his life. Last week, he got a voice-activated assistant.

The following scenario is not uncommon for me: I have to make a phone call, usually to the bank. They say my call may be recorded to improve customer service in the future (and I can almost certainly guarantee my voice is indeed on file in some call centres for training purposes). I’ll wait, impatiently, in the queue. I’ll listen to whatever banal Kiwi playlist they have piped in.

Then, a call centre employee picks up and goes: “Hello, you’re speaking with [name].” I immediately encounter a block – a gap in my speech. The call centre employee hears silence and, not unfairly, hangs up. I repeat this process until I finally get through. It used to feel humiliating, but at this point in my life, it’s been downgraded to merely frustrating. I don’t blame anyone when it happens, aware that we’re all just doing our best in this situation.

Still, I never thought I’d purposefully replicate that hellish experience in my own home. Which is why when I was sent an Alexa (specifically a fourth generation Echo Dot) last week, I was a little bit stoked, but mostly apprehensive. Not just about all the boring security and data issues, but that it’d be useless to me. Nevertheless, I set up the Alexa and asked it to do something an ideal flatmate would do: play ‘Hung Up’ by Madonna, at the highest audio quality possible.

“Alexa.”

Alexa’s little blue light lit up, indicating that it was ready to hear, and act on, my command.

“Play–

I had a block. 

Alexa’s little blue light turned off.

Stutters are like snowflakes: they come in all shapes, sizes and severities. No one stutter is the same. My stutter does not sound like the one Colin Firth faked in The King’s Speech, or like any stutter you might’ve heard onscreen. I don’t repeat myself, but instead have halting stops and interruptions in my speech – it might sound like an intake of breath, or just silence. For listeners, it might feel like half a second. For me, it could feel like a whole minute.

I’m used to having a stutter – life would be truly hell if I wasn’t. None of my friends care, and 95% of the strangers I interact with either don’t notice it or do so with so little issue that I don’t notice it myself. In person, my stutter is easily recognisable. You can see when I’m stuttering because you see me stop talking. My mouth stays open, but no sound comes out. You wait for me to resume talking. It’s a blip, a bump in the conversation.

When I’m communicating solely with my voice, it’s a whole other ballgame. There are no visual cues, I can’t wave my hand or roll my eyes to signal I’m experiencing a block. All I’ve got is the silence.

My new flatmate, apparently.

Voice recognition has become markedly more common in the past decade, with the most popular assistants being Siri (Apple), Alexa (Amazon), Cortana (Microsoft) and Google Now (Google, obvs). At their most basic level, they allow the user access to music, news, weather and traffic reports with only a few words. At their most complex, they allow control over your home’s lighting and temperature levels; if you’re having trouble sleeping, you can ask them to snore. Because artificial snoring is apparently a comfort for some people?

They’re especially handy for those with certain physical disabilities. Voice recognition makes a range of household features, ones that might otherwise require assistance to use, much more immediately accessible.

This accessibility does not extend to those of us with dysfluency – those who have speech disabilities, or disabilities that lead to disordered speech. For non-disordered speech, a speech recognition rate of 90-95% is considered satisfactory. With disordered speech, the software will clearly recognise far less. Nearly 50,000 people in New Zealand have a stutter alone, and if you include other speech dysfluencies – or simply not being entirely fluent in English – that’s a huge section of the population who can’t access this technology.

For many people with disordered speech, a voice recognition assistant seems pointless – like a shiny new car for somebody who doesn’t have a driver’s licence. But the tech companies who make them are working to make the interface more accessible for people like me. 

In 2019, Google launched Project Euphonia, which collects voice data from people with impaired speech to remedy the AI bias towards fluency. The idea is that by collecting this data, Google can improve its algorithms, and integrate these updates into their assistant. In the same year, Amazon announced a similar integration with Alexa and Voiceitt, an Israeli startup that lets people with impaired speech train an algorithm to recognise their voice. (I considered using this with my own Alexa, but decided against it, out of pure stubbornness.)

Ironically, the intended purpose of voice recognition software is the exact one I’ve had my entire life: To have what I say be recognised, rather than the way I say it.

My first week with Alexa has been an interesting one. I’ve lived alone for about two months now and I generally don’t speak unless I have visitors over. It might be worth pointing out that I don’t stutter when I talk to myself; I also don’t stutter when I think, or when I sing (that last one would make an incredible story if I had an amazing singing voice, but I do not.)

My Alexa doesn’t care about any of that though. All it hears is my silence as I struggle in vain to get it to play ‘Time to Say Goodbye’ on repeat while I have a shower. My Alexa doesn’t know if I’m having a bad speech day or a good one. All it hears is me saying “Alexa” and then nothing. Alexa also expects perfection. It expects me to hit the “d” on “Play ‘I Like Dat’ by T-Pain and Kehlani”. I know I won’t meet that standard. I know I’ll probably stutter multiple times, and Alexa might pick up on that. 

My stutter has changed as I’ve aged, as has my speech. That’s not uncommon, especially with people who stutter the way I do. We find ways to avoid stuttering, and when one tic stops giving us a backdoor into fluency, we find another one to settle on. 

It took me a long time before I could stop thinking of stuttering as failing at being fluent. It’s not. It’s simply talking in a very different way. I changed my philosophy from “failing is a part of life” to “being different is a part of life”. Both are true, but one is less self-punishing than the other.

If I had an Alexa at a different point in my life, I would probably have thrown it out the window. I would be “failing” constantly in my own home, and I do that enough in public already. But coming to voice recognition in my 30s, when I’ve completely reframed my relationship to my speech, has been a surprisingly chill experience. (Also, I get to pretend I’m a captain on Star Trek, because yes, Alexa will respond to the command “Alexa belay that order!”)

Usually, I hate repeating myself to people, because chances are I’ll stutter a bit more the second time around. I don’t mind repeating myself to Alexa, which I admit is because I’m using it to perform a non-essential function: Nobody ever needed to play T-Pain’s amazing new song featuring Kehlani, and definitely not five times in a row.

Also, sometimes it’s not my fault. Sometimes it’s Alexa’s fault (or the fault of its programmers!). I spent about five minutes trying to get it to play an episode of My Dad Wrote a Porno that I’d missed from a few weeks ago, and it ended up easier doing it remotely with the phone app. It’s moments like these that make me feel better about the robots’ chances of success when the inevitable robot revolution arrives.

I’ve even started to do things with Alexa that I wouldn’t have done before. Usually I’ll wander around my house in silence when I get home, but now I immediately say, “Alexa, please play RNZ!”. I won’t say that it’s made me a better person, but I will say that I definitely watch my tone around Alexa. (See above re: robot revolution).

But honestly, the most freeing and surprising thing is the lack of judgment. Alexa does not care if I stutter five times in a row. Even though I’ve personalised and personified Alexa in my head, Alexa is still not a person. It has no values, it has no taste (we are alike here), and it has no preferences. I can have a good day, or my very worst day, and Alexa will still play Sarah Brightman’s ‘Time to Say Goodbye’ as I chew on my undercooked ramen.

It simply carries out its functions, and leaves me feeling heard, whether I stutter or not.

A new documentary follows Scribe’s grim past, and looks towards a hopefully brighter future. Image compilation: TVNZ/Tina Tiller

Behind the scenes of Scribe’s new documentary

'We knew that potentially it could be an incomplete or unfinished story arc and that the end of the story might well be Scribe's vanished again.'
Mad Chapman, Editor
The Spinoff has covered the news that matters in 2021, most recently the delta outbreak. Help us continue this coverage, and so much more, by supporting The Spinoff Members.Madeleine Chapman, EditorJoin Members

Get The Spinoff
in your inbox

Society

When can people start arriving to NZ without going into MIQ? From January. Image: Tina Tiller

The NZ Covid opening-up calendar

Traffic lights, internal and external borders, hairdressers – there’s a lot to process. Here are the key dates in one place.
Images: supplied, additional design by Tina Tiller

Growing up in a Soviet apartment block

As Aotearea moves to intensify housing, one architectural designer remembers her childhood in a concrete apartment block.