The elevator pitch for Rillavoice goes something like this: “Rilla builds speech analytics for conversations between store associates and shoppers in physical stores. Associates clip on a mic, connected to their phone and they talk to customers like normal. Our AI listens, and captures anonymized analytics from their conversations like what the customer cares about, and how was their experience. It’s like Google Analytics for offline commerce.”
Simple. But that’s on purpose.
It’s actually a bit complicated. And not because of what you think actually.
The part where we take all these conversations between associates and shoppers and use AI to turn them into numbers? That’s actually the easy part.
The hard part is actually the one where we have to capture the conversations in the first place.
I know. You’re probably like “huh? What do you mean? Isn’t that just pressing record on the phone and call it a day?” Well, yes, but not really. The reason? Gossip.
You see, when you have people working in a store for hours and hours on end, you’re bound to get them talking with each other about their private lives. What we refer to in the common vernacular as gossip.
And that’s a good thing. It’s a human thing. It’s only natural when you have people spending so much time with each other.
But this, this is the kind of thing that makes our engineers scratch their heads.
Because you see, we don’t want to process these conversations.
These are private conversations between employees that are not relevant at all for improving the business.
The point of our technology is to understand how to give your shoppers a better experience, not to spy on the private conversations between store associates.
So it poses a very peculiar technical challenge: how do we capture the relevant conversations between shoppers and store associates without processing the private conversations store associates have with each other?
Naturally, our first idea was just to tell the store associates to start and stop the recording any time they talked with a customer.
But associates are people, and people forget.
They don’t think about it all the time.
Especially when they’re busy stocking shelves, dealing with complaints, and trying to help customers.
Last thing they want on their mind is to remember to pull their phone out any time someone talks to them.
And what if a shopper just asks a quick question about where a particular product is? Is the store associate supposed to say “hold on, let me turn this thing on… can you repeat that again please?”
No, that’s annoying. That’s making the customer experience worse not better. So we thought of another thing.
“What if we just process all the audio, use Automatic Speech Recognition to transcribe it, and then use Natural Language Processing to find where the conversations with shoppers are, and then we throw out the rest?”
That’s more reasonable, but still not good enough.
Sometimes store associates talk about products in the store with each other, and that means the NLP will think it’s a conversation with a shopper, and then some of the private store associate conversations will still remain in the system. That’s no bueno.
What about using wake words like Alexa? Instead of listening for “Hey Alexa”, the app is only activated when the store associate says “Hey welcome” or “Do you need any help?”
That’s much better, but still not good enough.
Conversations with shoppers don’t always start the same, so you’re bound to lose many important conversations.
Then there’s also the risk that the store associate mentions a wake word when they’re talking to their colleagues and the app starts recording by accident. No bueno.
That’s when it came to us.
The third vector of the Anti-Gossip Algorithm: voice identification.
One of the things we’ve always been really good at is at identifying individual voices of people.
Like we’re really, really good at that.
So good that one time I tried to trick our AI by doing my Scooby Doo impression and it could still figure out it was me.
You might be thinking that’s just because I’m bad at impressions (which I kinda am), but my Scooby Doo impression is actually flawless.
So the AI is actually good.
So we thought: “Why don’t we just use our fantastic voice ID model for this?” And that’s what we did.
Any time a store associate signs on to the app for the first time they have to speak for like a minute. That gives us their voice print.
At the same time, when the store associates talk with shoppers, we are able to identify whenever there is a unique shopper voice print.
So we have identifiable voice prints (store associates) and unidentifiable voice prints (shoppers) whose identity we don’t know.
Once we’ve signed up all the store staff into the app, it becomes a very simple equation.
“If unidentifiable voice print not in conversation, throw out conversation.”
Any time store associates are speaking to each other without a shopper in the conversation, it will be thrown out.
Using this as the foundation, and then using Natural Language Processing to make sure the conversations are relevant for customer experience, makes this a sure fire way to prevent private conversations store associates have with each other from ever being processed, stored or analyzed by our system.
In practical terms, this means store associates can start their shift, they can turn on the Rillavoice mobile app, and then they can go about their day without having to worry about turning off the app any time.
Their private conversations with each other will never be captured, processed or stored.
This is not to say that store associates can’t turn off the app. They absolutely can.
It’s a one button app. You start, you stop.
If the store associate is going on a break. They click stop
If they go to the bathroom they click stop.
If they forget to click stop and leave it on by accident, it won’t matter.
Our two layers of protection (voice identification + natural language processing) will prevent their private conversations with anyone who’s not a shopper from being processed or analyzed by our system.