Wei Yan's Work | Senior Product Designer in NYC

Other work

A sound-to-sound AI engine

Speech tone visualization

Real-time speed monitoring

Onboarding

Reflection

Acknowledgements

Coach Ana - AI powered speech training

A sound-to-sound AI engine

During my time working at Giving Tech Labs, a social impact tech incubator, I had the honor to work closely with Dr.Ying Li, a world-renowned data scientist recognized for her contributions to pracice of data mining. SuvoTek[1] is a specialized privacy-fist, speech-to-text-free voice technology she designed to help people communicate, especially by identifying the tone of speech. Our initial research shows SuvoTek stook out to teachers/parents. This user group are most interested in understanding their tone when communicating with kids, and care most about privacy.

I was excited to do a quick prototype with one of our engs. The goal is to experiement, test how applicable the tech is solve their real problem.

Speech tone visualization

The first question is how to effectively visualize the emotion/tone of a speech?

Plutchik's Wheel of Emotions

Surfacing specific emotions during a live-time training would be overwelming to users' cognitive load. The idea was to sythesize the emotions to a granuity that's easy to capture. Blue --> cold --> calm and yellow --> warm --> high energy is a universal color concept understood by diverse user group.

Color code to indicate tone

One thing to be careful. Both Joy and Anger could mean high energy. In the visual language, I wanted to be objective and not labeling something as negative or positive. So I chose a more objective tone "energy".

Summary screen after a training session

The summary dashboard offers an overview of how the emotion flows during the entire speech. It's designed to be glanceable.

Real-time speed monitoring

Speed is another key indicator of speaker's cognitive clarity. Different context requires different pace. A controlled, deliberate pace suggests that the speaker is in command of the material as well as help the listener to process information. This is especially critical during high-stress situations, like for crisis line operators.

Underlining logic to set the desired target speed

The premeters that's affecting what target speed is suggested is based on audience, intention, and setting. I translated these info as an onboarding experience, providing users with options to start.

The onboarding flow

The next challenge is to find the way to show real-time speed data, on top of the speech tone.

Visual research into how speed is represented and felt

Speed info should be glanceable. I want to minimize the cognitive load needed for users to get feedback. Information needs to be glanceable.

The speed wheel in active use

I landed on speedometer. It mimics real-life experience of driving while checking speed limit, to keep it on the targeted speed. A mental model familar to users. The goal is to keep the interface clean and minimal. Main actions "Pause" and "Stop" easily accessible.

Onboarding

I tested with 9 users to check on flow clarity. 4 out of 9 were not 100% sure of what everything means, even they guessed it right. To give users full confidence before they commit to the app, we rolled out a short onboarding.

The onboarding walkthrough

Reflection

We saw positive signals about the idea of tha app. However, to make it useful, just showing speed and tone seems not sufficient to win users' trust. More work needs to get done to get into naunce about defining what's a high-quality speech based on different contexts. At the time, the voice technology wasn't mature enough to do so. I hope one day it will. The mission of helping people with special needs to improve their life quality while protecting their privacy is something I want to stand behind, and felt lucky to get a chance to explore.

Acknowledgments Special thanks to Kyle Coburn and Dr. Ying Li for collaborating on this.

[1] If you are a nerd and wanted to read more about it, here's the research paper: https://dl.acm.org/doi/epdf/10.1145/3394486.3403326

Other work