Mantra Mirror

Mantra mirror is a design practice, an attempt to create an engaging mobile experience for the users within the limitations of Image Object Recognition and Text-to-speech capabilities.
DateOctober 2017
RoleUI/UX Designer, Web Developer
MembersWangshu Sun

Smile in the camera, then you’ll hear a random quote like this:


When you really don’t feel like smiling, it will try to comfort you:


Photo credit: Torrey Wiley

Voice source: Amazon Polly, Justin.



If I have Image Object Recognition and Text-to-speech technology, how to create an interesting mobile experience around these capabilities?

Within one week, I brainstormed about the possibilities to combine their strengths and appeal to most users, and created a functional prototype with HTML/CSS/Javascript, with Microsoft Emotion API and Amazon Polly.





It would feel unnatural if the robot just keep saying “Your smile is beautiful”

  • Solution: Reference the famous saying in history:


What if the user just cannot smile at that moment?

  • Instead of punishing someone who don’t smile, I think it should be better to show some level of empathy.


Why using text-to-speech instead of pre-recorded voice?

Text-to-speech can say anything you want, say it a thousand times without effort.


Video vs image

At last chose uploading an image instead of spying the video stream.

Reason: cost.

  • Time of API response: ~2s.
  • Google Cloud Vision: $1 per 1000 units.


UI design



Based on


Live Prototype

Live demo:

Sample voice when you smile (Amazon Polly, Justin):

Sample voice when you show a sad face:

Further thoughts:

  • $$: Mental health industry
  • Oct. 2017, smile. Emotion diary.
  • Social? This experience is rather private for now, otherwise it could be made into a party game, like “try not to laugh first

Unused Ideas: Language

Theme: Language

An app to help you say foreign language by inputing images.

Foreign Language

  • Image-> text, something you cannot see.

Pronunciation of foreign language

  • Text -> image, something you cannot say.


Reasons not using:

  • There already exists similar apps like OCR and Google translate.
  • It’s best to use scanned image and printed text for OCR, which is not so convenient for a mobile phone.


Unused Ideas: Extended Vision

Theme: Extended Vision for Driving

An app that can hint about approaching cars behind you when driving.

Something at your back

  • Image-> text, things at your back is hard to see. Even though cars have back mirrors, the drivers have to pay constant attention to it, while looking front at the same time, the cognitive load is high.

Voice assistance

  • Text-> speech. The information on the phone is also hard to see, so it is best given via voice.


Reasons not using:

  • Why using a mobile phone in a driving context?
  • Doubtful if technology is ripe for yielding distance information with only one camera on the phone. 2D object recognition is known to be not as precise as 3D object recognition. If instructions given are not helpful, it could be instead additional distractions and could lead to disaster.