Introducing VoiceFrom
We are building real-time speech-to-speech translation at the level of meaning, not words. From former Google audio AI leads Dominik Roblek and Hassan Rom.
On this page
We are building at the level of meaning, not words.
What to listen for
Translation technology has existed for decades. And for most of that time, it has solved the wrong problem.
The dominant measure of quality has been accuracy: does the output say the same things the speaker said? This is a reasonable starting point. But in live speech, it is not enough. A keynote speaker builds to a peak, slows for emphasis, lets a pause carry weight. A trainer’s warmth signals that a learner should feel safe asking a question. A negotiator’s measured calm communicates confidence that the words alone do not. When you strip all of that out and produce a flat, accurate transcript of what was said, you have not translated the communication. You have translated the surface of it.
This is the problem we are building to solve.
Why the pipeline matters
The reason most AI translation loses meaning is architectural. The standard pipeline converts speech to text, translates the text, and generates new audio from the translation. This approach is efficient, and it produces readable output. But it discards the audio signal at step one, and meaning lives in the audio. Prosody, rhythm, emphasis, emotional register: these are not recoverable from text. Any system that processes language as text cannot preserve the things that make spoken communication distinct from written communication.
We are building directly on the speech signal. This is harder. It requires solving the problem of preserving speaker characteristics (tone, pace, emphasis) across language boundaries in real time, with the latency constraints that live events demand. It is also, we believe, the only approach that can actually make the language barrier invisible, rather than just smaller.
Who we are
We are Dominik Roblek and Hassan Rom. Before VoiceFrom, we spent more than a decade at Google working on audio AI: the systems behind Google Meet, Google Assistant, Pixel Buds, and Waymo. We were inside the infrastructure that hundreds of millions of multilingual conversations depend on, and we watched the same meaning problem go unsolved. Solving it is why we started VoiceFrom.
VoiceFrom Pro today
VoiceFrom Pro is live. It delivers real-time speech-to-speech translation in the browser, with no hardware, no interpreter booths, and no AV setup, for conferences, enterprise events, and live communications in English, Spanish, French, German, Italian, and Portuguese. We were recognized in the Slator 2025 Language AI 50 Under 50.
This is the beginning, not the finished state. We are actively building toward broader language coverage, more robust handling of multi-speaker environments, and deeper integration with the workflows our customers depend on.
Why we’re writing
We are starting this blog because we want to share how we think about the hard problems in real-time speech translation: not just what VoiceFrom does, but why we built it the way we did. We’ll publish technical posts on model architecture and the decisions behind it, field observations from production deployments, and research we find genuinely useful. We’ll also be direct about what is hard, what we have not solved yet, and what we are working on next.
The language barrier is a solvable problem. We are solving it.
Dominik Roblek & Hassan Rom Co-founders, VoiceFrom