They programmed a robot to watch YouTube for hours… and what it learned wasn’t programmed into it (it’s a bit dizzying)

Published On: March 6, 2026 at 9:30 AM

Follow Us

The EMO humanoid robot face, featuring 26 actuators beneath a silicone skin, practicing lip-syncing gestures in front of a mirror.

A soft faced humanoid robot has learned to move its lips in sync with speech and song simply by watching people talk on video and studying its own reflection in a mirror.

The machine, called EMO, was built at Columbia Engineering and is at the center of a new study showing thatrobots can pick up complex speech related gestures through observation instead of hand written code.

The work, which appears in the journal Science Robotics, points to a future where robot conversations feel far less stiff and cartoonish than they do today.

Also Read: Luxury’s worst moment in years is starting to hit Hermès, Gucci, and LVMH, and the real shock is that even the safest names look exposed

The work, which appears in the journal Science Robotics, points to a future where robot conversations feel far less stiff and cartoonish than they do today. The work, which appears in the journal Science Robotics, points to a future where robot conversations feel far less stiff and cartoonish than they do today.

The work, which appears in the journal Science Robotics, points to a future where robot conversations feel far less stiff and cartoonish than they do today.

Why faces and lips matter in conversation

If you have ever found yourself staring at someone’s mouth while they speak, you are not alone. Eye tracking studies suggest humans devote a notable share of their attention to lips and lower face during conversation, which is one reason clumsy mouth motion makes many robots feel unsettling. EMO tries to solve that.

Its silicone face is driven by 26 tiny motors that can pull and push the lips with fine control, more like human muscle than the rigid jaws seen on many social robots.

How EMO trained itself using a mirror and YouTube

Training started with a kind of robotic mirror play. Engineers sat EMO in front of a reflective surface and let it fire off thousands of random expressions while a vision to action model learned how different motor patterns produced different mouth shapes.

Also Read: China builds a high-speed railway line in 9 hours with 1,500 workers: a 7-hour journey is reduced to 90 minutes

Once the system understood its own face, the team fed it hours of talking and singing clips on YouTube. By matching the sounds it heard with the lip positions it saw, the robot gradually learned to turn raw audio into the right sequence of facial movements across ten different languages.

What people noticed in tests

To find out whether people actually bought the effect, the researchers showed videos of EMO speaking to more than one thousand volunteers.

Viewers compared three different control methods against a reference of ideal lip motion and chose the new vision to action approach in roughly sixty two percent of trials, far ahead of the simpler baselines that only tracked loudness or copied past examples.

YouTube: @ColumbiaSEAS.

Hard consonants such as B and sounds that require lip puckering still trip the system up, but the team expects performance to improve as EMO keeps “listening” and practicing.

Why this could change human robot interaction

For the most part, the bigger story is what happens when this kind of realistic face is paired with conversational artificial intelligence.

Lead author Yuhang Hu notes that combining fluent lip syncing with modern dialogue models could make exchanges with robots feel more like talking to another person than to a machine, especially in settings such as classrooms, hospitals, or elder care homes where empathy and trust matter.

Also Read: Europe accelerates its military technology in response to Russia’s hypersonic missile: the terrifying gap and the race to close it

That possibility cuts both ways. Study supervisor Hod Lipson has warned that robots which smile and speak convincingly will be powerful tools and should be developed slowly and carefully so they help people without misleading them.

If billions of humanoid machines are coming, as some economists suggest, then teaching them to “use their face” responsibly may matter as much as teaching them to walk.

The study was published on Science Robotics.

Techy44

Related Posts

A close-up of used coffee grounds being processed in a laboratory beaker for biofuel extraction.

Used coffee grounds are turning into a new kind of fuel, and the real shock is that yesterday’s cup may help replace fossil energy

April 20, 2026 at 6:45 PM

Vibrant green and red laser beams projecting across a dark highway at night to alert drivers.

China is shooting lasers across highways so drivers do not fall asleep, and the real shock is that the road now tries to wake you up

April 20, 2026 at 6:00 AM

A silver Apple Mac mini connected via a Thunderbolt cable to a large external GPU enclosure.

What looked like Apple’s smallest desktop is turning into an AI monster, because the Mac mini can now borrow serious power from the outside

April 19, 2026 at 3:45 PM

A close-up view of a disassembled reusable water bottle and its lid components soaking in soapy water next to a cleaning brush.

The healthy habit no one suspects may be turning dirty inside your bag, and the risk hits harder when the bottle belongs to a child or older adult

April 19, 2026 at 7:00 AM

A portable, sleek lithium iron phosphate battery generator powering home appliances during a blackout.

A teenage entrepreneur is chasing portable energy with an invention that looks far bigger than a school project, and that is why people are paying attention

April 18, 2026 at 9:30 AM

An unprepared day hiker checking a smartphone with zero signal on a rocky, high-altitude mountain trail.

The mountain mistake most people never notice starts before the first step, and a new study says confidence is hiding a dangerous gear gap

April 17, 2026 at 6:45 PM

Leave a Comment Cancel reply