Skip to main content

A Netflix data scientist taught an A.I. to recognize smooching scenes in movies

A senior data scientist at Netflix has taught an artificial intelligence (A.I.) algorithm to be able to recognize smooches. Amir Ziai developed the tool, which is able to watch movies and recognize scenes in which the characters lock lips, as part of his work to obtain an A.I. graduate certificate from Stanford University.

He selected kissing scenes because, he told Digital Trends, they can be tough to detect using traditional video-processing techniques. That’s because machines get easily confused and produce false positives when seeing other scenarios in which two people have their heads close to one another — such as talking scenes or ones in which characters walk in close proximity.

Related Videos

“Training was done using a database of Hollywood movies spanning multiple decades and genres,” Ziai said. “I annotated kissing and non-kissing segments in 100 of these movies, and used those segments to train a multimodal neural network that uses both audio and visual features from 1-second segments. The major challenge with training these models is twofold. First, I had to make sure that I’ve annotated a representative set of training examples that’ll help with generalizing to a diverse set of movies. Second, training deep-learning models on video can be very resource intensive.”

Nonetheless, he pulled off the feat, and the resulting tool turns out to be impressively accurate. The system employs a two-phase process. First, it uses a binary classifier to predict whether kissing is or isn’t taking place using features extracted from still frames and audio waves. The second component then aggregates the binary labels for “contiguous non-overlapping segments” into a set of kissing scenes. The final result achieves a validation F1 score of 0.95 on a diverse database of movies.

Ziai, it should be noted, isn’t the only person interested in getting machine intelligence to recognize kisses. Recently, Google unveiled a new feature for its Photobooth tool that prompts Pixel smartphones to automatically take photos when they recognize that the subjects in a frame are kissing.

“System like the kissing detector can be used to automatically add metadata to movies,” Ziai said. “This metadata can be used to search and retrieve relevant snippets. For example, a video editor can use such metadata to quickly find relevant segments and to speed up the process of editing a movie.”

A paper describing the work, titled Detecting Kissing Scenes in a Database of Hollywood Films, is available to read on the preprint server arXiv.