A new app that makes just about any photo literally sing has become hugely popular on social media.
The tool, known as Wombo.AI, comes in the form of an iPhone or Android app. All a person needs to do to turn someone into a singer is download the app, choose their image or take a new one, pick a song from a limited list, and then let the app do the work.
The service is incredibly simple and allows anyone to participate in what appears to be a growing trend of deepfakes or synthetic media. Such tools use machine learning to spot the parts of a face that need to be animated, and move them in time with the music.
But the simplicity of the tool might also lead to concerns that the app is doing something more sinister with the images than simply turning them into popstars. Previous such apps – such as those that make people old, for instance, or apply other effects – have led to warnings about what is actually being done with the relatively detailed amount of data that needs to be handed over.
It is also explicit that it will delete what it calls “facial feature data” after the images are created. While it may retain some other information, it will only do so to improve the app, it promises.
Apple recently forced all app developers to give detailed information on their data collection in their App Store listings. Those “nutrition cards” for Wombo make the same claims: that no personally identifying information is collected, and that other information is kept to a minimum.
Rather than making money through selling or using personal data, Wombo runs as a “freemium” service that pushes people to pay to sign up to get its full range of features. It costs £4.49 per month or £26.99 per year – with a free three-day trial – and gives faster processing and no ads.
Paying for the full subscription does not give access to any other songs, and there is a relatively limited selection, though they are mostly very popular.
The app has already been used on just about everyone, from popular video game characters to chairs of the US Federal Reserve.
The success of the creations seems to vary on the quality of the image. Pictures that are more three-dimensional – rather than flat illustrations, for instance – seem to work better, and so do images where the subject is facing and looking at the camera.
Wombo also advises that pictures in which a person’s teeth can be seen are best. When used with those where a person’s mouth is very closed, the videos can sometimes lead to stretched and uncanny singing.