Children’s Privacy at Risk as AI Trained on Unauthorized Photos

A recent report by Human Rights Watch (HRW) has revealed that photos of Brazilian children, some spanning their entire childhood, have been used without their consent to train artificial intelligence (AI) tools, such as popular image generators like Stable Diffusion. This practice poses significant privacy risks to children and increases the likelihood of non-consensual AI-generated images bearing their likeness.

HRW researcher Hye Jung Han uncovered this issue by analyzing a dataset called LAION-5B, which contains image-text pairs derived from 5.85 billion images and captions posted online since 2008. Within this dataset, Han discovered 170 photos of children from various Brazilian states, sourced from family photos on personal and parenting blogs, as well as stills from low-view YouTube videos.

The German nonprofit organization LAION, responsible for creating the dataset, has collaborated with HRW to remove links to the children’s images. However, the report warns that this action may not fully address the problem, as there could be a significant amount of children’s personal data still present in the dataset. Additionally, the images remain on the public web, where they can be accessed and utilized in other AI datasets.

HRW’s analysis found that many of the Brazilian children’s identities were easily identifiable due to names and locations included in image captions. This raises concerns about the potential misuse of these images, especially in the context of generating deepfake content targeting children.

While LAION has taken down all publicly available versions of LAION-5B as a precautionary measure, efforts are underway to remove any illegal content from the dataset before republishing it. In Brazil, instances have been reported where AI tools were used to create explicit deepfakes of girls based on photos from their social media profiles, highlighting the lasting harm that such actions can cause.

HRW emphasized the need for government policies to safeguard children’s data from misuse by AI technologies. It is crucial to address these privacy concerns and protect children from potential exploitation in the digital age.

As the conversation around AI ethics and data privacy continues, it is essential to prioritize the well-being and safety of vulnerable populations, particularly children whose images are being used without their consent. Efforts to regulate and monitor the use of AI in sensitive contexts like this are crucial to prevent further harm and uphold ethical standards in technology development.

Hye Jung Han is a researcher at Human Rights Watch (HRW) who played a key role in uncovering the unauthorized use of children’s photos in AI training datasets. With a background in digital rights advocacy and data privacy, Han’s work focuses on protecting vulnerable populations from potential harms associated with emerging technologies. She holds a degree in computer science and has extensive experience in analyzing data sets for human rights violations. Han is committed to raising awareness about the ethical implications of AI technology and advocating for stronger regulations to ensure the protection of individuals’ privacy rights.