The Role of AI Moderation in Building Safer Social Platforms
On the 11th of May 2026, Generation AI hosted a webinar by Natalie Boll & Connor Bryan from Tribela. Dr Anthony Bridgen reflects on this.
LLM developers espouse an almost endless list of use-cases for their products, whilst researchers and policymakers rightly point out the risks of their deployment for young people. But is there a space for using LLMs to safeguard young people from harmful and inappropriate content? Tribela, a new social media platform prioritising safety over algorithmic engagement, makes a strong case.
A key challenge in building safe, age-appropriate social platforms is moderating content accurately, at scale. As the volume of user-generated content grows, relying solely on human review becomes both impractical and ethically complex, particularly given the exposure to harmful and traumatic material.
We were fortunate to be joined by Natalie Boll (CEO and Founder) and Connor Bryan (Head of Engineering) of Tribela, to explain how they have leveraged AI and Machine Learning to support this challenge. This system has enabled them to improve scale and efficiency of moderation, but also helped define and respond to different thresholds of harm.
Typical social media algorithms are designed to be reactive, activating when content is reported. At which point, harm has already occurred. With 81% of 10-12 year olds on social media and 70% of UK youth having seen harmful content on these platforms, this represents an enormous safeguarding failure. While many countries are discussing banning social media entirely for certain age groups in order to prevent these harms, the data on the efficacy of this approach is mixed. Tribela seeks to moderate upstream to mitigate harms before they happen, whilst preserving the social aspects of social media.
Achieving effective AI moderation is not without its challenges, models are not deterministic, requiring continual training to understand context, prevent spoofing, and keep up with fast evolving trends and slang. A cooking video showing someone using a knife is very different from footage of a knife fight, but they share characteristics. Distinguishing these requires that AI moderators be able to interpret a huge amount of mixed media content, spanning video, audio and text. Analysing behaviour indicators and metadata can be key to understanding content context in order to come to a conclusion and assign confidence to it. It is then up to platforms to decide on the confidence threshold they’re willing to accept and when to refer content to human reviewers. AI moderation is a valuable tool to deal with the scale of content but it is vital that humans remain in-the-loop to deal with edge cases and appeals.
Building these systems requires making value judgements, raising the question of who decides what is ‘appropriate’, something which differs across time, socioeconomic background and culture. How do we safeguard whilst still enabling critical discussion and debate? What do we want social media platforms to be for? To address these questions requires effective governance, system transparency and maintaining ongoing dialogue with users to understand how moderation is experienced.
Whether AI or human, moderation systems shape digital spaces. In deploying them, we must be aware and engaged with considering the kind of system we want to optimise towards.
If you are interested in hearing about future seminars, reach out to global.challenges@reuben.ox.ac.uk