close
close
Wed. Oct 16th, 2024

The multi-task learning model improves the identification of hate speech

The multi-task learning model improves the identification of hate speech

Right-wing political figures are fueling online hatred

Different approaches to addressing labeling biases in hate speech datasets. The traditional Machine Learning approach increases the size of the training dataset by adding more labeled rows with the same label definition, which leads to additional distortion of those labeling criteria. Our new multi-task learning approach makes it possible to increase the number of datasets and definitions in the training pipeline for a more general representation. Credit: Computer speech and language (2024). DOI: 10.1016/j.csl.2024.101690

Researchers have developed a new way to automatically detect hate speech on social media platforms more accurately and consistently using a new multi-task learning (MTL) model; a type of machine learning model that works on multiple data sets.

The spread of offensive hate speech online can deepen political divisions, marginalize vulnerable groups, weaken democracy and cause real harm, including an increased risk of domestic terrorism.

Associate Professor Marian-Andrei Rizoiu, Head of the Behavioral Data Science Lab at the University of Technology Sydney (UTS) is working on the frontlines in the fight against online disinformation and hate speech. His interdisciplinary research combines computer and social sciences to better understand and predict human attention in the online environment, including the types of speech that influence and polarize opinion on digital channels.

“As social media becomes an important part of our daily lives, automatic identification of hateful and offensive content is crucial to combat the spread of harmful content and prevent its harmful consequences,” said Associate Professor Rizoiu.

“Designing effective automatic detection of hate speech is a major challenge. Current models are not very effective at identifying all the different types of hate speech, including racism, sexism, harassment, incitement to violence and extremism.

“This is because current models are only trained on one part of a data set and tested on the same data set. This means that when confronted with new or different data, they can struggle and not perform consistently.”

Associate Professor Rizoiu outlines the new model in the paper ‘Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public Figures’, published in Computer speech and languagewith co-author and UTS Ph.D. candidate Lanqin Yuan.

A multi-task learning model can perform multiple tasks simultaneously and share information between data sets. In this case, it was trained on eight hate speech datasets from platforms including Twitter (now X), Reddit, Gab and the neo-Nazi forum Stormfront.

The MTL model was then tested on a unique dataset of 300,000 tweets from 15 American public figures – such as former presidents, conservative politicians, far-right conspiracy theorists, media pundits and left-wing representatives considered very progressive.

The analysis found that offensive and hate-filled tweets, often containing misogyny and Islamophobia, mainly come from right-wing individuals. Specifically, of the 5,299 offensive posts, 5,093 were generated by right-leaning figures.

“Hate speech as a concept is not easy to quantify. It lies on a continuum with offensive speech and other offensive content such as bullying and harassment,” Rizoiu said.

The United Nations defines hate speech as “any form of communication, whether speech, writing or behavior, that attacks or uses pejorative or discriminatory language relating to a person or a group on the basis of who they are”, including their religion, race, gender or other identity. factor.

The MTL model was able to separate abuse from hate speech and identify certain topics, including Islam, women, ethnicity and immigrants.

More information:
Lanqin Yuan et al, Generalizing hate speech detection using multi-task learning: a case study of political public figures, Computer speech and language (2024). DOI: 10.1016/j.csl.2024.101690

Provided by the University of Technology, Sydney

Quote: Multi-task learning model improves hate speech identification (2024, October 14) retrieved October 14, 2024 from https://techxplore.com/news/2024-10-multi-task-speech-identification.html

This document is copyrighted. Except for fair dealing purposes for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.

By Sheisoe

Related Post