Evaluation in Online Safety, a discussion of hate speech classification and safety measures

12 March 2024

In this report we discuss the existing literature on the effectiveness of a variety of safety measures applied by some online platforms at reducing hate speech online. We summarise the key findings, highlight gaps, and offer suggestions to shape the direction of future research. In particular, we highlight the importance of assessing the accuracy of hate speech classification by conducting our own analysis of the accuracy of commonly used hate speech classifiers and exploring the implications for research on the effectiveness of a safety measure.

Evaluation in Online Safety, a discussion of hate speech classification and safety measures (PDF, 424.4 KB)

Gwerthusiad o Ddiogelwch Ar-lein, trafodaeth ar ddosbarthiad iaith casineb a mesurau diogelwch (PDF, 161.2 KB)