Hsuan-Yu Chou, Wajiha Naveed, Shuyan Zhou, Xiaowei Yang
Open-weight large language models (LLMs) show promise for social media moderation, with performance comparable to proprietary models in detecting harmful content on platforms like Bluesky.
With the increasing amount of harmful content on social media, effective moderation is crucial. This study investigates whether open-weight large language models (LLMs), which are freely accessible, can effectively moderate content on social media platforms like Bluesky. The researchers compared various LLMs and found that open-weight models perform similarly to proprietary ones in detecting harmful posts. This suggests that open-weight LLMs could be a feasible option for moderating content while maintaining user privacy and operating on standard consumer hardware.