[NLP] Korean Hate Speech Detection 한국어 비매너 뉴스 댓글 탐지

비매너 댓글 탐지 리서치를 진행하며 관련 논문을 읽고 제공된 데이터에 대해 토이 프로젝트를 진행했다.

논문 저자는 캐글에 한국 연예 뉴스 댓글을 학습데이터로 제공하고 비매너/매너 댓글 분류하는 대회를 진행중이다.

BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection

논문 링크 : https://arxiv.org/abs/2005.12503

BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection

Toxic comments in online platforms are an unavoidable social issue under the cloak of anonymity. Hate speech detection has been actively done for languages such as English, German, or Italian, where manually labeled corpus has been released. In this work,

arxiv.org

논문에서는 캐글에 3개의 대회를 오픈해두었다고 나와있다.

1. www.kaggle.com/c/korean-gender-bias-detection

2. www.kaggle.com/c/korean-bias-detection

3. www.kaggle.com/c/korean-hate-speech-detection

해당 프로젝트에서 사용할 데이터는 3번째 데이터이다.

Kaggle 링크 : https://www.kaggle.com/c/korean-hate-speech-detection/data

Korean Hate Speech Detection

Identify hate speech in Korean entertainment news comments

www.kaggle.com

Data Description

Overview

The task is to identify hate speech in Korean entertainment news comments. Specifically, use train.hate.csv and dev.hate.csv which are provided a hate label for each comment.

Your job is to train a model and predict label of comments in test.hate.no_label.csv.

We also include {train, dev, test}.news_title.txt, to inform where the comments are from.

Files

train.hate.csv: the training set
dev.hate.csv: the validation set
test.hate.no_label.csv: the test set (w/o label)
train.news_title.txt: article titles of comments in the training set
dev.news_title.txt: article titles of comments in the validation set
test.news_title.txt: article titles of comments in the test set
unlabeled_comments.txt: comments without the label
unlabeled_comments.news_title.txt: article titles of comments without the label

Fields

comments (str) : news comments
label (str) : hate label
- none, offensive, hate

사용 모델

- KcBERT

- KoBERT

'NLP' 카테고리의 다른 글

[NLP] 자연어처리 필독 논문 100선 (0)	2022.05.10
[HuggingFace] 허깅페이스 모델 로컬에 다운 받기 (3)	2022.04.19
[NLP] NLTK 형태소 분석기 POS(Part of Speech) tag 리스트 (0)	2021.09.07
[NLP] 자연어처리 HuggingFace 뽀개기 (3)	2021.06.16

우키독스

[NLP] Korean Hate Speech Detection 한국어 비매너 뉴스 댓글 탐지

Overview

Files

Fields

'NLP' 카테고리의 다른 글

댓글

티스토리툴바

[NLP] Korean Hate Speech Detection 한국어 비매너 뉴스 댓글 탐지

Overview

Files

Fields

'NLP' 카테고리의 다른 글

관련글

댓글

티스토리툴바