517 shaares
We demonstrate through human evaluation that existing de-
tectors of machine-generated text are effective at predicting
low quality pages, outperforming, quite surprisingly, super-
vised spam classifiers. To our knowledge, this is the first use
of machine detection for a different NLP task.
• Using half a billion webpages, we conduct the largest appli-
cation of the detection models in the wild.
• We quantify the low quality pages that are surfaced by our
detector models. We perform extensive analysis, breaking
them down by attributes such as document length, age, and
topic.
• We qualitatively characterize and categorize the nature of the
low quality documents. We find traces of essay generation
farms, machine translated text, keyword optimizations, and
Not-Safe-For-Work (NSFW) content.
L'étude est intéressante et permet de mieux comprendre comment GTP fonctionne en arrière.
tectors of machine-generated text are effective at predicting
low quality pages, outperforming, quite surprisingly, super-
vised spam classifiers. To our knowledge, this is the first use
of machine detection for a different NLP task.
• Using half a billion webpages, we conduct the largest appli-
cation of the detection models in the wild.
• We quantify the low quality pages that are surfaced by our
detector models. We perform extensive analysis, breaking
them down by attributes such as document length, age, and
topic.
• We qualitatively characterize and categorize the nature of the
low quality documents. We find traces of essay generation
farms, machine translated text, keyword optimizations, and
Not-Safe-For-Work (NSFW) content.
L'étude est intéressante et permet de mieux comprendre comment GTP fonctionne en arrière.