AI Content Detection - A Comparison of GPTZero, ZeroGPT, and isGPT (Part 2 - Accuracy)
Experimental Design
In this section, we outline the experimental setup used to compare the accuracy of AI content detection tools (GPTZero, ZeroGPT, and isGPT) across different types of writing. The evaluation focuses on both human-written and AI-generated content, assessing detection performance based on error rates and overall accuracy.
Sample Types
Human-Written Group: 60 samples, including:
- 20 Study Abroad Documents (Personal Statements/Recommendation Letters)
- 20 Academic Paper Paragraphs (pre-2019)
- 20 Daily Writing (English press releases, technical blogs, Twitter posts)
AI-Generated Group: 60 samples, corresponding to the same themes, generated using GPT-4o and DeepSeek models.
Statistical Metrics
- False Positive Rate: The percentage of human-written texts mistakenly flagged as AI (Human → AI).
- False Negative Rate: The percentage of AI-generated texts that are not detected (AI → Human).
- Overall Accuracy: Correct detections divided by the total number of samples.
1. Study Abroad Application Documents (Personal Statements / Recommendation Letters)
FP(Human → AI) | FN (AI → Human) | Overall Accuracy | |
---|---|---|---|
GPTZero | 20% (4/20) | 15% (3/20) | 82.5% |
ZeroGPT | 25% (5/20) | 30% (6/20) | 72.5% |
isGPT | 5% (1/20) | 15% (3/20) | 90% |
2. Academic Paper Paragraph Detection
FP (Human → AI) | FN (AI → Human) | Overall Accuracy | |
---|---|---|---|
GPTZero | 10% (2/20) | 15% (3/20) | 87.5% |
ZeroGPT | 20% (4/20) | 25% (5/20) | 72.5% |
isGPT | 15% (3/20) | 20% (4/20) | 82.5% |
In the academic paper detection, GPTZero outperforms isGPT, with both tools performing better than ZeroGPT.
3. Daily Writing (Social Media/Blogs)
FP (Human → AI) | FN (AI → Human) | Overall Accuracy | |
---|---|---|---|
GPTZero | 40% (8/20) | 35% (7/20) | 62.5% |
isGPT | 35% (7/20) | 30% (6/20) | 67.5% |
ZeroGPT | 45% (9/20) | 40% (8/20) | 57.5% |
For daily writing samples such as social media posts and blogs, all three tools perform poorly, with GPTZero and isGPT showing somewhat better results than ZeroGPT.
Overall Summary
- isGPT and GPTZero tie for first place, each receiving a 2-star rating due to their overall strong performance across multiple types of writing.
- ZeroGPT receives a 1-star rating, as it generally performs worse across all categories, especially in detecting academic and daily writing samples.
This experiment provides valuable insights into the strengths and weaknesses of each AI content detection tool, helping users make informed decisions when selecting a tool for AI detection.