Experimental Design

In this section, we outline the experimental setup used to compare the accuracy of AI content detection tools (GPTZero, ZeroGPT, and isGPT) across different types of writing. The evaluation focuses on both human-written and AI-generated content, assessing detection performance based on error rates and overall accuracy.

Sample Types

Human-Written Group: 60 samples, including:
- 20 Study Abroad Documents (Personal Statements/Recommendation Letters)
- 20 Academic Paper Paragraphs (pre-2019)
- 20 Daily Writing (English press releases, technical blogs, Twitter posts)
AI-Generated Group: 60 samples, corresponding to the same themes, generated using GPT-4o and DeepSeek models.

Statistical Metrics

False Positive Rate: The percentage of human-written texts mistakenly flagged as AI (Human → AI).
False Negative Rate: The percentage of AI-generated texts that are not detected (AI → Human).
Overall Accuracy: Correct detections divided by the total number of samples.

1. Study Abroad Application Documents (Personal Statements / Recommendation Letters)

	FP(Human → AI)	FN (AI → Human)	Overall Accuracy
GPTZero	20% (4/20)	15% (3/20)	82.5%
ZeroGPT	25% (5/20)	30% (6/20)	72.5%
isGPT	5% (1/20)	15% (3/20)	90%

2. Academic Paper Paragraph Detection

	FP (Human → AI)	FN (AI → Human)	Overall Accuracy
GPTZero	10% (2/20)	15% (3/20)	87.5%
ZeroGPT	20% (4/20)	25% (5/20)	72.5%
isGPT	15% (3/20)	20% (4/20)	82.5%

In the academic paper detection, GPTZero outperforms isGPT, with both tools performing better than ZeroGPT.

3. Daily Writing (Social Media/Blogs)

	FP (Human → AI)	FN (AI → Human)	Overall Accuracy
GPTZero	40% (8/20)	35% (7/20)	62.5%
isGPT	35% (7/20)	30% (6/20)	67.5%
ZeroGPT	45% (9/20)	40% (8/20)	57.5%

For daily writing samples such as social media posts and blogs, all three tools perform poorly, with GPTZero and isGPT showing somewhat better results than ZeroGPT.

Overall Summary

isGPT and GPTZero tie for first place, each receiving a 2-star rating due to their overall strong performance across multiple types of writing.
ZeroGPT receives a 1-star rating, as it generally performs worse across all categories, especially in detecting academic and daily writing samples.

This experiment provides valuable insights into the strengths and weaknesses of each AI content detection tool, helping users make informed decisions when selecting a tool for AI detection.

AI Content Detection - A Comparison of GPTZero, ZeroGPT, and isGPT (Part 2 - Accuracy)