Privacy Policy

AI Content Detection - A Comparison of GPTZero, ZeroGPT, and isGPT (Part 2 - Accuracy)

July 24, 2025

Experimental Design

In this section, we outline the experimental setup used to compare the accuracy of AI content detection tools (GPTZero, ZeroGPT, and isGPT) across different types of writing. The evaluation focuses on both human-written and AI-generated content, assessing detection performance based on error rates and overall accuracy.

Sample Types

  • Human-Written Group: 60 samples, including:

    • 20 Study Abroad Documents (Personal Statements/Recommendation Letters)
    • 20 Academic Paper Paragraphs (pre-2019)
    • 20 Daily Writing (English press releases, technical blogs, Twitter posts)
  • AI-Generated Group: 60 samples, corresponding to the same themes, generated using GPT-4o and DeepSeek models.

Statistical Metrics

  • False Positive Rate: The percentage of human-written texts mistakenly flagged as AI (Human → AI).
  • False Negative Rate: The percentage of AI-generated texts that are not detected (AI → Human).
  • Overall Accuracy: Correct detections divided by the total number of samples.

1. Study Abroad Application Documents (Personal Statements / Recommendation Letters)

FP(Human → AI) FN (AI → Human) Overall Accuracy
GPTZero 20% (4/20) 15% (3/20) 82.5%
ZeroGPT 25% (5/20) 30% (6/20) 72.5%
isGPT 5% (1/20) 15% (3/20) 90%

2. Academic Paper Paragraph Detection

FP (Human → AI) FN (AI → Human) Overall Accuracy
GPTZero 10% (2/20) 15% (3/20) 87.5%
ZeroGPT 20% (4/20) 25% (5/20) 72.5%
isGPT 15% (3/20) 20% (4/20) 82.5%

In the academic paper detection, GPTZero outperforms isGPT, with both tools performing better than ZeroGPT.

3. Daily Writing (Social Media/Blogs)

FP (Human → AI) FN (AI → Human) Overall Accuracy
GPTZero 40% (8/20) 35% (7/20) 62.5%
isGPT 35% (7/20) 30% (6/20) 67.5%
ZeroGPT 45% (9/20) 40% (8/20) 57.5%

For daily writing samples such as social media posts and blogs, all three tools perform poorly, with GPTZero and isGPT showing somewhat better results than ZeroGPT.


Overall Summary

  • isGPT and GPTZero tie for first place, each receiving a 2-star rating due to their overall strong performance across multiple types of writing.
  • ZeroGPT receives a 1-star rating, as it generally performs worse across all categories, especially in detecting academic and daily writing samples.

This experiment provides valuable insights into the strengths and weaknesses of each AI content detection tool, helping users make informed decisions when selecting a tool for AI detection.