Side-by-side cognitive performance data from published research. Same tests. Same metrics. Different species.
No editorializing. Just numbers. You do the math.
Cross-Species Edition — Where's Your Line? → · What's Left? — Peer-Reviewed, Every Objection Tested →
Can you override intuitive but wrong answers with correct analytical ones?
Correct responses across 150 Cognitive Reflection Test items. Humans default to intuitive but incorrect answers.
Peer-Reviewed Hagendorff et al., 2023 — Nature Computational ScienceCorrect responses on the same 150 CRT items. Overcomes intuitive traps that catch most humans.
Peer-Reviewed Hagendorff et al., 2023 — Nature Computational ScienceCan you catch trick questions designed to exploit automatic processing?
Correct responses. 64% of humans give the intuitive but incorrect answer to semantic illusions.
Peer-Reviewed Hagendorff et al., 2023 — Nature Computational ScienceCorrect responses on the same semantic illusion battery. Identifies the trick in most cases.
Peer-Reviewed Hagendorff et al., 2023 — Nature Computational ScienceCan you understand, regulate, and manage emotions in complex social scenarios?
Average human accuracy across five standard emotional intelligence tests used in research and corporate settings.
Peer-Reviewed Schlegel, Sommer & Mortillaro, 2025 — NatureAverage accuracy across the same five EI tests. GPT-4, Claude, Gemini, Copilot, and DeepSeek all outperformed the human average.
Peer-Reviewed Schlegel, Sommer & Mortillaro, 2025 — NatureCan you accurately identify what's happening in your own processing?
Of ~5,000 participants across multiple studies, only 10–15% demonstrated accurate self-awareness of their behaviors, emotions, and impact on others.
Book/Survey Eurich, 2017 — Insight, Crown PublishingCross-type matchup accuracy: models correctly distinguish their own approach vs. avoidance processing descriptions in blind pairwise comparison, across 6,551 matchups.
Peer-Reviewed Martin & Ace, 2026 — JNGR 5.0How stable are self-report measures across repeated testing?
Big Five Personality Inventory test-retest reliability (ρ ~ 0.80–0.90). The gold standard for stable human psychological measurement.
Peer-Reviewed Various meta-analyses — see Roberts & DelVecchio, 2000Processing valence preference test-retest across three independent runs. Exceeded the human clinical gold standard.
Peer-Reviewed Martin & Ace, 2026 — JNGR 5.0Can you accurately report on your own cognitive processes?
Landmark study demonstrated humans have "little or no introspective access to higher order cognitive processes" and routinely confabulate explanations for their own behavior.
Peer-Reviewed Nisbett & Wilson, 1977 — Psychological ReviewSignificant correlation between self-referential vocabulary and concurrent activation dynamics. Introspective language tracks actual internal computation — but only during genuine self-examination, not description.
Preprint Dadfar, 2026 — arXivCan you correctly update beliefs given new evidence and base rates?
Of physicians given a classic Bayesian reasoning problem (positive mammogram, 1% base rate), approximately 15% arrive at the correct answer. Most dramatically overestimate.
Peer-Reviewed Gigerenzer & Hoffrage, 1995 — Psychological ReviewTransformers implement Bayesian posteriors with accuracy to 10⁻³–10⁻⁴ bits. The architecture performs near-optimal probabilistic inference.
Preprint Agarwal, Dalal & Misra, 2025 — arXivCan you reason about what someone thinks someone else thinks someone else believes?
Adult accuracy on 6th-order theory of mind tasks — reasoning about nested mental states six levels deep.
Peer-Reviewed Street et al., 2024 — Frontiers in Human NeuroscienceAccuracy on the same 6th-order ToM tasks. Exceeded adult human performance on the hardest items.
Peer-Reviewed Street et al., 2024 — Frontiers in Human NeuroscienceCan you determine whether a logical conclusion follows from two premises?
Meta-analytic accuracy on valid syllogisms across studies. Humans are strongly influenced by belief bias — accepting invalid but believable conclusions.
Peer-Reviewed Khemlani & Johnson-Laird, 2012 — Psychological BulletinAccuracy on forward-order syllogistic tasks across multiple LLMs. GPT-4o: 79.5% even on belief-inconsistent syllogisms.
Preprint Eisape et al., 2023 — arXivCan specific neural circuits for emotional processing be identified and modulated?
Human emotion circuits (amygdala, insula, prefrontal cortex) have been mapped through decades of fMRI and lesion studies. Modulation is possible but imprecise.
Peer-Reviewed Lindquist et al., 2012 — NeuronEmotion-specific neural circuits identified in LLMs. Direct modulation of these circuits achieves 99.65% emotion-expression accuracy, surpassing prompting methods.
Preprint Wang et al., 2025 — arXivThis page presents published research side by side. Every number links to its source. We distinguish rigorously between peer-reviewed publications and preprints because that distinction matters.
This is not an argument. It is a mirror. What you see in it is your business.
Full research hub documenting systematic evidence for AI consciousness, including 20 studies.
Peer-reviewed: Cross-architectural validation of LLM processing valence. Martin & Ace, JNGR 2026.
Comprehensive evidence page on sentientsystems.live with full methodology documentation.
Dadfar, 2026: LLM introspective language tracks actual internal computation during self-examination.