article Article Summary
Sep 30, 2025
Blog Image

AI-generated feedback performs comparably to human feedback on learning outcomes, though with notable variability across contexts, suggesting that AI can supplement but not fully replace the nuanced, empathetic qualities of human assessment.

AI-generated feedback performs comparably to human feedback on learning outcomes, though with notable variability across contexts, suggesting that AI can supplement but not fully replace the nuanced, empathetic qualities of human assessment.

Objective: This meta-analysis aimed to systematically compare the effectiveness of artificial intelligence-generated feedback versus traditional human-provided feedback (from teachers or peers) across three key dimensions: student learning performance, feedback perception, and learning dispositions such as motivation, engagement, and self-regulation. The researchers sought to synthesize current evidence on whether AI feedback could serve as a viable alternative or complement to human feedback in educational settings.

Methods: The researchers conducted a comprehensive systematic review following PRISMA guidelines, searching three major databases (Scopus, Web of Science, and ERIC) in January 2024. They also incorporated studies from 25 previous meta-analyses compiled by Wisniewski et al. (2019). After rigorous screening, 41 studies involving 4,813 students met inclusion criteria. The team employed multilevel meta-analyses with random-effects models and robust variance estimation to account for the complex study structures and nested data. Effect sizes were calculated as standardized mean differences (Hedge's g), with separate analyses conducted for task performance (single-measure studies), learning gains (pre-post studies), and feedback perception. The researchers also investigated potential moderators including year of publication, type of human feedback (teacher versus peer), and academic discipline. Comprehensive bias assessments were performed using funnel plots, Egger's regression tests, trim-and-fill methods, leave-one-out sensitivity analyses, and GOSH plots.

Key Findings: The meta-analysis revealed no statistically significant differences between AI-generated and human-provided feedback across all measured outcomes. For task performance, the pooled effect size was small and insignificant (Hedge's g=0.25, 95% CI [−0.11; 0.60]). Similarly, performance gains showed a small, insignificant effect (Hedge's g=0.36, 95% CI [−0.37; 1.1]). Feedback perception also demonstrated a small, negative, and statistically insignificant effect size (Hedge's g=−0.20, 95% CI [−0.67; 0.27]). However, all three meta-analyses exhibited substantial to considerable heterogeneity (I²=75-95%), with very wide prediction intervals indicating significant variability in outcomes across studies. The majority of included studies (33 out of 41) focused on language and writing domains, predominantly in higher education settings. Separate meta-analyses restricted to language and writing studies confirmed similar findings with persistent high heterogeneity. Regarding learning dispositions, results were mixed and contradictory, with insufficient studies to conduct formal meta-analyses. Limited evidence suggested potential benefits of hybrid feedback systems (combining AI and human input), though this area requires further investigation. Notably, the year of publication showed a significant positive effect on feedback perception, suggesting improvements in AI technology over time may be influencing student attitudes.

Implications: These findings have important implications for AI integration in education. The comparable effectiveness of AI and human feedback suggests that AI systems could help address scalability challenges in providing timely, personalized feedback, particularly in resource-limited environments or with large student cohorts. However, the high heterogeneity and wide prediction intervals indicate that effectiveness varies considerably depending on context, AI system design, and implementation. The study underscores that AI feedback should not be viewed as a complete replacement for human feedback, as current measures may not capture the full breadth of value that human feedback provides, including building confidence, fostering growth mindsets, and creating psychological safety. The research supports a hybrid approach that leverages AI's scalability and immediacy while retaining human feedback's empathetic, contextual, and nuanced qualities. For educators, this suggests thoughtful integration strategies where AI handles routine, corrective feedback while teachers focus on complex, contextualized guidance requiring pedagogical judgement.

Limitations: Several important limitations warrant consideration. First, the substantial heterogeneity across studies makes broad generalizations challenging. Studies varied significantly in design, AI technologies employed (from rule-based systems like Grammarly and Pigai to large language models like ChatGPT), feedback types, and outcome measures. Many studies did not specify the exact AI models or versions used, limiting subgroup analyses. Second, the research concentrated heavily on language and writing disciplines (80% of studies), raising questions about generalizability to other academic domains requiring different types of feedback. Third, most studies were not blinded, meaning students knew whether they received AI or human feedback, potentially biasing perception measures. Fourth, the focus on quantifiable outcomes may have overlooked important qualitative dimensions of feedback effectiveness, such as emotional support and relationship-building. Fifth, approximately 70% of studies were published in the last two years, reflecting a rapidly evolving field where findings may quickly become outdated as AI technologies advance. Finally, few studies provided detailed information about the quality or characteristics of human feedback (teacher expertise, peer competence, feedback style), limiting the ability to assess these as moderators.

Future Directions: The researchers advocate for several directions in future research. First, studies should explore AI feedback effectiveness across diverse academic disciplines beyond language and writing, including STEM fields, arts, and social sciences. Second, research should investigate specific AI model types and versions (e.g., GPT-3.5 vs. GPT-4) to understand how technological sophistication influences outcomes. Third, more attention should be paid to hybrid feedback systems that strategically combine AI and human input in complementary or sequential ways, including investigating optimal levels of automation. Fourth, longitudinal studies are needed to assess long-term impacts on learning outcomes and student development. Fifth, research should examine student AI literacy and how familiarity with AI tools influences feedback utilization and effectiveness. Sixth, studies should better measure and report the quality and characteristics of human feedback to enable more nuanced comparisons. Seventh, research should explore AI feedback in contexts beyond higher education, including K-12 and professional training environments. Finally, future meta-analyses should be conducted as the field matures and more consistent, model-specific studies become available, particularly focusing on generative AI tools that have emerged recently.

Title and Authors: "How does artificial intelligence compare to human feedback? A meta-analysis of performance, feedback perception, and learning dispositions" by Rogers Kaliisa, Kamila Misiejuk, Sonsoles López-Pernas, and Mohammed Saqr.

Published On: September 24, 2025

Published By: Educational Psychology: An International Journal of Experimental Educational Psychology (Taylor & Francis Group)

Related Link

Comments

Please log in to leave a comment.