Generative AI tools like ChatGPT significantly enhance students' higher-order thinking skills with large positive effects, contradicting concerns about potential cognitive harm.
Objective: The main goal of this study was to systematically examine the impact of generative artificial intelligence (Gen-AI), particularly ChatGPT, on students' higher-order thinking (HOT) skills through a comprehensive meta-analysis. The researchers aimed to address ongoing debates about whether Gen-AI helps or harms cognitive development, specifically focusing on critical thinking, creativity, problem-solving, and computational thinking abilities.
Methods: The researchers conducted a three-level meta-analysis following PRISMA guidelines, synthesizing 19 experimental and quasi-experimental studies published between January 2023 and September 2024. The analysis included 68 effect sizes from 2,347 participants across nine countries. Studies were systematically searched across multiple databases including Web of Science, ERIC, IEEE Xplore, PsycINFO, SCOPUS, and CNKI. The three-level analytical approach accounted for sampling variance (level 1), within-study variance (level 2), and between-study variance (level 3). Effect sizes were calculated using Hedges' g, with moderator analyses examining impact targets, Gen-AI design elements, study contexts, and methodological characteristics. Quality assessment was conducted using established indicators, and publication bias was evaluated through funnel plots, Egger's regression tests, and trim-and-fill methods.
Key Findings:
- Gen-AI demonstrated a significant large positive effect on students' higher-order thinking (Hedges's g = 0.851, p < 0.001), indicating substantial improvement in cognitive abilities.
- Specific HOT subcategories showed varying effects: problem-solving had the largest positive impact (g = 1.570), followed by reasoning (g = 1.454), critical thinking (g = 0.830), and computational thinking (g = 0.383).
- Creativity and reflection showed non-significant effects, suggesting Gen-AI may not effectively enhance these particular cognitive domains.
- Sample size emerged as a significant moderator - studies with fewer than 80 participants showed larger positive effects (g = 1.334) compared to larger studies (g = 0.340).
- Intervention duration significantly moderated outcomes: both short-term (less than 4 weeks) and long-term interventions (more than 8 weeks) produced significant positive effects, while medium-duration interventions (4-8 weeks) showed no significant impact.
- Gen-AI positively affected both convergent and divergent thinking, as well as both skills and dispositions related to higher-order thinking.
- Studies conducted in higher education settings (89.47%) dominated the research landscape, with limited representation from K-12 education.
Implications: These findings provide crucial evidence supporting the educational integration of generative AI tools, directly countering concerns about potential cognitive harm. The research demonstrates that Gen-AI can serve as a powerful educational tool for enhancing complex thinking skills when properly implemented. The positive effects on problem-solving, critical thinking, and computational thinking suggest that Gen-AI can support students in developing essential 21st-century skills. The study emphasizes the importance of thoughtful design and implementation, particularly regarding sample sizes and intervention duration. For educators and policymakers, these results support the strategic integration of Gen-AI tools into educational practices while highlighting the need for appropriate training and support systems.
Limitations: Several important limitations constrain the study's generalizability. The limited number of included studies (19) reflects the nascent state of Gen-AI research, as most studies were published after ChatGPT's 2022 launch. The analysis excluded unpublished papers and non-English/Chinese studies, potentially introducing publication bias. The heavy concentration on higher education participants (89.47%) limits applicability to K-12 settings. Small effect sizes within individual HOT subcategories may limit the reliability of specific cognitive domain findings. The studies examined diverse HOT types across different contexts, potentially reducing the precision of subcategory analyses. Additionally, the research doesn't address long-term effects or establish whether observed improvements persist beyond immediate intervention periods.
Future Directions: The researchers recommend several critical areas for future investigation. Longitudinal studies are essential to determine whether Gen-AI's positive effects on higher-order thinking persist over extended periods and to distinguish between genuine cognitive enhancement and novelty effects. Expanded research in K-12 educational settings is crucial, as current findings cannot be generalized to younger learners. Future studies should explore optimal intervention durations, particularly investigating why medium-term interventions (4-8 weeks) showed no significant effects. Research should also examine Gen-AI's impact on creativity and reflection more thoroughly, given the non-significant results in these domains. The development and testing of theoretically-grounded frameworks for Gen-AI integration in education would strengthen the field's conceptual foundation. Additionally, investigations into cultural and contextual factors affecting Gen-AI's educational impact across different global settings would enhance understanding of implementation best practices.
Title and Authors: "The impact of generative artificial intelligence on students' higher order thinking: Evidence from a three-level meta-analysis" by Xinxiao Nie, Yuan Tian, Mengjie Liu, Di Wu, and Yunxiao Guo.
Published On: Published online September 20, 2025 (Received December 7, 2024; Accepted July 21, 2025)
Published By: Education and Information Technologies (Springer)