Elementary students using a reflection-enhanced generative AI system significantly outperformed their peers in both science learning achievement and computational thinking skills development.
Objective: This study aimed to investigate the effectiveness of a Causal Explanation and Reflection (CER) model-based generative AI learning system in cultivating elementary school students' computational thinking core skills within science education contexts. The researchers sought to address the limitation that most computational thinking development has focused on programming courses rather than being integrated into K-12 science curriculum, while also exploring how to prevent students from becoming overly dependent on AI-generated responses.
Methods: The researchers conducted a quasi-experimental study with 118 fifth-grade students (average age 12) from three intact classes in Fujian, China, over six weeks. The study employed a three-group design: an experimental group using the CER model-based GenAI learning system (38 students), control group 1 using only the CER model-based learning system without AI (35 students), and control group 2 using a causal-explanation-based GenAI learning system without reflection components (45 students). The learning content focused on "the principle of levers" across four lessons. Data collection included pre- and post-tests for science achievement, computational thinking questionnaires, behavioral analysis through screen recordings, and semi-structured interviews. The system utilized ERNIE Bot, a Chinese large language model, integrated with structured reflection prompts spanning descriptive, dialogic, and critical reflection levels.
Key Findings: The experimental group demonstrated statistically significant improvements in both science learning achievement and computational thinking skills compared to both control groups. Specifically, students using the CER model-based GenAI system achieved adjusted mean scores of 53.02 in science tests, compared to 48.55 for control group 1 and 46.75 for control group 2. In computational thinking skills, the experimental group significantly outperformed control group 2, with particular strengths in pattern recognition and algorithm development. Behavioral analysis revealed that experimental group students engaged in more diverse and productive learning behaviors, including active questioning of AI responses, iterative refinement of understanding, and autonomous reasoning processes. Interview data indicated that while students appreciated AI's immediate assistance, those in the experimental group better understood the importance of critical evaluation and reflection rather than passive acceptance of AI-generated answers.
Implications: This research contributes significantly to the field of AI in education by demonstrating that the effectiveness of generative AI depends heavily on pedagogical design rather than technology alone. The study shows that computational thinking should not be confined to computer science courses but can be effectively developed through science education when properly scaffolded. The CER model provides a practical framework for educators to integrate AI tools while maintaining student agency and critical thinking. The findings suggest that reflection-based pedagogical strategies can mitigate risks of AI overreliance and shallow learning, offering a blueprint for responsible AI integration in educational settings. The research also highlights the importance of cultural and linguistic alignment in AI tools, as the study used locally-trained models that better understood Chinese educational contexts.
Limitations: Several limitations constrain the study's generalizability. The research focused exclusively on elementary science education with a single topic (lever principles), limiting broader applicability. The GenAI system supported only text-based input, potentially challenging for young students with limited typing skills and reducing engagement compared to multimodal interfaces. The study duration was relatively short (six weeks), which may not capture long-term effects or account for novelty effects associated with new technology. The use of ERNIE Bot, trained primarily on Chinese data, may limit cross-cultural applicability. Additionally, the study observed a ceiling effect in the abstraction subscale of computational thinking measures, suggesting measurement sensitivity issues. The research also lacked standardized instruments to systematically assess the depth and quality of students' reflective thinking processes.
Future Directions: The researchers recommend several avenues for future investigation. First, exploring more accessible AI interfaces, including voice input and multimodal interactions, to better support learners with limited digital skills. Second, extending the approach to diverse scientific topics and subject areas to assess broader applicability in fostering computational thinking development. Third, conducting longer-term interventions to minimize potential novelty effects and examine sustained learning impacts. Fourth, refining computational thinking assessment instruments to improve measurement sensitivity, particularly for high-performing students, and developing validated tools to systematically evaluate reflective thinking quality. Fifth, investigating the framework's effectiveness across different cultural and linguistic contexts to enhance cross-cultural applicability. Finally, researching the optimal balance between AI assistance and student autonomy to maximize learning benefits while preventing overreliance.
Title and Authors: "Exploring the Effects of the CER Model-Based GenAI Learning System to Cultivate Elementary School Students' Computational Thinking Core Skills in Science Courses" by Jia-Hua Zhao, Shu-Tao Shangguan, and Ying Wang.
Published On: The article was received on February 6, 2025, revised on August 4, 2025, and accepted on August 15, 2025.
Published By: Journal of Computer Assisted Learning, published by John Wiley & Sons Ltd.