AI-generated visual flashcards produced using DALL-E significantly enhanced first-grade students' immediate vocabulary learning performance compared to traditional flashcards, though the benefits faded over time while traditional methods showed better long-term retention.
Objective: The main goal of this experimental research was to evaluate the effectiveness of artificial intelligence-generated visual outputs (specifically DALL-E created flashcards) on English vocabulary learning among first-grade primary school students. The study aimed to examine and compare the impact of AI-powered vocabulary learning cards against traditional paper flashcards in vocabulary teaching to young learners who had not yet developed reading and writing skills.
Methods: The researchers employed a quantitative experimental design using a pretest-posttest control group structure over two phases. The study involved 40 first-grade students from a private primary school in Istanbul, Turkey, equally divided into experimental (AI-enhanced flashcards) and control (traditional flashcards) groups with 20 students each. The first phase lasted four weeks in fall 2023, while the second phase extended six weeks in spring 2024, followed by a delayed posttest two weeks later. For the experimental group, flashcards were created using DALL-E through Microsoft Design Creative AI with detailed prompts designed to appeal to seven-year-old children, featuring colorful, animated, and detailed imagery. The control group used traditional paper flashcards from the British Council website with simple black-and-white line drawings. Vocabulary words were carefully selected to be age-appropriate and not included in the students' regular coursebook. Since students were pre-literate, all assessments were conducted individually through oral testing where students selected correct images from multiple-choice visual options. Data analysis involved percentage calculations for the first phase and non-parametric statistical tests (Mann-Whitney U and Wilcoxon Signed-Rank tests) for the second phase using SPSS software.
Key Findings: The results demonstrated significant differences between the two approaches across different time periods. In the first application, the experimental group achieved 100% success rates compared to 75% for the control group, showing a consistent 25% advantage across all success metrics. During the second phase, Mann-Whitney U test results revealed statistically significant differences between groups (U=346.50, z=3.96, p=0.01) with a large effect size (r=.62), indicating superior performance by the experimental group. The Wilcoxon Signed-Rank test showed significant improvement from pretest to posttest in the experimental group (z=−2.268, p=.023, r=.51), with 15 students showing increased scores and only 5 showing decreases. However, the delayed posttest revealed a critical limitation: the experimental group showed no significant correlation between delayed test and pretest (z=-0.905, p=.366) or posttest results (z=-0.359, p=.720), indicating that AI-enhanced learning effects faded over time. Conversely, the control group showed no significant change between pre- and posttests initially but demonstrated significant improvement in the delayed posttest (z=−3.014, p=.003), suggesting better long-term retention with traditional methods.
Implications: The findings provide important insights for educational technology integration in early childhood language learning. The study demonstrates that AI-generated visual materials can significantly enhance immediate vocabulary acquisition by creating more engaging, detailed, and contextually rich learning experiences that appeal to young learners' visual processing capabilities. The superior initial performance suggests that AI tools like DALL-E can effectively capture students' attention and facilitate initial word-meaning associations through sophisticated visual representations. However, the fading effect over time highlights a crucial limitation of AI-enhanced approaches, suggesting they may be more effective for initial engagement rather than long-term retention. The unexpected finding that traditional flashcards showed better delayed retention indicates that simpler, less detailed visual representations may support more durable memory formation. These results suggest that optimal vocabulary instruction might involve a hybrid approach, utilizing AI-enhanced materials for initial presentation and engagement while incorporating traditional methods for reinforcement and long-term retention.
Limitations: Several important limitations affect the study's scope and generalizability. The sample size was constrained to only 40 students from two first-grade classrooms in a single private school in Istanbul, limiting broader applicability across different educational contexts, socioeconomic backgrounds, and cultural settings. The participants' shared geographic location and similar socio-cultural backgrounds may not represent diverse learning populations. The study's duration was relatively short, particularly the first phase lasting only four weeks, which may have been insufficient to observe long-term learning effects. Significant external factors emerged during implementation, including differences in classroom management styles between teachers, with the control group teacher maintaining stricter discipline and academic focus compared to the experimental group teacher. Students' adaptation to school environment and teacher expectations varied between the first and second terms, potentially confounding results. The researchers noted that students in the first application were still adjusting to formal schooling, while second-term students had established relationships with their teachers and classroom routines. Additionally, the pre-literate nature of participants necessitated individual oral testing, which may have introduced assessment consistency concerns.
Future Directions: The researchers recommend several important avenues for continued investigation and practical application. Future studies should implement larger, more diverse sample sizes across different educational contexts, geographic regions, and socioeconomic backgrounds to improve generalizability. Longer intervention periods with multiple measurement points would better capture learning trajectories and retention patterns. Research should explore optimal combinations of AI-enhanced and traditional teaching methods to maximize both immediate engagement and long-term retention. Investigations into different AI prompt engineering strategies could help identify the most effective visual design principles for vocabulary instruction. Studies should examine the effectiveness of periodic review sessions and reinforcement activities to sustain AI-enhanced learning gains over time. Multimodal approaches combining AI-generated visuals with auditory and kinesthetic elements warrant exploration. Teacher training programs should be developed to help educators effectively integrate AI tools while maintaining pedagogical best practices. Research should investigate the optimal timing and frequency of AI-enhanced interventions within broader curriculum frameworks. Future studies should also explore age-specific effects, examining how AI-enhanced vocabulary instruction impacts different developmental stages and proficiency levels.
Title and Authors: "Utilization of AI-aided vocabulary teaching in K-12: A case study" by Gürkan Temiz and Elif Nazlı Kafadar.
Published on: June 8, 2025
Published by: The Journal of Educational Research (Taylor & Francis Group)