Objective: The main goal of this study was to employ semantic tagging techniques on AI-generated science texts to identify key semantic tags that can serve as indicators for determining relevant and appropriate topics for STEM instruction in K-12 educational contexts. The researchers aimed to address the common problem of lacking clear guidance on which topics and concepts are most suitable for young learners in STEM education, particularly for teachers without specialized STEM training who often rely on broad curriculum guidelines that may not reflect real-world language and skills needed for strong STEM understanding.
Methods: The study utilized a comprehensive methodological approach involving two main datasets focused on contextually relevant themes for K-12 learners: raising domestic animals and growing vegetables. The researchers used ChatGPT to generate narrative texts related to these agricultural practices, creating content that was familiar and meaningful to target learners, especially those in rural or agricultural communities. The generated texts were then processed using the English USAS (UCREL Semantic Analysis System) semantic tagger, a tool developed by Lancaster University to automatically assign semantic field labels to words in texts. After semantic tagging, the texts were cleaned and prepared for analysis. The first dataset on raising domestic animals contained 12,050 semantic tags, while the second dataset on growing vegetables contained 11,966 semantic tags. The researchers then used AntConc, a free corpus analysis toolkit, to compare the two datasets and calculate log-likelihood (LL) statistics to identify the most significant semantic tags. Using the Top N method, they selected the five most distinctive tags from each dataset and analyzed their meaningful collocates based on Mutual Information (MI) scores to extract concordances and better understand the characteristics of each semantic category.
Key Findings: The analysis revealed distinct semantic patterns in both datasets that serve as valuable indicators for STEM curriculum design. For the domestic animals dataset, the key semantic tags included: L2 Living creatures (animals, birds, etc.), H1 Architecture, houses and buildings (shelter, cage, housing), B2 Health and disease, T3 Time: Old, new and young; age, and N3.5 Measurement: Weight. For the vegetable production dataset, the identified tags were: L3 Plants (bean, okra, eggplant, plants, pumpkin), O4.3 Colour and colour patterns (purple, red, yellow, green), O1.1 Substances and materials: Solid (soil, sand, compost), F4 Farming & Horticulture (watering, harvest, sowing, planting), and X3.1 Sensory: Taste (flavor, sweet, bitter, spicy, taste). Each semantic tag was found to offer natural entry points for integrated STEM learning, connecting concepts across Science, Technology, Engineering, and Mathematics disciplines through real-world, age-appropriate contexts.
Implications: The findings contribute significantly to the field of AI in education by demonstrating how semantic tagging can provide a systematic, evidence-based approach to curriculum development that is more reliable than traditional methods relying on subjective expert judgment or outdated textbook content. This approach ensures that selected topics are grounded in authentic language use and real-world communication patterns, making them more relevant to learners. The study shows how AI-generated texts can be leveraged to create customized educational content tailored to specific themes, age groups, and local contexts, offering greater flexibility than published texts. The semantic tags identified can inform and enrich STEM curriculum design by helping educators make more informed decisions about instructional content based on actual language patterns rather than assumptions. This methodology supports the integration of different STEM subjects while promoting critical thinking and real-world problem-solving skills essential for addressing global challenges such as food security and environmental conservation.
Limitations: The study acknowledges several limitations that may affect the generalizability of findings. The research focused specifically on two agricultural themes (domestic animals and vegetable production), which may limit the applicability to other STEM domains. The reliance on AI-generated texts from ChatGPT, while offering flexibility and customization, may introduce biases inherent in the language model's training data. The study used a relatively small sample of semantic tags and employed the Top N method to select only five tags from each dataset for demonstration purposes, which may not capture the full complexity of STEM concepts. Additionally, the research was conducted within a specific cultural and linguistic context, potentially limiting its transferability to different educational settings or languages. The effectiveness of the identified semantic tags in actual classroom implementation was not empirically tested, representing a gap between theoretical identification and practical application.
Future Directions: The researchers suggest several avenues for future investigation to extend and validate their findings. Future studies should explore the application of semantic tagging to other STEM-related domains beyond agriculture, such as physics, chemistry, environmental science, and technology, to develop a more comprehensive framework for curriculum design. Empirical validation of the identified semantic tags through classroom implementation and assessment of student learning outcomes would strengthen the practical utility of this approach. Research should also investigate the long-term impact of curriculum designed using semantic tagging methods on student achievement, engagement, and STEM career interest. Additionally, future work could explore the development of automated tools that can assist educators in applying semantic tagging techniques to identify appropriate content for different grade levels and learning objectives. Cross-cultural and multilingual studies would help determine the universal applicability of this methodology across different educational contexts and language systems.
Title and Authors: "Semantic tagging of AI-generated Science texts to identify topics for teaching STEM in K-12 contexts" by Raymund T. Palayon (King Mongkut's Institute of Technology Ladkrabang), Regie P. Amamio (Mindanao State University), Yenying Chongchit (King Mongkut's Institute of Technology Ladkrabang), and Naruethai Chanthap (King Mongkut's Institute of Technology Ladkrabang).
Published on: 2025
Published by: 2025 10th International STEM Education Conference (iSTEM-Ed), IEEE