Qualitative research in science education: A literature review of current publications

:


INTRODUCTION
Science education refers to the main science disciplines such as biology, chemistry, physics, and others.Science education research, however, is concerned with the ways science is taught by teachers, learned by students, and the affective factors that influence these phenomena.Duit (2007)  How these domains are explored, described, or analyzed can vary according to the research questions and methods of the study.Another analysis on areas of study about science teachers in educational research include the domains of science teachers' knowledge, conceptions and beliefs, understandings of scientific

Review Article
inquiry and the nature of science, pedagogical content knowledges, and knowledges of goals and curriculum (Avraamidou, 2014).Science teachers play a vital role in student learning of science, and research on science teachers is a part of that role.National Science Teacher Association (NSTA) maintains that continual inquiry into science teaching and learning promotes 21 st century students' scientific literacy (NSTA, 2017).Similarly, the National Research Council (NRC) argued that teacher-based research is an important step in students' scientific proficiency and also leads to science education that is more equitable and inclusive (NRC, 2012).
Traditionally, educational research has been dominated by two approaches-qualitative and quantitative.The two approaches have underlying philosophical assumptions, which heavily influence how they ascribe to truth.While qualitative research typically rejects worldviews that ascribe to one truth (i.e., non-positivism), quantitative research typically seeks out a single truth (i.e., positivism).The difference in these assumptions stems from a contrast in the overall purpose of each approach and affects inquiries positioned under each paradigm differently.For example, while quantitative inquiries may be guided by "what" questions (e.g., what …), qualitative inquiries may be guided by "how" or "why" questions (e.g., how).Mixed methods research is the combined use of both quantitative and qualitative methods in a way to provide a better understanding of the research objectives than can be from a single method (Creswell & Plano Clark, 2011). .Scientific research is a field steeped in quantitative traditions and research in science education has historically mirrored that paradigm.However, paradigms shift over time and articles in science education can today be quantitative, qualitative, or mixed methods.Researchers choose methods to carry out their qualitative research depending on the type, focus, and nature of the study.For example, when study participants were students, data collection focused on achievement, typically measured by quantitative standardized tests such as trends in international mathematics and science study.Studies related to student achievement tended to analyze quantitative data such as test scores (Libarkin & Kurdziel, 2002).In contrast, qualitative studies tended to focus on teacher beliefs, teacher practices, teacher development, teacher leadership or learner experiences in the classroom and collected data with interviews and observations (cf.Lundqvist & Sund, 2018;Vázquez-Bernal et al., 2021;Wen et al., 2021).Whereas quantitative approaches typically test theories by examining the connection or relationships among a set of variables, qualitative approaches tend to explore the meaning individuals ascribe to social phenomena (Creswell, 2014).As a result, quantitative approaches or quantitative data analyses may reveal meaningful interactions among variables and produce valuable findings from large data sets.By contrast, "qualitative research allows researchers to get at the inner experience of participants, to determine how meanings are formed through and in culture, and to discover rather than test variables" (Corbin & Strauss, 2008, p. 12).
Given the need for research on teachers (cf.NRC, 2012;NSTA, 2017), this study situates itself in the growing field of teacher-centered literature as a critical survey of the extent to which qualitative secondary science teaching research publications reflect high-quality practices as found in mainstream methodological texts.Despite an observable increasing frequency of publications that are qualitative in nature, there are no discernable guidelines from publications that define how qualitative research studies should be carried out.Moreover, differences across scholarly journals with respect to reviewers' training and professional opinions regarding the nature of qualitative research increases variability.The goal of this project is thus an attempt to understand more broadly the current state of the field's research by analyzing the qualitative research articles published in top-tier science education journals.

LITERATURE REVIEW
Previous systematic reviews of the literature informed the field about popular focuses within science education research.Chang et al. (2010), for instance, systematically analyzed the trends in science education research.Through a content analysis method of segmenting, clustering, and visualization, the authors analyzed 1,401 articles from four top tier journals for the years 1990-2007 and identified nine topics in science education research.However, by choosing to analyze a large number of articles, the author's methodology could not possibly accommodate a detailed analysis of each article's methodology.Furthermore, Chang et al.'s (2010) stated purpose of analysis was to identify topics, trends, and contributors to the field of science education research so that novice researchers may more easily understand which areas in the field need to be further explored, which excluded a motivation to analyze each article individually.Karampelas (2021) similarly sought to better understand trends in education research and reviewed a large number of articles (n=6,504) published over the ten-year period of 2010-2020.Whereas Chang et al. (2010) focused on journals that emphasized science education, Karampelas' (2021) analysis looked at a range of topics in educational research articles published in four journals for evidence of attention given to science education research.They found that of the thousands of articles they screened, 400 were based on the topic of science education.However, their analysis did not extend to identifying trends within science education, rather it reported on trends within education research as a whole.Lin et al.'s (2018) examination of research trends in science education from 2013-2017 was similar to both Chang et al. (2010) and Karampelas (2021) in providing the field with an analysis of article trends through systematic analysis.The author's study was the latest in a series of studies carried out by a group of scholars (see Lee et al., 2009;Lin et al., 2013;Tsai & Wen, 2011) on the same three journals (e.g., Science Education, Journal of Research in Science Teaching, and International Journal of Science Education) every four-five years.Of the four papers published by this group, two reported a breakdown of paradigmatic alignment and Lin et al. (2013) of articles analyzed.Lee et al. (2009) reported that 10 out of 15 empirical studies published from 1998-2002 were qualitative.The group's later publication, Lin et al. (2013) reported that of articles published from 2008-2012, three out of seven studies were qualitative.However, none of this group's articles have analyzed how each paper carried out its methodology.(e) science teacher education (Abell & Lederman, 2007).

Two published works titled
Volume two expanded the volume one list to also include theory and methods of science education research as well as diversity and equity in science learning (Lederman & Abell, 2014).The articles included in these two volumes of the science education research handbooks indicated the types of research publications that were widely accepted as the best in the field.The handbook categorizations were similar to those found in other literature reviews (c.f.Chang et al., 2010, Karampelas, 2021;Lee et al., 2009;Lin et al., 2013Lin et al., , 2018;;Tsai & Wen, 2011), however, the handbooks did not include an analysis of paradigmatic orientation or methodology.
While these literature reviews focused on qualitative secondary science studies more broadly, their findings underscored a need for closer analysis.For example, Chang et al.'s (2010) survey of publications from 1995-2005, the authors found that "researchers were becoming small-scaled in research design while more qualitative data collection methods were used" (p.317).This is a similar finding to that of Karampelas (2021), who wrote that qualitative methodologies have become preferred due to a growing concentration of researchers on classroom teaching practices, which "researchers prefer to carry out [with] empirical research that takes place in a classroom in the form of an action research case" (p.10).Because of this increasing number of researchers deploying qualitative methodologies, there is a need to analyze the individual methods and methodological orientations of authors engaging more closely in this work.Moreover, it was in this analytic deficiency that this article positioned itself so that the field may know not only what broad trends were present in research, but more specifically what methods were being used and how they were being deployed.

Research Question & Purpose
The purpose of our systematic review is to go beyond previously completed reviews, which have reported paradigmatic orientations and more fully explore which methods are used and how they were used.This systematic literature review analyzed articles over the last four years that utilized qualitative methodologies that centered science teachers, answering the call by the NRC (2012) and NSTA (2017).The research question that guided the study through systematic literature review was: To what extent do qualitative secondary science teaching research publications reflect high-quality practices found in mainstream methodological texts?

CONCEPTUAL FRAMEWORK
To answer this research question, we sought to develop a conceptual framework around how qualitative methodologies have been deployed in secondary science teaching research.We developed this conceptual framework around what we consider to be most applicable to secondary science studies that focus on the teacher.Moreover, while previous studies have focused on qualitative methodologies broadly, we developed our conceptual framework to engage with a smaller number of studies more closely.Toward this end, this conceptual framework functions to clarify our intent and methodological choices.

Qualitative Studies
Qualitative methodology is characterized as being descriptive and dealing with phenomena in a deeper or more individualized way (Merriam, 2009).In qualitative studies, researchers are the designers of the study, the collectors and analyzers of the data, and in some cases the facilitators in the programs being studied.Maxwell (2013) advocated that, no matter the level of involvement, the researcher is an instrument in the qualitative study, representing a strong departure from quantitative studies that consider researchers to be neutral.Additionally, quantitative studies collect different forms of data than qualitative studies as they "seek to recreate the contextual setting as a framework that can be analyzed and understood.By necessity, qualitative research often consists of as much data as possible, including detailed field notes, tape and video transcripts, and written documents" (Libarkin & Kurdziel, 2002, p. 80).Toward this end, qualitative studies tended to utilize some form of interview and observation.Qualitative researchers often coded this data looking for patterns in order to draw conclusions and imbue themselves in the research (Saldaña, 2016).
Qualitative researchers thus have differing practices from those found in quantitative studies.For example, in a quantitative study, reliability and validity have long been used as standards of consistency or accuracy of the measures utilized in the study.A qualitative study, however, is evaluated on the basis of credibility, transferability, dependability, and confirmability (Lincoln & Guba, 1985).However, for a variety of reasons (e.g., journal reporting standards, researchers' professional writing styles, etc.), these individual categories may not be explicitly addressed leading to confusion or misinterpretation by readers.

Credibility
The logic behind researcher choices must make sense to the reader and lend a sense of credibility to the researcher; "the basic notion with credibility is that both the readers and participants must be able to look at the research design and have it make sense to them" (Jensen, 2008b, p. 139).Had we chosen to focus on articles in the area of K-12 history education, this would represent a disconnect in the logic of our choice causing a credibility crisis when readers attempt to understand our research design.Jensen (2008b) recommended five ways that researchers can improve their credibility: spending enough time in the field with participants; angles: utilizing various perspectives to analyze data to gain a more holistic view; colleagues: reach out or partner with other researchers in the field to review your analysis and findings; triangulation: utilize multiple sources as well as multiple methods to gather data; member checks: enlist help from participants to ensure your analyses are accurate.Other scholars have argued of increased credibility through prolonged engagement (Korstjens & Mosler, 2018), the use of member checking to confirm participant responses (Anderson, 2017), and triangulation through the use of multiple data points (Stake, 1995;Steinke, 2004).

Transferability
Researchers generally choose study participants with the "inherent notion that they somehow represent the entire population" (Jensen, 2008d, p. 886).This is so that readers may make connections from a researcher's study to other applicable contexts.However, this is not true of all types of qualitative research.For example, Stake (1995) argued that while it may be beneficial to select typical cases for case study research, atypical cases may help us understand matters that are often overlooked.Jensen (2008d) wrote that qualitative researchers could increase the transferability of their research by (a) ensuring the study's context and participants are closely intertwined through purposeful sampling, (b) conveying the study's context using thick description so that readers are given a full account of all aspects of the research, and (c) ensuring that each research question is answered appropriately.

Dependability
One unavoidable reality of research in general is that the context of research may change over time.Values for quality are contextual (Tracy, 2010).This is important for qualitative research as it may produce findings that are highly sensitive to environmental contexts.This may also happen as unexpected forces arise at some point during the research process.Jensen (2008c) wrote that dependability "recognizes that the research is evolving and that it cannot be completely understood a priori as a singular moment in time" (p.209).To increase a study's dependability, researchers should give readers enough information about the structure of the research project-including methodological choices-so that others may attempt to replicate it.With this information, other researchers can account for changes in other contexts as they attempt to replicate the study in that context.

Confirmability
One of the implicit ideas of a published research article is that its findings should be confirmable by other researchers.Readers must be provided enough evidence to support a researcher's conclusions, findings, or implications.Jenson (2008b) defined confirmability as, "the degree to which the results of the study are based on the research purpose" (p.113).Anderson (2017) proposed a "communication of methodological awareness evidenced by an audit trail as a standard of quality in qualitative research" (p.127).An example of this would be to include coding examples in the write up, asking participants to review coding processes and analyses, and the use of multiple data points through triangulation (Korstjens & Mosler, 2018).Following this guidance, we first established the search terms "science teaching" and "secondary education."Next, we chose databases based on what was available through our institution.The databases chosen were "ERIC EBSCO Host," "APA PsycInfo," and "Education Full Text (H.W. Wilson)."We then used the search terms in the databases.Furthermore, we selected criteria designed to limit the search to peer-reviewed journal articles published since 2018, available online as full text, and published in English.These criteria were selected for several reasons.We believed that articles published since 2018 would ensure the literature returned was current and relevant.Due to the scope of this project, full text versions of each article were necessary for in-depth analysis.Additionally, articles available to the researchers for no charge were selected due to funding.English language articles were chosen because English is the first language of the research team.This search yielded 928 non-duplicated articles.For the sake of comparison, we searched ERIC EBSCO Host with the same terms without the additional criteria.This resulted in 1,139,711 individual articles, an unmanageable quantity for two researchers..We repeated the article search with the Web of Science database, which accesses open-source journals as an effort to broaden the selection of articles.We set similar search criteria and limitations-open access, peerreviewed journal articles from 2018-2022 published in English.In this database search we were able to select the following categories-education educational research and education scientific disciplines.The Web of Science search provided 752 additional articles to screen.

METHODOLOGY
We next exported all 1,680 articles included from both searches (i.e., 928 from search 1 and 752 from search 2) to ProQuest RefWorks to gather pertinent information such as the author(s), title, citation, and abstract.Next, we transferred the article information to an Excel Spreadsheet for sorting and screening.We used the following criteria for inclusion: The study was We took the 1,680 articles from the search results through three phases of screening.
First, we screened the title of each article according to the inclusion criteria listed above.Articles were eliminated that focused on non-education subjects such as cancer, diet, mental health, or sexuality as they were beyond the scope of the inclusion criteria to primarily focus on science education.Some articles were eliminated at this phase of screening because they addressed non-science education topics such as mathematics, English language learning, physical education, or literacy.Articles identifying both mathematics and science were not eliminated during the title screening.Topics identified as STEM but not science specifically were not eliminated during this phase.However, some STEM-related articles were discarded such as those addressing only engineering, robotics, or computer science.Additionally, all articles with elementary or collegiate settings that were extraneously included through the database searches were eliminated at this stage as they did not fit within the criteria of secondary education settings.A total of 1092 articles were eliminated by title screening, leaving 588 articles to continue into the abstract screening.Subsequently, we conducted an abstract screening employing the same inclusion parameters-qualitative studies of secondary science education teachers.Eliminated articles based on the abstract included any non-qualitative studies such as quantitative or mixed methods research studies, studies developing or evaluating measurement tools or curricula, which do not limit their focus to the science teacher, meta-analyses, and literature reviews.The abstract screening eliminated an additional 392 articles.Finally, we scanned the methods section of 196 remaining full text articles to identify qualitative, empirical studies, which were based on observed, actual experiences as opposed to theory.The methods section screenings further illuminated articles to exclude any that were mixed methods studies that did not originally present as such in terms of Creswell's (2014) definition for mixed methods research.Also excluded were more program or curriculum design studies that slipped through notice from the abstract screening.Additionally, studies involving preservice teachers were eliminated in favor of limiting the search to in-service teachers to align with the research question.At this point, we deselected qualitative studies based solely on questionnaires as those studies were not likely to demonstrate rich qualitative engagement such as interpersonal interactions through observations and interviews.Additionally, articles lacking open or institutional access were eliminated to ensure ready availability for the project.The methods section screening resulted in elimination of an additional 171 articles.This detailed three-level screening process left us with a total of 25 eligible articles for the literature analysis to address the research question.PRISMA diagram (Moher et al., 2009) illustrates our search and screening process (Figure 1).
Following screening, we read each of the 25 included articles (see Appendix A) and coded them manually.We approached coding in a deductive manner as we had a predetermined set of variables related to qualitative research.These variables were determined through meetings in which we discussed what we felt were indicators of rich quality in qualitative research.We also consulted the literature as well as colleagues in the field to verify and more fully understand these indicators.After establishing the indicators as variables, we created a matrix to track article features.The matrix variables were, as follows: (g) coding method (see Appendix B for the coding explanation).
These variables were chosen based on our understanding of research in science education as well as how we understand qualitative research with respect to credibility, transferability, dependability, and confirmability.For example, because this project sought to understand the methodological approach of articles that center the experiences of science teachers, the first variable coded for was whether the researcher was a participant or observer helps the reader understand the perspective of the researcher as well as how their perspective impacts the study.We also recorded the methodology explicitly stated by the author as it allowed us to understand the researcher's approach within the broader context of mainstream methodological texts as well as how they stated and approached data collection.The variables of time spent in the field and triangulation through multiple data sources not only give credibility to the researcher's data collection, but also the confirmability of their analyses.Similarly, member variables were recorded as enlisting the help of research participants to confirm findings also increases credibility.We recorded information related to coding as a researcher's transparency in revealing their coding method helps to increase their article's transferability as it ensures the reader that each question was appropriately answered.Similarly, this also increases an article's dependability as other researchers may attempt to replicate an author's inquiry.A full list of the variables and descriptions are found in Appendix C.

Credibility
In order to increase the credibility of our study, we adhered to writings from Jensen (2008b) and relied on our collective understanding of credibility developed from qualitative methods coursework we had completed during our doctoral studies.The methodology used for this systematic literature review was presented at the Southeastern Universities Graduate Research Symposium (SUGRS) Conference, an annual regional conference hosted by The University of Alabama's College of Education.We received rich detailed feedback about the methodology from reviewers who are current science education faculty that regularly engage in scholarly research and publication.This feedback was considered as we moved forward with our study.For example, instead of reporting variables like triangulation as dichotomous (i.e., yes or no), we recorded how a researcher used triangulation (i.e., using focus group data, interview data, and journal entry data).Additionally, the articles chosen for this systematic review were rigorously analyzed: first independently by each author, then as a group.When we met to discuss the articles we talked through disagreements with each author's analysis, and each article was checked several times to ensure consistency in our analysis.

Transferability
Our systematic literature yielded findings from journals with high impact factors and broadly representative of science education scholarship (Jensen, 2008d).Additionally, we were purposeful in sharing as many rich methodological descriptions as possible within the limits of the journal's wordcount.

Dependability
We worked to increase our study's dependability by explicitly describing all steps and search factors in order to provide readers the information necessary to replicate this research project.The values that impact beliefs in qualitative research have changed dramatically over time and these values will surely change in the future.We believe that by describing our process and values in as much detail as possible as well as connecting them in the literature will help future researchers understand our findings so that they may replicate it later when reporting standards inevitably change.

Confirmability
To convey a sense of confirmability to our study, we strived to be as thorough as possible when describing our methods.Some methodological decisions we made in this study to support its confirmability include descriptions of the coding themes, providing a PRISMA diagram of the literature search process, and including research support of qualitative methodologies used.

FINDINGS
As a review of qualitative science education research, our findings were guided by our coding process and are presented as findings based on the coded variables: A full coding matrix is also provided (Appendix C).We do not propose that any one feature of the coding has precedence over another.

Role of researcher
Researchers positioned as outside observers constituted the bulk of the articles (n=21).However, four of the articles featured the researcher as a participant (Hordvik et al., 2021;Keiler, 2018;King & Pringle, 2019;Velasco et al., 2021).Only two of the publications that we reviewed described the impact of the researcher in the study (e.g., Gardner & Tillotson, 2019;Hordvik et al., 2021).When authors considered researcher influence, it was framed in terms of their efforts in coding the data rather than their interactions with the participants (e.g., Brown & Bogiages, 2019;Fitzgerald et al., 2019).

Methodology
We found that authors identified their study's methodology beyond stating they were qualitative in all but four articles.The methodologies explicitly stated by the authors were case studies, narrative or discourse analysis, phenomenology, and one observational study.
Case study: Case studies represented the majority of the authors' chosen methodologies.Some authors described their case studies as descriptive (e.g., Gardner & Tillotson, 2019;Wilson, 2021), longitudinal (e.g., Dogan et al., 2020;Vázquez-Bernal et al., 2021), or multi-case (e.g., Kirmaci et al., 2019;Vossen et al., 2020).Brown and Bogiages (2019) called their study an instrumental case study while Velasco et al. (2021) identified their methodology as an embedded single-case study.Two studies did not identify any particular methodology but could have been framed as case studies based on our understanding of what case studies are and how they were identified by other authors in the literature review (i.e., Litman & Greenleaf, 2018;Vale et al., 2020).On the other hand, several of the studies that explicitly stated case study as their methodology, did not mention key components of a case study.For example, only eight of the 13 self-identified case studies mentioned triangulation, a common hallmark of case studies found in methodological literature in education research (Merriam, 2009;Stake, 1995).
Narrative/discourse analysis: Two articles stated they used discourse analysis methodology.Andrée and Hansson (2021) sought to understand teacher agency by analyzing talk among teachers from five focus groups as they evaluated curriculum materials provided by commercial agencies.The researchers stated that discourses are socially embedded and appropriate for analyzing how teachers talk about these resources.King and Pringle (2019) acknowledged the world was culturally and socially defined, therefore the students' narratives were important for giving their own account.Each researcher created narratives from participants' interview transcripts while participants wrote narrative accounts of their own experiences in STEM education.
Phenomenology: Three authors identified their studies as phenomenological.Birth et al. (2018) used a phenomenological methodology to understand perception of physics teachers of professional development.Strachan (2020) similarly used phenomenology to explore the experiences of two African American science teachers "to determine the connection between what is being perceived and how it is being experienced" (p.228).

Data collection
In the articles analyzed, data collection largely took the form of interviews, focus groups, and observations.Articles differed in the way they carried out a specific collection method.For example, Berge et al. (2020) analyzed body language in video recordings, while Gardner and Tillotson (2019) made classroom observations of teachers.However, we classified both of these data collection methods as observations.Interviews: 19 articles (76%) identified semi-structured interviews as a data collection method.Two of these articles stated they used a distinct approach to interviewing, using what they called video-stimulated interviews (i.e., Overman et al., 2019;Vale et al., 2020).In these articles, participants and interviewers watched video recordings together of the participant teaching lessons to focus the discussion on critical moments and to prompt further reflection by the participants.Studies that engaged in interviewing, discussed the value of interview as a data collection method in various ways.For example, Gardner and Tillotson (2019) stated that "teacher interviews provided historical perspectives of the [STEM] model" of the institution (p.1288).Another perspective on interview was from Nixon et al. (2019) who acknowledged that interviews have limits as "these interviews only approximated tasks of teaching, as they were removed from the context of teaching.In the complexity of the actual teaching practice, it is more challenging to identify the knowledge teachers are using" (p.155).
Authors used interview transcripts in a variety of ways.Several studies briefly stated that interviews were recorded and transcribed verbatim (Andrée & Hansson, 2021;Birth et al., 2018;Hordvik et al., 2021;Vossen et al., 2020;Wen et al., 2021).Other researchers attended to the transcriptions in more detail in their publication.An example, Berge et al. (2020): All presentations were first transcribed verbatim and read to obtain an overview of the data … In order to be able to compare and contrast expected and unexpected patterns in teacher-student interaction the three presentations, all thematically varied and comprehensive, were transcribed a second time to capture important body language such as what the teacher drew on the whiteboard in the classroom (p.68).
Likewise, King and Pringle (2019) described their transcription process this way: In crafting the first draft of the profile, we transcribed the first interview word-for-word with the coughs, sneezes, giggles, pauses, and idiosyncrasies.We kept the original transcript but started a new document, where we deleted all of the interview questions from the transcript so that only the girls' words remained in the document (p.553).
Although all the articles included quotes from the interview transcripts, each author described the process differently.Studies reported a variety of means of conducting interviews, transcribing interviews, and the length of interviews.As examples: (a) Lundqvist and Sund (2018) documented "the main data were collected by means of three group interviews, each lasting approximately 75 min" (p.359), (b) Velasco et al. (2021) described "the individual interviews were 45 min, conducted and recorded using the video conference application Zoom" (p.441), and (c) Walan (2020) simply put "the interviews were semi-structured, audio-recorded and transcribed" (p.

434).
Focus groups: Focus groups were a data collection method described in eight of the qualitative studies.Each author described their rationale for using focus groups differently.For example, Keiler (2018) used focus groups "to prioritize teachers' perspectives about their experiences" (p.5).Through a different lens, Andrée and Hansson's (2021) "study is based on an analysis of discursive practices employed by teachers in focus group conversations" (p.357).
Observations: Observations were used as a data collection method in 11 of the 25 publications (e.g., Berge et al., 2020;Dogan et al., 2020).Their descriptions and bases for using observations varied.Wilson's (2021) protocol included how the observation protocols were developed including their setting, social climate, program and unplanned activities, and non-verbal communication.Other articles were less explicit when describing observation protocols.For example, Litman and Greenleaf (2018)  (c) Keiler ( 2018) held interviews and focus groups that lasted from 20 to 90 minutes four times a year for three years of the program under study.

Triangulation
Only half of the case studies indicated they utilized triangulation in their study design and analysis.Without explicitly stating so, Kirmaci et al. (2019) thoroughly described appropriate data triangulation: We constantly compared and cross-checked the data that were collected through individual and focus-group interviews and participant observations to develop better understanding of and provide multiple sources of evidence of how teachers' participation in the program influenced their perspectives and practices (p.15).
Other examples include King and Pringle (2019) who stated they used triangulation in their narrative inquiry study for "robust counter-stories" (p.539), and Navy et al. (2020) employed triangulation in their study by using multiple data sources collected by different researchers to "ensure validity and reliability of the findings" (p.191).

Member-checking
Only 11 articles (44%) reported member-checking with the participants.King and Pringle (2019) used member-checking "to ensure that the narrative being written accurately depicted the girls' perceptions and experiences ... The Black girls, as participants in the research, were elevated to the position of co-researchers and knowledge generators through the co-construction of their counter-stories" (p.553-554).Their study contained multiple phases in which participant input influenced the direction of the study.Another example was Vázquez-Bernal et al. ( 2021) who involved the participant and wrote, "finally, in phase 3 (2011-2019), Marina was given the opportunity to read and write as narratives a major part of the reports elaborated by the researchers in the first two phases" (p.5).

Coding
We analyzed each article's data analysis section based on our understanding of deductive and inductive coding and further informed by Saldaña (2016).We have included these results in the coding chart in Appendix C. Each article reviewed utilized coding as a method of data analysis.We found these articles were implicitly or explicitly coded either deductively (n=11), inductively (n=11), or both (n=3).However, authors differed qualitatively in how their coding was reported.For example, Navy et al. (2020) stated, "coding was partway between a priori and inductive approaches" (p.191).Additionally, the authors explicitly state that the deductive coding in their first stage was based on "three primary types of resources (human, material, social)," which reflected an integration of the article's conceptual framework into their coding (p.191).Brown and Bogiages's (2019) codes were based on a previous pilot study and the authors included a rich (~450 words) description of their coding process.Walan's (2020) coding came from a previous study by Pringle et al. (2015) that "used selective parts of TPACK [technological pedagogical content knowledge] to code data" (p.432).The author stated, "I also decided to use only certain parts of TPACK" (p.432).However, these codes were not revealed explicitly.Hordvik et al.'s (2021) article did not utilize coding in the traditional sense as the author's theoretical framework rejects static conceptualizations of codes for analysis.Despite this, the author's description of their analysis was rich and highly detailed.Some authors identified computer software, such as NVivo or Atlas.Ti, for organizing their codes in logical ways (c.f.Navy et al., 2020;Velasco et al., 2021;& Wilson, 2021).Fitzgerald et al. (2019) used a digital application to produce graphical visualizations but manually coded their data.

DISCUSSION
This systematic literature review sought to understand how qualitative secondary science teaching research publications reflect high-quality practices found in mainstream methodological texts.We found that the studies examined in this systematic literature review differed not only in their alignment to qualitative methodologies, but also in the way data collection, time durations, triangulation, member-checking, and coding were reported.This analysis led us to make the following arguments.

Secondary Science Teaching Qualitative Methodologies
With respect to methodology, many studies reported themselves as case studies.Yazan (2015) wrote that case study was the most widely used qualitative research methodology.This research design is characterized by a small sample size delineated by a boundary and triangulation of data through multiple data sources.Stake (1995) described case studies as purposefully highly contextualized.They wrote "the real business of case study is particularization, not generalization.We take a particular case and come to know it well, not primarily as to how it is different from others but what it is, what it does.There is an emphasis on uniqueness...on understanding the case itself" (Stake, 1995, p. 8).However, some of these studies do not follow the characteristics of case study laid out by prominent case study methodologists (i.e., Stake, 1995;Merriam, 1998Merriam, , 2009) ) and more closely resemble a standard approach to qualitative inquiry.Merriam (1998) asserted, "the term case study is not used precisely; it has become a catchall category for studies that are clearly not experimental, survey, or historical.And to a large extent, the term has been used interchangeably with other qualitative research terms" (p.43).Stake (1995) went into great detail about what can be considered a case and what should not.More specifically, he stated that "people and programs clearly are prospective cases.Events and processes fit the definition less well, and studies of them are less likely to capitalize on the methods [of case study]" (p.2).Stake's argument was aligned with other scholarly arguments around case study's bounded system (i.e., Merriam, 1998;Smith, 1979).Bounded systems serve a methodological function, focusing the researcher's attention to the case under study.In this way, the case study boundary functions as a set of metaphorical blinders, giving the researcher laser focus on what is under study (Robertson & Yazan, 2022).Simply studying one particular entity does not necessarily make that study a case study.For example, Brown and Bogiages (2019) presented a case study without the earmark features of a case study such as triangulation or defined boundaries.Of the 25 articles within this literature review, 13 identified as case studies.However, only Keiler's (2018) article contained elements of careful case study engagement as they reported time spent in field, triangulation, and member-checking.Without the components of a case study (i.e., clearly defined case, case boundary, triangulation, and member-checking), what many report as "case studies" are more accurately general approaches to qualitative inquiry.
Phenomenological research designs are broadly applicable as they are a more general way to qualitatively explore the views of the participants.Phenomenology is the study of phenomena, and the personal experiences of people.Vagle (2014) explained the purpose of phenomenology is "to study what it is like as we find-ourselves-being-in-relation-with others ... and other things ..." (p. 20).As opposed to case study, a phenomenological study is not accompanied by the same elements such as boundaries.Instead, a phenomenological study can be a free-form, in-depth look at personal experiences.Groenewald (2004) wrote that phenomenology was less prescriptive since imposing a strict method would compromise the integrity of the phenomenological methodology.Birth et al. (2018) used the phenomenological approach to their study "because of its emphasis on the phenomenon of the study" (p.91), that being the teachers' perceptions of a particular program.Further, they justified that such methodology uses an emerging qualitative approach involving data collection in the natural setting and data analysis of patterns and themes (Birth et al., 2018).Gardner and Tillotson (2019) situated their study as both a phenomenology and a case study claiming the "phenomenological investigation emerged as an appropriate research method" (p.1287).Because phenomenology is less restrictive with respect to the elements that define what it is, it proves to be useful for secondary science researchers who seek to understand the views and experiences of participants.

Reporting Standards of Qualitative Secondary Science Research
This systematic literature review yielded articles demonstrating a range of engagements with qualitative research methods.Hordvik et al. (2021) and Birth et al. (2018) presented very qualitatively oriented articles.For example, Hordvik et al. (2021) described the data production in this way: "The multiple layers of qualitative data provided varied sources of experience and modes of expression, thus helping enhance the trustworthiness of the findings" (p.5).Likewise, Keiler (2018) discussed qualitative concerns such as minimizing bias, including participants with diverse views and experiences, and triangulation of data.These articles engaged deeply with the overarching goals of qualitative research.
Data collection through interviewing is ubiquitous in qualitative research studies.Despite its widespread usage, variations in transcription practices may impact data analysis and reporting of findings.Researchers may manually transcribe interviews or utilize a variety transcription services-human or artificial intelligenceassisted. Additionally, transcription practices may only reproduce spoken words, whereas others may reproduce utterances like "um's," "ah's," throat clearings, long pauses, and a host of other audible sounds.These utterances, which may lack meaning on the surface, can often be used to determine emotional context or aid in conversation analysis (Poland, 2011).Despite these varying practices, some scholars avoid describing how their transcription was produced.For example, Berge et al. (2020) stated, "all presentations were first transcribed verbatim and read to obtain an overview of the data" (p.68).Additionally, when the authors included participants' utterances, they were reproduced with narrative conventions of syntax and grammar, which differ greatly from spoken speech.The word "verbatim" means word for word in both American English and British English varieties (Verbatim, n. d., 2023).Contrasted with Berge et al. (2020), Andrée and Hansson's (2021) transcripts are filled with "like's" and ellipses to represent long pauses, more closely resembling spoken utterances.
Reporting research durations is not standardized; however, qualitative research studies should report how long interviews last, the length of time for an observation, or the time of a focus group session.What is often referred to as "time spent in field" improves both transparency and credibility (Jenson, 2008a).To be clear, methodological texts do not prescribe a perfect amount of time, but researchers should make the amount of time spent in the field known to the reader.For example, Dogan et al. (2020) indicated they wanted to uncover the epistemological underpinnings of a teacher's beliefs.The author utilized various observation protocols over a three-year span and provided the length of time for each annual interview.Given the context of what the author sought to uncover, it makes sense that they undertook a prolonged engagement in the field.Had the author sought to investigate the same phenomenon but only spent one month in the field, the reader's sense of the author's credibility would be undermined.
Triangulation can be a powerful tool in qualitative research.Some researchers may wish for findings to converge to amplify their confirmability, while others, particularly in case study research, may desire a richer analysis that comes from weighing multiple data sources.According to Flick (2022), Merriam (1988), andStake (1995), triangulation is an integral part of case study research as it allows a researcher the opportunity to collect data using multiple methodologies, which have differing strengths and weaknesses so that when combined these deficiencies may be overcome.However, despite its importance, scholars differ in their use and purpose of triangulation in their data analysis.Velasco et al. (2021) explained their use of triangulation in their case study was to "to check the consistency of findings by using focus group and document data as secondary sources to inform analysis of the individual interviews, which were the primary data source" (p.443-444).Kirmaci et al. (2019) similarly stated that they used triangulation to enhance trustworthiness.However, some studies that claimed to be case studies did not use triangulation at all (e.g., Brown & Bogiages, 2019;Wen et al., 2021).
Member-checking is a process related to research credibility.King and Pringle (2019) utilized extensive member-checking with their young participants as they "diligently listened to the girls to co-construct their counter-stories with authenticity" (p.550).The authors sought to increase Black girls' engagement with STEM and utilized a methodology informed by critical race theory.Meaningfully including their participant into the project at multiple stages and empowering them as co-researchers not only increased their study's confirmability, but it also aligned with the goals of the authors' theoretical framework.Vázquez-Bernal et al. (2021) provided their participant the "opportunity to read and write as narratives a major part of the reports elaborated by the researchers in the first two phases" (p. 5) over eight years of participation.In this way, the researchers continuously performed member-checking, increasing the credibility of their findings.In contrast, Dolfing et al. (2020) presented their research as a case study yet made declarations about their data analysis that do not align with methodological texts.The authors stated, The interpretations of the results of both authors, in general, were comparable.Member checks were not performed during the program, as they would have had too great an influence on the process of sense-making, whereas, after the program, member checks would result in teachers' rational responses concerning their intuitive, emotional process of sense-making (p.145).
As Dolfing et al.'s (2020) stated purpose of their research was to understand teachers' process of sense making, the researchers' analysis could have been richer from including participants' thoughts of the researchers' analysis.Stake's (1995) rationale for member checks was, "I often do not have all my facts straight and I need help … I think I can say that all my reports have been improved by member checking" (p.116).
Each article reviewed utilized coding during data analysis.Individual researchers will undoubtedly utilize different coding methods given that the researcher is the primary instrument of analysis in qualitative studies.The more a researcher shares about their coding process, the better idea a reader has of how that researcher arrived at a particular result or analysis.Navy et al. (2020) provided a rich description of their coding process guiding the reader through multiple cycles of coding, creation of codes and subcodes, and the specific kind of coding that was utilized at each stage.With this, readers can more fully confirm the article's results as they have a deeper understanding of the methodological process that guided the research.Other authors gave few details about their coding process.For example, Walan (2020) coded deductively based on teachers' use and knowledge of technology in the classroom.The author gave little information regarding their coding process beyond coding alongside a scholar knowledgeable about TPACK and that they "totally agreed on all of them without any differences in our interpretations of data" (2018, p. 434).Additionally, the author stated that "inductive themes emerged through interpretive readings of the interview transcripts" (2018, p. 434).Claims that themes emerge give agency to data, taking away or obfuscating the researcher's role in data analysis.
While there is no agreement around how much of the research process should be described in a publication, descriptive detailing of research methods is an important facet of qualitative research.King and Pringle (2019) provided an example of this level of detail of their methods: We used a word processing program and copied and pasted all of the passages of interest for each participant into her own running document as a single transcript and read even closer to select the most compelling pieces to start crafting the counter-stories (p.553).This kind of detailed explanation increases the transparency of the researchers' methodology, which helps establish their findings' credibility, dependability, and transferability.
Understandably there is an inherent dissonance between qualitative research methodology from a traditionally positivist area such as science and by association science education.Velasco et al. (2021) are an example of researchers who straddled the two paradigms of research.Despite their credible qualitative language in the methods section, they justified their inter-rater reliability with descriptive statistics as percentages in the analysis.Another example is Fitzgerald et al. (2019) who based their study on interview data but applied Bayesian confirmatory analysis to their communities of concepts, which is a quantitative statistical inference method.
Further aspects of deep engagement with qualitative research methods include positionality and reflexivity.King and Pringle's (2019) study was the only article that included "Subjectivity Statements" from each author.They prefaced with "our personal histories, cultural worldviews, and professional experiences color our lens and decisions for how we approached this study" (King & Pringle, 2019, p. 543).Strachan (2020) included a "researcher positionality" section in acknowledging their outsider status with their participants.Strachan also addressed paying particular attention to matters of race and culture so as to not solidify "racialized deficit perspectives" (2020, p. 229).Chang et al. (2010) and Karampelas (2021) reviewed 1,401 and 6,504 articles, respectively, which can be considered to be more representative of science education research generally.For our project, we purposefully chose to examine a smaller number of articles to enable us to more carefully examine each article's methodology.As a result, the articles selected may not be representative of the breadth of published science education research.

Limitations
We were unable to account for the variable of reviewer input in the publication process.The 25 articles comprising this literature review were sourced from 16 individual journals.Each journal has countless reviewers, each possessing individual understandings and experiences with research paradigms.The anonymity given to reviewers makes it difficult to know a reviewer's evaluation criteria or paradigmatic orientations rendering this variable unable to be accounted for.
What we know about qualitative research is the result of completing a combined 48 hours of coursework in qualitative methodology at our institution as well as preparing for and presenting on original qualitative research at a combined 10 conferences, and countless hours of contact with colleagues and scholars engaged in qualitative inquiry.Through our doctoral studies, we have come to understand the history of qualitative research and its current status in our individual fields of study.We have engaged with countless methodological texts that are cited frequently in the literature and that have been written from wellestablished figures within qualitative research.Our analysis, while based on these experiences, which we consider to have made us well-informed on the topic, our analysis is our own and could differ from other scholars who have differing understandings of qualitative research.With this in mind, we were proactive in ensuring that we shared as much detail as possible about our analysis, coding, and other methods so that others more fully understood our analysis.
Ethical considerations in qualitative research extend far beyond approval from IRB or other institutional interests.Researchers must address a host of considerations during research planning, data collection, and reporting results.These considerations can be highly contextual based on the scope of the researcher's inquiry and their own personal researcher identity.Moreover, Roth and von Unger (2018) wrote that ethical issues can manifest in any phase of research requiring ethical reflexivity to be a core consideration of qualitative research.While we wished to analyze the 25 articles for the researchers' ethical considerations, we ultimately decided to save this inquiry for another project so that we may more fully attune to ethical considerations in research.Chang et al. (2010) and Karampelas (2021) identified from their literature reviews the topics covered in science education research over previous decades.We designed our study to go beyond their work with an in depth focus on the qualitative methodologies used in secondary science teaching research.Additionally, whereas Chang et al. (2010) and Karampelas' (2021) reviewed articles from a range of years, (1990-2007 and 2010-2020, respectively), our review considered only articles published since 2018.While Chang et al. (2010) and Karampelas (2021) claimed that qualitative science education research was on the rise, findings from our small scale study revealed few articles addressed science teaching and learning with deep qualitative engagement.While several articles contained elements that represent rigorous, high-quality engagement with qualitative research practices, there was no single article included in our review that was exemplary in all of its methodology.

CONCLUSIONS
Education, particularly science education, has long been dominated by methodological positivism and paradigm shifts that present difficulties to researchers (Kuhn, 1970).The struggle with this dichotomy can be seen in the Velasco et al. (2021) publication as well in our own findings, where we describe qualitative research features in the articles in terms of quantitative percentages.Likewise, Dogan et al. (2020) took the process of observation, an opportunity to provide rich qualitative data, and quantified their observations with the reformed teaching observation protocol (Sawada et al., 2002).Qualitative research is new compared to quantitative research and has gone through several phases.The development to this current moment has been driven by questions related to democracy, race, gender, class, freedom, and community (Given, 2008).The present paradigm, just as paradigms of the past, does not exist as a monolith, and flexibility is inherent and arguably needed as qualitative research pushes into the future.However, this does not mean that researchers evoking qualitative methodologies should ignore the developments in qualitative research, which have defined it over the years.
defined three major domains of science education research: (a) analysis of content structure, (b) research on teaching and learning, and (c) development and evaluation of instruction/instructional design.
The Handbook of Research on Science Education contained comprehensive syntheses of science education research.Volume one categorized its articles under the following headings: (a) science learning, (b) culture, gender, society, and science learning, (c) science teaching, (d) curriculum and assessment in science, and characterized a systematic review as containing four steps: (a) create search terms based on the research question, (b) choose databases with which to conduct the search, (c) conduct the search and collect articles, and (d) select articles based on inclusion criteria.
(a) primarily focused on science education, (b) in a secondary education setting, (c) qualitative studies, and (d) pertained to classroom teachers and teaching practices.

Figure 1 .
Figure 1.PRISMA diagram (Adopted from Moher et al., 2009) explained their observation protocol stating, "classroom observations were protocol driven.The observation and analytic protocol focused on three aspects of the lesson: texts, classroom activities, and classroom culture" (p.111).Similarly, Berge et al.'s (2020) article stated they "document[ed] classroom activities through video recordings" (p.67).Time durations20 (80%) of the articles explicitly cited times from their research process; these reported the time spent observing or the length of time of the interviews.For example, (a)Dogan et al. (2020) reported interviewing their participant for three years "in the fall of each year for periods of about 1 to 1.5 hours" (p.87);(b)Gardner and Tillotson (2019) recorded a total of 1383 min of classroom instruction out of four classrooms multiples times per week over three months; and