Summary
This article is a complete methodological guide for doctoral students, aimed at optimizing the treatment, analysis and quantification of qualitative data. Against the apparent dichotomy between approaches, it is argued that the interpretative richness of qualitative data can be strategically complemented by quantification (quantitizing), thus reinforcing the validity and impact of research. The manuscript systematically addresses the methodological transition from data collection and cleaning to the iterative phases of open, axial, and selective coding, highlighting the rigorous treatment of emergent categories against preestablished theoretical frameworks. It also critically examines the process of converting qualitative narratives into numerical data, exploring advanced techniques ranging from frequency counting and the creation of categorical variables, to the application of Artificial Intelligence and Data Science using the R language for complex phenomena. It concludes with practical methodological recommendations, emphasizing that quantification should add analytical precision without eclipsing the contextual depth that characterizes qualitative research.
Keywords: Qualitative data analysis, Coding, Emergent categories, Quantitizing, Research methodology, Data science, Artificial Intelligence.
Introduction
In the development of a doctoral dissertation, the data analysis phase represents one of the most intellectually demanding challenges. Historically, qualitative research has focused on the deep understanding of meaning, language and experiences, seeking to describe complex phenomena rather than simply measuring them empirically. However, contemporary academic research demands a methodological rigor that often requires combining the contextual nuances of the qualitative approach with the structural precision of numerical and computational analysis.
Unlike quantitative research, which tends to operate under standardized algorithmic sequences and "recipes," the qualitative approach is inherently dynamic, flexible, and open-ended, requiring the researcher to interact with the environment and adapt his or her methods on an ongoing basis. The purpose of this article is to provide doctoral researchers with a comprehensive methodological roadmap on the treatment and analysis of qualitative data. By detailing the processes of coding, categorization, and quantification, this guide seeks to equip the student with theoretical, practical, and technological tools to manage narrative complexity and raise the interpretive quality of their research.
Qualitative Data Analysis Processing and Development
The approach to qualitative data begins long before its segmentation; it starts from the very conception of the phenomenon to be investigated. In the collection phase, the researcher acts as the main research instrument, obtaining information through in-depth interviews, focus groups, observations or textual data. Because qualitative collection usually generates massive volumes of unstructured data, the first methodological step is the correct organization, transcription and preparation of the empirical material.
The qualitative treatment process is not a linear path, but a constant cycle of revision and refinement. The researcher must begin by applying an initial "filter" through which the raw information is read and organized in light of the background and theoretical references of the study. This organizational step makes it possible to visualize a map of the research design, establishing an interrelated scheme of methods and evidence that will answer the doctoral questions. From this point on, the textual data (descriptions, narratives, emotions) are prepared for the next level of analytical abstraction.
The Codification and Categorization Process
The bridge between raw data and scientific conclusions is built through coding, an analytical process that translates textual information into a classifiable format. A qualitative code is a word or short phrase that symbolically captures the essence or evocative attribute of a piece of data. This procedure, which requires an iterative approach, is methodologically developed at three main levels:
- Open Coding: In this first immersion, the researcher codes the textual units through constant comparison, discovering initial ideas in the data collected. The information is disaggregated, paying attention to recurring themes, key phrases or patterns of behavior expressed by the participants.
- Axial Coding (or Relational Analysis): Subsequently, the concepts and categories discovered are related. It consists of contrasting the opinions and experiences of the informants to identify convergences and divergences, logically connecting the initial codes and grouping them into broader thematic families or categories.
- Selective Coding: Finally, the researcher integrates and condenses the previous categories to develop "substantive theories" or global constructs. This phase seeks to articulate an analytical narrative that responds directly to the core purpose of the doctoral research.
The use of specialized software (such as ATLAS.ti), as well as the integration of Data Science techniques -particularly through text mining libraries in the R language- and the use of Artificial Intelligence (AI) tools based on Natural Language Processing (NLP), exponentially optimizes these steps. These technologies make it possible to organize hierarchical coding systems, discover hidden semantic patterns and facilitate the systematic retrieval of codings from large volumes of data.
Categories and Emerging Issues: Rigor and Justification
A distinctive feature of qualitative rigor is the management of emergent categories. Whereas in a strictly deductive approach the researcher sticks to codes established a priori from the literature, the inductive nature of qualitative research allows the data themselves to "speak".
During the filtering of information, insights emerge from informants that the researcher had not anticipated in his original theoretical framework. Identifying and validating these emerging categories is fundamental; discarding them because they do not fit the preconceived theory constitutes a serious methodological bias. On the contrary, the emergence of these categories reflects the real complexity and innovation inherent in the phenomenon under study.
Methodologically, the validity of an emerging category is justified by two principles: theoretical saturation (the point at which the new data collected no longer contribute substantial variations to the category) and triangulation (the contrast of the new codes with different data sources or with the researcher's reflective notes).
Practical Example of Emergent Categorization: Let's imagine a doctoral research focused on proposing a new didactic model for the improvement of school discipline. The researcher, based on the literature, establishes deductive categories such as Institutional Regulation and Classroom Strategies. However, when transcribing and openly coding the interviews with teachers, the phrase "I feel that parents disallow my decisions in WhatsApp groups" emerges repeatedly.
This phenomenon was not foreseen in the initial theory. The researcher codes these fragments (codes: digital parental interference, public disavowal), groups them relationally (axial coding) and raises an Emergent Category named: "Digital Erosion of Teaching Authority". By integrating this emerging category with the established, the doctoral thesis acquires an invaluable originality that responds to the contemporary reality of the problem.
Quantifying Qualitative Data (The Why and the How)
Once the data have been narratively structured, the doctoral researcher can be faced with the decision to quantify them. This procedure, known in the international methodological literature as quantitizing (the numerical transformation of qualitative data), has become an essential technique to strengthen empirical findings.
The Why: The goal of quantifying qualitative data is not to reduce their phenomenological complexity, but to provide a multidimensional lens. From a "conditional complementarity" perspective, numerical conversion adds statistical-descriptive rigor. While the qualitative approach answers the "why" or "how," quantification visualizes the magnitude, prevalence or systematic intensity of the phenomenon at the aggregate level.
The How (Development and Practical Example): The process assumes that the boundaries between qualitative and quantitative are permeable. There are several methodological techniques, which we will illustrate by continuing with the example of the research on the didactic model and school discipline, assuming a corpus of interviews with 40 teachers:
Frequency Counting, Distribution and Artificial Intelligence: Consists of calculating the recurrence of a code. Using R scripts or AI algorithms, lexical frequency and associated sentiment can be quickly analyzed.
- Application: The analysis reveals that the emergent category "Digital Erosion of Teaching Authority" was coded 120 times throughout the transcripts, appearing in the discourse of 35 of the 40 teachers (87.5%). This simple quantification transforms a qualitative perception into a strong indicator of prevalence.
Dichotomization: The presence (1) or absence (0) of a particular theme in a unit of analysis is coded.
- Application: A Data Frame (data matrix) is created where the rows are the 40 teachers. If the teacher reported having suffered parental disavowal, a '1' is assigned; if not, a '0' is assigned. This prepares the data for descriptive statistical tests or predictive models in Data Science.
Ordinal Variables and Co-occurrence Matrices: Allows cross-referencing thematic codes with sociodemographic or contextual variables to identify latent patterns.
- Application: Using the above matrix, the researcher crosses the dichotomized category (1/0) with "Years of teaching experience" (categorical variable). The quantitizing reveals that 95% of Digital Erosion incidents occur in teachers with less than 5 years of experience.
This final quantification does not negate the richness of the interviews, but allows the doctoral candidate to assert with mixed support that the loss of school discipline has a direct correlation with the teacher's lack of experience in managing digital communication with parents.
Conclusions
For the doctoral researcher, the transition from the collection of stories, interviews and field notes to the construction of a well-founded thesis is complex but highly rewarding. Based on what is presented in this guide, the following practical recommendations are made:
- Assume flexibility as rigor: Understand that the qualitative route is not a straight path; it requires iterative recalculation and reevaluation of data.
- Balancing approaches: While starting with deductive (aprioristic) codes helps to organize the initial analysis, it is imperative to keep an open mind to authentically capture and integrate the emerging categories that emerge from the empirical field.
- Quantify with purpose, not by inertia: When using quantitizing, ensure that frequencies complement textual meanings, never replace them. Extracted numbers should always be interpreted under the original discursive context.
- Optimize technological and analytical resources: The use of qualitative software (ATLAS.ti), statistical programming languages (R) and Artificial Intelligence algorithms allows systematizing codes and modeling complex relationships in a much more transparent, deep and replicable way for the evaluation committees.
The meticulous integration of human voices and structured abstraction (whether theoretical, numerical or computational) will guarantee not only an academic degree, but a genuine and valuable contribution to the body of scientific knowledge.
Bibliographic References
Leal, J. [Javier Leal Data Science for Business]. (s.f.). Investigación Cualitativa - Análisis de Datos [Video]. YouTube. https://www.youtube.com/watch?v=-aqDJzuAk7g&t=16s
Sandelowski, M., Voils, C. I., & Knafl, G. (2009). On Quantitizing. Journal of Mixed Methods Research, 3(3), 208–222. https://doi.org/10.1177/1558689809334210
Stewart, L. (s. f.). Cuantificación de datos cualitativos. ATLAS.ti Research Hub. Recuperado el 8 de febrero de 2026, de https://atlasti.com/es/research-hub/cuantificacionde-datos-cualitativos
Stewart, L. (s. f.). Guía completa para el análisis cualitativo de datos. ATLAS.ti Research Hub. Recuperado el 8 de febrero de 2026, de https://atlasti.com/es/guias/guia-investigacion-cualitativa-parte-2/analisis-de-datos-cualitativos
Dr. José Javier Leal Rivero
Advisor of the Doctorate in Education and Innovation of the University of Research and Innovation of Mexico - UIIX.