Stanford Researchers Unveil Machine Learning Tool to Decode the Immune System for Disease Diagnosis

by Azzam Bilal Chamdy · July 9, 2025

Stanford Medicine researchers have developed a groundbreaking machine learning algorithm, Mal-ID, capable of deciphering the vast immunological memory stored within the human body to diagnose a wide spectrum of diseases, ranging from infectious agents like COVID-19 and influenza to complex autoimmune conditions such as lupus and Type 1 diabetes. This innovative approach leverages the intricate language of B and T cell receptors, the molecular sentinels of our immune system, to provide a more comprehensive and nuanced understanding of an individual’s health status. The development represents a significant leap forward from traditional diagnostic methods, which often overlook the rich repository of information encoded by the immune system’s lifelong encounters with pathogens, vaccines, and even the body’s own tissues.

The Immune System: A Living Chronicle of Health and Disease

The human immune system is an extraordinary biological archive, meticulously recording every encounter with foreign invaders and internal anomalies. This internal database, often described as a "biological Rolodex," contains detailed information about past battles fought against viruses, bacteria, and other threats. These records are primarily etched into the unique structures of B and T cell receptors. B cells produce antibodies, which are Y-shaped proteins that bind to specific antigens on the surface of pathogens, marking them for destruction. T cells, on the other hand, are crucial for cellular immunity; helper T cells coordinate immune responses, while cytotoxic T cells directly kill infected or cancerous cells. Both B and T cells possess receptors that are exquisitely specific, allowing them to recognize and respond to a vast array of molecular signatures.

The diversity of these receptors is staggering, generated through a process of random gene segment recombination. This inherent randomness ensures that the immune system can, in theory, recognize virtually any threat. When a B or T cell encounters a matching antigen, it becomes activated, proliferates, and differentiates into effector cells that mount a targeted immune response. The resulting increase in the prevalence of cells bearing receptors specific to a particular pathogen or antigen creates a unique immunological fingerprint. Mal-ID, the new tool developed at Stanford, is designed to read and interpret these fingerprints.

Mal-ID: Harnessing Machine Learning for Immunological Diagnosis

At the heart of this breakthrough is Mal-ID, a machine learning model inspired by the principles underlying large language models like ChatGPT. These sophisticated algorithms excel at identifying complex patterns within vast datasets. In this case, the researchers trained the model on millions of B and T cell receptor sequences and their structural data. By analyzing these sequences, Mal-ID can identify subtle commonalities and distinct patterns that correlate with specific disease states or immune responses.

"The diagnostic toolkits that we use today don’t make much use of the immune system’s internal record of the diseases it has encountered," explained Maxim Zaslavsky, PhD, a postdoctoral scholar and lead author of the study published in Science. "But our immune system is constantly surveilling our bodies with B and T cells, which act like molecular threat sensors. Combining information from the two main arms of the immune system gives us a more complete picture of the immune system’s response to disease and the pathways to autoimmunity and vaccine response."

The study involved nearly 600 participants, encompassing healthy individuals, those infected with SARS-CoV-2 (the virus responsible for COVID-19), individuals with HIV, people who had recently received an influenza vaccine, and patients diagnosed with autoimmune conditions like lupus and Type 1 diabetes. The Mal-ID algorithm was tasked with classifying these individuals based solely on their B and T cell receptor profiles. The results were remarkably accurate, demonstrating the algorithm’s potential to serve as a powerful diagnostic tool.

Deciphering the "Language" of Immune Receptors

The researchers employed a novel approach, treating receptor sequences as a form of biological "language." Large language models are adept at learning the grammar, syntax, and context of human languages, enabling them to predict words, generate text, and understand complex meanings. Similarly, Mal-ID was trained on the "language" of proteins, specifically the amino acid sequences that constitute B and T cell receptors. This training allowed the model to discern similarities in receptor structures and binding preferences, even when the sequences themselves appeared highly variable.

"The sequences of these immune receptors are highly variable," Zaslavsky elaborated. "This variability helps the immune system detect virtually anything, but also makes it harder for us to interpret what these immune cells are targeting. In this study, we asked whether we could decode the immune system’s record of these disease encounters by interpreting this highly variable information with some new machine learning techniques. This idea isn’t new, but we’ve been missing a robust way to capture the patterns in these immune receptor sequences that indicate what the immune system is responding to."

The inherent diversity of B and T cell receptors arises from a complex genetic shuffling process. Segments of DNA are randomly combined, with occasional additional mutations, to create trillions of unique receptor variants. While this diversity is essential for broad threat detection, it also presents a challenge for human interpretation. Mal-ID bridges this gap by applying machine learning to identify recurring patterns that signify specific immune system activities.

A Unified Approach: Combining B and T Cell Insights

A key finding of the study was the complementary nature of information derived from B and T cell receptors. While T cell receptor sequences proved particularly informative for diagnosing autoimmune diseases like lupus and Type 1 diabetes, B cell receptor sequences were more effective in identifying infectious agents such as SARS-CoV-2 and HIV, as well as responses to vaccinations like the influenza shot. Crucially, when data from both B and T cell receptors were combined, the accuracy of Mal-ID’s classifications improved significantly across all studied conditions, irrespective of demographic factors like sex, age, or race.

"Traditional approaches sometimes struggle to find groups of receptors that look different but recognize the same targets," noted Zaslavsky. "But this is where large language models excel. They can learn the grammar and context-specific clues of the immune system just like they have mastered English grammar and context. In this way, Mal-ID can generate an internal understanding of these sequences that give us insights we haven’t had before."

Broader Implications and Future Directions

The potential applications of Mal-ID extend far beyond initial disease diagnosis. Researchers envision its use in tracking patient responses to cancer immunotherapies, a rapidly evolving field that harnesses the immune system to fight cancer. Furthermore, the algorithm could help to subcategorize diseases, revealing underlying biological differences that might not be apparent with current broad diagnostic labels.

"Several of the conditions we were looking at could be significantly different at a biological or molecular level, but we describe them with broad terms that don’t necessarily account for the immune system’s specialized response," said Scott Boyd, MD, PhD, a senior author of the study and co-director of the Sean N. Parker Center for Allergy and Asthma Research. "Mal-ID could help us identify subcategories of particular conditions that could give us clues to what sort of treatment would be most helpful for someone’s disease state."

This ability to identify distinct disease subtypes holds immense promise for personalized medicine. For complex autoimmune diseases like lupus, which can manifest with a wide range of symptoms and progress differently in individuals, Mal-ID could provide the precision needed to tailor treatments effectively. Patients often endure lengthy diagnostic journeys, facing uncertainty and receiving treatments that may not be optimal for their specific disease profile.

"Patients can struggle for years before they get a diagnosis, and even then, the names we give these diseases are like umbrella terms that overlook the biological diversity behind complex diseases," Zaslavsky stated. "If we can use Mal-ID to unravel the heterogeneity behind lupus, or rheumatoid arthritis, that would be very clinically impactful."

Moreover, Mal-ID’s capacity to uncover patterns in immune responses, even without prior knowledge of the specific target molecules, opens new avenues for therapeutic discovery. By identifying common immunological signatures associated with certain conditions, researchers may be able to pinpoint novel targets for drug development.

"The beauty of this approach is that it works even if we don’t at first fully know what molecules or structures the immune system is targeting," Boyd added. "We can still get the information simply by seeing similar patterns in the way people respond. And, by delving deeper into these responses we may uncover new directions for research and therapies."

The Road Ahead: From Research to Clinical Application

The development of Mal-ID is the culmination of extensive research and collaboration, involving a multidisciplinary team of scientists from Stanford Medicine and numerous other institutions. The study received significant funding from various government agencies, including the National Institutes of Health, and private foundations, underscoring the recognized importance of this line of research.

While the current study focused on six distinct immune states, the researchers are optimistic about Mal-ID’s scalability and adaptability. The underlying machine learning architecture is designed for rapid deployment to identify immunological signatures for a much broader array of diseases and conditions. The next steps will likely involve validating Mal-ID in larger, more diverse patient cohorts and exploring its integration into clinical workflows. The potential to transform disease diagnosis, treatment stratification, and therapeutic development makes Mal-ID a beacon of innovation in the ongoing quest for improved human health.

The study was a collaborative effort involving researchers from: the Swiss Tropical and Public Health Institute, the University of Basel, the Oklahoma Medical Research Foundation, the University of Pennsylvania, the University of Cincinnati, the Cincinnati Children’s Hospital Medical Center, the Icahn School of Medicine at Mount Sinai, Duke University, the Swedish Medical Center, the University of Washington, the Institute for Systems Biology, the Harvard T.H. Chan School of Public Health, Beth Israel Deaconess Medical Center, New York University, and the Lupus Foundation of America.

Funding for the research was provided by: the National Institutes of Health (grants R01AI130398, R01AI127877, U19AI057229, U54CA260518, U19AI167903, 5R01 EB001988-16, UM-1 AI100645, UM1 AI144371, AI 101093, AI-086037, AI-48693, R01AI153133, R01AI137272, 3U19AI057229-17W1 COVID SUPP2, AR07375, UM1AI144292, NIDDK P30DK116074, U54CA260518, U19AI167903, R01 AI175771-01, R01 CA264090-01, U19 AI057229 and 1U54CA260518), the National Science Foundation, the Burroughs Wellcome Fund, the Sunshine Foundation, the Henry Gustav Floren Trust, a philanthropic gift from Eva Grove, and a philanthropic gift from an anonymous donor.

You may also like