Unveiling Molecular Mechanisms and Biomarkers for Colitis-Associated Colorectal Cancer in Ulcerative Colitis
Sara Pashapour,1,*Nesa Kazemifard,2Maryam Farmani,3Shaghayegh Baradaran Ghavami,4Mohammad Kazemi,5
1. Research Institute for Gastroenterology and Liver diseases 2. Research Institute for Gastroenterology and Liver diseases 3. Research Institute for Gastroenterology and Liver diseases 4. Research Institute for Gastroenterology and Liver diseases 5. Isfahan University of Medical sciences
Introduction: Inflammatory bowel diseases (IBD), including ulcerative colitis (UC) and Crohn’s disease (CD), are chronic inflammatory disorders affecting the gastrointestinal tract. The exact causes of IBD remain unclear but involve immune dysregulation, microbial imbalance, environmental triggers, and genetic predisposition. These factors contribute to a significant complication: colitis-associated colorectal cancer (CAC). Colorectal cancer (CRC) is the third most common cancer globally, with UC patients facing a 2–3 times higher risk, particularly those with extensive, long-standing colitis. The risk of CAC in UC patients increases with disease duration, reaching 2%, 8%, and 18% after 10, 20, and 30 years, respectively. Incidence rates for CRC are higher in UC than in CD, due to UC’s extensive colonic inflammation. Key risk factors for CAC include family history, disease duration, inflammation severity, and primary sclerosing cholangitis. CAC occurs at a younger age, has a worse prognosis than sporadic CRC, and follows a distinct inflammation-neoplasia-cancer pathway from adenoma-carcinoma pathway of sporadic CRC. This study aims to identify novel biomarkers for early CAC detection in UC patients through bioinformatic analysis of microarray datasets.
Methods: The study involved analyzing microarray datasets to identify molecular mechanisms and biomarkers for UC, CRC, and CAC. Approximately 40 datasets were filtered from the Gene Expression Omnibus (GEO) database using criteria like sample size, publication date and species, focusing on expression by array studies. Datasets included GSE87466 and GSE75214 for UC, GSE44076 and GSE41568 for CRC, and GSE37283 for CAC. Data were merged, batch effects were removed using the R package sva and quality control was performed with ggplot2 and grid packages. Differential expression analysis was conducted using limma package (adjusted p-value < 0.05, |log2FC| > 1), generating lists of upregulated and downregulated differentially expressed genes (DEGs). Common and unique DEGs among UC, CRC, and CAC were identified using VennDiagram. Enrichment analysis of DEGs was also performed with R for REACTOME and KEGG pathways to explore biological functions and signaling pathways. Protein-protein interaction (PPI) networks were constructed using the STRING plugin in Cytoscape, with confidence cut-offs of 0.7 (common DEGs) and 0.6 (CAC-specific DEGs). The cytoHubba plugin identified key nodes as potential novel biomarkers for IBD and CAC based on network ranking factors like degree and betweenness.
Results: Microarray data from GSE87466, GSE75214, GSE44076, GSE41568, and GSE37283 were analyzed to identify DEGs in UC, CRC, and CAC. Principal component analysis (PCA) using R packages ggplot2 and grid confirmed data quality across 87 controls, 161 UC, 137 CRC, and 11 CAC samples. Differential expression analysis (|log2FC| ≥ 1, p ≤ 0.05) revealed 268 upregulated and 294 downregulated genes in UC, 234 upregulated and 314 downregulated genes in CRC, and 328 upregulated and 293 downregulated genes in CAC. Venn diagram identified 461 common DEGs (60.8%) across UC, CRC, and CAC, and 153 distinct DEGs (20.2%) unique to CAC. Enrichment analysis of common DEGs using the REACTOME database, identified key pathways including collagen degradation, degradation of the extracellular matrix and extracellular matrix organization. Additionally CAC-specific DEGs were enriched for distinct pathways, including leukocyte chemotaxis, leukocyte migration and chemotaxis. Protein-protein interaction (PPI) networks, built using Cytoscape’s STRING plugin, included 459 common nodes with 2632 edges (198 singletons removed) and 152 CAC-specific nodes with 201 edges (87 singletons removed). High-degree common DEGs included IL6 (degree=46), IL1β (degree=40), and MMP9 (degree=33), while VCAM1 (degree=7), TLR2 (degree=6), and MSN (degree=6) were prominent in CAC-specific DEGs. Using CytoHubba, ten common (e.g., IL1β, MMP9, SERPINE1, COL1A1) and ten exclusive (e.g., VCAM1, TLR2, ANXA5, CAV1) nodes were identified as candidate biomarkers based on 11 ranking factors, including betweenness and closeness.
Conclusion: This study elucidated molecular mechanisms linking UC, CRC and CAC through bioinformatic analysis of microarray datasets. By identifying 461 common and 153 CAC-specific DEGs, the research highlighted key genes (ANXA5, CAV1, VCAM1, TLR2) as potential biomarkers for CAC risk in UC patients. Enrichment and protein-protein interaction analyses revealed critical biological pathways and networks driving CAC. These findings enhance understanding of UC-associated tumorigenesis and propose novel biomarkers for early CAC detection, potentially improving diagnosis and targeted treatment strategies for IBD patients at risk of colorectal cancer.