Molecular Biology and Nanomedicine

© The Author(s) 2021. / Viewed: / Downloaded:35 / Cited:0 / DOI:10.37813/j.mbn.2707-4692.016

Screen and Identify Key Biomarkers Related to Bladder Cancer Through Multi-View Analysis


Screen and Identify Key Biomarkers Related to BladderCancer Through Multi-View Analysis

Xiaxia Wang 1,2, YanruiDing 1,3,*

1     School of Science, Jiangnan University, Wuxi214122, China;

2     Laboratory of Media Design and SoftwareTechnology, Jiangnan University, Wuxi 214122, China

3     Key Laboratory of Industrial Biotechnology,Jiangnan University, Wuxi 214122, China

*Correspondence to:E-mail:; Fax/Tel.: +86-510-85916929

Received: 24 January 2021; Accepted: 28 March2021; Published:29 March 2021

Abstract: Accurate tumor staging is the premise andfoundation of precision medicine. This study aims to find the effectivebiomarkers in the staging process of bladder cancer tumors. First, screen keygene module related to the staging of bladder cancer. Then, taking the hubgenes in the key gene module as features, a stage prediction model isconstructed by logistic regression and stochastic gradient descent method. Itshows that themolecular combination of CTHRC1, COL6A3, TIMP2 andSULF1 can be used as an indicator for stage diagnosis. Importantly, the COL1A1is not only the pivot gene in the weighted gene co-expression network, but alsothe core of the protein-protein interaction network. And a set of single-cellRNA sequencing data in Gene Expression Omnibus (GEO) database is used to verifythe results. The consistent conclusions indicate that COL1A1 is a potentialoncogene and biomarker in the progression of bladder cancer, and can be used asa molecular target for the treatment of bladder cancer to slow the rate oftumor development. In addition, function and pathway enrichment analysis showthat focal adhesion is a crucial pathway in the tumor metastasis of bladdercancer, and members of the fibrous collagen family play an important role inthe proliferation and spread of tumor cells. The biomarkers and pathwaysidentified in this study provide an important basis for the determination ofpersonalized medical programs.

Keywords: bladder cancer; biomarkers; focaladhesion; tumor staging; COL1A1

1. Introduction

The Bladder cancer(BC) is one of the most common malignant tumors in the urinary system. Itsmorbidity and mortality rate rank first among malignant tumors of the urinarysystem, seriously endangering the lives and health of people all over the world[1].Due to the biological characteristics of BC that are prone to relapse, invasionand metastasis, about 30% of early patients eventually develop muscularinvasive or metastatic BC. The survival rate of these patients at the advancedstage of cancer is only about 50% [2,3]. Studies have shown that tumorstaging can be used to evaluate the biological behavior and prognosis ofmalignant tumors [4,5].Therefore, starting from the pathogenesis of BC, studying the different stagesof tumor development at the molecular level and exploring the mechanism of itsrecurrence and metastasis, which is of great practical significance for thediagnosis and prevention of BC.

In recent years,researches related to BC have received extensive attention, including theidentification of protein markers and specific pathways [6,7], the screening ofurine biomarkers [8,9],and the discovery of a series of oncogenic signaling pathways and therapeutictargets [10,11].Although many molecular markers have been developed for clinical treatment ofBC, it is still a challenge to make a systematic explanation and effectivetreatment plan for tumor staging. An appropriate and unified prediction modelcan improve the accuracy of staging diagnosis of BC. Ahmadi et al. incorporatesmultiple factors such as age, number of intravesical treatments, and invasionof lymphatic vessels into the principal component analysis (PCA) model to predictpathological staging [12].Kouznetsova et al used the biomarkers including metabolites and correspondinggenes for different stages of BC to build a binary classifier for early- andlate-stage BC [13].Smith et al developed a gene expression model to predict the molecular nodalstaging of BC [14].However, few studies have constructed a staging prediction model for BC basedon the multi-view information of the data. On the one hand, it will providevaluable biomarkers and a certain theoretical reference for clinical medicine,which can help clinicians identify high-risk patients and implement furthertreatments, and ultimately improve the prognosis of patients. On the otherhand, it will avoid unnecessary invasive surgery, reduce the secondary injuriesassociated with BC screening, and reduce unnecessary medical expenses and wasteof resources.

Weighted geneco-expression network analysis (WGCNA) can be used for network-based genescreening, which in turn can be used to identify candidate biomarkers andtherapeutic targets. It is powerful in exploring the hub genes of varioustumors [15–18].As we know, there are complicate relations among genes, proteins and pathways.It is meaningful to identify the key biomarkers related to the stage of BC frommulti-view of data. Therefore, in this study, we built co-expression networksof co-expressed genes and used WGCNA to identify the key module related to thetumor staging of bladder cancer. According to the key module, the crucialpathway in the tumor metastasis of BC was determined. Then, combining the genenetwork and the protein-protein interaction network, we screened the keybiomarkers of BC progression, and verified the performance of the key markersfrom the outside using different data sets.

2. Materials and Methods

2.1. Data Collectionand Processing

Transcriptome data andclinical data of BC patients are from the Cancer Genome Atlas (TCGA) database ( Then, use Perl programand R program to preprocess the original data, and delete the samples withouttumor staging information. The top quarter of the mRNA with the highestvariance are selected, and finally 402 cancer samples and 4879 gene sets are asthe preliminary research object [19]. Among them, 131 patients are inthe early stage of bladder cancer, and 271 patients are in the advanced stageof bladder cancer. The raw data on key clinical information for BC patients arepresented in Supplementary Table S1. In addition, the human bladder cancer single-cellRNA sequencing data in the GSE146137 data set from the GEO database( are used as a validation set, whichcontains gene expression data of 3847 cells.The workflow of this study isshown in Figure 1.

Figure 1. The workflow ofdata analysis.

2.2. Identify KeyModule

As we all know,systematically exploring the interaction mode between genes is very importantto understand the mechanism of cancer occurrence and development. Therefore, weuse WGCNA algorithm to construct the gene co-expression network, this analysismethod uses the similarity of gene expression in samples to find co-expressedgene modules. In order to make the network more concise and effective, wefurther filter the samples and genes. First, outlier sample data is removed bychecking the expression data spectrum of BC patients. Then genes are clusteredby setting the minimum number of genes in the module to 40, and gene dendrogramis generated by module recognition. Highly similar modules are merged intogroups by hierarchical clustering with a height shear threshold of 0.25. Inaddition, WGCNA can associate gene modules with external clinical information,and then further identify gene modules with clinical significance. The genemodule with the highest correlation coefficient with tumor staging isconsidered to be the key module, which is worthy of our priority analysis andin-depth exploration. Therefore, the genes in this module will be included inthe next analysis.

2.3. Function and PathwayEnrichment Analysis

We perform Gene Ontology(GO) function analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG)pathway enrichment analysis to reveal and determine functional roles of genesin the key module in the progression of BC. GO analysis includes biologicalprocess (BP) analysis, cellular component (CC) analysis and molecular function(MF) analysis. According to which GO biological process classifications andKEGG pathways are enriched by genes, we determine the molecular biologicalmechanisms related to tumor staging from gene functions and pathway functions,respectively.

2.4. Identify Hub Genes in Gene Networks

In the key module, itis necessary to search for the most important genes. Generally, there are threecriteria to screen for the hub genes in the network. The first is modulemembership (MM), it is the correlation coefficient value of gene expression andeigengene of the module. The second is gene significance (GS), it indicates thecorrelation coefficient value of gene expression and corresponding phenotypevalue. The last is connectivity, it means that hub genes are a series of geneswith the highest connectivity in a gene module. Here,the hub genes satisfy GS >0.25 and MM > 0.8, and genes are ranked in the top 20 for connectivitywithin key module.

2.5. Validation of Hub Genes

First,we perform quality control and data filtering on the single-cell RNA sequencingdata, and standardize the data. Next, identify highly variable features thatshow higher cell-to-cell differences in the data set, that is, they are highlyexpressed in some cells and low in other cells. In order to reduce theinstability of gene expression caused by the cell itself and batch effects, weuse principal component analysis (PCA) to retain most of the gene expressioninformation. Then use t-Distributed Stochastic Neighbor Embedding (t-SNE) tocluster the cells. By comparing the average value of gene expression in eachcluster and other cell clusters, we can obtain the set of highly expressedgenes in each cell subgroup, that is, the marker genes. And further verifywhether the hub genes obtained from our previous analysis is also the markergenes in the single-cell RNA data.

2.6. Construction of StagePrediction Model

Exploring therelationship between key module and tumor staging can be simplified to studythe relationship between the hub genes of this module and tumor staging. Therefore,in this study, we set the samples in the early stage of cancer to label 0, andthe samples in the advanced stage of cancer to label 1, and the hub genes asthe characteristics of the model to build stage prediction model. The essenceof the stage prediction model is to determine the probability P (y | x) thatthe sample x belongs to the category y, this process is a typical binaryclassification problem. Therefore, we use the logistic regression algorithmwith L1 penalty term to classify the samples. Since there is usually a strongcorrelation between variables, the L1 norm as a penalty term can reduce thefeatures dimension.

For agiven BC data set: . Randomly select 80% of the dataas our training data set, and use five-fold cross-validation to adjust thevalue of the hyperparameter . When the optimal model isobtained on the validation set, the hyperparameters of the model can be used toretrain the model at this time. Assuming that the probability that theprediction is 1 is p, the logistic regression can be expressed as follows:


Among them, the feature vectorcoefficient w is unknown and needs to be learned from samples. Here, we use themaximum likelihood estimation method to construct the objective function, andthe objective function is as follows:


Then use the stochastic gradientdescent method to solve the parameters in the model. The gradient descentmethod finds the direction of descent by solving the first derivative of L(w)with respect to w, and updates the parameters in an iterative manner. Theupdate method is:


Where‘t’ is the number of iterations and ‘h’is the learning step length. After each parameter update, if is less than a threshold e, or the number of iterations isgreater than 500 times, then the iteration process is terminated.

Finally, evaluate thefinal performance of the model in the remaining 20% of the test data set.Evaluation indicators include accuracy, precision, recall, F1_score and the areaunder curve (AUC). Among them, accuracy is the most commonly used classificationperformance index, its definition is: for a given test data set, the ratio ofthe number of samples correctly classified by the classifier to the totalnumber of samples. The F1_score considers both the precision rate and therecall rate. And AUC is defined as the area under the receiver operating characteristiccurve, usually greater than 0.5 and less than 1.

2.7. Construction ofProtein-protein Interaction Network

For all the genes inthe key module, we use the STRING database ( to buildprotein-protein interaction (PPI) relationships to screen for hub proteins inthe PPI network.

3. Results

3.1. Key Module Related tothe Tumor Staging

402 cancer samples areclustered by WGCNA algorithm and 8 outliers are deleted. Based onthe dynamic hybridcutting method in the dynamic cutting tree algorithm provided by the WGCNAsoftware package, a total of 13 gene modules are identified. Principalcomponent analysis (PCA) is performed to calculate the module eigengenes (MEs)of each module. Then perform hierarchical clustering on the modules. As shownin Figure 2A,the height of each branch is 1 minus the correlation coefficient between thetwo modules. We merge the modules with high correlation (the correlationcoefficient is greater than 0.75) into one cluster. In this clustering tree,the corresponding result is to merge modules with height less than 0.25. Fromthe red line in Figure 2A,we can see that no modules are merged in this experiment. Different modules aremarked with different colors, and then the number of clusters and moduleinformation are integrated. According to the minimum number of genes in themodule we set, the genes that cannot form a module will be merged into the greymodule, which is not considered here.

In order to study thecorrelation between network modules and tumor staging, the correlationcoefficient and significance value between the MEs of each module and tumorstaging are calculated (Figure 2B).And we find that the red module shows a higher positive correlation with thetumor staging than other modules, suggesting that the nodes contained thereinmay have an important relationship with the progression of bladder cancer.Therefore, in the subsequent analysis, we will focus on the genes of the redmodule, which has a total of 166 genes.

Figure2.Identification of modules related to tumor staging ofbladder cancer based on WGCNA. (A) Clustering tree of 13 co-expressionmodules, the red line represents the horizontal line when the height is 0.25; (B)The correlation between the characteristic value of each module and the tumorstage, where the number in parenthesis represents the p-value of statisticaltest.

3.2. Function and PathwayEnrichment Analysis

Through gene ontologyanalysis of the genes in the red module, we find that these genes are mainlylocated in the extracellular matrix, collagen-containing extracellular matrix,etc. The biological processes involved include extracellular matrixorganization, extracellular structure organization, collagen fibrilorganization, collagen metabolic process, etc. The molecular functions involvedare extracellular matrix structural constituent, collagen binding,extracellular matrix structural constituent conferring tensile strength, etc.The top first five significantly enriched GO Term of each part are shown in Figure 3A and Figure 3B.

Similarly, theenrichment analysis of the KEGG pathway shows that the genes of the red moduleare mainly enriched in 11 pathways (Table 1), suggesting that theprogression of bladder cancer may be related to multiple signaling pathways.Among them, some genes of the fibrous collagen family (COL gene family) areenriched in multiple pathways, suggesting that members of this family have acertain effect on the proliferation and spread of tumor cells. In addition, wefind that a large number of genes are significantly enriched in ECM-receptorinteraction and focal adhesion. Research has shown that the ECM-receptorinteraction is a microenvironmental pathway to maintain cell and tissuestructure and function. And extracellular matrix (ECM) is an intricate networkof extracellular macromolecules, which affects cell metabolism, migration,reproduction and differentiation, and provides a convenient place for cellsurvival and activity [20,21].From Figure 3C,we can find ECM-receptor interaction as the upstream pathway of focal adhesion.The dysregulated mRNA in ECM will lead to dysfunction of ECM and its receptors,and further abnormally activate focal adhesion pathway, and then triggerPI3K-Akt, MAPK, Wnt and other downstream signal transduction pathways toregulate cell proliferation, adhesion, migration, survival and DNA damage.

Table 1. The pathways enrichedin the red module.







ECM-receptor  interaction






Focal adhesion







Protein  digestion and absorption






Human  papillomavirus infection







PI3K-Akt  signaling pathway







Proteoglycans  in cancer











TGF-beta  signaling pathway





AGE-RAGE  signaling pathway in diabetic complications





Relaxin  signaling pathway









Figure3.(A) Bubble chart of Significantly enriched GO TERM. The y-axis is the enriched term, and the x-axis represents the ratio ofthe number of enriched genes to the total number of genes in the red module.The size of the circle represents the number of enriched genes. The colorrepresents the significant degree of enrichment.BP stands for biological process, CC stands for cellularcomponent, and MF stands for molecular function; (B) Bar graph of Significantly enriched GO TERM, where the lengthof the horizontal bar represents the number of genes enriched in each pathway;(C) Focal adhesion pathway diagram, the value in the upper right cornerrepresents -log10 (p-value).

According to previousreports, focal adhesion kinase (FAK) is a non-receptor protein tyrosine kinasethat is triggered by special extracellular signals such as certain growthfactors and integrins. FAK exists at the site of cell matrix attachment and isrelated to cell migration, invasion, movement, gene expression, survival, andapoptosis. And because FAK is involved in the carcinogenic signals of invasionand migration, it is considered to be a new therapeutic target for thetreatment of BC patients [22].Later studies also showed that the expression pattern of FAK is closely relatedto the pathological stage of BC patients, disease progression andcancer-specific survival rate [23].Interestingly, in our study, we accidentally find that the combination ofintegrin (ITGA and ITGB) and extracellular matrix protein can directly orindirectly activate FAK, and the connection between FAK and different proteinscan initiate the corresponding downstream signal transduction events. In otherwords, changes in gene expression are the prerequisites for these signalingevents to finally work. In addition, different cell morphological changes andthe regulation of gene expression are triggered by the binding of growthfactors to their respective receptors, which fully reflects the considerablecrosstalk between adhesion and growth factor-mediated signaling.

3.3. Hub Genes in GeneExpression Network

Plot the scatter plotsof GS and MM on the genes of the red module (Figure 4A), we can find thatgenes with high MM generally have higher importance. Then calculate the intra-module connectivity of allthe genes of the red module, that is, the direct weighted correlation of thegenes, and find that there are 9 hub genes. The GS, MMand the internal connection degree of the red module genes are listed in Supplementary Table S2.Since thehub node represents the characteristics of the module to some extent, exploring therelationship between the red module and the tumor staging can be simplified tostudy the relationship between the hub genes of the red module and the tumorstaging. From Figure 4B,we find that there are relatively high correlation coefficients among these 9hub genes, indicating that the abnormality of one gene may affect theexpression of other genes.

Figure4.(A) Scatterplot of gene significance and modulemembership of genes in red module; (B) The correlation values betweenhub genes, where red represents positive correlation and blue representsnegative correlation.

3.4. Validation of HubGenes

Select 2000 genes withlarge coefficients of variation between different cells, and perform PCAdimensionality reduction. And select the top 20 principal components andperform t-SNE cluster analysis to obtain 8 cell populations, and differenttypes of cell populations are marked with different colors (Figure 5A). Each clusterselects the first 6 marker genes to make a heat map (Figure 5B), and draw the distributionpoint map of hub genes obtained in the previous analysis in each cluster (Figure 5C). We find that almostall of these hub genes are only highly expressed in cluster3 and cluster6.Through the annotation of cell types, we noticed that the cell type annotatedby cluster3 and cluster6 is Fibroblasts, which is different from other cellpopulations. The results indicate that these marker genes are all highlymutated genes of Fibroblasts, which may be potential oncogenes and biomarkersfor the diagnosis and treatment of bladder cancer. This result also furtherconfirms the accuracy of the previous analysis.

Figure5.(A) PCA and t-SNE cluster analysis, each colorrepresents a cluster; (B) Heat map of marker genes in each cluster; (C)Bubble chart of gene expression of hub genes in different clusters, theabscissa is the hub genes, the ordinate is the different cluster, the size ofthe dot represents the cell proportion of the gene, and the color representsthe average expression level.

3.5. Layered Analysis ofDifferent Stages

According to the staginginformation in the clinical documents, we divide the patients into two groups,one group is patients in the early stage of cancer, and the other group ispatients in the late stage of cancer. The Kaplan Meier (K–M) method is used to draw thesurvival curves of the two groups of bladdercancer patients in the early and late stages, in which the curve of patientswith early cancer is shown in red, and the curve of patients with advancedcancer is shown in blue(Figure 6A).The log-rank test is used to calculate the difference in survival rates betweenthe two groups.In order to display the results more intuitively, we also show the number ofpatients corresponding to each time point. From Figure 6A, We can see that thesurvival rates of the two groups have declined over time, and through the p-valuewe can know whether they are different. Since the p-value here is much lessthan 0.01, the survival rate of the two groups of patients is different and thedifference is very significant. Next, we can see the five-year survival rateeach group of patients. The five-year survival rate of patients in the earlystage of cancer is about 70%, and the five-year survival rate for patients withadvanced cancer is about 30%. This result is very reasonable. Most patientswith early-stage BC are well-differentiated or moderately differentiatednon-muscle invasive BC, and most patients with advanced BC are muscle-invasiveor metastatic BC. In general, the tumor staging of a tumor is directly relatedto the malignancy of the tumor and the degree of disease progression. Thehigher the stage, the worse the prognosis.

Looking further at thedifference in the expression of the hub genes in the gene network between thetwo groups (Figure 6B),the results suggest that the high expression levels of hub genes in advanced BCtissues is higher than those in patients with early BC. It is further confirmedat the gene expression level that the hub genes are related to theproliferation and migration ability of BC cells.

Figure 6.(A) Survival curve of the patient. The abscissais the survival time, and the ordinate is the survival rate and the number ofpatients; (B) The heat map of the difference between hub genes in earlypatients and late patients. Among them, red represents patients with earlytumors and blue represents patients with advanced tumors.

3.6. Construction ofStaging Model

We try to use the hubgenes in the gene network to construct a staging model for BC patients. First,narrow down the range of target genes through L1 norm, screen out four genesCTHRC1, COL6A3, TIMP2 and SULF1, and further use logistic regression algorithmand stochastic gradient descent method to build a staging prediction model. Andcalculate the value of each indicator in the model through the test data set toevaluate the effect of the model. The accuracy rate is 75.87%, the precisionrate is 81.35%, the recall rate is 83.06%, the F1_score is 82.06%, and the AUCis 78.72%. Albeit inspired by Kouznetsova et al.’s method [13], our model has someinnovation: less input data is needed for prediction without sacrificing thepredictive performance. From the comprehensive score, it can be seen that justusing the four genes as feature vector, the prediction model can predict thestage of nearly 4/5 BC patients, indicating that the combination of these fourgenes can be used as a cofactor for predicting patient staging. Impressively,it outperforms previous models [12,14].

3.7. Hub Proteins Relatedto the Tumor Staging

Inthe STRING database, set the minimum required interaction score > 0.9, hidedisconnected nodes in the network, and construct a PPI network for the proteinsencoded by the genes (Figure 7A).Calculate the degree of nodes in the network, and visualizethe top 20 nodes with the highest degree of connectivity (Figure 7B). We find that thegenes of the COL family seem to be more important, showing that they are widelylinked to proteins encoded by other genes.

Figure 7.(A) PPI network, line color indicatesthe type of interaction evidence; (B) The degree of nodes in thenetwork.

We select nodes with adegree of connectivity greater than 20 as the hub nodes of PPI networks. And wefind that type Ⅰ collagen (COL1A1) is not only the hub gene of the geneexpression network, but also the hub protein in the PPI network. In addition,the protein encoded by the COL1A1 gene has the highest connectivity (connectivity= 27), suggesting that the COL1A1 is at the core of the network. According toreports, COL1A1 is the main component of the fibrous collagen family, which ismainly involved in the composition of the extracellular matrix structure and isconsidered to be a tumor-related gene, participated in the invasion andprogress of various tumors [24–26].Through the analysis of this study, we believe that COL1A1 is a key geneinvolved in the progression of BC.

4. Discussion

Cancer is generallydivided into four stages in the world. Stages I and stage II belong to theearly stage, and the chance of recovery after treatment is large. Stages IIIand stage IV belong to the advanced stage, and the five-year survival rate isrelatively low. In general, the stage of cancer is the degree of infection andspread of cancer, the higher the stage, the higher the tumor progression. Alarge number of studies have shown that the staging of BC is closely related toits recurrence and aggressive behavior, but so far, no clear molecular targethas been widely recognized. In this study, through WGCNA and the analysis ofPPI network, and combined with the verification of GEO external data sets, webelieve that COL1A1 is highly correlated with tumor progression, and may be apotential oncogene and biomarker for the diagnosis and treatment of BC.

According to GOenrichment analysis and KEGG signal pathway analysis, we find that thebiological processes involved in genes of key module mainly include cell matrixadhesion, regulation of extracellular matrix tissue, etc. The enriched pathwaysmainly include ECM-receptor interaction, focal adhesion, protein digestion andabsorption and PI3K-Akt signaling pathway. These biological functions andpathways are related to the proliferation and migration of tumor cells, whichlargely determine the degree of tumor invasion and the prognosis of patients [27–30]. The results of thepathway map show that the combination of ECM and related integrins leads to theactivation of cell adhesion and focal adhesion-related signaling pathways, andsubsequently activates downstream signaling pathways such as PI3K-Akt relatedto cell proliferation and invasion. The ECM is an active and complexmicroenvironment that can indirectly or directly control cell adhesion, migration,proliferation and differentiation, and participate in organ and tissueregeneration and homeostasis [31].Interestingly, we are pleasantly surprised to find that most of the genes comefrom gene family related to collagen formation (COL gene family), includingCOL1A1, COL1A2, COL6A3, COL6A2, COL6A1, etc. Studies have shown that theprocess of BC development is a dynamic process that interacts with the tumormicroenvironment. Tumor infiltration and migration of BC are closely related tothe microenvironment [32].Collagen is the main component of the extracellular matrix of BC cells and theinterstitial microenvironment. And collagen can provide growth attachment andscaffold for tumor cells and induce tumor cell migration [33]. Evidence suggests thatcollagen synthesis increases when BC occurs [34]. Therefore, wespeculate that during the development of BC, the collagen family may affect theformation or degradation of intercellular adhesion complexes, thereby affectingthe proliferation and spread of tumor cells. It may even integrate somesignaling pathways to induce epithelial-mesenchymal transition, which leads tothe infiltration and metastasis of BC tumor cells.

Type I collagen isencoded by the COL1A1 gene, which can constitute collagen fibers and at thesame time as an effective component of bone marrow, involved in cellproliferation, infiltration, metastasis and angiogenesis. In this study, it canbe found that the expression level of COL1A1 increases with the increases oftumor stage, suggesting that its expression may promote tumor invasion andmetastasis and play a role in the occurrence and development of BC. And studieshave confirmed this result [35],further illustrating that COL1A1 may become a new intervention target for thetreatment of BC. Previous reports indicate that COL1A1, as a proto-oncogene, isclosely related to the occurrence of various tumors. Studies have suggestedthat COL1A1 can be used as a marker for early screening of gastric cancer [36]. And COL1A1 isassociated with non-small cell lung cancer [37]. Other studies havefound that COL1A1 promotes colorectal cancer metastasis by regulating theWNT/PCP pathway [38].Liu et al. confirmed that COL1A1 can mediate breast cancer metastasis and canbe used as a potential therapeutic target for breast cancer [39]. And COL1A1 is alsoan important marker for the occurrence and development of liver cancer [40].

Constructinga staging prediction model based on BC is a work of practical clinicalsignificance. Staging prediction models can facilitate the development of newstaging predictors for BC patients. And because the tumor staging of BC playsan important role in BC treatment selection and prognosis judgment, oursolution significantly reduces the cost of sequencing, which makes theapplication of gene-specific targeted sequencing more cost-effective androutine. Moreover, if there is more tumor staging data, we believe that ourmodel can better fit the data. However, the development of precision medicineand personalized medicine needs to clarify the role of more BC related genesand its relationship with the clinical traits of BC patients. Therefore, weplan to invest in the analysis of BC subtypes in the future to identify theimpact of these genes on different subtypes. At the same time, through the Cmapdatabase, search for small drug molecules closely related to the disease tofind more ideal molecular markers, and provide new ideas for more accurateclinical treatment methods. In addition, all the data in this study comes fromthe RNA expression profile of human bladder cancer patients provided in thedatabase, which may not fully represent the general expression of genes. Andwhether these genes are really specific to patients with bladder cancer orobserved in other types of cancer is still worth exploring. Therefore, in thefollow-up research, we should combine cell and animal experiments to furtheranalyze, discuss and verify the prediction of this research.


In summary, we findfor the first time that a combination of several molecules can be used as a BCtumor staging marker, thereby providing guidance for clinical treatmentprograms, further improving patient prognosis, and improving survival rate. Inaddition, our results show that COL1A1 can regulate the proliferation,apoptosis, migration and DNA damage of BC cells by affecting the transcriptionof different genes and the expression of signaling pathway proteins. Therefore,COL1A1 can be used as a molecular target for BC treatment to slow the rate of tumordevelopment.

SupplementaryMaterials:The following are available online at

Author Contributions:Wang X and Ding Ydesigned the research ideas; Wang X collected and organized the data. Wang Xand Ding Y analyzed the data; Wang X supplied technical support and wrote themanuscript; Ding Y supervised the whole study.

Funding: Thisstudy was supported by the National Nature Science Foundation of China (grantnumbers 21541006, 61772241) and, the Postgraduate Research & PracticeInnovation Program of Jiangsu Province under Grant SJCX20_0777.

Conflicts of Interest:All authors declare noconflict of interest.

Copyright Statement


©2021 the authors. This article is an open access article  licensed under the terms and conditions of the CREATIVE COMMONS ATTRIBUTION  (CC BY) LICENSE (


1.SiegelRL, Miller KD, Jemal A. Cancer statistics, 2019. CA A Cancer Journal forClinicians, 2019, 69: 7–34.

2.PerezA, Loizaga A, Arceo R, Lacasa I, Rabade A, et al. A Pilot Study on thePotential of RNA-Associated to Urinary Vesicles as a Suitable Non-InvasiveSource for Diagnostic Purposes in Bladder Cancer. Cancers (Basel), 2014,6: 179–192.

3.SanliO, Dobruch J, Knowles MA, Burger M, Alemozaffar M, et al. Bladder cancer. NatureReviews Disease Primers, 2017, 3: 17022.

4.LiS, Liu X, Liu T, Meng X, Yin X, et al. Identification of Biomarkers Correlatedwith the TNM Staging and Overall Survival of Patients with Bladder Cancer. Frontiersin Physiology, 2017, 8: 947.

5.HanafiAR, Jayusman AM, Alfasunu S, Sadewa AH, Pramono D, et al. Serum MiRNA asPredictive and Prognosis Biomarker in Advanced Stage Non-small Cell Lung Cancerin Indonesia. Zhongguo Fei Ai Za Zhi, 2020, 23: 321–332.

6.WongYH, Li CW, Chen BS. Evolution of network biomarkers from early to late stagebladder cancer samples. BioMed Research International, 2014, 2014:159078.

7.IsmailMF, El Boghdady NA, Shabayek MI, Awida HA, Abozeed H. Evaluation and screeningof mRNA S100A genes as serological biomarkers in different stages of bladdercancer in Egypt. Tumour Biology, 2016, 37: 4621–4631.

8.MargelD, Pevsner-Fischer M, Baniel J, Yossepowitch O,Cohen IR. Stress proteins andcytokines are urinary biomarkers for diagnosis and staging of bladder cancer. EuropeanUrology, 2011, 59: 113–119.

9.VagoR, Ravelli A, Bettiga A, Casati S, Lavorgna G, et al. Urine Endocannabinoids asNovel Non-Invasive Biomarkers for Bladder Cancer at Early Stage. Cancers(Basel), 2020, 12:

10.WangX, Ding Y, Wang J, Wu Y. Identification of the Key Factors Related to BladderCancer by lncRNA-miRNA-mRNA Three-Layer Network. Frontiers in Genetics,2019, 10: 1398.

11.HanB, Cui D, Jing Y, Hong Y, Xia S. Estrogen receptor beta (ERbeta) is a novelprognostic marker of recurrence survival in non-muscle-invasive bladder cancerpotentially by inhibiting cadherin switch. World Journal of Urology,2012, 30: 861–867.

12.AhmadiH, Mitra AP, Abdelsayed GA, Cai J, Djaladat H, et al. Principal componentanalysis based pre-cystectomy model to predict pathological stage in patientswith clinical organ-confined bladder cancer. BJU International, 2013,111: E167–172.

13.KouznetsovaVL, Kim E, Romm EL, Zhu A,Tsigelny IF. Recognition of early and late stages ofbladder cancer using metabolites and machine learning. Metabolomics,2019, 15: 94.

14.SmithSC, Baras AS, Dancik G, Ru Y, Ding K-F, et al. A 20-gene model for molecularnodal staging of bladder cancer: development and prospective assessment. TheLancet Oncology, 2011, 12: 137–143.

15.LiCY, Cai JH, Tsai JJP, Wang CCN. Identification of Hub Genes Associated WithDevelopment of Head and Neck Squamous Cell Carcinoma by IntegratedBioinformatics Analysis. Frontiers in Oncology, 2020, 10: 681.

16.LiuH, Sun Y, Tian H, Xiao X, Zhang J, et al. Characterization of long non-codingRNA and messenger RNA profiles in laryngeal cancer by weighted gene co-expressionnetwork analysis. Aging (Albany NY), 2019, 11: 10074–10099.

17.NiemiraM, Collin F, Szalkowska A, Bielska A, Chwialkowska K, et al. MolecularSignature of Subtypes of Non-Small-Cell Lung Cancer by Large-ScaleTranscriptional Profiling: Identification of Key Modules and Genes by WeightedGene Co-Expression Network Analysis (WGCNA). Cancers (Basel), 2019, 12: 37.

18.WangH, Liu J, Li J, Zang D, Wang X, et al. Identification of gene modules and hubgenes in colon adenocarcinoma associated with pathological stage based on WGCNAanalysis. Cancer Genet, 2020, 242: 1–7.

19.IvlievAE, Hoen PACt, Sergeeva MG. Coexpression Network Analysis IdentifiesTranscriptional Modules Related to Proastrocytic Differentiation and SproutySignaling in Glioma. Cancer Research, 2010, 70: 10060–10070.

20.ReinhardJ, Brosicke N, Theocharidis U, Faissner A. The extracellular matrix nichemicroenvironment of neural and cancer stem cells in the brain. TheInternational Journal of Biochemistry & Cell Biology, 2016, 81: 174–183.

21.AlfanoM, Canducci F, Nebuloni M, Clementi M, Montorsi F, et al. The interplay ofextracellular matrix and microbiome in urothelial bladder cancer. NatureReviews Urology, 2016, 13: 77–90.

22.KongD, Chen F, Sima NJO. Focal adhesion kinases crucially regulate TGFβ-inducedmigration and invasion of bladder cancer cells via Src kinase and E-cadherin. OncoTargetsand Therapy, 2017, 10: 1783–1792.

23.ZhangQ, Wang H, Wei H, Zhang D. Focal adhesion kinase (FAK) is associated with poorprognosis in urinary bladder carcinoma. International Journal of Clinicaland Experimental Pathology, 2018, 11: 831–838.

24.WuYH, Chang TH, Huang YF, Huang HD, Chou CY. COL11A1 promotes tumor progressionand predicts poor clinical outcome in ovarian cancer. Oncogene, 2014,33: 3432–3440.

25.WangQ, Yu J. MiR-129-5p suppresses gastric cancer cell invasion and proliferationby inhibiting COL1A1. Biochemistry Cell Biology, 2018, 96: 19–25.

26.Ilhan-MutluA, Siehs C, Berghoff AS, Ricken G, Widhalm G, et al. Expression profiling ofangiogenesis-related genes in brain metastases of lung cancer and melanoma. TumourBiology, 2016, 37: 1173–1182.

27.JangM, Koh I, Lee JE, Lim JY, Cheong JH, et al. Increased extracellular matrixdensity disrupts E-cadherin/beta-catenin complex in gastric cancer cells. BiomaterialsScience, 2018, 6: 2704–2713.

28.ZhangT, Ma Y, Fang J, Liu C, Chen L. A Deregulated PI3K-AKT Signaling Pathway inPatients with Colorectal Cancer. Journal of Gastrointestinal Cancer,2019, 50: 35–41.

29.YangW, Xie T. Hsa_circ_CSPP1/MiR-361-5p/ITGB1 Regulates Proliferation and Migrationof Cervical Cancer (CC) by Modulating the PI3K-Akt Signaling Pathway. ReproductiveSciences, 2020, 27: 132–144.

30.AlowaidiF, Hashimi SM, Alqurashi N, Wood SA, Wei MQ. Cripto-1 overexpression in U87glioblastoma cells activates MAPK, focal adhesion and ErbB pathways. OncologyLetters, 2019, 18: 3399–3406.

31.AiyelabeganHT,Sadroddiny E. Fundamentals of protein and cell interactions in biomaterials.Biomed Pharmacother, 2017, 88: 956–970.

32.LiD, Jiao W, Liang Z, Wang L, Chen Y, et al. Variation in energy metabolismarising from the effect of the tumor microenvironment on cell biologicalbehaviors of bladder cancer cells and endothelial cells. Biofactors,2020, 46: 64–75.

33.ZhuH, Chen H, Wang J, Zhou L, Liu S. Collagen stiffness promotednon-muscle-invasive bladder cancer progression to muscle-invasive bladdercancer. OncoTargets and Therapy, 2019, 12: 3441–3457.

34.MiyakeM, Morizawa Y, Hori S, Tatsumi Y, Onishi S, et al. Diagnostic and prognosticrole of urinary collagens in primary human bladder cancer. Cancer Science,2017, 108: 2221–2228.

35.MoriK, Enokida H, Kagara I, Kawakami K, Chiyomaru T, et al. CpG hypermethylation ofcollagen type I alpha 2 contributes to proliferation and migration activity ofhuman bladder cancer. International Journal of Oncology, 2009, 34: 1593–1602.

36.LiJ, Ding Y, Li AJ. Identification of COL1A1 and COL1A2 as candidate prognosticfactors in gastric cancer. World Journal of Surgical Oncology, 2016, 14:297.

37.OleksiewiczU, Liloglou T, Tasopoulou K, Daskoulidou N, Gosney JR, et al. COL1A1, PRPF40A,and UCP2 correlate with hypoxia markers in non-small cell lung cancer. Journalof Cancer Research and Clinical Oncology, 2017, 143: 1133–1141.

38.ZhangZ, Wang Y, Zhang J, Zhong J, Yang R. COL1A1 promotes metastasis in colorectalcancer by regulating the WNT/PCP pathway. Molecular Medicine Reports,2018, 17: 5037–5042.

39.YuanYH, Wang HY, Lai Y, Zhong W, Liang WL, et al. Epigenetic inactivation of HOXD10is associated with human colon cancer via inhibiting the RHOC/AKT/MAPKsignaling pathway. Cell Communication and Signaling, 2019, 17: 9.

40.MaHP, Chang HL, Bamodu OA, Yadav VK, Huang TY, et al. Collagen 1A1 (COL1A1) Is aReliable Biomarker and Putative Therapeutic Target for HepatocellularCarcinogenesis and Metastasis. Cancers (Basel), 2019, 11: 786.

Download PDF