A simple way to detect disease-associated cellular molecular alterations from mixed-cell blood samples

Hong, Guini; Li, Hongdong; Li, Mengyao; Zheng, Weicheng; Li, Jing; Chi, Meirong; Cheng, Jun; Guo, Zheng

doi:10.1093/bib/bbx009

Abstract

Blood is a promising surrogate for solid tissue to investigate disease-associated molecular biomarkers. However, proportion changes of the constituent cells in the often-used peripheral whole blood (PWB) or peripheral blood mononuclear cell (PBMC) samples may influence the detection of cell-specific alterations under disease states. We propose a simple method, Ref-REO, to detect molecular alterations in leukocytes using the mixed-cell blood samples. The method is based on the predetermined within-sample relative expression orderings (REOs) of genes in purified leukocytes of healthy people. Both the simulated and real mixed-cell blood gene expression profiles were used to evaluate the method. Approximately 99% of the differentially expressed genes (DEGs) detected by Ref-REO in the simulated mixed-cell data are owing to the transcriptional alterations in leukocytes rather than the proportion changes of leukocytes. For the real mixed-cell data, the DEGs detected by Ref-REO in the PBMCs expression data for systemic lupus erythematosus (SLE) patients overlap significantly with the DEGs detected in the expression data of SLE CD4 + T cells and B cells and they are mainly enriched with mRNA editing and interferon-associated genes. The detected DEGs in the PWB data for lung carcinoma patients are significantly enriched with coagulation-associated functional categories that are closely associated with cancer progression. In conclusion, the proposed method is capable of detecting the disease-associated leukocyte-specific molecular alterations, using mixed-cell blood samples, which provides simple, transferable and easy-to-use candidates for disease biomarkers.

leukocyte subtypes, relative expression orderings, cell-mixed

Background

As blood sampling is less invasive and easy to handle, many investigators have attempted to identify disease-associated biomarkers from peripheral whole blood (PWB) and peripheral blood mononuclear cells (PBMCs) [1–3]. It is well known that PWB and PBMCs comprise a heterogeneous population of cells, whose relative proportions may shift under disease states [4, 5]. For example, the proportions of T cells, NK cells and B cells decrease while the proportions of monocytes and neutrophils increase in PWB of head and neck carcinoma, ovarian cancer and rheumatoid arthritis patients [6, 7]. The use of peripheral blood cells to detect disease-associated molecular alterations relies on the natural role of these cells in their immune responses to physiologic and pathologic changes [8]. Unfortunately, the proportion change of leukocyte subtypes introduces additional signals in the expression data when using mixed-cell blood samples, which could influence and even disguise the disease-associated signals in leukocytes themselves [9, 10]. If the absolute measurement values of genes are compared directly between diseased and normal blood samples, some artificial differentially expressed genes (DEGs) will be detected, which have no expression change in any leukocyte subtypes and they are simply owing to the leukocyte proportion change. In our previous studies, we revealed that both the aberrant gene expressions [9] and DNA methylation alterations [10] in PWB of patients with cancer and inflammatory diseases were mainly attributed to the proportion changes of leukocyte subtypes. Therefore, it is necessary to avoid detecting the signals originating from leukocyte subtype proportion changes in the mixed-cell blood samples when detecting disease-associated cellular (leukocyte-specific) molecular alterations.

In developing methods to detect disease-associated cellular molecular changes, some researchers have already taken the influence of leukocyte subtype proportion changes on the overall signals of the mixed-cell blood sample into consideration [7, 11]. According to whether the purified leukocyte subtypes are required as reference data, these methods can be classified into reference-free and reference-dependent methods.

The reference-free methods, such as SVA-PLS (surrogate variable analysis using partial least squares) [12] and RefFreeEWAS (reference-free analysis epigenome-wide association studies) [13], are closely related with surrogate variable analysis, which is based on singular value decomposition [12]. However, on one hand, it is difficult for these methods to eliminate the influence of cell proportion changes that are indeed associated with disease progression. On the other hand, as the statistical models in the methods are population dependent, the detected biomarkers could be impractical for an individual sample in real clinical settings.

For the reference-dependent methods, it is needed to estimate and adjust the proportion of each leukocyte subtype in blood samples using deconvolution algorithms based on the profiles of the purified leukocyte subtypes [7]. These methods are intuitive and have been used to detect risk factors for rheumatoid arthritis [11], systemic lupus erythematosus (SLE) [14], ovarian cancer and head and neck carcinoma using mixed-cell blood samples [15]. However, this type of methods relies on the statistical calculation on the absolute measurement values of genes in the purified leukocyte subtypes, and the detected molecular changes are sensitive to the systematic biases in microarray measurements [16].

It has been reported recently that within-sample relative expression orderings (REOs) of genes are insensitive to the systematic biases in microarray measurements, invariant to monotonic data normalization and robust against inter-individual biological variation of gene expression levels [17, 18]. Here, we propose a new reference-dependent method, Ref-REO, which is based on within-sample REOs of genes in profiles of purified leukocyte subtypes, to detect the disease-associated leukocyte-specific alterations in mixed-cell blood samples. The usage of this method is illustrated through applications to simulated mixed-cell data and real data obtained from PBMCs for SLE patients and PWB for lung carcinoma patients.

Materials and methods

Data sources and data preprocessing

All the data analyzed in this study were downloaded from the Gene Expression Omnibus database (GEO, http://www.ncbi.nlm.nih.gov/geo/). Detailed information for each data set was described in Table 1. Set 1 and Set 2 examined expression profiles of purified leukocyte subtypes extracted from normal control cohorts and they were used as reference data. Set 1 examined 47 expression profiles of nine cell types, including CD16 + CD66b+ neutrophils, CD16-CD66b+ eosinophils, CD14 + monocytes, CD4+ T cells, CD8+ T cells, CD56+ NK cells, CD19+ B cells, CD123+ pDCs and CD11c+ mDCs, which were isolated from blood samples of healthy human and assessed for cell type purity by flow cytometry [19]. Set 2 examined 33 expression profiles for seven cell subsets, including CD16 + CD66b+ neutrophils, CD16 − CD66b+ eosinophils, CD14 + monocytes, CD4+ T cells, CD8+ T cells, CD56+ NK cells and CD19+ B cells, which were obtained from a separate panel of healthy donors at the University Hospital of Geneva [19]. Set 3 examined 61 expression profiles of PBMCs, which were collected from SLE patients in an observational study performed at the University of Michigan, and 20 expression profiles of PBMCs from normal controls [20]. Set 4 examined the expression profiles of CD4+ T cells and B cells collected from peripheral blood of SLE patients and normal controls. Set 5 examined expression profiles of PWB from 73 lung adenocarcinoma and 80 normal controls, which were collected from the Environment and Genetics in Lung Cancer Etiology study [21]. The normalized data were downloaded from GEO, and the original platform annotation file obtained from GEO for each data set was used to annotate the CloneIDs to GeneIDs. The number of genes measured in each data set was shown in Table 1. Totally, 7429 genes were commonly measured in all data sets, which were analyzed in the following study.

Table 1

Data sets analyzed in this study

Data set	Sample characteristic	#Gene	Platform	Accession ID	Reference
Purified normal leukocyte subtypes
Set 1	Sample No: 47	11 241	GPL570	GSE28490	[19]
Set 1	CD16+CD66b+ Neutrophils: 3, CD16-CD66b+ Eosinophils: 4, CD14+ Monocytes: 10, CD4+ T cells: 5, CD8+ T cells: 5, CD56+ NK cells: 5, CD19+ B cells: 5, CD123+ pDCs: 5, CD11c+ mDCs: 5	11 241	GPL570	GSE28490	[19]
Set 2	Sample No: 33	10 689	GPL570	GSE28491	[19]
Set 2	CD16+CD66b+ Neutrophils: 5, CD16−CD66b+ Eosinophils: 3, CD14+ Monocytes: 5, CD4+ T cells: 5, CD8+ T cells: 5, CD56+ NK cells: 5, CD19+ B cells: 5	10 689	GPL570	GSE28491	[19]
SLE
Set 3	PBMCs Sample No: 81	20 283	GPL570	GSE50772	[20]
Set 3	SLE patients: 61, normal controls: 20	20 283	GPL570	GSE50772	[20]
Set 4	Sample No: 34	20 283	GPL570	GSE4588	–
	CD4 + T cells Sample No: 18
	SLE patients: 10, Normal controls: 8
	B cells: 16
	SLE patients: 9, Normal controls: 7
Lung adenocarcinoma
Set 5	PWB Sample No: 153	12 753	GPL571	GSE20189	[21]
Set 5	Lung adenocarcinoma: 80, Controls: 73	12 753	GPL571	GSE20189	[21]

Data set	Sample characteristic	#Gene	Platform	Accession ID	Reference
Purified normal leukocyte subtypes
Set 1	Sample No: 47	11 241	GPL570	GSE28490	[19]
Set 1	CD16+CD66b+ Neutrophils: 3, CD16-CD66b+ Eosinophils: 4, CD14+ Monocytes: 10, CD4+ T cells: 5, CD8+ T cells: 5, CD56+ NK cells: 5, CD19+ B cells: 5, CD123+ pDCs: 5, CD11c+ mDCs: 5	11 241	GPL570	GSE28490	[19]
Set 2	Sample No: 33	10 689	GPL570	GSE28491	[19]
Set 2	CD16+CD66b+ Neutrophils: 5, CD16−CD66b+ Eosinophils: 3, CD14+ Monocytes: 5, CD4+ T cells: 5, CD8+ T cells: 5, CD56+ NK cells: 5, CD19+ B cells: 5	10 689	GPL570	GSE28491	[19]
SLE
Set 3	PBMCs Sample No: 81	20 283	GPL570	GSE50772	[20]
Set 3	SLE patients: 61, normal controls: 20	20 283	GPL570	GSE50772	[20]
Set 4	Sample No: 34	20 283	GPL570	GSE4588	–
	CD4 + T cells Sample No: 18
	SLE patients: 10, Normal controls: 8
	B cells: 16
	SLE patients: 9, Normal controls: 7
Lung adenocarcinoma
Set 5	PWB Sample No: 153	12 753	GPL571	GSE20189	[21]
Set 5	Lung adenocarcinoma: 80, Controls: 73	12 753	GPL571	GSE20189	[21]

Table 1

Data sets analyzed in this study

Data set	Sample characteristic	#Gene	Platform	Accession ID	Reference
Purified normal leukocyte subtypes
Set 1	Sample No: 47	11 241	GPL570	GSE28490	[19]
Set 1	CD16+CD66b+ Neutrophils: 3, CD16-CD66b+ Eosinophils: 4, CD14+ Monocytes: 10, CD4+ T cells: 5, CD8+ T cells: 5, CD56+ NK cells: 5, CD19+ B cells: 5, CD123+ pDCs: 5, CD11c+ mDCs: 5	11 241	GPL570	GSE28490	[19]
Set 2	Sample No: 33	10 689	GPL570	GSE28491	[19]
Set 2	CD16+CD66b+ Neutrophils: 5, CD16−CD66b+ Eosinophils: 3, CD14+ Monocytes: 5, CD4+ T cells: 5, CD8+ T cells: 5, CD56+ NK cells: 5, CD19+ B cells: 5	10 689	GPL570	GSE28491	[19]
SLE
Set 3	PBMCs Sample No: 81	20 283	GPL570	GSE50772	[20]
Set 3	SLE patients: 61, normal controls: 20	20 283	GPL570	GSE50772	[20]
Set 4	Sample No: 34	20 283	GPL570	GSE4588	–
	CD4 + T cells Sample No: 18
	SLE patients: 10, Normal controls: 8
	B cells: 16
	SLE patients: 9, Normal controls: 7
Lung adenocarcinoma
Set 5	PWB Sample No: 153	12 753	GPL571	GSE20189	[21]
Set 5	Lung adenocarcinoma: 80, Controls: 73	12 753	GPL571	GSE20189	[21]

Data set	Sample characteristic	#Gene	Platform	Accession ID	Reference
Purified normal leukocyte subtypes
Set 1	Sample No: 47	11 241	GPL570	GSE28490	[19]
Set 1	CD16+CD66b+ Neutrophils: 3, CD16-CD66b+ Eosinophils: 4, CD14+ Monocytes: 10, CD4+ T cells: 5, CD8+ T cells: 5, CD56+ NK cells: 5, CD19+ B cells: 5, CD123+ pDCs: 5, CD11c+ mDCs: 5	11 241	GPL570	GSE28490	[19]
Set 2	Sample No: 33	10 689	GPL570	GSE28491	[19]
Set 2	CD16+CD66b+ Neutrophils: 5, CD16−CD66b+ Eosinophils: 3, CD14+ Monocytes: 5, CD4+ T cells: 5, CD8+ T cells: 5, CD56+ NK cells: 5, CD19+ B cells: 5	10 689	GPL570	GSE28491	[19]
SLE
Set 3	PBMCs Sample No: 81	20 283	GPL570	GSE50772	[20]
Set 3	SLE patients: 61, normal controls: 20	20 283	GPL570	GSE50772	[20]
Set 4	Sample No: 34	20 283	GPL570	GSE4588	–
	CD4 + T cells Sample No: 18
	SLE patients: 10, Normal controls: 8
	B cells: 16
	SLE patients: 9, Normal controls: 7
Lung adenocarcinoma
Set 5	PWB Sample No: 153	12 753	GPL571	GSE20189	[21]
Set 5	Lung adenocarcinoma: 80, Controls: 73	12 753	GPL571	GSE20189	[21]

Detecting disease-associated cellular alterations from mixed-cell samples

Suppose there are n leukocyte subtypes in a mixed-cell sample. Let

P_{i}

denote the proportion of the i-th leukocyte subtype, then

\sum_{i = 1}^{n} P_{i} = 1

⁠. Suppose there are two genes A and B measured in the normal mixed-cell sample, with expression levels

E_{i}^{A}

⁠,

E_{i}^{B}

in the i-th leukocyte subtypes. Then, the expression levels of genes A and B in the normal mixed-cell sample could be represented by formula (1) and (2), respectively:

E_{n o r m a l}^{A} = \sum_{i = 1}^{n} P_{i} \times E_{i}^{A}

(1)

E_{n o r m a l}^{B} = \sum_{i = 1}^{n} P_{i} \times E_{i}^{B}

(2)

If the REO of genes A and B is

E_{A} > E_{B}

in all leukocyte subtypes, namely,

E_{i}^{A} > E_{i}^{B}

⁠, then the REO of genes A and B in the normal mixed-cell sample should be

E_{A} > E_{B}

⁠, as described in formula (3):

\begin{matrix} Δ E_{n o r m a l}^{A B} = E_{n o r m a l}^{A} - E_{n o r m a l}^{B} \\ = \sum_{i = 1}^{n} P_{i} \times (E_{i}^{A} - E_{i}^{B}) > 0 \end{matrix}

(3)

Similarly, the REO of genes A and B in a mixed-cell sample under a disease state could be represented by formula (4):

\begin{matrix} Δ E_{d i s e a s e}^{A B} = E_{d i s e a s e}^{A} - E_{d i s e a s e}^{B} \\ = \sum_{i = 1}^{n} {P^{'}}_{i} \times (E_{i}^{A'} - E_{i}^{B'}) > 0 \end{matrix}

(4)

Here, ${P^{'}}_{i}$ represents the proportions of the i-th leukocyte subtypes under the disease state, and $E_{i}^{A'}$ ⁠, $E_{i}^{B'}$ represent the expression levels of genes A and B in the i-th leukocyte subtypes under the disease state, respectively.

According to formula (4), if genes A and B have no expression changes in all leukocyte subtypes under the disease state, the REO of genes A and B should also be $E_{A} > E_{B}$ ⁠, no matter how the proportions of leukocyte subtypes change. Consequently, if the REO of genes A and B changed to $E_{A} < E_{B}$ under the disease state, the expression levels of genes A and/or B must be changed in at least one of the leukocyte subtypes.

According to the above deduction, we could detect disease-associated cellular alterations from such gene pairs in mixed-cell blood samples by determining which gene pairs have changed REOs under the disease state, taking the gene pairs with stable REOs in normal leukocyte subtypes as the reference. The flow chart is shown in Figure 1. First, the gene pairs with the identical REOs (e.g. $E_{A} > E_{B}$ ⁠) in all normal leukocyte subtypes are extracted, defined as the reference gene pairs. Second, owing to the limited leukocyte subtypes collected, the mixed-cell normal control samples for studying a disease are used to filter the reference pairs. If the REO of a reference gene pair is maintained in >95% of the mixed-cell normal control samples, then this gene pair is kept for the following analysis. Third, Fisher’s exact test is used to evaluate whether the REOs of the filtered reference pairs are significantly reversal in the disease samples. At last, according to the filtered reference gene pairs and reversed gene pairs, we could identify the genes with altered expression levels in leukocyte subtypes under the disease state. The detailed procedure is described as below.

Figure 1

The flow chart of the Ref-REO method. The Ref-REO algorithm consists of the following four steps: (1) detection of gene pairs with stable REOs in normal leukocyte subtypes; (2) filtration of the stable gene pairs by mixed-cell normal blood samples; (3) extraction of the filtered stable gene pairs, which have reversal REOs in mixed-cell disease blood samples; (4) identification of the DEGs.

Open in new tab Download slide

For a gene A, suppose the REOs of the gene pairs it involves are

E_{A} > E_{B}

(gene B could be any gene except gene A) in the filtered reference gene pairs. If this gene is down-regulated under the disease state, then the REOs of the gene pairs it involves should tend to change to be

E_{A} < E_{B}

⁠. The significant level of down-regulation could be calculated by the hypergeometric distribution model as follows:

P_{d o w n} = 1 - \sum_{i = 0}^{k - 1} \frac{(\begin{matrix} n \\ i \end{matrix}) (\begin{matrix} M - n \\ N - i \end{matrix})}{(\begin{matrix} M \\ N \end{matrix})}

(5)

Here, M represents the number of the filtered reference gene pairs, N represents the number of the reversed pairs under disease state, n represents the number of gene pairs involving gene A with REOs $E_{A} > E_{B}$ in the filtered reference pairs, k represents the number of gene pairs involving gene A with REOs $E_{A} < E_{B}$ in the reversed pairs.

Otherwise, if this gene is up-regulated under the disease state, the gene pairs involving gene A with REOs $E_{A} < E_{B}$ in the filtered reference gene pairs should tend to change to be $E_{A} > E_{B}$ under the disease state. The significant level, $P_{u p}$ ⁠, could be calculated similarly as $P_{d o w n}$ ⁠. At last, the minimum value of $P_{d o w n}$ and $P_{u p}$ is used to determine the dysregulated direction of this gene under the disease state. The significant level is denoted as P. After multiple test correction [22], if the adjusted P-value is <0.05, then the gene is identified as DEG that occurred in leukocytes under the disease state.

Some genes with no changes under the disease state could be identified as DEGs owing to that they frequently appear to be the partners of true DEGs. To solve this problem, the following procedures are used: (1) Calculate the accumulated hypergeometric distribution P-value for each gene based on the filtered reference gene pairs and the reversed pairs, and adjust the P-values for multiple comparisons. (2) Sort all DEGs by the adjusted P-values in ascending order. (3) Select the first DEG (with the smallest adjusted P-value) from the sorted DEG list and remove the pairs involving this gene from the reversed pairs. (4) Repeat the above steps until no DEGs could be found. All the selected genes were the final DEGs.

Simulation experiment

In the simulated experiment, we assumed that there are only two cell types, myeloid and lymphoid cells, in the mixed-cell samples for simplicity. The flow chart of the simulation experiment was shown in Figure 2. First, we classified the expression profiles of purified normal leukocyte subtypes examined in Set 1 into myeloid and lymphoid cell groups. The group of myeloid cells included CD16 + CD66b+ neutrophils, CD16-CD66b+ eosinophils and CD14+ monocyte. The group of lymphoid cells included CD4+ T cells, CD8+ T cells, CD56+ NK cells and CD19+ B cells. Then, for each gene, we calculated the baseline expression level for myeloid and lymphoid cells by averaging its expression levels in each type of cells belonging to each group. To simulate a normal myeloid or lymphoid cell profile, we added a ∼5% increase or decrease in the baseline for each gene as described in formula (6).

{E^{'}}_{i} = (1 \pm 5 % \times r a n d (0, 1)) \times E_{i}

(6)

Figure 2

The flow chart of simulation experiment.

Open in new tab Download slide

Here, $E_{i}$ represents the baseline expression value of Gene i in myeloid or lymphoid cells, and rand(0,1) represents a uniformly distributed random number between 0 and 1, and ${E^{'}}_{i}$ represents the simulated value for gene i in myeloid or lymphoid cells.

Finally, 50 simulated normal lymphoid cell profiles and 50 simulated myeloid cell profiles were created, respectively. Given that the mixed-cell normal samples were composed of 50% lymphoid cells and 50% myeloid cells, the expression level of a gene in a mixed-cell normal profile could be represented by formula (7).

E_{i, j} = 50 % \times E_{i, j}^{M} + 50 % \times E_{i, j}^{L}

(7)

Here, $E_{i, j}$ represents the expression level of the i-th gene in the j-th simulated normal mixed-cell normal profile, $E_{i, j}^{M}$ represents the expression level of the i-th gene in the j-th simulated myeloid cell profiles and $E_{i, j}^{L}$ represents the expression level of the i-th gene in the j-th simulated lymphoid cell profiles.

Then, to simulate mixed-cell disease profiles that not only have cell proportion shifts but also have expression changes under the disease state, we randomly select and change the expression levels of 500 genes with 2∼3-fold increases or decreases in myeloid cells as the DEGs under the disease state. In the same way, we also randomly selected 500 genes as the DEGs for lymphoid cells. These two gene lists shared 30 genes. Then, we constructed mixed-cell disease profiles with different cell proportions by increasing the proportion of myeloid cells in mixed-cell samples from 50% to 80% and correspondingly decreasing the proportion of lymphoid cells from 50% to 20% with a step of 5%.

Functional pathway enrichment

To analyze the disease-associated genes identified from mixed-cell blood samples, 1330 canonical pathways collected in the C2 gene sets of MSigDB (The Molecular Signatures Database, http://software.broadinstitute.org/gsea/msigd) were downloaded. The hypergeometric distribution model was used to test whether the disease-associated genes observed in canonical pathways were significantly more than what was expected by random chance.

Results

Performance evaluation of Ref-REO using simulated data

As described in the ‘Material and Methods’ section, we generated simulation data for 50 normal myeloid cell profiles, 50 normal lymphoid cell profiles and 50 mixed-cell normal profiles with 50% myeloid cells and 50% lymphoid cells. To evaluate the influence of cell proportion changes on expression signals in mixed-cell samples, we constructed mixed-cell samples of which only cell proportions were changed, i.e. no changes in expression values in leukocytes, by increasing the proportion of myeloid cells in mixed-cell samples from 50% to 80% and decreasing the proportion of lymphoid cells from 50% to 20% correspondingly with a step of 5%. In each run of the simulation experiments, 50 mixed-cell expression profiles were constructed and DEGs were detected through the comparison with the mixed-cell normal samples using the Student’s t-test with false discovery rate (FDR) < 5%. The simulation result shows that the number of DEGs increases as the proportion of myeloid cells increases (Figure 3). For example, when the proportion of myeloid cells increased to 80% in the mixed-cell disease samples, 6180 DEGs were detected between the mixed-cell samples and the normal samples. These DEGs were all significantly differentially expressed between the simulated normal lymphoid cells and myeloid cells. This result indicates that cell proportion changes could distort the overall expression signals in mixed-cell samples [10]. In contrast, for the 15 361 780 gene pairs with stable REOs in both simulated normal myeloid cells and lymphoid cells, their relative ranks did not change in the mixed-cell samples, which indicates that the changes in cell proportions alone could not affect the stable REOs in all leukocyte subtypes in the mixed-cell samples.

Figure 3

Number of DEGs detected from mixed-cell samples with different proportions of myeloid cells. The bars plot shows the number of DEGs (y-axis) detected from simulated mixed-cell samples with different proportions of myeloid cells (x-axis) by Student’s t-test.

Open in new tab Download slide

Further simulation experiments were performed on the simulated mixed-cell disease profiles with changes in both the cell proportions and expression values. We selected 500 genes as true DEGs for myeloid and 500 genes for lymphoid cells (970 genes in total, because 30 genes were shared by the two cell types) and changed their values. In addition, we adjusted the cell proportions in the mixed-cell samples. Then, with FDR < 5%, DEGs between the mixed-cell disease samples and mixed-cell normal samples were detected using the Student’s t-test, the reference-free [13], the reference-dependent [7] and Ref-REO methods, respectively. The reference-free and reference-dependent methods developed by Houseman et al. were performed with the default parameter settings. For the reference-dependent method, the top 100 genes with the most varied expression levels between the simulated gene expression profiles of myeloid and lymphoid cells were used as the cell signatures to estimate the cell proportions. As shown in Table 2, when there is no cell proportion change under the disease state, the four methods all performed well with high rates of precision and recall (above 95%) for DEG detection. However, when the proportion of myeloid cells increased only slightly from 50% to 55%, though the recalls of detecting DEGs using the Student’s t-test and the reference-free method remained high, the precisions decreased sharply to below 50%. When the proportion of myeloid cells increased to 80%, the numbers of DEGs detected by the two methods were all >6000 but their precisions were only 14.72% and 14.71%, respectively. Although the reference-dependent method performed better than the Student’s t-test and reference-free methods, its precision also decreased sharply to 67.48% when the proportion of myeloid cells slightly increased from 50% to 55%, as shown in Table 2. In contrast, no matter how many the proportions of myeloid and lymphoid cells were changed, the precision of the Ref-REO method remained above 99%, with only a small sacrifice in recall as the proportion of myeloid cells increased.

Table 2

Performance comparison of Ref-REO and other three methods

Proportion of myeloid cells	Student’s t-test	Reference-free	Reference dependent	Ref-REO
0.50	966 (100%, 99.59%)^a	966 (100%, 99.59%)	966 (99.90%, 99.48%)	927 (99.89%, 95.46%)
0.55	2167 (44.62%, 99.69%)	2249 (43.00%, 99.69%)	1230 (67.48%, 85.57%)	917 (99.78%, 94.33%)
0.60	4307 (22.48%, 99.90%)	4382 (22.11%, 99.90%)	665 (54.74%, 37.53%)	918 (99.56%, 94.23%)
0.65	5330 (18.14%, 99.69%)	5305 (18.23%, 99.69%)	403 (40.94%, 17.01%)	897 (99.67%, 92.16%)
0.70	5874 (16.43%, 99.48%)	5897 (16.36%, 99.48%)	267 (34.83%, 9.59%)	876 (99.89%, 90.21%)
0.75	6157 (15.46%, 98.04%)	6174 (15.40%, 98.04%)	188 (27.66%, 5.36%)	819 (99.88%, 84.33%)
0.80	6324 (14.72%, 95.98%)	6314 (14.71%, 95.98%)	128 (25.78%, 3.40%)	735 (100%, 75.77%)

Proportion of myeloid cells	Student’s t-test	Reference-free	Reference dependent	Ref-REO
0.50	966 (100%, 99.59%)^a	966 (100%, 99.59%)	966 (99.90%, 99.48%)	927 (99.89%, 95.46%)
0.55	2167 (44.62%, 99.69%)	2249 (43.00%, 99.69%)	1230 (67.48%, 85.57%)	917 (99.78%, 94.33%)
0.60	4307 (22.48%, 99.90%)	4382 (22.11%, 99.90%)	665 (54.74%, 37.53%)	918 (99.56%, 94.23%)
0.65	5330 (18.14%, 99.69%)	5305 (18.23%, 99.69%)	403 (40.94%, 17.01%)	897 (99.67%, 92.16%)
0.70	5874 (16.43%, 99.48%)	5897 (16.36%, 99.48%)	267 (34.83%, 9.59%)	876 (99.89%, 90.21%)
0.75	6157 (15.46%, 98.04%)	6174 (15.40%, 98.04%)	188 (27.66%, 5.36%)	819 (99.88%, 84.33%)
0.80	6324 (14.72%, 95.98%)	6314 (14.71%, 95.98%)	128 (25.78%, 3.40%)	735 (100%, 75.77%)

^aThe number outside the brackets represents the number of detected DEGs, and the percentages inside the bracket represent the precision and recall rate respectively.

Table 2

Performance comparison of Ref-REO and other three methods

Proportion of myeloid cells	Student’s t-test	Reference-free	Reference dependent	Ref-REO
0.50	966 (100%, 99.59%)^a	966 (100%, 99.59%)	966 (99.90%, 99.48%)	927 (99.89%, 95.46%)
0.55	2167 (44.62%, 99.69%)	2249 (43.00%, 99.69%)	1230 (67.48%, 85.57%)	917 (99.78%, 94.33%)
0.60	4307 (22.48%, 99.90%)	4382 (22.11%, 99.90%)	665 (54.74%, 37.53%)	918 (99.56%, 94.23%)
0.65	5330 (18.14%, 99.69%)	5305 (18.23%, 99.69%)	403 (40.94%, 17.01%)	897 (99.67%, 92.16%)
0.70	5874 (16.43%, 99.48%)	5897 (16.36%, 99.48%)	267 (34.83%, 9.59%)	876 (99.89%, 90.21%)
0.75	6157 (15.46%, 98.04%)	6174 (15.40%, 98.04%)	188 (27.66%, 5.36%)	819 (99.88%, 84.33%)
0.80	6324 (14.72%, 95.98%)	6314 (14.71%, 95.98%)	128 (25.78%, 3.40%)	735 (100%, 75.77%)

Proportion of myeloid cells	Student’s t-test	Reference-free	Reference dependent	Ref-REO
0.50	966 (100%, 99.59%)^a	966 (100%, 99.59%)	966 (99.90%, 99.48%)	927 (99.89%, 95.46%)
0.55	2167 (44.62%, 99.69%)	2249 (43.00%, 99.69%)	1230 (67.48%, 85.57%)	917 (99.78%, 94.33%)
0.60	4307 (22.48%, 99.90%)	4382 (22.11%, 99.90%)	665 (54.74%, 37.53%)	918 (99.56%, 94.23%)
0.65	5330 (18.14%, 99.69%)	5305 (18.23%, 99.69%)	403 (40.94%, 17.01%)	897 (99.67%, 92.16%)
0.70	5874 (16.43%, 99.48%)	5897 (16.36%, 99.48%)	267 (34.83%, 9.59%)	876 (99.89%, 90.21%)
0.75	6157 (15.46%, 98.04%)	6174 (15.40%, 98.04%)	188 (27.66%, 5.36%)	819 (99.88%, 84.33%)
0.80	6324 (14.72%, 95.98%)	6314 (14.71%, 95.98%)	128 (25.78%, 3.40%)	735 (100%, 75.77%)

^aThe number outside the brackets represents the number of detected DEGs, and the percentages inside the bracket represent the precision and recall rate respectively.

The results suggest that the Ref-REO method is able to detect cellular alterations in mixed-cell samples effectively while the DEGs introduced by the cell proportion changes can be excluded.

SLE associated cellular alterations in PBMCs

Using the purified leukocyte subtype profiles examined in Set 1 and Set 2, 8 037 062 gene pairs were identified, which have stable REOs in >95% samples and they were used as the reference pairs in the following analysis. Among them, 7 035 262 gene pairs were also stable in >95% of the normal control PBMC profiles examined in Set 3, which was unlikely to happen by chance (binomial distribution test, P-value = 2.2 × 10 ⁻ ¹⁶) and they were used as the filtered reference pairs for analyzing PBMC samples. Among these filtered reference pairs, 59 624 showed significantly reversal REOs in the SLE PBMC profiles examined in Set 3 (Fisher’s exact test, FDR < 5%). Using the filtered reference pairs and reversed pairs, 1301 DEGs were detected by Ref-REO with FDR <5% (Supplementary Table S1).

To validate whether the DEGs detected by Ref-REO were leukocyte cell-specific, we compared them with the DEGs detected for leukocyte subtypes. As gene expression data for SLE leukocyte subtypes were limited, we only analyzed the SLE CD4+ T cells and B cells (Set 4). By using the Student’s t-test with FDR <5%, 271 and 147 DEGs were detected, respectively, in the two type of cells. Among the 271 DEGs detected in SLE CD4+ T cells, 72 were also detected in SLE PBMCs by Ref-REO, which was unlikely to be observed by chance (hypergeometric distribution test, P-value = 9.76 × 10 ⁻ ⁵). Similarly, among the 147 DEGs detected in SLE B cells, 44 DEGs were also detected in SLE PBMCs by Ref-REO, which was unlikely to happen by chance (hypergeometric distribution test, P-value = 1.32 × 10 ⁻ ⁴). Notably, although the DEGs detected in both SLE PMBC and CD4 + T cells/B-cells accounted for only approximately 30% of the DEGs detected in SLE CD4 + T cells/B-cells, the former DEGs had greater expression alterations than those missed by Ref-REO from PBMCs. For the DEGs detected in CD4 + T cells, the median difference in expression levels between normal and SLE samples was 0.95, significantly greater than 0.67, which was calculated from the remaining DEGs (Wilcoxon rank sum test, P-value = 3.01 × 10 ⁻ ⁴). Similarly, for the DEGs detected in B cells, the median difference in expression levels of the 44 DEGs was 1.21, significantly higher than 0.70 for the remaining DEGs (Wilcoxon rank sum test, P-value = 1.32 × 10 ⁻ ⁴). The result indicates that the Ref-REO method tends to detect the cellular expression alterations with larger magnitude.

Furthermore, we performed functional enrichment analysis for the 1301 SLE PBMC cellular DEGs. It was found that they were mainly enriched in the interferon and mRNA-editing-associated functional modules (Table 3). The interferon associated genes have been reported to be significantly associated with SLE development and progression [23], and aberrant expressions of these genes have been found in CD19+ B cells, CD3 + CD4+ T cells and CD33+ myeloid cells [24]. Aberrant mRNA editing is also thought to be involved in the pathogenesis of SLE. For example, increased FAS mRNA editing mutation in T cells in the pathogenesis of SLE could defect the FAS/FASL system which plays a central role in maintaining peripheral immune tolerance [25], and 3ʹ-untranslated region splice variant with reduced mRNA stability has been reported to express in increased amounts in SLE T cells [26]. The application to SLE PBMC data further supports that Ref-REO is able to detect disease-associated cellar alterations using the mixed-cell blood samples.

Table 3

Functional modules enriched by the DEGs detected from SLE PBMC

Functional module	Gene number^a	Hits	P-value
Antiviral mechanism by interferon-stimulated genes	50	18	6.64 × 10⁻⁴
mRNA splicing	77	25	4.08 × 10⁻⁴
Interferon alpha/beta signaling	38	17	4.03 × 10⁻⁵
Interferon signaling	99	31	1.80 × 10⁻⁴
Processing of capped intron-containing pre-mRNA	100	30	5.23 × 10⁻⁴

Functional module	Gene number^a	Hits	P-value
Antiviral mechanism by interferon-stimulated genes	50	18	6.64 × 10⁻⁴
mRNA splicing	77	25	4.08 × 10⁻⁴
Interferon alpha/beta signaling	38	17	4.03 × 10⁻⁵
Interferon signaling	99	31	1.80 × 10⁻⁴
Processing of capped intron-containing pre-mRNA	100	30	5.23 × 10⁻⁴

^aRepresented the number of genes involved in functional modules.

Table 3

Functional modules enriched by the DEGs detected from SLE PBMC

Functional module	Gene number^a	Hits	P-value
Antiviral mechanism by interferon-stimulated genes	50	18	6.64 × 10⁻⁴
mRNA splicing	77	25	4.08 × 10⁻⁴
Interferon alpha/beta signaling	38	17	4.03 × 10⁻⁵
Interferon signaling	99	31	1.80 × 10⁻⁴
Processing of capped intron-containing pre-mRNA	100	30	5.23 × 10⁻⁴

Functional module	Gene number^a	Hits	P-value
Antiviral mechanism by interferon-stimulated genes	50	18	6.64 × 10⁻⁴
mRNA splicing	77	25	4.08 × 10⁻⁴
Interferon alpha/beta signaling	38	17	4.03 × 10⁻⁵
Interferon signaling	99	31	1.80 × 10⁻⁴
Processing of capped intron-containing pre-mRNA	100	30	5.23 × 10⁻⁴

^aRepresented the number of genes involved in functional modules.

Lung adenocarcinoma-associated cellular alterations in PWB

We applied the Ref-REO method to the lung adenocarcinoma PWB profiles examined in Set 5 to detect lung adenocarcinoma-associated cellular alterations. The 8 037 062 reference gene pairs extracted from the purified leukocyte subtype profiles were filtered by the normal control PWB profiles in Set 5. At last, 6 282 742 gene pairs were analyzed as filtered reference pairs for PWB, which had stable REOs in >95% of the normal control PWB profiles. Evaluating these filtered reference pairs in the lung adenocarcinoma PWB profiles, we found that the REOs of 256 gene pairs were significantly reversed in the lung adenocarcinoma PWB profiles. With FDR < 5%, 22 DEGs were detected (Table 4). Some of these DEGs have been reported to be closely associated with lung cancer development. For example, CLU, ITGA2B and SPARC are prognostic factors of overall survival for lung cancer [27–29]; SLEP has been considered as therapy target for lung cancer, as the promoter activities of this gene are increased in lung cancer cells as SF1 mutation [30]. TSTA3 (tissue-specific transplantation antigen P35B), which has been reported to have consistent change in lung cancer and blood samples, could be used as candidate lung cancer diagnostic biomarker [2]. We also found that among the detected 22 DEGs, 12 genes were significantly differentially expressed in lung tumor tissues (Table 3). The tissue data were collected from GEO (Accession No: GSE19804), which examined 60 lung tumor tissues and adjacent normal controls [31]. The result of functional enrichment analysis showed that these genes were enriched in the coagulation-associated functional modules, including ‘Homeostasis (P-value = 3.16 × 10 ⁻ ⁵)’, ‘Response to elevated platelet cytosolic Ca²⁺(P-value = 2.08 × 10 ⁻ ⁵)’ and ‘Platelet activation, signaling and aggregation (P-value = 3.99 × 10 ⁻ ⁶)’. In the literature, more and more studies have reported that coagulation disorders are commonly the first sign of malignancy, which also take place in patients with lung cancer [32]. It has now been established that cancer development leads to an increased risk of thrombosis, and conversely, excessive activation of blood coagulation profoundly influences cancer progression. Coagulation could facilitate tumor progression through the release of platelet granule contents, inhibition of natural killer cells and recruitment of macrophages [32–34]. These results again show the capability of the Ref-REO method in detecting the disease-associated cellular alterations using the mixed-cell samples.

Table 4

DEGs detected from lung adenocarcinoma PWB in cancer tissue

Gene ID	Symbol	P-value*	Gene ID	Symbol	P-value*
Up-regulated in lung adenocarcinoma PWB
DEGs in lung cancer tissue			Non-DEGs in lung cancer tissue
1191	CLU	<2.20 × 10⁻¹⁶	6678	SPARC	<2.20 × 10⁻¹⁶
3674	ITGA2B	<2.20 × 10⁻¹⁶	7168	TPM1	<2.20 × 10⁻¹⁶
5577	PRKAR2B	<2.20 × 10⁻¹⁶	6478	SIAH2	2.13 × 10⁻⁹
6403	SELP	<2.20 × 10⁻¹⁶	10 857	PGRMC1	3.26 × 10⁻²
6583	SLC22A4	<2.20 × 10⁻¹⁶	267	AMFR	5.54 × 10⁻⁷
7264	TSTA3	<2.20 × 10⁻¹⁶	5742	PTGS1	8.36 × 10⁻⁷
2791	GNG11	1.70 × 10⁻¹⁰	80 896	NPL	1.03 × 10⁻²
5594	MAPK1	1.01 × 10⁻⁵
29 780	PARVB
Down-regulated in lung adenocarcinoma PWB
DEGs in lung cancer tissue			Non-DEGs in lung cancer tissue
11 214	AKAP13	2.09 × 10⁻⁴	57 062	DDX24	2.62 × 10⁻⁴
1108	CHD4	2.00 × 10⁻⁴	7536	SF1	2.70 × 10⁻⁴
23 215	PRRC2C	4.01 × 10⁻⁷	56 987	BBX	1.40 × 10⁻²

Gene ID	Symbol	P-value*	Gene ID	Symbol	P-value*
Up-regulated in lung adenocarcinoma PWB
DEGs in lung cancer tissue			Non-DEGs in lung cancer tissue
1191	CLU	<2.20 × 10⁻¹⁶	6678	SPARC	<2.20 × 10⁻¹⁶
3674	ITGA2B	<2.20 × 10⁻¹⁶	7168	TPM1	<2.20 × 10⁻¹⁶
5577	PRKAR2B	<2.20 × 10⁻¹⁶	6478	SIAH2	2.13 × 10⁻⁹
6403	SELP	<2.20 × 10⁻¹⁶	10 857	PGRMC1	3.26 × 10⁻²
6583	SLC22A4	<2.20 × 10⁻¹⁶	267	AMFR	5.54 × 10⁻⁷
7264	TSTA3	<2.20 × 10⁻¹⁶	5742	PTGS1	8.36 × 10⁻⁷
2791	GNG11	1.70 × 10⁻¹⁰	80 896	NPL	1.03 × 10⁻²
5594	MAPK1	1.01 × 10⁻⁵
29 780	PARVB
Down-regulated in lung adenocarcinoma PWB
DEGs in lung cancer tissue			Non-DEGs in lung cancer tissue
11 214	AKAP13	2.09 × 10⁻⁴	57 062	DDX24	2.62 × 10⁻⁴
1108	CHD4	2.00 × 10⁻⁴	7536	SF1	2.70 × 10⁻⁴
23 215	PRRC2C	4.01 × 10⁻⁷	56 987	BBX	1.40 × 10⁻²

*

The P-value is calculated by accumulated hypergeometric distribution.

Table 4

DEGs detected from lung adenocarcinoma PWB in cancer tissue

Gene ID	Symbol	P-value*	Gene ID	Symbol	P-value*
Up-regulated in lung adenocarcinoma PWB
DEGs in lung cancer tissue			Non-DEGs in lung cancer tissue
1191	CLU	<2.20 × 10⁻¹⁶	6678	SPARC	<2.20 × 10⁻¹⁶
3674	ITGA2B	<2.20 × 10⁻¹⁶	7168	TPM1	<2.20 × 10⁻¹⁶
5577	PRKAR2B	<2.20 × 10⁻¹⁶	6478	SIAH2	2.13 × 10⁻⁹
6403	SELP	<2.20 × 10⁻¹⁶	10 857	PGRMC1	3.26 × 10⁻²
6583	SLC22A4	<2.20 × 10⁻¹⁶	267	AMFR	5.54 × 10⁻⁷
7264	TSTA3	<2.20 × 10⁻¹⁶	5742	PTGS1	8.36 × 10⁻⁷
2791	GNG11	1.70 × 10⁻¹⁰	80 896	NPL	1.03 × 10⁻²
5594	MAPK1	1.01 × 10⁻⁵
29 780	PARVB
Down-regulated in lung adenocarcinoma PWB
DEGs in lung cancer tissue			Non-DEGs in lung cancer tissue
11 214	AKAP13	2.09 × 10⁻⁴	57 062	DDX24	2.62 × 10⁻⁴
1108	CHD4	2.00 × 10⁻⁴	7536	SF1	2.70 × 10⁻⁴
23 215	PRRC2C	4.01 × 10⁻⁷	56 987	BBX	1.40 × 10⁻²

Gene ID	Symbol	P-value*	Gene ID	Symbol	P-value*
Up-regulated in lung adenocarcinoma PWB
DEGs in lung cancer tissue			Non-DEGs in lung cancer tissue
1191	CLU	<2.20 × 10⁻¹⁶	6678	SPARC	<2.20 × 10⁻¹⁶
3674	ITGA2B	<2.20 × 10⁻¹⁶	7168	TPM1	<2.20 × 10⁻¹⁶
5577	PRKAR2B	<2.20 × 10⁻¹⁶	6478	SIAH2	2.13 × 10⁻⁹
6403	SELP	<2.20 × 10⁻¹⁶	10 857	PGRMC1	3.26 × 10⁻²
6583	SLC22A4	<2.20 × 10⁻¹⁶	267	AMFR	5.54 × 10⁻⁷
7264	TSTA3	<2.20 × 10⁻¹⁶	5742	PTGS1	8.36 × 10⁻⁷
2791	GNG11	1.70 × 10⁻¹⁰	80 896	NPL	1.03 × 10⁻²
5594	MAPK1	1.01 × 10⁻⁵
29 780	PARVB
Down-regulated in lung adenocarcinoma PWB
DEGs in lung cancer tissue			Non-DEGs in lung cancer tissue
11 214	AKAP13	2.09 × 10⁻⁴	57 062	DDX24	2.62 × 10⁻⁴
1108	CHD4	2.00 × 10⁻⁴	7536	SF1	2.70 × 10⁻⁴
23 215	PRRC2C	4.01 × 10⁻⁷	56 987	BBX	1.40 × 10⁻²

*

The P-value is calculated by accumulated hypergeometric distribution.

Discussion

One important challenge in the development of blood-borne biomarkers is to discover the disease-associated signals as opposed to the signals introduced by the cell proportion changes [8]. A novel method, Ref-REO, based on the within-sample REOs of genes was proposed to detect the disease-associated cellular alterations using the mixed-cell blood samples. This method first specified the stable within-sample REOs obtained from the purified leukocyte subtypes as reference, then identified gene pairs that had reversal REOs in the mixed-cell blood samples compared with the reference and finally detected DEGs based on the stable and reversed gene pairs. Results suggested that this method could effectively detect the disease-associated cellular alterations in the simulated mixed-cell data, real SLE PBMC and lung adenocarcinoma PWB data.

However, some genes may be undetected with cellular alterations in leukocytes. (1) The differential signals of some genes may be covered in the mixed-cell samples, as the cell proportion changes are limited. In the simulated data, we found that the true DEGs detected in lymphoid cells decreased from 94% (470/500) to 51% (255/500) when the proportion of lymphoid was decreased from 50% to 20%. (2) The differential signals of some genes may be diluted in the mixed-cell samples if they have opposite dysregulated directions in different cell types under the disease state. (3) The genes with weak expression changes may be missed, as weak expression changes may not result in significant REO changes. This is especially the case for the lowly expressed genes, as the REO for a lowly expressed gene and a non-lowly expressed gene is mainly determined by the non-lowly expressed gene. For gene pairs comprising two lowly expressed genes, REOs are easily influenced by the systematic variations of microarray measurements, and thus, such gene pairs are unlikely to from stable gene pairs in all normal samples. As shown in Figure 4, lowly expressed genes did not tend to be detected as DEGs for SLE by Ref-REO. On the other extreme, high-expressed genes were also less likely to be detected as DEGs by Ref-REO, which could be explained by the possibility that the highly expressed genes are likely to be involved in some functionally conserved pathways [35] and thus their expression levels are relatively stable. Nevertheless, the DEGs detected by our method, which tend to have greater changes in cell types, are more reliable as the risk factor of diseases. As shown in the ‘Results’ section, the DEGs detected in SLE PBMCs have greater expression alterations in CD4+ T cells and B cells than those missed by our method.

Figure 4

The expression level distribution of DEGs detected for SLE. The x-axis represents the gene expression levels in normal control samples. The Y-axis represents the gene expression levels in SLE samples. Gray circle represents all genes. Black circle represents the DEGs detected by the Student’s t-test. Red circle represents the DEGs detected by Ref-REO.

Open in new tab Download slide

In another perspective, unlike other reference-based methods, there is no need for the Ref-REO method to make estimation and adjustment on the proportion of each cell type in the mixed-cell samples [12, 13], and neither are complex statistical models needed. The method is easy for clinic use and validation. Furthermore, like k-TSP [36] and RankComp [18], our method is based on the REOs of genes only, which makes it transferable to other studies to discover disease diagnostic biomarkers or individual analysis. In conclusion, our method provides a new way to detect the disease-associated leukocyte molecular alterations in mixed-cell blood samples.

Key Points

Cell proportion changes in mixed-cell samples may distort the expression signals.
Within-sample relative expression orderings of genes in profiles of purified leukocyte are used to detect the disease-associated cellular alterations in mixed-cell blood samples.
The Ref-REO method is effective, transferable and easy-to-use for the detection of candidate biomarkers from mixed-cell blood samples.

Supplementary Data

Supplementary data are available online at http://bib.oxfordjournals.org/.

Funding

National Natural Science Foundation of China (grant no. 81501215, 81501829, 81372213, 81572935; in part); the Natural Science Foundation of Fujian Province, China (grant no. 2016J01706; in part); and the Joint Fund for Program of Science and Technology Innovation of Fujian Province, China (grant no. 2016Y9102).

Guini Hong, PhD, is a lecturer of Bioinformatics at Fujian Medical University, China. Her research focuses on the functional enrichment analysis of omics data and the biomarker discovery for cancer.

Hongdong Li, PhD, is a lecturer of Bioinformatics at Fujian Medical University, China. His research focuses on investigating the relationship between aging and cancer.

Mengyao Li is a postgraduate student of Bioinformatics at Fujian Medical University, China. Her research focuses on the tumor metastasis.

Weicheng Zheng is a lecturer of Bioinformatics at Fujian Medical University, China. His research focuses on constructing databases of omics data and subtype identification for cancers.

Jing Li, PhD, is a lecturer of Bioinformatics at Fujian Medical University, China. Her research focuses on small RNAs and cancer-related pathway analysis.

Meirong Chi is a postgraduate student in Bioinformatics at Fujian Medical University, China. Her research focuses on investigating the relationship between inflammation and cancer.

Jun Cheng is a postgraduate of Bioinformatics at Fujian Medical University, China. His research focuses on investigating the influence of the proportion of tumor cells on cancer biomarker detection.

Zheng Guo, PhD, is a professor of Bioinformatics at Fujian Medical University. His research interests include investigating complex diseases at the functional module level and translational medicine.

References

1

Tudoran

O

,

Virtic

O

,

Balacescu

L

, et al.

Differential peripheral blood gene expression profile based on Her2 expression on primary tumors of breast cancer patients

.

PLoS One

2014

;

9

:

e102764.

2

Rotunno

M

,

Hu

N

,

Su

H

, et al.

A gene expression signature from peripheral whole blood for stage I lung adenocarcinoma

.

Cancer Prev Res (Phila)

2011

;

4

:

1599

–

608

.

3

Hausler

SF

,

Keller

A

,

Chandran

PA

, et al.

Whole blood-derived miRNA profiles as potential new tools for ovarian cancer screening

.

Br J Cancer

2010

;

103

:

693

–

700

.

4

Reinius

LE

,

Acevedo

N

,

Joerink

M

, et al.

Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility

.

PLoS One

2012

;

7

:

e41361.

5

Li

H

,

Guo

Z

,

Guo

Y

, et al.

Common DNA methylation alterations of Alzheimer's disease and aging in peripheral whole blood

.

Oncotarget

2016

;

7

:

19089

–

98

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

6

Cho

H

,

Hur

HW

,

Kim

SW

, et al.

Pre-treatment neutrophil to lymphocyte ratio is elevated in epithelial ovarian cancer and predicts survival after treatment

.

Cancer Immunol Immunother

2009

;

58

:

15

–

23

.

7

Houseman

EA

,

Accomando

WP

,

Koestler

DC

, et al.

DNA methylation arrays as surrogate measures of cell mixture distribution

.

BMC Bioinformatics

2012

;

13

:

86.

8

LaBreche

HG

,

Nevins

JR

,

Huang

E.

Integrating factor analysis and a transgenic mouse model to reveal a peripheral blood predictor of breast tumors

.

BMC Med Genomics

2011

;

4

:

61.

9

Hong

G

,

Chen

B

,

Li

H

, et al.

Similar source of differential blood mRNAs in lung cancer and pulmonary inflammatory diseases: calls for improved strategy for identifying cancer-specific biomarkers

.

PLoS One

2014

;

9

:

e108104.

10

Li

H

,

Zheng

T

,

Chen

B

, et al.

Similar blood-borne DNA methylation alterations in cancer and inflammatory diseases determined by subpopulation shifts in peripheral leukocytes

.

Br J Cancer

2014

;

111

:

525

–

31

.

11

Liu

Y

,

Aryee

MJ

,

Padyukov

L

, et al.

Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis

.

Nat Biotechnol

2013

;

31

:

142

–

7

.

12

Chakraborty

S

,

Datta

S

,

Datta

S.

Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies

.

Bioinformatics

2012

;

28

:

799

–

806

.

13

Houseman

EA

,

Molitor

J

,

Marsit

CJ.

Reference-free cell mixture adjustments in analysis of DNA methylation data

.

Bioinformatics

2014

;

30

:

1431

–

9

.

14

Abbas

AR

,

Wolslegel

K

,

Seshasayee

D

, et al.

Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus

.

PLoS One

2009

;

4

:

e6098.

15

Langevin

SM

,

Houseman

EA

,

Accomando

WP

, et al.

Leukocyte-adjusted epigenome-wide association studies of blood from solid tumor patients

.

Epigenetics

2014

;

9

:

884

–

95

.

16

Leek

JT

,

Scharpf

RB

,

Bravo

HC

, et al.

Tackling the widespread and critical impact of batch effects in high-throughput data

.

Nat Rev Genet

2010

;

11

:

733

–

9

.

17

Qi

L

,

Chen

L

,

Li

Y

, et al.

Critical limitations of prognostic signatures based on risk scores summarized from gene expression levels: a case study for resected stage I non-small-cell lung cancer

.

Brief Bioinform

2016

;

17

:

233

–

42

.

18

Wang

H

,

Sun

Q

,

Zhao

W

, et al.

Individual-level analysis of differential expression of genes and pathways for personalized medicine

.

Bioinformatics

2015

;

31

:

62

–

8

.

19

Allantaz

F

,

Cheng

DT

,

Bergauer

T

, et al.

Expression profiling of human immune cell subsets identifies miRNA-mRNA regulatory relationships correlated with cell type specific expression

.

PLoS One

2012

;

7

:

e29979.

20

Kennedy

WP

,

Maciuca

R

,

Wolslegel

K

, et al.

Association of the interferon signature metric with serological disease manifestations but not global activity scores in multiple cohorts of patients with SLE

.

Lupus Sci Med

2015

;

2

:

e000080.

21

Yuan

J.

[Pathogenesis of Alzheimer's disease]

.

Zhonghua Yi Xue Za Zhi

1990

;

70

:

429

–

30

, 430.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

22

Benjamini

Y

,

Hochberg

Y.

Controlling the false discovery rate: a practical and powerful approach to multiple testing

.

J R Stat Soc Series B

1995

;

57

:

289

–

300

.

Google Scholar

OpenURL Placeholder Text

WorldCat

23

Niewold

TB

,

Clark

DN

,

Salloum

R

, et al.

Interferon alpha in systemic lupus erythematosus

.

J Biomed Biotechnol

2010

;

2010

:

948364.

24

Becker

AM

,

Dao

KH

,

Han

BK

, et al.

SLE peripheral blood B cell, T cell and myeloid cell transcriptomes display unique profiles and each subset contributes to the interferon signature

.

PLoS One

2013

;

8

:

e67003.

25

Wu

J

,

Xie

F

,

Qian

K

, et al.

FAS mRNA editing in Human Systemic Lupus Erythematosus

.

Hum Mutat

2011

;

32

:

1268

–

77

.

26

Chowdhury

B

,

Tsokos

CG

,

Krishnan

S

, et al.

Decreased stability and translation of T cell receptor zeta mRNA with an alternatively spliced 3'-untranslated region contribute to zeta chain down-regulation in patients with systemic lupus erythematosus

.

J Biol Chem

2005

;

280

:

18959

–

66

.

27

Panico

F

,

Rizzi

F

,

Fabbri

LM

, et al.

Clusterin (CLU) and lung cancer

.

Adv Cancer Res

2009

;

105

:

63

–

76

.

28

Ramirez

NE

,

Zhang

Z

,

Madamanchi

A

, et al.

The alpha(2)beta(1) integrin is a metastasis suppressor in mouse models and human cancer

.

J Clin Invest

2011

;

121

:

226

–

37

.

29

Huang

Y

,

Zhang

J

,

Zhao

YY

, et al.

SPARC expression and prognostic value in non-small cell lung cancer

.

Chin J Cancer

2012

;

31

:

541

–

8

.

30

Xu

R

,

Guo

LJ

,

Xin

J

, et al.

Luciferase assay to screen tumour-specific promoters in lung cancer

.

Asian Pac J Cancer Prev

2014

;

14

:

6557

–

62

.

31

Lu

TP

,

Tsai

MH

,

Lee

JM

, et al.

Identification of a novel biomarker, SEMA5A, for non-small cell lung carcinoma in nonsmoking women

.

Cancer Epidemiol Biomarkers Prev

2010

;

19

:

2590

–

7

.

32

Voulgaris

E

,

Pentheroudakis

G

,

Vassou

A

, et al.

Disseminated intravascular coagulation (DIC) and non-small cell lung cancer (NSCLC): report of a case and review of the literature

.

Lung Cancer

2009

;

64

:

247

–

9

.

33

Timp

JF

,

Braekkan

SK

,

Versteeg

HH

, et al.

Epidemiology of cancer-associated venous thrombosis

.

Blood

2013

;

122

:

1712

–

23

.

34

Khorana

AA

,

Francis

CW

,

Culakova

E

, et al.

Frequency, risk factors, and trends for venous thromboembolism among hospitalized cancer patients

.

Cancer

2007

;

110

:

2339

–

46

.

35

Huang

H

,

Li

X

,

Guo

Y

, et al.

Identifying reproducible cancer-associated highly expressed genes with important functional significances using multiple datasets

.

Sci Rep

2016

;

6

:

36227.

36

Geman

D

,

d'Avignon

C

,

Naiman

DQ

, et al.

Classifying gene expression profiles from pairwise mRNA comparisons

.

Stat Appl Genet Mol Biol

2004

;

3

:

article19.

Author notes

These authors Guini Hong and Hongdong Li contributed equally to this work.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
February 2017	32
March 2017	19
April 2017	16
May 2017	24
June 2017	12
July 2017	4
August 2017	11
September 2017	6
October 2017	8
November 2017	8
December 2017	22
January 2018	5
February 2018	2
March 2018	10
April 2018	9
May 2018	13
June 2018	5
July 2018	25
August 2018	42
September 2018	22
October 2018	9
November 2018	13
December 2018	6
January 2019	1
February 2019	10
March 2019	17
April 2019	9
May 2019	3
June 2019	3
July 2019	6
August 2019	7
September 2019	12
October 2019	1
November 2019	13
December 2019	4
January 2020	5
February 2020	2
March 2020	1
April 2020	2
May 2020	2
June 2020	30
July 2020	27
August 2020	8
September 2020	13
October 2020	44
November 2020	18
December 2020	33
January 2021	13
February 2021	15
March 2021	40
April 2021	27
May 2021	21
June 2021	18
July 2021	20
August 2021	22
September 2021	17
October 2021	15
November 2021	20
December 2021	15
January 2022	14
February 2022	17
March 2022	29
April 2022	21
May 2022	20
June 2022	13
July 2022	23
August 2022	25
September 2022	114
October 2022	131
November 2022	49
December 2022	99
January 2023	50
February 2023	20
March 2023	24
April 2023	56
May 2023	30
June 2023	11
July 2023	6
August 2023	5
September 2023	13
October 2023	5
November 2023	22
December 2023	12
January 2024	7
February 2024	9
March 2024	10
April 2024	13
May 2024	24
June 2024	18
July 2024	7
August 2024	3
September 2024	4
October 2024	5
November 2024	16
December 2024	37
January 2025	13
February 2025	8
March 2025	19
April 2025	11
May 2025	5

Article Contents

A simple way to detect disease-associated cellular molecular alterations from mixed-cell blood samples

Abstract

Background

Materials and methods

Data sources and data preprocessing

Detecting disease-associated cellular alterations from mixed-cell samples

Simulation experiment

Functional pathway enrichment

Results

Performance evaluation of Ref-REO using simulated data

SLE associated cellular alterations in PBMCs

Lung adenocarcinoma-associated cellular alterations in PWB

Discussion

Supplementary Data

Funding

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

A simple way to detect disease-associated cellular molecular alterations from mixed-cell blood samples

Abstract

Background

Materials and methods

Data sources and data preprocessing

Detecting disease-associated cellular alterations from mixed-cell samples

Simulation experiment

Functional pathway enrichment

Results

Performance evaluation of Ref-REO using simulated data

SLE associated cellular alterations in PBMCs

Lung adenocarcinoma-associated cellular alterations in PWB

Discussion

Supplementary Data

Funding

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only