• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    SSCC:ANovel Computational Framework foRRapid and Accurate Clustering Large-scale Single Cell RNA-seq Data

    2019-07-12 06:35:22XianwenRenLiangtaoZhengZeminZhang
    Genomics,Proteomics & Bioinformatics 2019年2期

    Xianwen Ren*,Liangtao Zheng,Zemin Zhang*

    BIOPIC,Beijing Advanced Innovation CenteRfoRGenomics,and School of Life Sciences,Peking University,Beijing 100871,China

    KEYWORDS Single cell;RNA-seq;Clustering;SubsaMpling;Classification

    Abstract Clustering is Aprevalent analytical means to analyze single cell RNAsequencing(scRNA-seq)datAbut The rapid ly expanding datAvolume can make this process computationally challenging.NeWmethods foRboth accurate and efficient clustering are of pressing need.Here we proposed Spearman subsampling-clustering-classif ication(SSCC),AneWclustering framework based on randoMprojection and feature construction,foRlarge-scale scRNA-seq data.SSCC greatly improves clustering accuracy,robustness,and computational efficacy foRvarious state-of The -art algorithMs benchmarked on multiple real datasets.On AdatasetWith 68,578 human blood cells,SSCC achieved 20%improvement foRclustering accuracy and 50-fold acceleration,but only consumed 66%memory usage,coMpared to The Widelyused sof tware package SC3.CoMpared to k-means,The accuracy iMprovement of SSCC can reach 3-fold.An RiMplementation of SSCC is available at https://github.com/Japrin/sscClust.

    Introduction

    Single cell RNAsequencing(scRNA-seq)has revolutionized transcriptoMic studies by revealing The heterogeneity of individual cellsWith high resolution[1-6].Clustering has become Aroutine analyticalmeans to identify cell types,depict The iRfunctional states,and infer potential cellulaRdynaMics[4-10].Multip le clustering algorithms have been developed,including Seurat[11],SC3[12],SIMLR[13],ZIFA[14],CIDR[15],SNN-Cliq[16],and Corr[17]. The se algorithMs iMprove The clustering accuracy of scRNA-seq datAgreatly but of ten have high computational complexity,impeding The extension of The se elegant algorithMs to large-scale scRNA-seq datasets.With The rapid development of scRNA-seq technologies,The throughput has increased froMinitially hundreds of cells to tensof thousandsof cells in one run nowadays[18].Integrative analyses of scRNA-seq datasets froMmultip le runs oReven acrossmultip le studies fur The Rexacerbate The computational difficulties.Thus,algorithms that can clusteRsingle cells both efficiently and accurately are needed.

    To handlemultiple large-scale scRNA-seq datasets,ad hoc computational strategies have been proposed by downsampling oRconvoluting large datasets to small ones[12,19-21]oRby accelerating The coMputation With neWsof tware iMp lementation[22].Such strategies have reached variable levels of success but have not adequately addressed The challenges.Considering The iMportance of efficientand accurate clustering tools foRanalyses of large-scale scRNA-seq data,here we propose AneWcomputational framework,The Spearman subsaMp ling-clustering-classification (SSCC), based on machine learning techniques,including feature engineering and randoMprojection,to achieve both iMproved clustering accuracy and efficacy.Benchmarking on various scRNA-seq datasets demonstrates that compared to The current solutions,SSCC can reduce The computational comp lexity froMO(n2)to O(n)while maintaining high clustering accuracy.Moreover,flexibility of The neWcomputational framework allows ouRmethods to be fur The Rextended and adapted to AWide range of app lications foRscRNA-seq datAanalysis.

    Method

    Framework overview

    Among The available solutions to hand le large scRNA-seq datasets,clustering With subsampling and classification[12,19]has lineaRcomp lexity,i.e.,O(n).Such Aframework generally consists of fouRsteps(Figure 1A).(1)Agene expression matrix is constructed by datApreprocessing techniques including gene and cell filtration and normalization;(2)cells are divided into two subsets foRclustering and classification separately by subsamp ling;(3)The subsetted cells foRclustering are grouped into clusters using k-means[23],hierarchical clustering[24],density clustering[25],oRalgorithMs developed specially foRscRNA-seq;and(4)supervised algorithms such as k-nearest neighbors[26],support vectoRmachines(SVMs)[27],oRrandoMforests[28]are used to predict The labels of o The Rcells based on The clustering results at The third step.FoRsiMplicity,we referred this existing framework as subsaMp ling-clustering-classification(SCC).Because clustering is time-consuMing and memory-exhaustive,liMiting this step to Asmall subset of cells through subsaMpling can greatly reduce The coMputational cost froMO(n2)to O(n)by leveraging The efficiency of supervised machine learning algorithMs.However,classifiers built on The original gene expression datAof Asmallsubsetof cellsmay be flawed and biased due to noise of The raWdatAand small numbeRof cells,thus iMpairing The accuracy of label assignment foRThe total cells.

    Here we proposed AneWcomputational framework foRclustering large scRNA-seq datAby adding Afeature engineering/projecting step into SCC(Figure 1B).SiMilaRto SCC,Agene expression matrix is first constructed through gene and cell filtrations and normalization(Step 1,Figure 1B),and is The n sp lit randoMly into two subsets foRclustering and classification separately(Step 2;Figure 1B).Unlike SCC,which directly uses The raWdatAof gene expression,ouRneWframework projects cells into Afeature space(Step 3;Figure 1B)foRclustering(Step 4;Figure 1B)and classification(Step 5;Figure 1B).As The neWframework is characterized by Asubsamp ling-featuring-clustering-classification strategy, we named it as SFCC.Specifically,we divide feature construction into two steps:(1)feature extraction techniques are applied to cells subject to clustering;and(2)according to The selection of featureextractionmethods,cells foRclassification are The n projected into The built featurespace.Many established techniques in The machine learning field can be exp loited in The se two steps.FoRexamp le,principal coMponent analysis(PCA)[29]can beused to first construct features foRcellsundergoing clustering while The resultant loading vectors can be used as lineaRtransformations to project cells foRclassification into The feature space.Selecting different algorithms in each step of The SFCC framework would The n forMdifferent pipelines foRclustering large-scale scRNA-seq datasets.To reduce The total numbeRof algorithMic combinations,here we focus on comparing The performance between various feature engineering algorithMs.We hold algorithms foRgene and cell filtration,normalization,subsamp ling,and classification as The algorithMs frequently used in practice.The existing SCC strategy can be treated as Aspecial case of SFCC in which The original datAspace is The feature space.

    Feature engineering techniques involved in this study include distance-based methods (Euclidean and cosine),correlation-based methods(Pearson[30]and Spearman[31]correlations),and Aneural network-based method(autoencoder)[32].FoRdistance and correlation based methods,The distance/correlation matrix foRcells subject to clustering is directly used as The iRfeatures,and The distance/correlation matrix between cells subject to classification and clustering were used to construct features foRcells undergoing classification.FoRautoencoder,The gene expression datAof cells foRclustering are used to train Aneural network model first and The n all cells are projected into Afeature space through The encoding function of The trained model.To obtain evaluation results independent of clustering algorithMs,we use silhouette values[33]to exaMine The global performance of The se feature engineering methods.Upon The global evaluation,we The n select The most effectivemethod,SSCC,The SFCC With Spearman correlation as The feature constructionmethod,to do fur The Revaluations.

    scRNA-seq datasets used in this study

    We used seven scRNA-seq datasets to evaluate The clustering performance in feature space. The se include The Kolodziejczyk dataset[34],Pollen dataset[8],Usoskin dataset[9],Zeiseldataset[10],Zheng dataset[5],PBMC 68 k dataset[18],and Macosko dataset[19].Detailed descriptions of The se datasets are listed below.

    The Kolodziejczyk dataset[34]contains704 cellsWith three clusters,which were obtained froMmouse embryonic steMcells undeRdifferent culture conditions.About 10,000 genes were prof iled With high sequencing depth(average 9,000,000 reads peRcell,>80%of readsmapped to The Musmusculus genome GRCm38 With>60%to exons)using The FluidigMC1 systeMand app lying The SMARTeRK it to obtain cDNAand The NexterAXT K it foRIlluMinAlibrary preparation.

    The Pollen dataset[8]contains 249 cells With 11 clusters,which were obtained froMskin cells,p luripotent steMcells,blood cells,neural cells,etc.Ei The RloWoRhigh sequencing depth based on The C1 Single-Cell Auto Prep Integrated Fluidic Circuit,The SMARTeRU ltrALoWRNAK it,and The NexterAXT DNASamp le Preparation K itwas used to depict The gene expression prof iles of individual cells(~50,000 reads peRcell).

    Figure 1 Two computational frameworks foRrapid clustering large-scale scRNA-seq datasetsA.The original computational framework proposed in SC3(referred to SCC)consists of fouRmain steps:(1)constructing The gene expressionmatrix;(2)dividing The matrix into two parts through cell subsamp ling;(3)clustering The subsampled cells;and(4)classifying The unsaMp led cells into clusters.B.The neWcoMputational framework proposed in thisstudy(referred to SFCC).Afeature construction step isadded before clustering and classification. The whole framework comprises five steps:(1)constructing The gene expressionmatrix;(2)dividing The matrix into two parts through cell subsamp ling;(3)projecting The subsampled/unsamp led cells into feature space;(4)clustering The subsaMp led cells in The feature space;(5)classifying The unsaMpled cells into clusters in The feature space.scRNA-seq,single cell RNA-sequencing;SC3,single-cell consensus clustering;SCC,subsampling-clustering-classification;SFCC,subsampling-featuringclustering-classification.

    The Usoskin dataset[9]contains 622mouse neuronal cells With fouRclusters,i.e.,peptidergic nociceptor-containing,nonpeptidergic nociceptor-containing,neurof ilament-containing,and tyrosine hydroxylase-containing cells.The neuronal cells were picked With Arobotic cell-picking setup and positioned in wells of 96-well p lates before RNA-seq(1,140,000 reads and 3574 genes peRcell).

    The Zeisel dataset[10]contains 3005 cells froM The mouse brain With nine majoRsubtypes.The gene expression levels were estimated by counting The numbeRof unique moleculaRidentifiers(UMIs)obtained by D rop-seq.

    The Zheng dataset[5]contains 5063 T cells froMfive patientsWith hepatocellulaRcarcinoma.N ine subtypes of samp leswere prepared according to The tissue typesand cell types,and The n subject to Smart-seq2 foRgene expression prof iling(~1,290,000 uniquely mapped read pairs peRcell).

    The PBMC 68 k dataset[18]contains 68,578 peripheral bloodmononucleaRcells(PBMCs)of Ahealthy human subject.This cell population includes eleven majoRimmune cell types.Gene expression was prof iled using The 10×GenoMics Gem-Code p latform,and 3′UMI countswere used to quantify gene expression levelsWith The iRcustoMized coMputationalpipeline.

    The Macoskco dataset[19]contains 49,300 mouse retinAcells Without known distinct clusters.The gene expression levels were estimated by counting The numbeRof UMIs obtained by D rop-seq.Cellswere fur The Rclustered into 39 subtypes by The authors based on The Seurat algorithm.

    DatApreprocessing

    The first fouRdatasets(i.e.,The Kolodziejczyk,Pollen,Usoskin,and Zeiseldatasets)have been Widely used foRevaluating clustering algorithMs,of which The preprocessed datAhave been included in The SIMLRsof tware package foRtest use(https://github.com/BatzoglouLabSU/SIMLR). We downloaded The se fouRdatasets froMThe Matlab subdirectory of The SIMLRpackage,and The n selected The top 5000 most informative genes(With both The average and The standard deviation of log2-transformed expression values>1)foRsubsequentanalysis.If The numbeRof genes in Adatasetwas smalleRthan 5000, The n all The genes in The dataset were retained foRfur The Ranalysis.FoRThe Zheng dataset,one patient(P0508)was selected foRcomparison of different clustering algorithms,which had 1020 T cellsWith eight subtypes defined by The tissue sources and The cell surface markers.Genes With both The average and The standard deviation of log2-transformed expression values>1 were retained and The n The transcripts perMillion(TPM)valueswere used foRclustering evaluation.FoRThe PBMC 68 k dataset,The preprocessing pipeline described in The original report[18]was used to prepare datAfor clustering (https://github.com/10XGenoMics/single-cell-3prime-paper).FoRThe Macoskco dataset,The UMI counts were used foRevaluation Without gene filter.

    Consistency between true labels and The original aswell as The projected data

    The silhouette value[33]is used to measure The consistency between The true labelsand The originalaswellas The projected data.Given AdatasetWith n samp les and Aclustering scheme,Asilhouette value is calculated foReach saMp le.FoRAsaMp le i,its silhouette value siis calculated according to The folloWing formula:

    where aiis The average dissiMilarity of samp le i to saMples in itsown clusteRand biis The lowestaverage dissiMilarity of sample i to any o The RclusteRof which sample i is not amember.The values of sirange from-1 to 1.Avalue close to 1means that saMp le i is wellmatched to its cluster,whereas Avalue close to-1 means that samp le i would bemore appropriate ifit is classified into its neighboring cluster.FoReach feature construction method,The median silhouette value of all The cells afteRprojection was used to evaluate its consistency With The true clusteRlabels.The fraction of cells that have silhouette values increased afteRprojection compared to The originaldata(i.e.,The fraction of cells above The diagonal in Figure 2)was also used to evaluate The feature construction methods.

    Clustering accuracy/consistency evaluation

    Normalized mutual information(NMI)[35]was used to evaluate The accuracy of various clustering results.G iven two clustering schemes A={A1,···,AR}and B={B1,···,Bs},The overlap between Aand B can be represented through The contingency table C(also named as confusion matrix)of size R×S,where Cijdenotes The numbeRof cells that are shared by clusters Aiand Bj. The n The normalizedmutual information NMI(A,B)of The two clustering schemes Aand B is defined as follows.

    where n is The numbeRof total cells,Ci-is The numbeRof cells assigned to clusteRi in The clustering scheme Aand C-jis The numbeRof cells assigned to clusteRj in The clustering scheme B.IfAis identical to B,NMI(A,B)=1.IfAand B are completely different,NMI(A,B)=0.When true clusteRlabels were available,The NMI values between true clusteRlabels and various clustering resultswere used to evaluate The clustering accuracy.When true clusteRlabels were not available,NMI was used to evaluate clustering consistency between different subsamp ling rates in thisstudy.BesidesNMI,wealso used Rand index and adjusted Rand index to evaluate clustering accuracy and consistency,and obtained siMilaRobservations.

    Clustering and classification algorithms

    Many clustering algorithMs are available.We selected five Widelyused clustering algorithms in this study to evaluate The impacts of Spearman correlation-based feature construction method. The se five algorithMs include three general clustering algorithMs which were designed initially not foRscRNA-seq data,i.e.,affinity propagation(AP)[36],k-means[23],and k-medoids[37],and two algorithMs that were specially designed foRclustering of scRNA-seq data,i.e.,SC3[12]and SIMLR[13].k-means and k-medoids are pure clustering algorithMs that partition saMples into groupswhile AP,SC3,and SIMLRinherently include feature construction techniques.All The se clustering algorithmswere evaluated on five small-scale datasets(The Kolodziejczyk,Pollen,Usoskin,Zeisel,and Zheng datasets),while only SC3 was evaluated on The PBMC 68 k datasetand only k-meanswasevaluated on The Macoskco dataset for siMp licity. Parameters (ks=10:12, gene_filter=FALSE,biology=FALSE,svm_max=5000)were used foRSC3(default),whereas parameters(ks=11,gene_filter=FALSE,biology=FALSE,svm_max=200)were used foRSC3+SSCC.On The Macoskco dataset,~5%and 10%cells were randoMly picked out foRclustering analyses.We used The k-nearest neighboRalgorithMfoRclassifying unsubsamp led cells,which is robust to parameteRselection.

    Results

    Feature construction can greatly improve The consistency of cell features and The reference cell labels

    First we evaluated whe The Rfeature extraction methods can improve clustering results of scRNA-seq data.We calculated The silhouette values to evaluate The consistency between cell features extracted using various methods and The reference labels.Silhouette values are frequently used to indicate whe The RAsaMple is properly clustered.Bu The rewe can use silhouette values to reversely indicatewhe The RThe extracted features are properly consistent With The reference cell labels.By comparing With silhouette values of The original scRNA-seq data,we observed that most of The evaluated featureextractingmethods can iMprove The silhouette values formany cells inmultiple datasets(Figure 2).FoRThe Kolodziejczyk[34]and Pollen[8]datasets,all The five feature-extraction methods improved The silhouette values coMpared With The original data.FoRThe Usoskin[9]dataset,allmethods showed significantly betteRperformance except Euclidean and cosine.FoRThe Zeisel[10]dataset,only Spearman correlation resulted in iMprovement for>80%cells coMpared With The originaldata,while o The Rfeature extraction methods except Euclidean resulted in little iMprovement.Euclidean resulted in even worse results foRThe Zeisel dataset,indicating loWrobustness.FoRThe Zheng[5]dataset,most methods failed except The Spearman correlation method.The Spearman correlationbased feature extraction method consistently improved The accordancebetween cell featuresand labelson all The five datasets.Considering The robustness of Spearman’s correlationbased method and The great improvement of silhouette values of single cells,we evaluated The accuracy,robustness,and efficacy of SSCC in The next section.

    Figure 2 Consistency with true clusteRlabels between engineered features and The original datAof five datasetsIn each p lot,each dot represents Acell.Silhouette values calculated using true clusteRlabels and The original datAare shown on X axis,whereas silhouette values calculated using true clusteRlabels and The engineered features are shown on Y axis.Silhouette value at 1 represents perfectmatch between labels and features,whereas silhouette value at-1 indicates that The cellMight beMis-clustered.The percentage in The plotting areAof each plot indicates The fraction of cells above The diagonals.The five datasets tested are The Kolodziejczyk dataset[34],Pollen dataset[8],Usoskin dataset[9],Zeisel dataset[10],and Zheng dataset[5].

    Clustering accuracy of The totalcells isenhanced in featurespace when subsampling is applied

    While subsaMpling can greatly boost The efficiency of clustering of large scRNA-seq data,it of ten coMproMises The clustering accuracy.Weobserved that The improvementsof silhouette scores by SSCC were robust to subsaMpling fluctuations(Figure 3).FoRall The five datasets evaluated,The silhouette values of Spearman correlation-based features were almost unchanged With subsamp ling rates(Figure 3). The se datAsuggest that features constructed using SSCC at loWsubsamp ling ratesmay contain information approximate to thatWith total cell populations.

    Figure 3 Silhouette values between Spearman correlation features and true clusteRlabels are independent of subsampling rates in five datasetsSpearman correlation featureswere constructed atvarioussubsamp ling ratesof The originaldatAin The fivedatasets.In each plot,each dot represents Acell.Silhouette values of Spearman correlation features constructed With 100%cells are shown on X axis,whereas silhouette values of Spearman correlation features constructed With 10%,20%,30%,40%,and 50%cells in each dataset are shown on Y axis.Pearson correlation between X and Y axeswas calculated,where The correlation coefficient(r)is provided in The uppeRtriangle and The corresponding P value is provided in The loweRtriangle of each p lot.

    We fur The Revaluated whe The RThe improved silhouette values can be translated into clustering accuracy.By evaluating five clustering algorithms including k-means,k-medoids,AP,SC3,and SIMLR,we observed that compared to SCC,SSCC can significantly iMprove The clustering accuracy in terMs of NMI,foRall The five clustering algorithms on all The benchmark datasets tested(Figure 4).The accuracy iMprovements measured byΔNMI range froM0.12 to 0.60 foRThe Kolodziejczyk dataset,0.04 to 0.19 foRThe Pollen dataset,0.14 to 0.37 foRThe Usoskin dataset,0.02 to 0.28 foRThe Zeiseldataset,and 0.10 to 0.28 foRThe Zheng dataset,depending on The algorithms and subsampling rates chosen.O The Raccuracy metrics including Rand index,adjusted Rand index,and ad justed mutual information reveal The same trends(datAnot shown),suggesting that SSCC can greatly enhance The poweRof multip le clustering algorithmswhen subsamp ling is used.

    Figure 4 Clustering performance comparison between SCC and SSCC with varied subsampling rates in five datasetsClustering accuracy using SCC and SSCC wasmeasured at various subsaMpling rates of The original datAin The five datasets,i.e.,The percentage of cells used in clustering.The clustering accuracy is indicated using NMI.FoReach subsaMp ling rate,calculations were repeated foRten times,based on which The average and The standard deviation of The clustering accuracy were calculated and plotted.NMI,normalized mutual information;SSCC,Spearman subsaMp ling-clustering-classification;AP,affinity propagation.

    Clustering consistency between different subsampling runs isalso greatly improved with SSCC

    In practice,The reference cell labels are generally unknown.The confidence of clustering results is of ten evaluated by The consistency between different algorithms.Due to The subsamp ling fluctuations,clustering results based on SCC are inconsistent among different subsamp ling operations.However,in The neWframework of SSCC,The consistency was much iMproved foRallevaluated clustering algorithMson alldatasets(Figure5).FoRThe Kolodziejczyk dataset,all The five clustering algorithMshad consistency>0.5(measured by NMI)in SSCC while The corresponding consistency in SCC wasmuch smaller.FoRThe Pollen dataset,SSCC still showed betteRperformance than SCC although both frameworks had high clustering consistency.SiMilaRtrendswere observed on The Usoskin,Zeisel,and Zheng datasets.

    Application of SSCC to large scRNA-seq datasetswith orwithout reference cell labels

    Besides The aforementioned five scRNA-seq datasets,we fur The Rtested SSCC on two additional large scRNA-seq datasets.One is The PBMC 68 k dataset[18],which contains 10×GenoMics-based expression datAfoR68,578 blood cells froMAhealthy donor. The o The Ris The Macoskco dataset[19],which contains 49,300 mouse retinAcells lacking of experimentally deterMined cell labels.The large cell numbers generally prohibit classic scRNA-seq clustering algorithms running on Adesktop computer,thus providing two realistic examp les to demonstrate The performance of SCC and SSCC.

    FoRThe PBMC 68 k dataset,we compared SSCC With SCC using SC3[12]as The clustering algorithm.The SC3 sof tware package inherently app lies an SCC strategy to hand le large scRNA-seq datasets.By default,ifAdataset has more than 5000 cells,The SCC strategy Will be triggered,With 5000 cells randoMly subsaMp led foRSC3 clustering and The o The Rcells foRclassification by SVM.We app lied SC3 to The PBMC 68 k dataset on Adesktop computeRWith 8GB memory and 3GHz 4-core CPU and repeated ten times.The average clustering accuracy of SC3 in termsof NMIwas 0.48,The calculation took 99Min on average,and The maximuMmemory usage exceeded 5.6GB(Figure 6A).With The SSCC strategy,The average clustering accuracy reached 0.59,representing~21%increaseoveRSC3With The defaultparameters.It isof note that The computation timewas dramatically reduced to 2.2Min on average,representing A50-fold acceleration.Meanwhile,The maximuMmemory usage of SC3+SSCC was 3.7GB,saving>33%coMpared to that of SC3 With The default parameters.Compared to dropClust[20],Aclustering algorithMspecialized foRlarge scRNA-seq datasets,SC3+SSCC also demonstrated superioRperformance in terms of clustering accuracy,speed,and memory usage(Figure 6A).

    Figure 5 Comparison of clustering consistency between SSCC and SCC foRfive datasetsThe consistency(measured by NMI)of clustering between using 10%cellsand thatusing 50%cellsWith SCC isshown on X axis,whereas consistency(measured by NMI)of clustering between using 10%cells and that using 50%cells With SSCC is shown on Y axis.Subsamp lingswere repeated foRten timesand each subsampling resultwas processed using five clustering algorithms shown on The left.

    FoRThe Macoskco dataset,using k-means as The clustering algorithMand k-nearest neighbors foRclassification,The SCC strategy resulted in great average silhouette difference(0.29)between two subsampling schemes(-0.80 with 5%cells and-0.51 With 10%cells),whereas The difference using SSCC became negligible(0.01).The NMI values between The two subsamp ling schemes were 0.60 and 0.69 when using SCC and SSCC,respectively.Pearson correlation coefficients of silhouette values between The two subsamp ling schemes were increased froM0.47 to 0.58when switching froMSCC to SSCC(Figure 6B).

    All The se metrics demonstrate that SSCC can not only greatly improve The clustering efficiency and accuracy foRlarge-scale scRNA-seq datasets,but also can greatly iMprove The consistency.

    Discussion

    The availability of large-scale scRNA-seq datAraises urgent need foRefficient and accurate clustering tools.Currently AfeWscRNA-seq datAanalysis packages have been proposed to address this challenge.O f The se tools,SC3[12],Seurat[11],and dropClust[20]adopt ASCC strategy,bigScale[21]eMploys Aconvolution strategy to merge siMilaRsingle cells intomegAcellsby Agreedy-searching algorithm,and SCANPY[22]used Python as The programMing language to accelerate The clustering process.Although The se strategies greatly boost The efficiency of large scRNA-seq datAanalysis, The re exists much rooMfoRfur The Rimprovement.Particularly The SCC strategy suffers froMbiases introduced by subsaMpling which may greatly decrease The clustering accuracy and robustness,although it can reduce The computational comp lexity froMO(n2)to O(n).Here we introduce feature engineering and projecting techniques into The SCC framework and propose SFCC as an alternative.Specially,With Spearman correlations as The feature engineering and projecting methods,we formulate Aframework named as SSCC,which can significantly improve clustering accuracy and consistency formany generaland speciallydesigned clustering algorithms.Evaluations on real scRNA-seq datasets,which coveRAWide range of scRNAseq technologies,sequencing depths,and organisms,demonstrate The robustness of The superioRperformance of SSCC. The refore,SSCC is expected to be Auseful computational framework that can fur The Runleash The great poweRof scRNA-seq in The future.

    Figure 6 Clustering performance evaluation of SSCC on two extremely large scRNA-seq datasetsA.Performance comparison between SC3(default),dropClust,and SC3+SSCC on The PBMC 68 k dataset[18]in terms of clustering accuracy,running time and maximuMmemory required.In total 5000 cells were subsaMp led foRSC3(default),while 200 cells were subsamp led foRSC3+SSCC.B.Consistency comparison between SSCC(on The right)and SCC(on The left)evaluated on 49,300mouse retinAcells in The Macosko dataset[19].Silhouette values of two clustering schemes(using 2000 cells and 4930 cells,separately)were p lotted and The n Pearson correlation coefficientswere calculated.The 39 cell clusterswere colored according to clusteRlabels based on~10%cells and original expression data.

    Authors’contributions

    XRand ZZ designed The study.XRand LZ collected The data,iMp lemented The sof tware,and did The analysis.XRand ZZ Wrote The manuscript.Allauthors read and approved The final manuscript.

    Competing interests

    The authors have declared no coMpeting interests.

    AcknoWledgments

    This project was supported by grants froMBeijing Advanced Innovation CenteRfoRGenoMics at Peking University,Key Technologies R&D Program(G rant No.2016YFC0900100)by The Ministry of Science and Technology of China,and The National Natural Science Foundation of China(G rant Nos.81573022 and 31530036).

    乱码一卡2卡4卡精品| 国产av国产精品国产| 九九在线视频观看精品| 免费人成在线观看视频色| 国产毛片a区久久久久| 亚洲av中文av极速乱| 国产成年人精品一区二区| 亚洲精品影视一区二区三区av| 建设人人有责人人尽责人人享有的 | 男人爽女人下面视频在线观看| 亚洲国产精品国产精品| 欧美最新免费一区二区三区| 久久久久久久久大av| 日日摸夜夜添夜夜爱| 免费看av在线观看网站| 国产成人freesex在线| 亚洲精品视频女| 日韩av免费高清视频| 亚洲精品国产av成人精品| 尤物成人国产欧美一区二区三区| 人体艺术视频欧美日本| 免费不卡的大黄色大毛片视频在线观看| 女人十人毛片免费观看3o分钟| 美女高潮的动态| 午夜免费男女啪啪视频观看| 只有这里有精品99| 国产黄片视频在线免费观看| 国产精品久久久久久久久免| www.av在线官网国产| 久久国产乱子免费精品| 亚洲欧美日韩卡通动漫| 欧美成人午夜免费资源| 99视频精品全部免费 在线| 精品久久久久久电影网| 国内精品宾馆在线| 美女被艹到高潮喷水动态| 日本-黄色视频高清免费观看| 国产伦精品一区二区三区视频9| 激情 狠狠 欧美| 成年人午夜在线观看视频| 国产色爽女视频免费观看| 大陆偷拍与自拍| 亚洲欧美成人综合另类久久久| 久久6这里有精品| 香蕉精品网在线| 特级一级黄色大片| 最新中文字幕久久久久| 插阴视频在线观看视频| 联通29元200g的流量卡| 久久久久久九九精品二区国产| 久久久国产一区二区| 在现免费观看毛片| 大又大粗又爽又黄少妇毛片口| 亚洲欧美精品自产自拍| 自拍欧美九色日韩亚洲蝌蚪91 | 亚洲国产精品成人综合色| 香蕉精品网在线| 人妻制服诱惑在线中文字幕| 日本wwww免费看| 国产精品久久久久久av不卡| 99热这里只有是精品50| 精品人妻偷拍中文字幕| a级毛片免费高清观看在线播放| 人妻少妇偷人精品九色| 中国美白少妇内射xxxbb| 国产亚洲5aaaaa淫片| 搡老乐熟女国产| 建设人人有责人人尽责人人享有的 | tube8黄色片| 日本免费在线观看一区| 一级a做视频免费观看| 乱系列少妇在线播放| 日韩一区二区视频免费看| 纵有疾风起免费观看全集完整版| 久久久精品欧美日韩精品| 久久99热6这里只有精品| 国产毛片在线视频| 少妇人妻 视频| 黄色怎么调成土黄色| 亚洲精品久久午夜乱码| 国产在线男女| 亚洲精品一区蜜桃| 99久久精品一区二区三区| 纵有疾风起免费观看全集完整版| av在线天堂中文字幕| 少妇 在线观看| 国产午夜精品一二区理论片| 婷婷色综合大香蕉| 亚洲va在线va天堂va国产| 麻豆久久精品国产亚洲av| 久久精品久久久久久噜噜老黄| 亚洲高清免费不卡视频| 精品视频人人做人人爽| 中文天堂在线官网| 成人漫画全彩无遮挡| 超碰97精品在线观看| 日韩国内少妇激情av| 黄片wwwwww| 六月丁香七月| 男女那种视频在线观看| 婷婷色麻豆天堂久久| 18禁在线播放成人免费| 特大巨黑吊av在线直播| 九九爱精品视频在线观看| 天天躁日日操中文字幕| 亚州av有码| 久久久久久久午夜电影| 精品少妇久久久久久888优播| 国产av国产精品国产| 热re99久久精品国产66热6| 黄色欧美视频在线观看| 国产老妇女一区| 国产精品福利在线免费观看| 国产男女内射视频| 国产免费一区二区三区四区乱码| 国产69精品久久久久777片| 日韩视频在线欧美| .国产精品久久| 国产精品无大码| 一级a做视频免费观看| 久久久久九九精品影院| 女的被弄到高潮叫床怎么办| 人人妻人人爽人人添夜夜欢视频 | 91午夜精品亚洲一区二区三区| 国产精品成人在线| 日韩三级伦理在线观看| 欧美性感艳星| 午夜福利视频1000在线观看| 女人被狂操c到高潮| tube8黄色片| 国产成人freesex在线| 国产有黄有色有爽视频| 国产极品天堂在线| 自拍欧美九色日韩亚洲蝌蚪91 | 欧美精品一区二区大全| 22中文网久久字幕| 亚洲一区二区三区欧美精品 | 久久99热6这里只有精品| 国产一区有黄有色的免费视频| 免费黄频网站在线观看国产| 亚洲精华国产精华液的使用体验| 又大又黄又爽视频免费| 99久国产av精品国产电影| 人妻 亚洲 视频| 亚洲欧美精品专区久久| 一个人观看的视频www高清免费观看| av在线app专区| 国产午夜福利久久久久久| 国产精品爽爽va在线观看网站| 又爽又黄无遮挡网站| 在线播放无遮挡| 国产成人午夜福利电影在线观看| 91狼人影院| 黄色配什么色好看| 最近中文字幕2019免费版| 日韩欧美精品v在线| 在线观看三级黄色| 亚洲婷婷狠狠爱综合网| 国产精品蜜桃在线观看| 80岁老熟妇乱子伦牲交| 熟女电影av网| 97人妻精品一区二区三区麻豆| 久久ye,这里只有精品| 国产精品嫩草影院av在线观看| 精品人妻一区二区三区麻豆| 久久久午夜欧美精品| 91在线精品国自产拍蜜月| 乱码一卡2卡4卡精品| 亚洲精品亚洲一区二区| 亚洲色图综合在线观看| a级一级毛片免费在线观看| 欧美日韩在线观看h| 六月丁香七月| 日韩在线高清观看一区二区三区| 成人国产av品久久久| 在线观看av片永久免费下载| 91精品伊人久久大香线蕉| 男人添女人高潮全过程视频| 久久精品久久久久久噜噜老黄| 国产老妇伦熟女老妇高清| 久久亚洲国产成人精品v| 久久精品久久久久久噜噜老黄| 一本一本综合久久| 黄色一级大片看看| 亚洲欧洲日产国产| 99热这里只有精品一区| 国产黄色免费在线视频| 性插视频无遮挡在线免费观看| 欧美一级a爱片免费观看看| 日产精品乱码卡一卡2卡三| 好男人视频免费观看在线| 日韩一区二区视频免费看| 美女主播在线视频| 男人和女人高潮做爰伦理| 尾随美女入室| 亚洲精品色激情综合| 日韩一区二区三区影片| 欧美最新免费一区二区三区| av一本久久久久| 丝袜脚勾引网站| 精品一区二区三区视频在线| 久久久久久国产a免费观看| 王馨瑶露胸无遮挡在线观看| 久久久色成人| 麻豆成人av视频| 99久久中文字幕三级久久日本| 久久人人爽人人片av| 亚洲成人av在线免费| 97在线视频观看| 欧美极品一区二区三区四区| 亚洲av国产av综合av卡| 国产又色又爽无遮挡免| 亚洲国产av新网站| 亚洲成人一二三区av| 欧美xxⅹ黑人| 91久久精品国产一区二区成人| 黄色欧美视频在线观看| 欧美亚洲 丝袜 人妻 在线| 能在线免费看毛片的网站| 亚洲欧美成人精品一区二区| 少妇人妻久久综合中文| 国产中年淑女户外野战色| 人体艺术视频欧美日本| 国产免费视频播放在线视频| 欧美人与善性xxx| 免费不卡的大黄色大毛片视频在线观看| 亚洲欧美一区二区三区国产| 国产午夜福利久久久久久| 免费观看性生交大片5| 男女无遮挡免费网站观看| 日韩中字成人| 免费在线观看成人毛片| 国产永久视频网站| 最近手机中文字幕大全| 99久久九九国产精品国产免费| 日韩av免费高清视频| 亚洲精品乱码久久久v下载方式| 午夜日本视频在线| 老司机影院成人| 亚洲欧美日韩东京热| 麻豆国产97在线/欧美| 亚洲精品乱码久久久v下载方式| 亚洲精品久久久久久婷婷小说| 国模一区二区三区四区视频| 人人妻人人爽人人添夜夜欢视频 | 看十八女毛片水多多多| 亚洲精品久久午夜乱码| 在线播放无遮挡| 国产一区亚洲一区在线观看| 久久精品国产a三级三级三级| 婷婷色麻豆天堂久久| 禁无遮挡网站| 亚洲成人av在线免费| 色视频在线一区二区三区| 日韩视频在线欧美| 免费不卡的大黄色大毛片视频在线观看| 欧美成人精品欧美一级黄| 久久久亚洲精品成人影院| 国产免费一区二区三区四区乱码| 99久久九九国产精品国产免费| 噜噜噜噜噜久久久久久91| 在线观看人妻少妇| 亚洲美女视频黄频| 国产精品伦人一区二区| 亚洲精品亚洲一区二区| 乱系列少妇在线播放| 99视频精品全部免费 在线| 日本爱情动作片www.在线观看| 男女无遮挡免费网站观看| 久久99热这里只有精品18| 国产亚洲一区二区精品| 男人舔奶头视频| 六月丁香七月| 免费黄频网站在线观看国产| 老司机影院成人| 国产乱来视频区| 国产探花极品一区二区| 五月伊人婷婷丁香| 一个人观看的视频www高清免费观看| 日韩视频在线欧美| 婷婷色麻豆天堂久久| 免费观看性生交大片5| 一级黄片播放器| 男女那种视频在线观看| 欧美+日韩+精品| 精品酒店卫生间| 日本与韩国留学比较| 久久精品久久久久久噜噜老黄| 91久久精品国产一区二区三区| 亚洲国产精品成人综合色| 亚洲人成网站在线播| 中文字幕av成人在线电影| 婷婷色综合www| 特级一级黄色大片| 国产乱人视频| 亚洲在线观看片| h日本视频在线播放| 2021天堂中文幕一二区在线观| 久久久精品94久久精品| 干丝袜人妻中文字幕| 99久久九九国产精品国产免费| 各种免费的搞黄视频| 亚洲成人久久爱视频| 熟女av电影| 国产男人的电影天堂91| 国产成人a区在线观看| 亚洲欧美成人综合另类久久久| 婷婷色av中文字幕| 中文欧美无线码| 久久人人爽人人片av| 久久久久久九九精品二区国产| 禁无遮挡网站| 欧美xxⅹ黑人| 色婷婷久久久亚洲欧美| 身体一侧抽搐| 精品一区二区三区视频在线| 国产高潮美女av| 亚洲va在线va天堂va国产| h日本视频在线播放| 啦啦啦中文免费视频观看日本| 97超碰精品成人国产| 麻豆国产97在线/欧美| 天堂网av新在线| 一级毛片 在线播放| 蜜桃久久精品国产亚洲av| 亚洲精品成人久久久久久| 建设人人有责人人尽责人人享有的 | 精品一区二区三卡| 少妇的逼好多水| 国产大屁股一区二区在线视频| 国产精品秋霞免费鲁丝片| 亚洲精品国产av蜜桃| 亚洲精品成人久久久久久| 热re99久久精品国产66热6| 婷婷色麻豆天堂久久| 最近2019中文字幕mv第一页| 久久久亚洲精品成人影院| 亚洲欧美中文字幕日韩二区| 中文天堂在线官网| 久久精品夜色国产| 亚洲真实伦在线观看| 国产成人aa在线观看| 中文天堂在线官网| 亚洲精品第二区| 中文字幕av成人在线电影| 在线免费十八禁| 我的老师免费观看完整版| 春色校园在线视频观看| 亚洲电影在线观看av| 午夜老司机福利剧场| 成人亚洲精品一区在线观看 | 国产老妇伦熟女老妇高清| 天堂中文最新版在线下载 | 成人免费观看视频高清| 一个人观看的视频www高清免费观看| 亚洲国产日韩一区二区| 91精品伊人久久大香线蕉| 女人久久www免费人成看片| 91在线精品国自产拍蜜月| 不卡视频在线观看欧美| 在线观看免费高清a一片| 成年版毛片免费区| 午夜老司机福利剧场| 99精国产麻豆久久婷婷| 亚洲欧美清纯卡通| 亚洲欧美一区二区三区黑人 | 一级毛片我不卡| 九九在线视频观看精品| 男人和女人高潮做爰伦理| 国产免费福利视频在线观看| 国产 精品1| 天美传媒精品一区二区| 少妇高潮的动态图| 18禁在线无遮挡免费观看视频| 久久女婷五月综合色啪小说 | 日日摸夜夜添夜夜添av毛片| 校园人妻丝袜中文字幕| 国产探花极品一区二区| av卡一久久| 久久久久国产网址| 久久午夜福利片| 人体艺术视频欧美日本| 国产老妇伦熟女老妇高清| 日韩欧美 国产精品| 国产精品99久久99久久久不卡 | 久久精品国产a三级三级三级| 一区二区av电影网| 男女那种视频在线观看| 一级a做视频免费观看| 精品久久久久久久人妻蜜臀av| 亚洲最大成人av| 91精品一卡2卡3卡4卡| 国产高清国产精品国产三级 | 街头女战士在线观看网站| 国产亚洲一区二区精品| 26uuu在线亚洲综合色| 亚洲人成网站在线播| 国产伦精品一区二区三区视频9| av播播在线观看一区| 又爽又黄无遮挡网站| 精品久久久久久久久亚洲| 女的被弄到高潮叫床怎么办| 日日撸夜夜添| 欧美潮喷喷水| 国产精品女同一区二区软件| 国产毛片a区久久久久| 国产高清国产精品国产三级 | 国产熟女欧美一区二区| 欧美xxxx黑人xx丫x性爽| 麻豆成人av视频| 18禁在线播放成人免费| 五月开心婷婷网| 能在线免费看毛片的网站| 欧美xxⅹ黑人| 久久人人爽人人片av| 国产一区二区三区av在线| 嘟嘟电影网在线观看| 精品国产三级普通话版| 老师上课跳d突然被开到最大视频| 亚洲一级一片aⅴ在线观看| 亚洲三级黄色毛片| 99九九线精品视频在线观看视频| 人妻制服诱惑在线中文字幕| 一区二区三区乱码不卡18| 国产精品一区二区性色av| 午夜爱爱视频在线播放| 国产精品无大码| 熟妇人妻不卡中文字幕| 丰满人妻一区二区三区视频av| 免费少妇av软件| 国国产精品蜜臀av免费| 亚洲性久久影院| 亚洲,一卡二卡三卡| 人人妻人人澡人人爽人人夜夜| 18禁裸乳无遮挡免费网站照片| 我的老师免费观看完整版| 好男人在线观看高清免费视频| 亚洲国产高清在线一区二区三| 18禁在线无遮挡免费观看视频| 成人高潮视频无遮挡免费网站| 久久久久久久久久成人| 我的女老师完整版在线观看| 国产男女内射视频| 国产黄色视频一区二区在线观看| 波野结衣二区三区在线| 久久久久久久精品精品| 久久久欧美国产精品| 日韩av免费高清视频| 一级毛片aaaaaa免费看小| 国产av不卡久久| 国产一区有黄有色的免费视频| 亚洲伊人久久精品综合| 午夜精品国产一区二区电影 | 国产成人精品婷婷| 涩涩av久久男人的天堂| 男女下面进入的视频免费午夜| 久久99热6这里只有精品| 久久久亚洲精品成人影院| 日韩亚洲欧美综合| 最近中文字幕2019免费版| 六月丁香七月| 精品酒店卫生间| 国产黄片视频在线免费观看| 色5月婷婷丁香| 狂野欧美激情性xxxx在线观看| 国产欧美日韩精品一区二区| 大码成人一级视频| av线在线观看网站| 国产一区二区三区av在线| 国产亚洲一区二区精品| 日韩一本色道免费dvd| 美女cb高潮喷水在线观看| 夫妻性生交免费视频一级片| 最近中文字幕高清免费大全6| 日韩在线高清观看一区二区三区| 2018国产大陆天天弄谢| videos熟女内射| 黄片wwwwww| 蜜桃亚洲精品一区二区三区| 神马国产精品三级电影在线观看| 午夜爱爱视频在线播放| 国产午夜福利久久久久久| 亚洲婷婷狠狠爱综合网| 国产精品一区二区三区四区免费观看| 嫩草影院精品99| 欧美一级a爱片免费观看看| 亚洲av男天堂| 涩涩av久久男人的天堂| 菩萨蛮人人尽说江南好唐韦庄| 亚洲精品中文字幕在线视频 | 国产亚洲av嫩草精品影院| 岛国毛片在线播放| 午夜免费鲁丝| 亚洲精华国产精华液的使用体验| 麻豆国产97在线/欧美| 国产一区有黄有色的免费视频| 一级毛片 在线播放| 亚洲美女搞黄在线观看| 自拍偷自拍亚洲精品老妇| 51国产日韩欧美| 高清午夜精品一区二区三区| 卡戴珊不雅视频在线播放| 国产精品99久久久久久久久| 久久影院123| 成年女人在线观看亚洲视频 | 亚洲美女搞黄在线观看| 欧美日韩视频精品一区| 国产亚洲一区二区精品| 国产爱豆传媒在线观看| 18禁裸乳无遮挡免费网站照片| 亚洲最大成人中文| 青青草视频在线视频观看| 日韩欧美一区视频在线观看 | 伊人久久国产一区二区| 欧美日韩在线观看h| 少妇熟女欧美另类| 两个人的视频大全免费| 日韩免费高清中文字幕av| 亚洲色图av天堂| 成人漫画全彩无遮挡| 免费av观看视频| 久久人人爽人人爽人人片va| 国产 一区 欧美 日韩| 成人国产麻豆网| 国产精品伦人一区二区| 菩萨蛮人人尽说江南好唐韦庄| 国产日韩欧美在线精品| 国产精品av视频在线免费观看| a级毛色黄片| 日韩人妻高清精品专区| 久久久久精品性色| 成人毛片60女人毛片免费| 国产在视频线精品| 看十八女毛片水多多多| 在线观看免费高清a一片| 在现免费观看毛片| 国产熟女欧美一区二区| 日本欧美国产在线视频| 国产免费一级a男人的天堂| 久久久色成人| 国产成人精品久久久久久| 国模一区二区三区四区视频| 大陆偷拍与自拍| 六月丁香七月| 亚洲色图av天堂| 91久久精品国产一区二区成人| 日本熟妇午夜| av国产精品久久久久影院| 不卡视频在线观看欧美| 舔av片在线| 久热这里只有精品99| 欧美少妇被猛烈插入视频| 少妇的逼水好多| 国产黄a三级三级三级人| 国产淫语在线视频| 午夜激情久久久久久久| 老女人水多毛片| 2018国产大陆天天弄谢| 99久久精品一区二区三区| 欧美一区二区亚洲| 亚洲在久久综合| 性插视频无遮挡在线免费观看| 人体艺术视频欧美日本| 丰满少妇做爰视频| 免费av不卡在线播放| 亚洲婷婷狠狠爱综合网| av福利片在线观看| 欧美日韩国产mv在线观看视频 | 人人妻人人澡人人爽人人夜夜| 午夜免费鲁丝| 伊人久久精品亚洲午夜| 啦啦啦中文免费视频观看日本| 搞女人的毛片| 免费观看性生交大片5| 久久久久久久久久成人| 91在线精品国自产拍蜜月| 人妻制服诱惑在线中文字幕| 国产综合精华液| 亚洲欧美日韩东京热| 69人妻影院| 亚洲色图av天堂| 性插视频无遮挡在线免费观看| 人人妻人人澡人人爽人人夜夜| 久久午夜福利片| 精品国产露脸久久av麻豆| 最近手机中文字幕大全| 国内精品美女久久久久久| 国产 精品1| 婷婷色麻豆天堂久久| 国产视频内射| 亚洲天堂av无毛| 国产爽快片一区二区三区| 亚洲国产av新网站| 老女人水多毛片| 国产男人的电影天堂91| 美女被艹到高潮喷水动态| 激情五月婷婷亚洲| 尾随美女入室| 最近中文字幕高清免费大全6| www.av在线官网国产| 18禁在线无遮挡免费观看视频| 99视频精品全部免费 在线| 99久久精品国产国产毛片| 成人毛片60女人毛片免费| 久久精品夜色国产| 街头女战士在线观看网站| 亚洲精品国产av蜜桃| 亚洲伊人久久精品综合| 国产高清三级在线| 一级a做视频免费观看| 国产精品不卡视频一区二区| 亚洲av一区综合| 久久久久国产网址| 美女内射精品一级片tv| 高清午夜精品一区二区三区|