Chongling Zhng,Binduo Xu,Ying Xue,Yiping Ren,b
aCollege of Fisheries,Ocean University of China.216,Fisheries Hall,5 Yushan Road,Qingdao,266003,China
bQingdao National Laboratory for Marine Science and Technology,1 Wenhai Road,Qingdao,266000,China
ABSTRACT
Keywords:
Multispecies survey
Boral
Sampling methods
Estimation methods
Sample size
Field surveys are fundamental for ecology studies,conservation and management plans but often come with considerable costs,expensive,time-consuming,and labor-intensive(Margules&Pressey,2000;Pullin,Knight,Stone,&Charman,2004;Reynolds,Thompson,&Russell,2011;Sutherland,Pullin,Dolman,&Knight,2004),especially surveys in marine ecosystems.It is therefore necessary to develop survey designs that are cost-efficient while addressing research and management objectives(Reynolds et al.,2011;Sanderlin,Block,&Ganey,2014).As such,a tradeoffor prioritization is often needed under limited financial supports,logistics and biological characteristics of target species(Miller,Skalski,&Ianelli,2006;Sanderlin et al.,2014),whereas an integrative framework is still lacking with respect to the multiple facets of survey designs.
Technically,a survey design should be planned with respect to spatial-temporal coverage,sample size,survey frequency and sampling techniques(Cabral&Murta,2004;Kercher,Frieswyk,&Zedler,2003;Statzner,Gore,&Resh,1998).In particular,fisheries surveys need to consider potential gears,the annual frequency and seasonality of samples.This study is focused on two aspects of survey designs,sampling methods and sample size.Sampling methods refer to the geographical configuration of sampling sites,and some classical sampling methods,including simple random sampling,stratified random sampling,and systematic sampling,have long been used in various researchfields(Cadima et al.,2005;Cochran,1946;Neyman,1934).Recently,adaptive sampling designs are proposed as an improved method based on previous knowledge(Smith&Lundy,2006;Thompson&Seber,1996;Yu,Jiao,Su,&Reid,2012).These designs assume statistical independence among sample units and are commonly referred to as“design-based sampling”(Wang,Stein,Gao,&Ge,2012).An alternative approach explicitly accounts for spatial autocorrelation and heterogeneity of sample units,which is called “model-based sampling”.These methods use geostatistical or species distribution models to estimate population abundance and distribution(Petitgas,2001)and have proven satisfactory performances(Liu,Chen,&Cheng,2009).
Unfortunately,none of the sampling methods constantly outperforms the others over diverse contexts,leading to a lack of consensus on choice of sampling methods(Bailey,Hines,Nichols,&MacKenzie,2007;Bijleveld et al.,2012).In fact,numerous studies concluded that survey efficiency varied among different survey objectives,spatial autocorrelation of organisms,and sample sizes(Bijleveld et al.,2012;Jardim&Ribeiro,2007;Liu et al.,2009;Mier&Picquelle,2008),which implied that survey designs were context-dependent and should be evaluated before implementation.In addition,this study highlights that the methods to interpret collected data,i.e.,estimation methods are critical but largely overlooked in surveys.Sample data from“designbased sampling”and “model-based sampling”should be treated differently(Gregoire,1998),while different methods might lead to varying results and distinct conclusions(Dormann et al.,2007).In particular,many fisheries surveys followed fixed designs and defined sample size,in which case the estimation methods were the only improvable approach for surveys.Furthermore,we highlight that the evaluation studies were commonly focused on single species,whereas typical multispecies fisheries surveys have been less studied.
A few pioneer researchers suggested that multispecies surveys required a balance among target species,as the accurate estimation of one species often comes at the cost of others(Carvalho,Gon?alves,Guisan,&Honrado,2016;Manly,Akroyd,&Walshe,2002;Sinclair et al.,2003).For instance,Miller et al.(2006)used a weighted sum of relative variances as the objective to optimize stratified sampling design;Sanderlin et al.(2014)developed a survey program for avian species based on the average accuracy of richness of all species.It should be noted that even in these studies,different species were commonly handled individually,which implicitly assumed independent distributions among species.In fact,species distributions tend to be correlated with each other,as a result of extensive biotic interactions or hidden environmental gradients(Austin,2007;Berlow et al.,2004;Elith&Leathwick,2009;Guisan&Thuiller,2005;Mouquet et al.,2015;Wisz et al.,2013).As such,simulation studies to evaluate survey designs were supposed to account for species associations,which,however,has been limited by the development of operating models.The recently developed “Joint Species Distribution Modelling”(JSDM)approach(Warton et al.,2015)provides an opportunity to address the issues in survey designs for multiple species.These types of models are featured by the capacity to handle multispecies distributions,environmental effects and species associations simultaneously(Clark,Nemergut,Seyednasrollah,Turner,&Zhang,2017;Harris,2015;Hui,2016;Ovaskainen et al.,2017;Pollock et al.,2014;Rizopoulos,2006;Thorson et al.,2015).
We developed one JSDM based on a pilot survey in North Yellow Sea,China,and used it as the operating model to simulate the spatial distribution of multiple species simultaneously.Simulation data were generated from the model and used to simulate the process of surveys to estimate multispecies abundance.We conducted a variety of simulation scenarios with different sampling designs,estimation methods,and sample sizes.This study aims to present an integrative framework of survey evaluation for multiple species and contribute to efficient fisheries surveys.
We chose the “boral”model(Hui,Taskinen,Pledger,Foster,&Warton,2015)among diverse candidate JSDMs as the operating model considering its usability(C.Zhang,Chen,Xu,Xue,&Ren,2018).Boral(Bayesian ordination and regression analysis,version 1.4)is a modelbased approach to describe multispecies responses to environmental variables(EVs)while accounting for residual correlations among species.The model can handle a variety of error distributions,including normal,Bernoulli,Poisson,negative binomial,Gamma,Tweedie,and log-normal distributions for different types of response variables(Hui,2016).Boral uses a latent variable(LV)approach to tackle multispecies associations parsimoniously(Skrondal&Rabe-Hesketh,2004;Warton et al.,2015).LVs are introduced as unobserved predictors in the form of random effects resulting from either biotic interactions or unmeasured environmental variables(EVs),i.e.,
where g(.)is the link function,μijdenotes the abundance of species j at sampling site i,and β0jrepresents species-specific intercepts.Xiand βjdenote site-specific environmental values and species-specific regression coefficients of EVs;Ziand λjdenote the values and the regression coefficients of LVs,respectively.Ziare treated as random effects and need to be estimated together with λj.The LV approach reduces the number of parameters to track species correlations and allows diverse hierarchical structure in modelling(Ovaskainen et al.,2017;Warton et al.,2015).It should be noted that boral does not account for the spatial structure of LVs in modeling(Ovaskainen et al.,2017;Thorson,Shelton,Ward,&Skaug,2015).The model fitting is computation-intensive,implemented with the Markov Chain Monte Carlo algorithm via JAGS(Plummer,2003).The model is implemented using R package“boral”(Hui,2016).
The boral model was implemented based on the fish community observed during a bottom trawl survey conducted in the coastal water of north Yellow Sea,China in the winter 2016(Supplementary materials,Fig.S1).An otter trawl,with a width of 15m and cod-end mesh size of 20mm,was used in the survey.The trawls were towed for 1h at a speed of 3 knots targeting 114 sampling sites,and environmental variables including temperature,salinity,and depth of bottom water were measured using CTD(XR-420)at the same sites following the hauls.A total of 82 fish species were identified in the survey,and rare species,with occurrence frequency less than 5%,were excluded to avoid the instability in model fitting.This resulted in 48 species,accounting for 99%abundance of the community,being retained.
The abundance of 48 common fish species was used to fit the boral model,along with temperature,salinity and depth as explanatory variables.The model was specified with Tweedie distribution because of a large amount of zeros(Candy,2004),and the number of LVs was determined by the minimum deviance information criterion(DIC),which suggested three LVs fit best in this study.We added quadratic terms for explanatory variables to address potential unimodal relationships.A “stochastic search variable selection”method(SSVS)was used for variable selection(George&McCulloch,1993),based on the posterior probability of a variable being included in the model.
The boral model was used for data generation to project the multispecies distribution throughout the study area.The simulation data followed the same Tweedie distribution as the fitted model.The spatial resolution of simulation should be accommodated to the area swept by typical trawl surveys.However,we simplified the resolution to regular spatial grids of 1.0 nautical mile,which resulted in 54976 spatial cells in the study area(Supplementary materials,Fig.S4).The values of EVs and LVs were spatially interpolated onto the same spatial resolution(Supplementary materials,Fig.S6),using the universal kriging method(Oliver&Webster,1990).It should be noted that neither EVs nor LVs were necessarily spatially autocorrelated.The spatial interpolation was a compromise to generate simulation data with sufficient spatial resolution and the strategy was accordant to spatially structured JSDMs(Ovaskainen et al.,2017;Thorson et al.,2015).
We examined five sampling methods,including three classical and two less prevalent ones.Classical methods included random sampling(RDS),stratified random sampling(SRS)and systematic sampling(SYS)(Cadima et al.,2005;William Gemmell Cochran,1977).The details of them were shown in the supplementary materials.In addition,we considered two spatially balanced sampling designs,generalized random-tessellation stratified sampling(GRT)and spatial coverage sampling(SPC)(Stevens&Olsen,2004;Walvoort,Brus,&de Gruijter,2010).GRT aims to balance the randomness and spatial coverage of sampling by transferring two-dimensional space into one dimension using a quadrant-recursive function(Stevens&Olsen,2004),which preserves the spatial relationship or “order”of sampling points into a line.SPC emphasizes the importance of equal spatial coverage(Brus,Sp?tjens,&De Gruijter,1999;van Groenigen&Stein,1998)and allocates sampling sites as uniformly as possible by minimizing the mean of shortest distances(MSSD)among them.A k-means clustering algorithm was used to optimize the objective function of MSSD from compact geographical strata(Walvoort et al.,2010).The stratification in SRS,GRT and SPC followed the same configuration,based upon water depth and latitudes in the survey area of Yellow Sea(the five survey designs were exemplified in the Supplementary materials,Fig.S5).The GRT design was implemented using the R package “spsurvey”(version 3.3),and the SPC design was implemented using “spcosa”(version 0.3-6).The packages are available on CRAN(https://cran.r-project.org/).
In addition,we examined the effects of sample size on the performances of survey designs,as it was well established that this factor could impose substantial influences on the precision of sampling and accuracy of resultant models(Stockwell&Peterson,2002;Wisz et al.,2008).We simulated different sample sizes by randomly drawing a number of sampling sites,with 10 levels ranging from 30 to 300,from the simulation data.
We adopted different estimation methods following the rationale of design-and model-based approaches,respectively.In the design-based approach,arithmetic means of survey data were used as estimates;and in the model-based approach,spatial and environmental information were used to build models and derive estimates(Gregoire,1998;Petitgas,2001).This study considered five estimation methods that differed substantially in assumptions and mathematics:
(1)Arithmetic mean of sample abundance(Arm),following the designbased approach.
(2)Universal kriging(Ukr),which assumed that the variation of data was spatially structured following a variogram(Oliver&Webster,1990).The estimates of species abundance were derived from spatial interpolation based on the variogram.
(3)Multispecies distribution model(Mvd),which used common regression models to estimate the spatial distribution of multiple species individualistically(Bahn&Mcgill,2007).We adopted the boral framework to build the multispecies regression model and eliminated species associations by removing LVs,resulting in multiple GLMs essentially.
(4)Boral model(Brm),which was implemented with the same number of LVs as in the operating model.The values of LVs were estimated for each sampling site,whereas spatial interpolation was needed for abundance estimation,as the LVs were unknown for unobserved sampling sites.The interpolation was conducted by marginalizing over the LVs via Monte-Carlo integration(Hui,2016)(details were shown in the reference manual of “boral”package).
With the estimation methods,spatial prediction was conducted throughout the survey area,i.e.,all grid cells,and the results were used to calculate average abundance for each species.The Ukr was implemented using the R package “automap”(version 1.0-14),and the others were implemented using the same “boral”package.
The processes of data generation,sampling and estimation were repeated 200 times for each scenario.RAB was calculated for every species in the scenarios of different sampling designs,sample sizes and estimation methods.Survey performanceswere species-specific,whereas multispecies surveys should be optimized for all species instead of for each species individually.Therefore,we used regression analyses to delineate the effects of“sampling”,“sample size”and “estimation”as well as their interactions,by treating “species”as a controlled covariate.That is,the regression model was fitted with four explanatory variables,RAB=f(sampling,size,estimation,species),in which the explanatory variables were specified as factors.The interacting effects among sampling,size and estimation were also examined within the regression framework.The processes of simulation and analyses were illustrated via a flowchart(Fig.1).
Fig.1.The flowchart of model building,data generation,survey simulation and evaluation in this study.
Fig.2.Performance measuresofmultispecies survey with respectto speciesand sampling methods.(a)showed relative absolute bias(RAB)of estimates for each species,and(b)showed effects of sampling methods over all species using a regression analysis.The species in(a)were ordered for better display which was generally consistent with their prevalence.The results were derived from a simulated sample size of 100 with the estimating method of arithmetic mean(Arm).
The simulation results of RAB showed substantial differences among species,the effects of which overwhelmed that of sampling methods(Fig.2).Species as an explanatory factor accounted for 98.1%of total variance in RAB(using ANOVA),and the effects of sampling methods were trivial,accounting for 0.4%of total variance(both factors with p<0.01).The estimation of less prevalent species tended to be less precise(Fig.2a).Over all,GRT and SYS showed the best performances among the sampling methods,followed by SPC,and RDS and SRS yielded the least precision(Fig.2b).
Using the results from GRT sampling,we examined the effects of estimation methods on survey performances.The results showed overwhelming species effects(Fig.3a),accounting for 88%of total variance of RAB,and the estimation methods explained 8%total variance(p<0.01,ANOVA).Over all,Mvd and Arm showed satisfactory performances,followed by Ukr.Brm yielded the least precision and much larger RAB compared to other methods(Fig.3b).
Although GRT and Mvd showed superior performances in our sutdy,their combination did not necessarily lead to the optimal choice as a result of an interactive effect between sampling and estimation methods.ANOVA on RAB showed that after accounting for the effects of species,51%of the remaining variance were explained by estimation methods,7.7%by sampling methods,and 5.7%by their interactions.The interactions implied that sampling and estimation methods should be considered together,in particular,Arm worked best with GRT,Mvd matched with SYS,and Ukr with SPC(Table 1).On the contrary,the combination of RDS and Ukr yielded an adverse result.
Table 1The interactive effects of sampling methods and estimating methods on survey precision.The values denoted coefficients derived from the linear regression of RAB(relative absolute bias)on sampling methods and estimating methods,using species as a controlled covariate.
Sample size showed substantial influence on the precision of survey estimations.Generally with increasing sample size(log scale),the values of RAB tended to decrease linearly.With a small sample size(N=30),the combination of SPC and Mvd outperformed other sampling and estimation methods(Fig.4).With a larger sample size,the performance of different methods tended to converge,except Brm and the combination of Ukr with SRS or RDS.With very large sample size(N=300),GRT and SYS slightly outperformed SPC,and Arm,Mvd and Ukr yielded similar results.ANOVA showed significant pairwise interactions between sample size,sampling methods,and estimation methods,but not among the three factors(p=0.45).
Fig.3.The precision of multispecies survey with respectto speciesand estimating methods.(a)showed the relative absolute bias(RAB)of estimates for each species,and (b)showed effects ofestimating methods over all species using a regression analysis.The results were derived from a simulated sample size of 100 with the sampling method ofGRT (generalized random-tessellation stratified sampling).
Fig.4.The variation of survey precision with respect to a range of combinations of sample size,sampling and estimating methods.Sampling methods are denoted in different panels and estimating methods are denoted with colored lines.RAB represents the relative absolute bias.
Table 2The efficiency of increasing sample size to improve survey precision.The values denoted the relative changing rate of relative absolute bias(RAB)with the doubling of sample sizes,using the combination of Arm and RDS as the reference level.
To quantify the efficiency of increasing sample sizes to improve surveys,we conducted a linear regression between RAB and sample size(in a log2 scale).This analysis measured the changing rate of RAB by regression coefficients,and the effects of sampling and estimation methods were examined using their interaction terms with sample size.For example,based on the combination of RDS and Arm,we derived a regression coefficient of-0.097,meaning that on average RAB decreased by nearly 0.1 for every doubling of sample size from 30 to 300.The decreasing rates varied substantial among species but slightly among different combinations of sampling and estimating methods.SYS and Mvd was the most efficient sampling and estimation methods in utilizing additional samples(Table 2).
Although the importance of sampling design has been widely acknowledged in a large body of literature,survey procedures in practices are often conducted with little considerations of accuracy(Mier&Picquelle,2008;Yu et al.,2012).The current situation can be largely attributed to the inconsistent performances of sampling designs with respect to different survey objectives,sample size and spatial structure of the population.Therefore,it is critical to develop an integrative framework for the evaluation of survey designs.As a preliminary step to this framework,this study systematically evaluate a range of survey designs in NYS with a JSDM.We demonstrate the selection of sampling methods,estimation methods and sample sizes with the objective to estimate multispecies abundance.Additionally,alternative survey objectives and statistical methods can be easily incorporated into this framework.
In general,the present study favored spatial-coverage sampling methods,SYS,GRT and SPC,over randomized sampling SRS and RDS.The results might be attributed to the autocorrelated spatial structure of species abundance distribution.Specifically,the environmental gradients and species interactions underlying species distributions were characterized by a range of variogram models in our simulation,which implied substantial similarity of species composition among nearby sample sites.The randomized sampling methods easily resulted in clustered sampling sites,which tended to be similar to each other and less representative.On the other hand,the spatial coverage of SYS,GRT and SPC might yield less correlated samples and benefit spatial interpolations for abundance estimates(Bijleveld et al.,2012;Cao,Chen,Chang,&Chen,2014;Stevens&Olsen,2004;Wang et al.,2012).Accordingly,we recommended the use of SYS and GRT for surveys of multispecies abundance,and particularly SPC when sample size was small.
It should be noted that model-based estimation methods considering environmental gradients and spatial autocorrelation might not necessarily outperform simple methods such as Arm.In fact,our results suggested that Arm yielded better estimates than Brm and Ukr in most scenarios,inferior slightly to Mvd in small sample size.This result was different from previous studies,which suggested spatially explicit models often outperformed traditional models(Bahn&Mcgill,2007;Dormann et al.,2007).The divergence might be due to different objectives and criteria,i.e.,this study was focused on the average abundance throughout the study area,whereas previous studies concerned the prediction of site-level species distribution.As such,an estimation method that tended to underestimate large values and overestimate small values could still provide reasonable estimates for regional average.In addition,uncertainty of the estimation models themselves might also contribute to the undesirable performances,as complex models involved increasing parameters,and errors of parameters could propagate to survey estimates.This fact was evidenced by the comparison between Mvd and Brm.Using the same mathematical framework,the estimation methods were supposed to be improved by including extra information of LVs(Ovaskainen et al.,2017;Warton et al.,2015);however,the results were the opposite(in addition,a spatialized Brm with the same LVs as in the operating model showed less desirable results in our preliminary analyses and was not shown in this study).As random effects,the LVs required many parameters be estimated,leading to increased risk of overfitting especially when many species contained sparse data(Ovaskainen et al.,2017;Ovaskainen&Soininen,2011).Besides,the marginalization and interpolation of LVs in boral model introduced further uncertainty.We conclude that simple models or a non-modelling approach should be sufficient for regional abundance estimations,but deliberate models should be favored for spatial distribution predictions.
Our stimulation demonstrated that doubling sample size would lead to a linear decrease of RAB with a rate of 0.1 on average.Besides,the changing rates varied significantly among species(p<0.01,result not shown),large for rare species and small for prevalent species.The loglinear relationship suggested that substantially increasing sample size might not be cost-efficient except for very rare species.As such,a balance between the decreasing RAB and the increasing cost could be used for the design of survey programs.Nevertheless,it should be noted that this study was focused on the estimation of species abundance.When considering different research objectives,such as species distribution modeling,species-specific mapping,and population composition and life-history traits estimating(Liu et al.,2009;Miller et al.,2006;Stockwell&Peterson,2002;Wisz et al.,2008),the effect of sample size might be different.We emphasize that survey designs should be optimized with respect to varying research objectives(Müller,2007).
The results of this study reflected the case of a specific marine ecosystem,and the operating model for simulation was fitted with data of a short temporal scope.In this sense,some conclusion may not hold for other ecosystems or with improved collections of data.Meanwhile,we emphasize that for the purpose of evaluating survey designs,the spatial pattern,rather than detailed distribution,is crucial,in which case the requirement for prediction accuracy may be relieved to certain degree.Despite this,we highlight that our study would be informative for developing a systematic evaluation framework in future studies.In particular,the JSDM-based operating model is recommended.Although many statistical models can properly handle spatial structures of species distributions(Bahn&Mcgill,2007;Dormann et al.,2007),such as spatial autocorrelation(Yu et al.,2012),environmental gradients(Cao et al.,2014)or a combination of the two(Liu et al.,2009),there is no guarantee that the models cover all critical abiotic/biotic variables.In this sense,JSDM provided a more realistic approach for community simulation by incorporating unobserved/unobservable predictors and potential biotic interactions(Warton et al.,2015;Zhang&Ren,2019),although sufficient data would be particularly important for building such operating models to mitigate overfitting(Auger-Méthé et al.,2016;Ovaskainen et al.,2017).In addition,JSDMs as well as general species distribution models reflect only a snapshot of biotic communities(Elith&Leathwick,2009;Guisan&Thuiller,2005),whereas species distributions vary over time as a result of habitat selection,aggregation and dispersal behaviours.The role of temporal variations in populations has commonly been ignored in survey designs,and a thorough consideration of the spatial and temporal dynamics of species distributions should be included in future studies(Cabral&Murta,2004;Thorson&Barnett,2017;Thorson et al.,2016).
Acknowledgements
This study is supported by the National Natural Science Foundation of China(31802301,31772852).We thank anonymous reviewers for constructive comments that help improve the earlier versions of this manuscript.
Appendix A.Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.aaf.2019.11.002.
Aquaculture and Fisheries2020年3期