Data Selection Using Support Vector Regression

2015-02-24 03:39:21MichaelRICHMANLanceLESLIETheodoreTRAFALISandHichamMANSOURI

Advances in Atmospheric Sciences 2015年3期

Michael B.RICHMAN,Lance M.LESLIE,Theodore B.TRAFALIS,and Hicham MANSOURI

1School of Meteorology and Cooperative Institute for Mesoscale Meteorological Studies, University of Oklahoma,Norman,Oklahoma,73072,USA

2School of Industrial and Systems Engineering,University of Oklahoma,Norman,Oklahoma,73019,USA

3Power Costs,Inc.,301David L.Boren Blvd.,Suite2000,Norman,Oklahoma73072,USA

Data Selection Using Support Vector Regression

Michael B.RICHMAN?1,Lance M.LESLIE1,Theodore B.TRAFALIS2,and Hicham MANSOURI3

1School of Meteorology and Cooperative Institute for Mesoscale Meteorological Studies, University of Oklahoma,Norman,Oklahoma,73072,USA

2School of Industrial and Systems Engineering,University of Oklahoma,Norman,Oklahoma,73019,USA

3Power Costs,Inc.,301David L.Boren Blvd.,Suite2000,Norman,Oklahoma73072,USA

Geophysical data sets are growing at an ever-increasing rate,requiring computationally effcient data selection(thinning) methods to preserve essential information.Satellites,such as WindSat,provide large data sets for assessing the accuracy and computational effciency of data selection techniques.A new data thinning technique,based on support vector regression (SVR),is developed and tested.To manage large on-line satellite data streams,observations from WindSat are formed into subsets byVoronoi tessellationandthen eachisthinned bySVR(TSVR).ThrExperiments areperformed.Thefrstconfrms the viability of TSVR for a relatively small sample,comparing it to several commonly used data thinning methods(random selection,averaging and Barnes fltering),producing a 10%thinning rate(90%data reduction),low mean absolute errors (MAE)and large correlations with the original data.A second experiment,using a larger dataset,shows TSVR retrievals with MAE＜1 m s-1and correlations≥0.98.TSVR was an order of magnitude faster than the commonly used thinning methods.A third experiment applies a two-stage pipeline to TSVR,to accommodate online data.The pipeline subsets reconstruct the wind feld with the same accuracy as the second experiment,is an order of magnitude faster than the nonpipeline TSVR.Therefore,pipeline TSVR is two orders of magnitude faster than commonly used thinning methods that ingest the entire data set.This study demonstrates that TSVR pipeline thinning is an accurate and computationally effcient alternative to commonly used data selection techniques.

dataselection,datathinning,machinelearning,support vector regression,Voronoi tessellation,pipelinemethods

1. Introduction

The quantity of geophysical data is increasing at a rapid rate.Hence,it is essential to identify and/or select features that preserve relevant information in the data.Data selection has as its two main aims the removal of redundant and faulty data.Here,the emphasis is on redundant data,so the terms data selection and data thinning will be used interchangeably. Redundant data arise from two main sources:when the data density is greater than the spatial and temporal resolution of the analysis grid and when the data are not linearly independent.Penalties for retaining redundant data are the(possibly massive)increase in computational cost,the failure to satisfy key assumptions of the data analysis scheme(Lorenc,1981) andtheincreasedriskofoverftting(particularlyforproblems with high dimensions).

The need for data selection is exemplifed by satellite ob-tors of observations to the data selection process and,hence, to the analysis.Notably,satellites providehigh-resolutionobservations over data poor regions,especially the oceans and sparsely populated land areas.Historically,data redundancy issues led to the development of data selection approaches that were simple and cost effective.These included:allocating the observations to geographical grid boxes and then averaging the data in each box to produce so-called superobservations,or“superobs”(Lorenc,1981;Purser et al., 2000);the selection of observations,in both meridional and zonal directions,with random sampling of the observations (Bondarenko et al.,2007);and the use of flters,such as the Barnesscheme(Barnes,1964).Owingtotheirsimplicity,and because they are non-adaptive,such strategies are referred to as unintelligent data selection techniques.For example,they do not specify targeted areas of interest or weight the data according to their contribution to minimizing differences between the thinned and non-thinned data.

Recently,variousintelligentdata selection strategies have emerged(e.g.,Lazarus et al.,2010).Such approaches are effective in identifying and removing redundant data and haveother desirable features.One example is the Density Adjusted Data Thinning(DADT;Ochotta et al.,2005;2007), and its successor,the modifed DADT(mDADT;Lazarus et al.,2010).The intelligent data selection schemes are adaptive,as they attempt to retain those observations that are less highly correlated with other observations,but contribute more signifcantly to the retention of the information content in the observations(e.g.,theyemploy metrics based on gradients and/or curvature of the felds).Intelligent data selection schemes usually require defnitions of redundancy measures, and their sampling strategies iteratively remove observations that fail to meet the metric threshold criteria.

The present work develops an entirely different,kernelbased,intelligentdataselection techniqueusingSupportVector Machines(SVMs).SVMs require neithera priorispecifcation of metrics nor of thinningrates.SVMs are alternatives to artifcial neural networks,decision trees and Bayesian networks for classifcation and prediction tasks(Sch¨olkopf and Smola,2002)used in supervised learning,such as statistical classifcation and regression analysis.Although SVMs were introduced several decades ago(Vapnik,1982),they have been investigated extensively by the machine learning community only since the mid-1990s(Shawe-Taylor and Cristianini,2004).

SVMs require solving a quadratic programming problem with linear constraints.Therefore,the speed of the algorithm is a function of the number of observations(data points)used during the training period.Hence,the SVM solution to problems comprised of numerous data points is computationally ineffcient.Several methods have been proposed to ameliorate this problem.Platt(1999)applied Sequential Minimal Optimization(SMO),to break the large quadratic programmingproblem into a series of smallest analytically solvable problems.A faster SMO SVM algorithm,advantageous for real-time or online prediction or classifcation for large scale problems,was suggested by Bottou and LeCun(2004). Musicant and Mangasarian(2000)applied a linear program SVM method to accommodate very large datasets.Bak?r et al.(2004)selectively removed data using probabilistic estimates,without modifyingthe location of the decision boundary.Other techniques used online training to reduce the impact of large data sets.Bottou and LeCun(2005)showed that performing a single epoch of an online algorithm converges to the solution of the learning problem.Laskov et al.(2006) developincremental SVM learning with the aim of providing a fast,numerically stable and robust implementation.Support Vector Regression(SVR)uses the kernel approach from SVM to replace the inner product in regression.It is discussed extensively by Smola and Sch¨olkopf(1998).SVM techniques have been applied to small-scale meteorological applications,such as rainfall and diagnostic analysis felds supporting tornado outbreaks.These include the studies of Son et al.(2005),Santosa et al.(2005),Trafalis et al.(2005), and in satellite data retrievals,by Wei and Roan(2012).The present study seeks to further enhance SVR in two respects: (1)by applying a Voronoi tessellation(Bowyer,1981)to reduce the size of the large observational data sets and,(2) adopting a pipeline methodology(Quinn,2004)to improve the computational effciency of the data selection scheme.

In section 2,large-scale problems using satellite datasets are described.In section 3,it is shown how Voronoi tessellation reduces the size of the large observational data sets,and how a pipeline SVM methodologysubstantially enhances the computational effciency of the data selection scheme.The results are presented in section 4.Finally,conclusions are discussed in section 5.

2. Data

This study employs data from the WindSat microwave polarimetric radiometry sensor(Gaiser et al.,2004).Wind-Sat provides environmental data products,including latitude, longitude,cloud liquid water,column integrated precipitable water,rain rate,and sea surface temperature.WindSat measurements over the ocean are used operationally to generate analysis felds and also as input to numerical weather prediction models of the U.S.Navy,the U.S.National Oceanic and Atmospheric Administration(NOAA)and the United Kingdom Meteorological Offce.As a polarimetric radiometer, WindSat measures not only the principal polarizations(vertical and horizontal),but also the cross-correlation of the vertical and horizontal polarizations.The cross-correlation terms represent the third and fourth parameters of the modifed Stokes vector(Gaiser et al.,2004).The Stokes vector provides a full characterization of the electromagnetic signature of the ocean surface and the independent information needed to uniquely determine the wind direction(Chang et al.,1997).

To illustrate the data selection procedure introduced herein,it suffces to explore a single data type,namely,sea surface wind(SSW)speeds and directions.For SSW data,it is necessary to account not only for random errors but also for spatially correlated errors.Typical ascending swaths for a 24 hour sample of WindSat data provide～1.5 million observations.Given this massive number of data points,oversampling of wind data can severely degrade the analysis and, consequently,the model forecasts.

ThrExperimentswere carried out using differentWind-Sat datasets.The frst experiment was designed to assess,on a relatively small sample,the accuracy and computational effciency of a Voronoi tessellation followed by SVR to thin the WindSat data.Hereafter,this sequential combination of Voronoi tessellation followed by SVR will be referred to“TSVR”.Two hours of WindSat data from 1 January 2005 were chosen in the region 127°W to 145°E longitude and 23°to 42°N latitude,providing 13 540 observations for the data selection process.Additionally,TSVR was compared to three commonly used data thinning techniques(simple averaging,random selection and a Barnes flter)to assess the relative accuracy and computational effciency of each method. A second experiment used 226393 observations to determine if the accuracy and computational effciency gains by TSVR were preserved with a much larger dataset.The third experi-ment employs a pipeline methodology(section 3.3)as it has been employed successfully to achieve much higher computational effciency(e.g.,Ragothaman et al.,2014).Such an approach is expected to enhance real-time processing of an on-line stream of WindSat data.

3. Learning Machine Methodologies

3.1.Voronoi Tessellation

Experiments show that the standard SVR algorithm loses computational effciency when analyzing more than several thousand observations(Platt,1999).Since the WindSat data sets used in this study are in excess of this,and can exceed 106observations,direct application of SVR is not feasible. Methods have been proposed to reduce this problem(e.g., Platt,1999;Musicant and Mangasarian,2000).Voronoi tessellation partitions a plane withppoints into convex polygons such that each polygon contains exactly one generating point and every point in a given polygon is closer to its generating point than to any other.The cells are called polytopes (e.g.,Voronoi polygons).They were employed by Voronoi (1908)and have been applied in diverse felds,such as computer graphics,epidemiology,geology,and meteorology.As shown in Fig.1,the tessellation is achieved by allocating the data points to a number of Voronoi cells(Du et al.,1999; Mansouri et al.,2007;Gilbert and Trafalis,2009;Helms and Hart,2013).The process uses the Matlab“voronoi”function (Matlab,2012).

As mentioned above,for a discrete set,S,of points in Rnand for any pointx,there is one point ofSclosest tox.More formally,letXbeaspace(andSa nonemptysubset ofX)providedwithadistancefunction,d.LetC,anonemptysubsetofX,be a set ofpcentroids(Pc),c∈[1,p].The Voronoi cell,or Voronoi region,Vc,associated with the centroidPcis the set ofall pointsinXwhosedistancetoPcis notgreaterthantheir distance to the other centroids,Pj,wherejis any index different fromc.That is ifD(x,A)=inf{d(x,a)|a∈A}denotes the distance between the pointxand the subsetA,thenVc={x∈X|d(x,Pc)≤d(x,Pj),for allj/=c}.

In general,the set of all points closer toPc,than to any other point ofS,is called the Voronoi cell forPc.The set of such polytopes is the Voronoi tessellation corresponding to the setS.In two dimensional space,a Voronoi tessellation can be represented as shown in Fig.1.Since the number of data points inside each Voronoi polygonis much less than for the full data set,the computational time is reduced greatly. Moreover,further effciency can be gained by using parallel computing,solvingasetofVoronoipolygonssimultaneously.

3.2.Support Vector Regression

In SVR,it is assumed that there is a data sourceproviding a sequence oflobservations and no distributional assumptions are made.Each observation(data point)is represented as a vector with a fnite numbernof continuous and/or discrete variables that can be denoted as a point in the Euclidean space,Rn.Hence,thelobservations are data points in the Euclidean space Rn.

Thelobservations are divided intopcells using Voronoi tessellation.The methodology consists of making eachkth observation a seed or“centroid”for a Voronoi cellVc,?c∈ [1,p].The parameterkis set such thatHence,for a largerk,fewer cells will be generated.Each cellVcwill be composed of data points represented byxi,c∈Rn,?i∈[1,l]. In regression problems,each observationxi,cis related to a unique real valued scalar target denoted byyi,c.The couplets (xi,c,yi,c)in Rn+1are a set of points that have a continuous unknown shape that is not assumed to follow a known distribution.The objective of support vector regression(SVR)is to fnd a machine learning prediction function(in our application,this is an estimation at a particular time,t,rather than a forecast at timet+?t),denoted byfcfor each cellVcsuch thatthedifferencesbetweenfc(xi,c)andthetargetvalues,yi,c, are minimized.

In the present study,the target is either theu-or thevcomponent of the winds.By introducing,for each observationxi,c,a set of positive slack variables,ξi,c,which are minimized,the following set of constraints for the regression problems are generated for each cellVc:

For linear regression,in the SVM literature,fcbelongs to a class of functions denoted byF,such that:

wherebcis the bias term,Bc＞0 is a constant that boundsthe weight space,wc=αj,c,xj,c,andαj,c∈R?j∈[1,l].

In the case of nonlinear regression,the class of functions,F,is changedto allowforlinearregressionin Hilbertspaceto where the observationsxi,cwill be mapped.This is achieved by introducing a nonnegative defnite kernelk:Rn×Rn→R,toinducea newHilbertspaceHandmap?:Rn→Hsuchthatk(x,y)=〈?(x),?(y)〉Hfor anyxandyin Rn.Hence,Fbecomes:

wherewc=αj,c?(xj,c),andαj,c∈R?j∈[1,l].Explicit knowledge ofHand?is not required.Therefore,the set of constraints Eq.(1)becomes:

SVM allows for an objective function that reduces the slack variables and the expected value of|fc(xi,c)-yi,c|.To achieve that objective,minimize the quantitiesbc,ξi,c,and‖wc‖H.

whereC＞0 is a positive trade-off constant that penalizes the non-zero values of theξi,c,1is al×1 vector of ones,andycis the vector with elementsyi,c.

The optimal solution()of Eq.(5)yields the following prediction function:

The vectors forxi,cwhich the values ofαi,care nonzero are called support vectors.

From Eq.(3)a kernel is required.In this work,several kernelswere tested fortheir ability to select a smaller number of observations with a minimum loss of information.Those tested were:

the linear kernel,

the radial basis function kernel(RBF),

the polynomial kernel of degreeq,

and the sigmoidal kernel,

whereσ,g,aandθare scaling constants.

3.3.Pipeline TSVR

To improve the effciency of the TSVR,a pipeline methodology(Quinn,2004)is introduced to allow for an online stream of meteorological satellite data.The pipeline approach is appropriate for such data because the satellite samples a swath of new wind data as it orbits.Within each Voronoi polygon,the pipeline is applied to the variables used to estimate the winds by TSVR.A two-stage pipeline(with 50%overlap as shown in Fig.2)is applied that fetches and preprocesses new data as old data is executing in the CPU. Figure 2 illustrates the pipeline,showing that the orbital swath is divided into discrete steps and how these new data are incorporated into the TSVR process.Figure 2 shows the pipeline window of width of four CPU time units,ingesting the data set.At each step,the most recent data are included in the window,while the oldest data are released.Next,the window moves to the right by one-half step.Hence,instead of thinning all the data within a window,the cells outside the window are dropped and new Voronoi cells are formed that contain only the new data.If this overlapping approach were not adopted,the data would have to be ingested,preprocessed and analyzed prior to moving on to the next batch of data,thereby reducing the effciency of the process.

3.4.Measures of differences between non-thinned and thinned data

Mean squared differences(commonly referred to as MSE),mean absolute differences(MAE),as well as the correlationbetween the original(non-thinned)and thinnedsatellite observed winds are employed to measure the quality of the thinned observations.MSE,MAE and correlations are defned in Wilks(2011).These are commonly applied metrics to measure differences between two felds.

4. Results

4.1.Results of the frst experiment

Table 1.MAE,MSE and correlation metrics comparing the differences between observed and thinnedu-andv-components of wind for different SVR kernels.The kernel selected(RBF 1)is in bold font.

The main objective of this experiment is to assess the feasibility of the TSVR,and to determine the most effective kernel,using a small sample(13 540 observations)of WindSat data.Support vectors are used for the reproduction of the wind feld after data selection.Because of the intelligent adaptive capability of the TSVR,fewer than 8% of the observed satellite data were needed to reconstruct the wind feld.To quantify the accuracy of the reconstructed winds using TSVR,the thinned winds are compared to the non-thinned observations.From Eq.(3),a kernel must be selected to generate the support vectors and reconstructs the wind felds.Table 1 shows metrics(MSE,MAE and correlations)for the kernels defned in section 3.1.The various kernels tested were:linear;seven radial basis functions with theσparameter varying from 0.5 to 100;polynomials withg=1 and of orders(q)2 and 3;and sigmoidal with the two scale parameters(a,θ)set to 1.The smallest differences between thinned and non-thinned wind data were obtained for the RBF kernel,with au-component MAE(MSE)of 1.05 m s-1(5.99 m2s-2),which are 44%(53%)reductions in the discrepancies,respectively,obtained from any non-RBF kernel.For thev-component,the corresponding reductions for the RBF kernel,compared to a non-RBF kernel,were even larger at 63%(65%).The variances explained(correlations squared)are 82.8%and 96.0%for theu-andv-components, representing improvements of 33%and 6%,respectively, over any non-RBF kernel.Therefore,the RBF kernel with parameter 1 is used for all subsequent TSVR analyses.

Figure 3 shows frequency counts of the reconstructed wind errors for the 13540 observations thinned by TSVR. For theu-component(Fig.3a),77%(87%)of the discrepancies of the magnitudes are≤1 m s-1(2 m s-1),which is at or below the accepted observation error for these data (Quilfen et al.,2007).Similar discrepancies were found for thev-component(Fig.3b).Both distributions are highly leptokurtic,illustrating the effcacy of TSVR.Figure 4 presents the thinned(Figs.4a,c)andnon-thinned(Figs.4b,d)satellite wind feld contours for theu-andv-components.The close spatial correspondence of the patterns for each component is consistent with the large positive correlations in Table 1 for the RBF 1 kernel.

For the present problem,most of the support vectors have alphavaluesnearzero(Fig.5),thustheyhaveaninsignifcant contribution to the fnal solution.From Eq.(6),those support vectors with zero or near-zero alpha values are ignored,providing further data reduction.For the present analysis,Figure 5 illustrates the large data reduction capability of SVR for these data.From the available 13540 data points,only～1000 support vectors(＜8%)are required to reconstruct the wind vector feld with the aforementioned high level of accuracy.Specifcally,foreach Voronoicell,the satellite data points inside each cell are used to train the SVR.Fewer than 8%ofthe observationswere supportvectorsand are retained; therefore,the thinningrate is＞92%.The＜8%support vectors had an MAE of 0,the MAE of the other＞92%datapoints was calculated using only＜8%of the support vectors.Since the percentage of support vectors is a function of the complexity of the data feld,it will vary according to the spatial and temporal data structure.

Given the large data reduction and high level of accuracy in reproducing the wind felds provided by TSVR,as found in section 4.1,a considerably larger sample(226393 data points)was drawn to assess the scalability of TSVR and to compare it to several commonly used data thinning techniques.For these commonly used techniques,the observations were assigned to cells ofhdegrees latitude and longitude.For random sampling,a single observation was selected.For the other schemes,all data were used.The accuracy of these data selection methods is shown in Figs.6a-d (MAE,MSE)and Fig.7(correlation).The MAE for theucomponent(Fig.6a)shows that,as the width of the data cells decreases,the discrepancies decrease for both averaging and random selection.The accuracy of Barnes fltering improves as the cells decrease in size and reaches a minimum at a cell width of approximately 0.7 degrees;beyond that,insuffcient data density produces increasingly inaccurate results. As the Voronoi tessellation is applied to TSVR,the cells do not change and hence the accuracy remains constant.For thev-component(Fig.6b),similar behavior is noted for all techniques.TSVR is the most accurate thinning technique with MAE～0.5 m s-1.The MSE values(Figs.6c,d)are larger than the corresponding MAE values;however,the ranking of the techniques remains the same,with the random sampling being least accurate,averaging and Barnes giving similar results and the TSVR producing the most accurate thinning.The correlation between the thinned and non-thinned winds is calculated for the same data selection methods(Fig. 7).As the cell width decreases,the correlations for the ucomponents,given by the three commonly used techniques move closer to the TSVR value,but never exceed it.Despite these large correlations at small cell widths,the larger MAE andMSEofthethreecommonlyusedtechniquesindicateless accurate thinning for those methods.Thev-component correlations for the other methods are considerably lower than those for TSVR(Fig.7).Moreover,the high correlations obtained with the three commonly used data selection methods is achieved at the expense of a loss of computational effciency(Fig.8),as the TSVR requires approximately250 secondsto thinthese dataat the aforementionedaccuracy(correlation of 0.99 and 0.98 for the TSVR)versus over 1000 seconds for the other three techniques.For this experiment,the percentage of data required to obtain this level of accuracy for the TSVR is～10%.In comparison,the thinning rates of the three commonly used methods,to achieve accuracy closeto that of the TSVR,is much larger(～26%).

4.3.Results of the third experiment

Using TSVR,computation times can be decreased by buffering in a series of subsets of data and calculating the support vectors of each sample.This process is known as pipeline thinning(Fig.2).To investigate the gain in computational effciency of the pipeline approach,compared to TSVR withoutapipeline,a sampleof120983datapointswas drawn from the 1.5 million observations.The results for the regular and pipeline TSVR are very similar,with MAE magnitude differences(Fig.9a,b)of≤0.05 m s-1and the MSE differences of≤0.1 m2s-2(Fig.9c,d).The correlations between the reconstructedand observedwinds for the regular versus pipeline methods(Fig.9e,f)show trivial differences in the second decimal point,at most.It is notable that the correlations for theu-componentare,for both the regular and pipeline methods,～0.97(Fig.9e)and,for thev-component,～0.99(Fig.9f),indicating the very close correspondence between the thinned and the non-thinned data.The computation time for the pipeline TSVR is less than that for the regular TSVR.The computational effciency gain arises as, for the frst CPU time step(Fig.10;t=1),all the data within the window are thinned;however,fort＞1,using pipeline TSVR,only the new data are thinned.For both the pipeline and non-pipeline TSVR approaches,the time needed to thin the data for the frst period was～145 seconds.However,for periods 2–13,the average thinning time was～142 seconds for the regular TSVR,decreasing by an order of magnitude to 13 seconds for the pipeline TSVR approach(Fig.10). Therefore,the pipeline TSVR approach requires just 9%of the time of the non-pipeline TSVR method,while providingalmost identical accuracy.

5. Conclusions

The removal of redundant data is commonly known as data thinning.In this study,the application is the thinning ofu-andv-components of the winds estimated from Wind-Sat.The numberof observationsis reduced througha combination of Voronoi tessellation and support vector regression (TSVR).Here,hundreds of thousands of observations are assigned to several thousand Voronoicells to optimize the wind retrieval accuracy.For each cell,separate TSVR analyses were conducted,for theu-andv-components of the winds. The number of Voronoi cells can be adapted,consistent with the complexity of the feld,by increasing or decreasing their number.The process can be extremely effcient if the process is parallelized by assigning the SVR calculation inside each Voronoi cell to a separate CPU.

The results of the thinning experimentsyielded decidedly encouragingresults.TheTSVR requiresfewerthan8%–10% of the WindSat data to produce a highly accurate estimate of thewind feld(MAE＜1 ms-1andthe correlation≥+0.98). In comparison,commonly used techniques,such as random selection,averaging and a Barnes flter,are computationally effcient,but have poor retrieval accuracy at coarse spatial resolution.However,at high spatial resolution,as the accuracy of the three commonly used techniques approaches that of TSVR,the computational times for the other thinning methods exceed those of the TSVR approach by a factor of～4.

High retrieval accuracy is a requirement for meaningful analysis.Of the thinning techniques examined,only TSVR offers this combination of providing extremely high retrieval accuracy with the shortest clock time.To determine whether the computational effciency of the TSVR approach could be improved further,a pipeline thinning methodology was applied to the TSVR,reducing the clock time from 150 to 15 seconds.Therefore,for any application requiring ingesting and preprocessing online data,followed by thinning,the pipeline TSVR methodology is advantageous.In this study, it is not only the most accurate of all methods tested but is also the fastest,by up to two orders of magnitude.

Acknowledgements.The authors wish to acknowledge NOAA Grant NA17RJ1227 and NSF Grant EIA-0205628 for providing fnancial support for this work.The third author was partly supported by RSF Grant 14-41-00039.The opinions expressed herein are those of the authors and not necessarily those of NOAA or NSF.The authors wish also to thank Kevin HAGHI,Andrew MERCER and Chad SHAFER for their assistance with several of the fgures.

REFERENCES

Bak?r,G.H.,L.Bottou,and J.Weston,2004:Breaking SVM complexity with cross-training.In L.K.Saul,Y.Weiss,and L. Bottou,editors,Advances in Neural Information Processing Systems 17,MIT Press,81–88.

支氣管哮喘是由嗜酸粒細胞、T淋巴細胞、肥大細胞等多種炎癥細胞介入的氣道慢性炎癥,簡稱哮喘。患者的臨床表現(xiàn)包括胸悶、咳嗽、呼氣性呼吸困難、反復(fù)喘息等,多于清晨或夜間發(fā)作,經(jīng)治療后緩解[1]。本文選取我院2016年5月~2017年5月之間收治的20例支氣管哮喘患者,對其臨床治療觀察分析如下。

Barnes,S.L,1964:A technique for maximizing details in numerical weather-map analysis.Journal of Applied Meteorology,3, 396–409.

Bondarenko,V.,T.Ochotta,and D.Saupe,2007:The interactionbetween model resolution,observation resolution and observations density in data assimilation:A two-dimensional study.Preprints,11th Symp.On Integrated Observing and Assimilation Systems for the Atmosphere,Oceans,and Land Surface,SanAntonio,TX,Amer.Meteor.Soc.,P5.19.[Available online at http://ams.confex.com/ams/pdfpapers/117655. pdf.]

Bottou L.,and Y.LeCun,2004:On-line learning for very large datasets.Applied Stochastic Models in Business and Industry, 21,137–151.

Bowyer,A.,1981:Computing Dirichlet tessellations.Comput.J., 24,162–166.

Chang,P.,P.Gaiser,K.St.Germain,and L.Li,1997:Multi-Frequency Polarimetric Microwave Ocean Wind Direction Retrievals.Proceedings of the International Geoscience and Remote Sensing Symposium 1997,Singapore.[Available online at http://w.nrl.navy.mil/research/nrl-review/2004/ featured-research/gaiser/#sthash.IskB3x9l.dpuf.]

Du Q.,V.Faber,and M.Gunzburger,1999:Centroidal Voronoi tessellations:applications and algorithms.SIAM Review,41, 637–676.

Gaiser,P.W.,K.M.St.German,E.M.Twarog,G.A.Poe,W. Purdy,D.Richardson,W.Grossman,W.L.Jones,D.Spencer, G.Golba,J.Cleveland,L.Choy,R.M.Bevilacqua,and P. S.Chang,2004:The WindSat space borne polarimetric microwave radiometer:Sensor description and early orbit performance.IE Trans.on Geosci.and Remote Sensing,42, 2347–2361.

Gilbert,R.C.,and T.B.Trafalis,2009:Quadratic programming formulations for classifcation and regression.Optimization Methods and Software,24,175–185.

Helms,C.N.,and R.E.Hart,2013:A polygon-based line-integral method for calculating vorticity,divergence,and deformation from nonuniform observations.J.Appl.Meteor.Climatol.,52, 1511–1521.

Laskov,P.,C.Gehl,S.Kr¨uger,and K.-R.M¨uller,2006:Incremental support vector learning:Analysis,implementation and applications.Journal of Machine Learning Research,7,1909–1936.

Lazarus,S.M.,M.E.Splitt,M.D.Lueken,R.Ramachandran,X. Li,S.Movva,S.J.Graves,and B.T.Zavodsky,2010:Evaluation of data reduction algorithms for real-time analysis.Wea. Forecasting,25,511–525.

Lorenc,A.C.,1981:A three-dimensional multivariate statisticalinterpolation scheme.Mon.Wea.Rev.,109,1177–1194.

Mansouri,H.,R.C.Gilbert,T.B.Trafalis,L.M.Leslie,and M.B. Richman,2007:Ocean surface wind vector forecasting using support vector regression.In C.H.Dagli,A.L.Buczak,D. L.Enke,M.J.Embrechts,and O.Ersoy,editors,Intelligent Engineering Systems Through Artifcial Neural Networks,17, 333–338.

MATLAB,2012:MATLABand StatisticsToolbox Release2012b, The MathWorks,Inc.,Natick,Massachusetts,United States. [Available online at http://nf.nci.org.au/facilities/software/ Matlab/techdoc/ref/voronoi.html.]

Musicant D.R.,and O.L.Mangasarian,2000:Large scale kernel regression via linear programming.Machine Learning,46, 255–269.

Ochotta,T.,C.Gebhardt,D.Saupe,and W.Wergen,2005:Adaptive thinning of atmospheric observations in data assimilation with vector quantization and fltering methods.Quart.J. Royal Meteorol.Soc.,131,3427–3437.

Ochotta,T.,C.Gebhardt,V.Bondarenko,D.Saupe,and W.Wergen,2007:On thinning methods for data assimilation of satellite observations.Preprints,23rd Int.Conf.on InteractiveInformation ProcessingSystems(IIPS),SanAntonio,TX, Amer.Meteor.Soc.,2B.3.[Available online at http://ams. confex.com/ams/pdfpapers/118511.pdf.]

Platt,J,1999:Using sparseness and analytic QP to speed training of support vector machines.In M.S.Kearns,S.A.Solla,and D.A.Cohn,editors,Advances in Neural Information Processing Systems11,MIT Press,557–563.

Purser,R.J.,D.F.Parrish,and M.Masutani,2000:Meteorological observational data compression:An alternative to conventional“super-obbing”.NCEP Offce Note430,12pp.[Available online at http://w.emc.ncep.noaa.gov/mmb/papers/ purser/on430.pdf.]

Quilfen,Y.,C.Prigent,B.Chapron,A.A.Mouche,and N.Houti, 2007:The potential of QuikSCAT and WindSat observations for the estimation of sea surface wind vector under severe weather conditions,J.Geophys.Res.Oceans,112,49–66.

Quinn,M.J.2004:Parallel Programming in C with MPI andopenMP.Dubuque,Iowa:McGraw-Hill Professional,544pp.

Ragothaman,A.,S.C.Boddu,N.Kim,W.Feinstein,M.Brylinski,S.Jha,and J.Kim,2014:Developing ethread pipeline using saga-pilot abstraction for large-scale structural bioinformatics.BioMed Research International,2014.1–12,doi: 10.1155/2014/348725.

Santosa,B.,M.B.Richman,and T.B.Trafalis,2005:Variable selection and prediction of rainfall from WSR-88D radar using support vector regression.Proceedings of the6th WSEAS Transactions on Systems,4,406–411.

Sch¨olkopf,B.,and A.Smola,2002:Learning with Kernels.MIT Press,650pp.

Smola,A.J.,andB.Sch¨olkopf,1998:ATutorialon Support Vector RegressionRoyal Holloway College,NeuroCOLT Technical Report(NC-TR-98-030),University of London,UK.[Available online at http://svms.org/tutorials/SmolaScholkopf1998. pdf.]

Shawe-Taylor,J.,and N.Cristianini,2004:Kernel Methods for Pattern Analysis.Cambridge University Press,478pp.

Son,H-J,T.B.Trafalis,and M.B.Richman,2005:Determination of the optimal batch size in incremental approaches:An application to tornado detection,Proceedings of International Joint Conference on Neural Networks,IE,2706–2710.

Trafalis,T.B.,B.Santosa,and M.B.Richman,2005:Feature selection with linear programming support vector machines and applications to tornado prediction,WSEAS Transactions on Computers,4,865–873.

Vapnik,V.,1982:Estimation of Dependences Based on Empirical Data.Springer,505pp.

Voronoi,G.,1908:Recherches sur les parall′elo`edres Primitives.J. Reine Angew.Math.134,198–287(in French).

Wei,C.-C.,and J.Roan,2012:Retrievals for the rainfall rate over land using special sensor microwave imager data during tropical cyclones:Comparisons of scattering index,regression,and support vector regression.J.Hydrometeor,13, 1567–1578.

Wilks,D.S.,2011:Statistical Methods in the Atmospheric Sciences.3rd ed.,Elsevier,676 pp.

:Richman,M.B.,L.M.Leslie,T.B.Trafalis,and H.Mansouri,2015:Dataselectionusingsupport vector regression.Adv.Atmos.Sci.,32(3),277–286,

10.1007/s00376-014-4072-9.

(Received 17 April 2014;revised 11 September 2014;accepted 18 September 2014)

?Corresponding author:Michael B.RICHMAN

Email:mrichman@ou.edu

servations.Satellites are among the most important contribu-