Real-time stereo matching on CUDA using Fourier descriptors and dynamic programming

2019-05-14 13:26:40ohamedHallekFethiSmachandohamedAtri

Computational Visual Media 2019年1期

M ohamed Hallek (),Fethi Smach ,and M ohamed Atri

Abstract Computation of stereoscopic depth and disparity map extraction are dynamic research topics.A large variety of algorithms has been developed,among which we cite feature matching,moment extraction,and image representation using descriptors to determine a disparity map.This paper proposes a new method for stereo matching based on Fourier descriptors.The robustness of these descriptors under photometric and geometric transformations provides a better representation of a template or a local region in the image.In our work,we specif ically use generalized Fourier descriptors to compute a robust cost function.Then,a box f ilter is applied for cost aggregation to enforce a smoothness constraint between neighboring pixels.Optimization and disparity calculation are done using dynamic programming,with a cost based on similarity between generalized Fourier descriptors using Euclidean distance.This local cost function is used to optimize correspondences.Our stereo matching algorithm is evaluated using the Middlebury stereo benchmark;our approach has been implemented on parallel high-performance graphics hardware using CUDA to accelerate our algorithm,giving a real-time implementation.

Keywordsgeneralized Fourier descriptors;stereo matching;dynamic programming;CUDA

1 Introduction

Due to technological advances and improvements in digital cameras,stereo vision is an important research area.However,dense correspondences and 3D reconstitution are key problems for computer vision researchers.Provision of an ef ficient algorithm for the reconstitution of 3D information from a stereo image pair of the same scene taken from distinct viewpoints is the main objective of stereo systems.The stereo system must follow three basic steps:calibration,correspondence,and reconstruction.

In this work,we focus on the correspondence step.The main aims of the stereo matching algorithm are to correctly identify corresponding pixels in the rectif ied stereo images and f ill the disparity map[1,2].Stereo matching algorithms must overcome various problems,the most commonly encountered ones being noise,occlusion,and repetitive textures.Also the researcher must respect various constraints including epipolar geometry,ordering constraints,and smoothness.Many stereo matching algorithms have been developed to solve the correspondence problem using patch-based image synthesis methods[3,4].Analysing state of the art algorithms,stereo matching methods may be divided into local and global categories[5,6]. The most popular local methods are based on block matching and feature matching[7,8].Generally,these methods involve an analysis of local light intensities around each pixel or some regions in the image.However all pixels in the image are involved in global methods,such as graph cut,belief propagation,and dynamic programming[5].

In 2002,Scharstein and Szeliski[1]def ined a taxonomy to categorize dense correspondence algorithms. It shows that most existing stereo matching methods contain four steps:

?Cost function calculation:a matching process for each pixel at all possible disparity levels.

?Cost aggregation:aggregating the cost over the support region.

?Disparity computation and optimization:selecting the disparity value that optimizes the function and f illing the disparity map.

?Disparity ref inement:post-processing to improve the results.

Dif ferent techniques are used to realize each step.For example,block matching and box f ilters provide the most popular techniques for cost calculation and cost aggregation respectively.Usually block matching is done on gray images or on the intensity channel of color images.In this work,we use a mathematical transformation before calculating the cost function.Here,the first step is done by calculating generalized Fourier descriptors then finding the Euclidean distance between descriptors.Next,we apply a boxer filter to aggregate the cost function.The optimization and disparity computation is done using dynamic programming while the last step is performed using a stereo consistency check and median filter to improve the final disparity map.

Many stereo matching algorithms,especially the global methods,are computationally expensive.For this reason,many research works are interested in runtime reduction and real-time implementation.In this paper,we present a new approach for stereo image matching based on generalized Fourier descriptors.This approach isdetailed in Section 3.In Section 4,we evaluate our algorithm and we give some experimental results.In Section 5,we present a CUDA-based real-time implementation of our approach on a GPU.Finally,conclusions are presented in Section 6.

2 Related work

As already indicated,the stereo matching process is realized by following four fundamental steps.It starts by def ining a cost function and calculating the volume cost for each pixel at all disparity levels,then aggregating the matching cost.Next the energy function is optimized and the disparity map f illed.Finally the obtained disparity map is postprocessed.In the literature,there are several techniques to achieve each step.The most common cost functions are absolute dif ference and block matching.The two functions are characterized by linear computational complexity,simplicity of implementation,and fast runtime.However,the limitations of these techniques are their failure in discontinuous areas and theirs sensitivity to the window size used. A simple comparison of light intensities is not always enough,hence the use of mathematical transformations such as census or rank transforms is required.These nonparametric transformations provide standard metrics and are more robust to radiometric distortion and occlusion[9].In our work,we use a mathematical transformation based on the Fourier transform to extract robust descriptors.Similarity of descriptors provides our cost function.We note that many stereo matching methods are based on feature extraction and point of interest detection.In this context we can mention various descriptors such as the scaleinvariant feature transform(SIFT)[10],and speededup robust features(SURF)[11].Zernike moments are also used for the determination of corresponding points[12].Generally,a mathematical transformation is calculated and robust descriptors are extracted to def ine an ef ficient cost function.

For cost aggregation,many techniques are employed.The simple solution is the use of linear image f ilters such as a box or Gaussian f ilter.Edge-preserving f ilters such as the bilateral f ilter and guided f ilter can also be good solutions,but they are computationally expensive.In our work,we adopt a simple box f ilter for cost aggregation.

In the disparity computation,we note that winner takes-all is the most common solution.This step can be improved using semi-global or global optimization algorithms such as graph cuts[13],belief propagation[14],or dynamic programming[15,16].Disparity ref inement is done using the same approach as in Mattoccia et al. [17]and Kordelas et al. [18]to detect occlusions and depth borders. In this step,three consecutive processes are applied:invalid disparity detection,f ill-in of invalid disparity values,and median f iltering.

Many works consider acceleration of these computationally intensive algorithms.Dif ferent architectures are used to achieve real-time performance.One is based on f ield-programmable gate arrays(FPGAs)[19]. A second alternative is based on graphics hardware using the CUDA language,and is used in many real-time algorithms such as the work of Kowalczuk et al.[20]and Congote et al.[16].Dif ferent real-time algorithms focus on reducing the complexity associated with cost calculation,at the expense of reduced accuracy.In our work we exploit NVIDIA’s GeForce GTX960 computing capabilities to produce an accurate disparity map.

3 GFD for stereo matching

Figure 1 presents a block diagram of our stereo matching algorithm.The cost function is based on similarity of generalized Fourier descriptors,denoted SGFD.The cost is aggregated over square windows using a box f ilter.Then we usedynamic programming for energy optimization and f illing the disparity map.At this stage,the obtained disparity map contains some invalid or unwanted pixels,so a post-processing step is required. A left–right consistency check allows us to detect these invalid pixels.Then we perform a f ill-in process to replace the invalid pixels with valid minimum values.The ref inement step includes median f iltering to remove noise and enforce smoothness between neighboring pixels.

3.1 Rep resentation of GFD

In pattern recognition,the Fourier transform hasbeen used for many years to extract a set of invariants.In 2002,generic Fourier descriptors[21]were applied to grayscale images.Color descriptors called generalized Fourier descriptors(GFDs)were def ined in 2008 which can be applied to both grayscale and color images[22].These descriptors are given in Eq.(1),when(r,θ)are polar coordinates of the input point M and?f(r,θ)isthe Fourier transform of the function f at the point M(r,θ).

These invariants are functions of a single variable(radius)which makes them simple to calculate.In an image,the integral in Eq.(1)becomes a discreet sum that gives us a set of values forming a vector.All descriptors must satisfy some invariance and robustnessproperties.For GFDs,thesepropertiesare well respected.The theoretical properties of GFDs are detailed in the work of Smach et al.[22]and we note their invariance under motion,change of scale,and ref lection.In practice,GFDs are obtained for color images using the f lowing steps:

?decompose the color image into three channels(red,green,and blue);

?calculate the Fourier transform and its square modulus for each channel;

?vary the radius r and compute the sum of the pixels located along each ray.

The final descriptors concatenate the descriptors for each channel.These descriptors give a complete and robust description of the image which can be used for color object recognition and image classification.Smach et al.[22]evaluated the performance of the GFD on several standard and personal databases.The results obtained using GFD and support vector machines(SVMs)for classification indicated the robustness of these descriptors.GFDs outperform various families of invariants,such as Zernike moments.See Refs.[22,23]for more details of GFDs.

3.2 Local cost function

Invariance under geometric transformations,and robustnessto noiseand lighting changesareimportant properties of GFDs which allow us to use GFDs for stereo matching.A full and easily accessible description can characterize a region in the reference image and identify it in a target image.

In stereo matching,the cost function is the main step and it dif fers from one algorithm to another.Our energy function,denoted SGFD,is def ined by similarity between GFDs:for a stereo pair we take the left image as the reference image and characterize some region in this image by the left color descriptors,GFDlc.Then,we calculate the descriptors for the candidate region in the right image,GFDrc.The two descriptors can be expressed as below:where aiand biare the components of the left and right color descriptors and N is the radius of the window. After computing these descriptors,we compute their similarity SGFDcusing their Euclidean distance:

Fig.1 Block diagram of our approach.

Many algorithms combine the matching costs for colors and gradients in order to increase the robustness against radiometric dif ferences.Yang et al.[24]and Hosni et al.[25]combined absolute dif ferences of color and gradient using a parameter α.In a similar way,our matching cost combines the Euclidean distance for color descriptors(SGFDc)and the Euclidean distance for gradient descriptors(SGFDg).The global cost function for a pixel p at the disparity value d is denoted by SGFD(p,d):

The parameterαis used to balance the color and gradient termsasin Yang et al.[24].In the above,we need to calculate SGFDg(p,d)based on the gradient.This is done using the following steps:

?Calculate the gradient values in horizontal and vertical directions for the left and right images(Il,Ir).These values,denoted Gx,Gy,are given by Eqs.(6)and(7).

?Calculate the gradient magnitude G for both images,as given by Eq.(8).

?Calculate the generalized Fourier descriptors for reference and target images.

?Computethe Euclidean distancebetween descriptors.The necessary equations are given below,where A=[1 0?1]and I is either the left or right image.Here?denotes the convolution operation,and ATis the transpose of A.

After calculating SGFDg(p,d),this cost function is aggregated using a simple technique.By applying a box f ilter,we aggregate the matching cost over a square windowω.The aggregated cost for a pixel p at disparity value d is given by following Scharstein and Szeliski[1].

3.3 Op timization and d isp arity f ill ap p roach

After cost matching calculation and use of a box f ilter for the aggregation step,the third step in Scharstein and Szeliski’s taxonomy is performed.The aim of this step is to optimize the cost and f ill the disparity map.The most popular method is the winner takesall(WTA)technique,which selects the minimal aggregated corresponding value for each pixel:

where drdefines the set of allowed discrete disparity levels in an image.The use of WTA reduces computational complexity but can produce unmatched pixels and invalid disparity values at the image border and occluded regions.

Once the global approach has been developed,a variety of algorithms can be used to f ind the correctly matching points.These algorithms make explicit smoothness assumptions and optimize a global cost function that combinesmatching cost and smoothness cost as detailed in Ref.[1].This global cost function is def ined by

Dynamic programming(DP)is a popular optimization approach.Generally,the aim of DP is to solve a global problem by dividing it into smaller subproblems whose solution can easily be obtained.The global solution is the concatenation of the solutions of all sub-problems.This optimization approach was introduced for stereo vision by Ohta and Kanade[26].DP exploits ordering and the smoothness constraints to optimize the matching cost between two scanlines.This technique is based on two stages:a step for constructing the cost matrix for all pixels at all possible disparity levels and a step in which pairs of corresponding pixels are selected by searching for the optimal path.Let the aggregated cost function be denoted by CA(x,y,d)where x and y represent the position of a pixel p.The matrix extracted from a volume cost CA(x,y,d)for a f ixed line is denoted Mh(x,d).Thedimensionsof Mhare W×Dmaxwhere W and Dmaxrepresent the width of the image and the maximal disparity.In our work,we use the DP approach developed by Congote et al.[16].Each matrix Mhthat represents the matching between two scanlines is updated according to

We note thatλrepresents a penalty for change in disparity values between neighboring pixels.After calculating Mh,we compute the disparity value of the corresponding line using the algorithm for the optimal path.

3.4 Disp arity ref inement

By thisstage,the cost function hasbeen calculated and aggregated.Dynamic programming is also applied to fill disparity map.Theobtained disparity map contains unmatched pixels due to occlusion and repetitive textures. Thus,a postprocessing step is required.This step involves three consecutive processes:invalid disparity detection,fill-in of invalid disparity values,and weighted median filtering.Disparity refinement starts by detection of invalid disparity values and unmatched pixels in the depth map.This stage is done using a left–right consistency checking process,comparing the left disparity map to the right disparity map.Inconsistent pixels between the two disparity maps are marked as having invalid disparities.In our work,we use the same approach defined by Mattoccia et al.[17]and Kordelas et al.[18].

Disparity values are marked as invalid if they do not satisfy the condition below:

where DLRand DRLrepresent the left reference disparity and right disparity map respectively.

The next step for disparity ref inement is f ill-in of invalid disparity values.The disparity of each unmatched pixel is replaced with the nearest valid disparity.Knowing that,the used valid value is located in the same line or in the starting line.This process is by

wherethedisparity valueat thelocation of p isdef ined by d(p)and(p?i)indicates the location of the f irst valid disparity on the left side while(p+j)is the location of the f irst valid disparity on the right side.We f inish the ref inement step with median f iltering in order to reduce noise and enforce smoothness between neighboring pixels.

4 Experimental results and analysis

To evaluate our stereo matching approach it is necessary to use standard databases.We confronted our algorithm with some stereo matching problems involving occlusion and discontinuous regions.The average number of bad pixels in these regions indicates the accuracy of our stereo matching method,and is given by

where Dxis the obtained disparity map,GTxis the ground truth,andδpresents a disparity tolerance.

To evaluate our approach,we start by evaluating our proposed local function:SGFD can be compared with other local energy functions.We study the ef fect of thewindow size and give the evaluation errors.The test wasdone by changing the window size from(3×3)to(25×25)and the errors were calculated in the non-occluded regions.The images used were ArlL and Teddy from the Middlebury dataset.Results are shows in Fig.2.

The f irst column in Fig.2 shows the left image of ArtL and thepercentage of bad pixelsin non-occluded areas(curves(a)and(b)). The second column presents the Teddy image and the corresponding errors(curves(c)and(d)).The results obtained for all cost functions were calculated without performing any post treatment and errors were calculated using Eq.(15)withδ=1.We compare SGFD,ZSAD(zero mean sum of absolute dif ferences),and NCC(normalized cross correlation)in curves(a)and(c)for Art L and Teddy images respectively.Curves(b)and(d)compare SGFD,ENCC(enhanced normalized cross correlation)[27],and ZNCC(zero mean normalized cross correlation)for the two stereo pairs.For the images used,we note that the lowest error in non-occluded areas is obtained by SGFD using a small window size(3×3)or(5×5).In addition,this error always remains lower compared than the errors given by other functions for a large window.Thus,these curves indicate that SGFD is more accurate than other local cost functions and more robust to window size variation.

After evaluating the local function,we evaluated our stereo matching algorithm on the Middlebury dataset.This database is used as a de facto standard for comparing stereo matching methods and ranking them according their performance.We started by using version 2 of this database,denoted MV2;it contains only 4 stereo pairs(Tsukuba,Venus,teddy,cones).Results are provided in Tables 1 and 2.We denote the average errors in non-occluded regions and the average of absolute errors by Avg non-occ and Avg all respectively,while the average of depth discontinuity errors are denoted Avg disc.The metric Avg indicates the average of the bad pixel error.These measures are used as the main metrics to evaluate the accuracy of all stereo matching methods.

Fig.2 Ef fect of window size variation for SGFD and other cost functions.

Table 1 Comparison of methods on MV2 usingδ=1

Table 2 Comparison of methods on MV2 usingδ=2

We note that the calculation of these parameters is performed using Eq.(15).To evaluate our approach,we determined the disparity maps for all images in MV2.Then we calculated thefour metrics usingδ=1 andδ=2 as indicated respectively in Tables 1 and 2.These two tables show that our approach gives the lowest average errors for both thresholds,and conf irm that our proposed approach outperforms the optimized dynamic programming method of Salmen et al.[28],and the approach of Wang et al.[29]based on joint bilateral f iltering and dynamic programming.In addition,our approach is more accuratethan other recent works given in the evaluation table for MV2(ht t p://vision.middl ebur y.edu/st er eo/eval).

Furthermore,we evaluated our stereo matching algorithm on version 3 of the Middlebury benchmark(MV3).Thisdataset contains30 stereo pairs:15 pairs for training and 15 pairs for testing.The images in this database have resolution up to 750×500 and a maximal disparity that can reach 200.Evaluation results for MV3 are shown in Table 3.It presents the percentage of bad pixels in non-occluded regions and in all regions(nonocc,all).The errors are calculated for the two regions using thresholds equal to 2 and 4(Bad2,Bad4).The average absolute error is denoted Avgerr and indicated in the last column.Table 3 summarizes the evaluation results on the training dataset.The obtained results are detailed in Tables 4and 5,which respectively show the error for each stereo pair on non occluded and all regions.

Table 3 Comparison of methods on MV3

The evaluation on MV3 indicates that our proposed approach outperforms other algorithms such as DoGGuided[33]that use a guided f ilter based on the response of the dif ference of Gaussian,binary stereo matching(BSM)[34]and other recent approaches.In addition our stereo matching algorithm is more accurate than ICSG[35]and semi global matching(SGBM1)[36]. Further details of the evaluation on MV3 are provided by disparity maps displayed in Fig.3. This f igure presents respectively the left images and ground truths(GT)for the stereo pairs used in the second and the third columns;the disparity maps in the fourth column are calculated using the proposed approach. Further columns show the disparity maps obtained with other stereo matching algorithms.Errors in both regions(all and non-occluded)for these disparity maps are given Tables 4 and 5.

Stereo matching methods are classif ied according to dif ferent measures,essentially based on the disparity map quality and the execution time.Global methods are computationally expensive.Their complexity is O(NN)operations per scanline,for scanlines of N pixels.The use of dynamic programming can reduce thiscomplexity to O(N2)but thisdoesnot of fer a fastrunning implementation.A software implementation of our method has a long runtime and does not meet realtime needs.

To accelerate our stereo matching system we propose an implementation based on graphics hardware as detailed in the following section.

Table 4 Performance comparison of quantitative evaluation results based on nonocc error from MV 3

Tab le 5 Performance comparison of quantitative evaluation results based on all error from MV 3

Fig.3 Disparity maps for the proposed algorithm and other methods on MV 3.

5 CUDA implementation

Realtime stereo matching has become a reality.Until very recently,all realtime implementations made use of GPUs or FPGAs.Our method is based on the calculation of generalized Fourier descriptors.We mentioned previously that the GFD calculation is done for each channel separately,then the f inal descriptor is calculated by concatenating the results.We can use parallelism to compute descriptors for the three channels simultaneously and consequently reduce the time required to get the f inal descriptors.On the other hand,dynamic programming allows us to optimize matching between two scan-lines.A CPU implementation performs these steps successively,which is why we seek an appropriate environment for simultaneous processing.We believe that GPU implementation can provide an ef ficient solution.

5.1 Approach

In a few years,GPUs have become powerful tools for massively parallel intensive computing.They are currently used for several applicationsincluding image processing.These applications exploit classical image processing methods implemented on a GPU,typically using a specif ic language,CUDA,def ined by NVIDIA in 2007.There are many predef ined functions and libraries for image processing using CUDA language.As our descriptors are based on the Fourier transform,we can exploit the CUFFT library for fast Fourier transform calculation.Our stereo model relies on CUDA for parallel processing,result visualization,and reduction of data transfer costs between CPU memory and GPU memory.This model contains 4 steps:

?Load ing inp ut images:transfering the stereo pairs from the CPU to GPU memory(host to device).

?Thread allocation:f ixing the number of threads for the calculation grid so that each thread can perform processing on a pixel template.

?Parallel p rocessing w ith CUDA:executing kernel stereo functions N timesusing the N threads created in the previous step.

?Presentation of results:transfering results from the GPU to CPU memory(device to host).

We start by f ixing the number of threads and blocks and loading left and right images into device memory.All processes of our stereo matching algorithm are performed with specif ic functions,or kernels,that are executed in parallel by multiple threads.For our method,the organization of the kernel functions is presented in Fig.1.The f irst step in our algorithm is the calculation of the cost volume V(x,y,d)where x,y indicate the position of the pixel p and d is the disparity value.This volume is obtained for all pixels and for all possible disparity values.It is obtained by matching pixel p toˉp at position(x+d,y)using SGFD def ined by Eq.(5).Therefore,the aim of the f irst kernel function is to calculate V(x,y,d)based on our local cost function.The SGFD is built from Fourier transforms and calculated using CUFFT.This library implements several FFT algorithms for varying types of input and output including C2C(complex input to complex output),R2C(real input to complex output),and C2R(complex input to real output).CUFFT of fers highly optimized algorithms to calculate the FFT for dif ferent dimensions:1D,2D,and 3D.In our approach,we use FFT 2D to compute the FFT of a square window,following Haythem et al.[37].

After the descriptors are obtained to characterize this region,the Euclidean distance is employed to determine the matching cost as indicated in Algorithm 1.We start by loading the input images,f ix the window size,and extract the templates from the original images(Il,Ir)and gradients(Gl,Gr).Next,we calculate the generalized Fourier descriptors(GFD)for all templates.We obtain four descriptors:GFDlcfor the left window,denotedfor the right window,denotedand GFDlg,GFDrgfor gradient left and right windows denoted respectively byand Tmp grad r.The Euclidean distance is computed between the descriptors and the f inal cost function is determined.The aggregated cost volume CA(p,d)in Eq.(9)is easy to calculate using a box f ilter to average the cost. The next kernel function is dedicated for optimization and calculation of the disparity.The goal of this kernel is def ined in Eq.(12).In our work,we follow Congote et al. [16],where the dynamic programming kernel is well described.Before transferring the results to host memory,a last kernel performs postprocessing.The goal of this function is to improve the disparity map by detection of invalid disparity values,to f ill them in,and apply a median f ilter.Westart by simplecomparison between left disparity and right disparity to identify the unwanted pixels according the condition in Eq.(13).We then replace invalid pixels with valid values from the left or right side as indicated in Eq.(14).Finally a simple median f ilter is used to reduce the noise and impose smoothing between neighboring pixels.

Algorithm 1 Cost function at disparity value d Input:left image I l,right image I r,gradient image left G l,gradient image right G r,parameterα.Outp ut:cost function SGFD c(x,y,d).for every pixel p(x,y)d o for p←y?w/2 to y+w/2 d o for q←x?w/2 to x+w/2 d o T mp lef t[p.ω+q]←I l[y.width+x];T mp r ight[p.ω+q]←I r[y.width+x+d];T mp gr ad l[p.ω+q]←G l[y.width+x];T mp gr adt r[p.ω+q]←G r[y.width+x+d];end for end for S GFD c←dist(GFD(Tmp right));S GFD g←dist(GFD(Tmp gr ad lef t),GFD(Tmp r));S GFD(p,d)←αSGFD c(p,d)+(1?α)SGFD g(p,d);end for l),GFD(Tmp gr ad

5.2 Imp lementation results

The computational complexity of our stereo method and its execution time distribution are now discussed.In practice,the graphics card available was an NVIDIA GeForce GTX960 with Maxwell architecture.It has 1024 CUDA cores running at 1.2 GHz.It is connected to an Intel Core i7-3770M based CPU with a clock speed of 3.4 GHz. We tested our stereo matching implementation on images with resolution 320×240 pixels and 32 disparity levels.Our implementation givesusan execution timeof 26.2 ms,with the steps of cost calculation and aggregation taking 76%of the overall runtime.Optimization and disparity f illing processes take 18%of the total processing time and the ref inement kernel takes 6%of the total runtime.

In order to compare our stereo matching method with other real-time algorithms,we used the same stereo pairs with a resolution of 320×240 pixels and 32 disparity levels.We evaluated the performance of the algorithms based on three important metrics:the number of millions of disparity computations performed per second(MDE/s),the number of the frames per second(FPS),and the average percentage of bad pixel errors across all test images.Results using accuracy and runtime metrics are indicated in Table 6,which provides a comparison between our proposed algorithm and other real-time stereo matching methods.Our implementation achieves 38 frames per second,and is more accurate and faster than DCBGrid[38],and RealTimeGPU[39]based on adaptive cost aggregation and dynamic programming.ReliabilityDP[40],using reliability based dynamic programming,produces less accurate results and is slower than our proposed algorithm.Moreover,our approach gives us almost the same accuracy as that obtained using ESAW[41],although this last method is faster.

Evaluation of our approach on the Middlebury MV3 database requires the calculation of disparity maps for all stereo pairs.We present some results in Fig.4.The f irst line indicates the left images of each stereo pair and the second line shows the ground truths for each stereo pair.The last line presents the disparity maps obtained using our approach.

The test on MV3 allows us to place our algorithm in an evaluation table(ht t p://vision.middl ebur y.edu/st er eo/eval 3/).From it,we extract the most important factors:average absolute errors in nonoccluded regions(Avg nonocc),average absolute errors in all regions(Avg all),total time(time),time normalized by number of pixels(s/megapixels,denoted time/MP,and time normalized by number of disparity hypotheses(s/(gigapixels?ndisp))denoted time/GD.In Table 7,we compare our approach with other stereo matching algorithms in termsof accuracy and runtime metrics.

Table 7 Accuracy and speed for MV 3

Theresultsin Table7 show that our approach gives average absolute errors in non-occluded regions equal to 9.76%and average absolute errors in all regions equal to 17.6%.The total execution time of our proposed algorithm is 0.27 s.These results indicate that our approach is more accurate and faster than DoGGuided[33],BSM[34],ICSG[35],and SGBM1[36].SPS[32]produces more accurate results over all regions(16.6%)but its results are less accurate in non-occluded regions(10.4%).Also,this method is slower than our stereo matching algorithm,with a global execution time of 22.1s.

6 Conclusions

Fig.4 Some disparity maps obtained for MV3.

This paper presents a new cost function for stereo matching based on generalized Fourier descriptors.The cost function for the proposed stereo matching algorithm is Euclidean distance between Fourier descriptors applied to color and gradient images.Cost aggregation,disparity calculation,and result ref inement are performed respectively using a box f ilter,dynamic programming,and postprocessing.To evaluate our algorithm,we used the Middlebury stereo benchmark.The experimental results indicate that our proposed method outperforms many stereo matching algorithms including ones based on joint bilateral f iltering and dynamic programming,semi global matching and optimized dynamic programming.Also,our proposed approach is more accurate than recent works involving binary stereo matching and stereo matching based on sampled photoconsistency computation.

Furthermore,we havepresented an implementation of our approach on graphics hardware using CUDA.This implementation exploits the CUFFT library to compute the cost function and CUDA parallel computing architecture to implement the dynamic programming.Results show that this implementation can reach real-time performance,conf irming that it outperforms many real-time algorithms in terms of accuracy and runtime metrics.

Computational Visual Media2019年1期

Computational Visual Media的其它文章: Shadow GAN:Shadow synthesis for virtual objects with conditional adversarial networks; Recurrent 3D attentional networks for end-to-end active object recognition; Image-based app earance acquisition of ef fect coatings; Discernible image mosaic with edge-aware adaptive tiles; Automated p ebble mosaic stylization of images; Removing fences from sweep motion videos using global 3D reconstruction and fence-aware light f ield rendering