Adaptive Median Filtering Algorithm Based on Divide and Conquer and Its Application in CAPTCHA Recognition

2019-03-18 08:15:26WentaoMaJiaohuaQinXuyuXiangYunTanYuanjingLuoandNealXiong

Computers Materials&Continua 2019年3期

Wentao Ma,Jiaohua Qin, ,Xuyu Xiang,,Yun Tan,Yuanjing Luo and Neal N.Xiong

Abstract:As the first barrier to protect cyberspace,the CAPTCHA has made significant contributions to maintaining Internet security and preventing malicious attacks.By researching the CAPTCHA,we can find its vulnerability and improve the security of CAPTCHA.Recently,many studies have shown that improving the image preprocessing effect of the CAPTCHA,which can achieve a better recognition rate by the state-of-theart machine learning algorithms.There are many kinds of noise and distortion in the CAPTCHA images of this experiment.We propose an adaptive median filtering algorithm based on divide and conquer in this paper.Firstly,the filtering window data quickly sorted by the data correlation,which can greatly improve the filtering efficiency.Secondly,the size of the filtering window is adaptively adjusted according to the noise density.As demonstrated in the experimental results,the proposed scheme can achieve superior performance compared with the conventional median filter.The algorithm can not only effectively detect the noise and remove it,but also has a good effect in preservation details.Therefore,this algorithm can be one of the most strong tools for various CAPTCHA image recognition and related applications.

Keywords:Image preprocessing,machine learning,CAPTCHA recognition,adaptive median filtering algorithm.

1 Introduction

The development of CAPTCHA recognition technology not only facilitates the user to obtain massive information,but also promotes the development of technologies such as image processing and pattern recognition.Image denoising is an important part of the CAPTCHA recognition technology.Thus,this paper analyzes the image denoising algorithm with the image CAPTCHA as the research object.

Image denoising algorithm has formed a complete algorithm technology system along with the generation and development of image processing technology [Singh and Shree(2016)].Its emergence means filtering has driven the development of the linear denoising[Gonzalez and Woods (2002)].However,such algorithms blur the details of the image while removing noise,so they were quickly replaced by nonlinear denoising.The nonlinear filtering denoising algorithm [Varade,Dhotre and Pahurkar (2013)] can effectively suppress the interference pulse and random noise,which can also preserve the edge information of the image.In recent years,with the development of computer vision technology such as autonomous driving,the real-time requirements for image preprocessing have been constantly improved,but the conventional median filtering has a slow sorting and cannot meet the real-time requirements,so improvements of the median filtering are always the focus of research.

The conventional median filtering used bubble method for sorting pixel values,which needs to sort all pixels in the neighborhood to obtain the median value [Tukey (1974);Pitas and Venetsanopoulos (1992)].ForN×Nfilter window,N2(N2-1)/2Comparison operations are required.Taking the3× 3filter window as an example,the comparison of median value is 36 times,which is a time-consuming process.Brodland et al.[Brodland and Veldhuis (1998)] proposed a weighted median filter,the weight value of the central pixel is defined by the degree of noise pollution.This algorithm can effectively suppress noise and greatly reduce the complexity,which can meet the requirements of computer vision detection system in protecting the edge and details.Huang et al.[Huang,Yang and Tang (2003)] make full use of the data correlation.By considering the relationship between the move-in value,move-out value and median value,which greatly improve the efficiency of the filtering process.On the basis of comparing various fast median filtering,Dai et al.[Dai,Xu and Piao et al.(2017)] proposed an improved fast median filtering algorithm,which combined the sorting algorithm with the hardware system to effectively improve the processing speed.However,these algorithms only do a lot of work in improving the filtering efficiency,but they have not optimized the adaptiveness of filtering window.The window size of the conventional median filtering is fixed,and it is impossible to simultaneously denoise and preserve the image details [Zhang,Xu and Dong (2006)].Therefore,it is necessary to dynamically alter the size of the window during the filtering process.Zhang et al.[Zhang,Tang and Shi (2014)] proposed a Recursive of Least Square (RLS) adaptive filtering which has good denoising performance and high precision.Bhadouria et al.[Bhadouria,Ghoshal and Siddiqi (2014);Roy,Singha and Manam et al.(2017)] did a lot of optimization in the adaptiveness of filtering window,and also achieved good results,but they have neglected the complexity.Ding et al.[Ding,Niu and Lu et al.(2018); Fan,Han and Gou et al.(2018); Roy,Singha and Devi (2016)] used the convolutional neural network (CNN) and support vector machine (SVM) in the field of image recognition.Although this method has achieved good results,they have not tried to preprocess the original image input by the network,and outstanding image preprocessing algorithm combines with state-of-the-art machine learning algorithm will certainly improve the CAPTCHA recognition accuracy.

In this paper,we propose an adaptive median filtering algorithm based on divide and conquer:

(1) The conventional median filtering is mostly based on bubble sorting.This paper presents another idea:On the one hand,the median filtering based on the quick sorting algorithm is developed by using divide and conquer,which effectively reduces the sorting.On the other hand,the data correlation is fully utilized by the methods of literature,which can greatly improve the efficiency of filtering.

(2) The window size of the conventional median filter is fixed,so it is impossible to simultaneously denoise and preserve image details.Thus,we propose an algorithm that can adaptively adjust the size of the window according to the density of noises,which can not only remove noise but also protect details and avoids refinement or coarsening of edges.

This paper combines the advantages of the above two algorithms and proposes an adaptive median filtering algorithm based on divide and conquer,which can improve the adaptiveness of filtering window and reduce the sorting complexity.The experimental results of peak signal-to-noise ratio (PSNR) and recognition rate show that compared with the conventional median filtering algorithm,the proposed method has a better performance in denoising and image information preservation.

2 Related work

2.1 Preprocessing

In order to facilitate the next denoising,the image needs to be grayed,etc.The RGB image is generally converted into a one-dimensional gray values according to certain coefficients.

whereR,G,andBdenote the pixel points in the three-color channels,Grayis the gray value corresponding to the pixel points.The specific calculation method:obtaining theRGBvalue,taking the average as the gray value of the pixel corresponding to the new image.The grayscale image isG.

Since the background and the character color of CAPTCHA image are quite different,the binarization processing is required as follows.

whereG(x,y)denotes the grayscale value at the image(x,y),andF(x,y)denotes the binarized value.The steps to select the appropriate thresholdT0are as follows.

Step 1.Set an initial thresholdT0=127.

Step 2.Each pixelPis divided into character dataG1and background dataG2according to the threshold.

Step 3.The average value ofG1isM1,and the average value ofG2isM2.

Step 5.IfT1≠T0,turn to Step 2; if equal,T1is the final threshold.

After the five steps,the suitable thresholdT0can be found,and the binarized image isI.

2.2 Closed operation

Morphological processing is a form of domain operation.Through a special definition of a domain called “structural element”,which performs specific logical operation on the region corresponding to the binarized imageIat each pixel position.The result of the logical operation is output as the pixel corresponding to the image.The closed operation is used after the image binarization to smooth the outline of the character,suppress the small noise of the character boundary,and fill the small holes between characters.

Structural elements act as the input image to produce the output image in closed operation.The binarized imageIand the structural elementSused to processIare usually relatively small windows.The closed operation is a process of eroding and corrosion,which is expressed as follows.

Where denotes the image to be processed,Sdenotes the defined structural element.

The functions of closed operation are followed.

(1) Closed operations can fill small holes,connect adjacent objects,smooth their boundaries,and maintain the position and shape of the object.

(2) It filters image by filling the concave corners of the image.

(3) The effect of filtering depends on the size of structural elements.

After that,the image is clearer and the edges smoother.The closed operation is recorded asH.

2.3 Image quality evaluation standard

Mean Square Error (MSE) and Peak Signal-to-Noise Ratio (PSNR) are commonly used to measure image quality.

(1) Mean Square Error (MSE)

Wherep′(i,j)andp(i,j)denote the image to be evaluated and the original image,respectively,and,denote the length and width of an image.

(2) Peak Signal-to-Noise Ratio (PSNR)

3 Conventional median filter

In the sequence of two-dimensional,a digital image withmrows andnColumns can be denoted by ｛f(t,s);t= 1,2,… ,m;s= 1,2,… ,n｝.Then the output of the two-dimensional array median filter is

where (x,y) denotes a position coordinate,A(x,y)denotes a neighborhood of (x,y),which contains the sizer×rof the corresponding filter window andr×rpixels.According to the median filter and the image denoising,the filter can be implemented by the following steps.

Figure1:The sliding rule of the median filter window

Step 1.The filtering window traverses the entire image as shown in Fig.1 and overlaps the corresponding position with its center.

Step 3.Finds the median value of the array and assign it to the corresponding center filter window pixel.

Algorithm 1:Conventional Median Filtering Algorithm Input:image of size m × n,kernel radius Output:image of the same size as 1:for do 2:for?to do 3:initialize arrayto n []4:fordo 5:forto to do 6:add to []7:end for 8:end for 9:)10:end for 11:end for[],then med (

Conventional median filter has been widely used to eliminate impulse noise which works well at low noise.However,it works not well at high noise density,and the image details such as some threads and sharp corners are destroyed easily in such case that leads to image distortion.Furthermore,it uses the predefined fixed filter window,and pixels are filtered by the windows with the same size.So,it is hard to get better filtering performance when the image has been seriously polluted.

4 Adaptive median filtering based on divide and conquer

The median filtering algorithm has been developed over decades and has achieved encouraging results in denoising,but there are still many disadvantages in processing complex image:1) The denoising performance is affected by the noise density,and its performance drops sharply when the noise density is large.2) The details of the images such as thin lines and sharp corners may be erroneously eliminated.3) The filter window is not flexible enough to automatically adjust the size of the processed image.This paper takes the CAPTCHA images as experimental data to analyze the adaptive median filtering algorithm from two aspects:the filter window adaptability and the improved denoising effect.

4.1 Quicksort by data correlation

Blocks sort is the essence of quicksort based on divide and conquer,which can greatly reduce the complexity of sorting and improve the efficiency of filtering compared with the conventional sorting algorithm.The method quickly sorts the data of the first filter window.The algorithm is described as follows.

Algorithm 2:Quicksort Algorithm Based on Divide and Conquer Input：unordered array output：ordered array 1:filter window array [],initialize = 0,=2:&&3:],take the first array element as standard data 4:while[5:if ]6:&&[,looking forward from right 7:]8:if[][]9:&&,looking from left to back.10:]11:[][]12:end while 13:sort (),sort the small sequences on the left 14:sort (,,right),sort the small sequences on the right 15:return?ordered array [].

The filter window moves by one pixel next to one pixel,a column of elements are removed and inserted in each side of the window,but most of them remain same.Therefore,we can only consider the influence of the inserted and deleted elements on the median value of the previous window,so that we don’t need to compare all the elements,which greatly reduces the sorting complexity.

Assumption that the filter window moves,the column elements inserted on the right side arethe column elements shifted out on the left side areThe algorithm steps are followed.

Algorithm 3:Sliding Window Pixel Sorting Algorithm Based on Data Correlation Input:window array []Output:the median ([])1:Initialize [],2:repeat 3:Judging whether is equal to .4:Calculate ,the array [] order is unchanged.5:Calculate ,replacing with .Reordering new array by algorithm 2.6:7:Filter window continues to move and until the entire image is traversed.

4.2 Adaptiveness of filtering window

The performance of the median filter is greatly affected by the size of the filter window.There is a contradiction between noise elimination and detail protection:the smaller filter window can protect some details of images,but the filtering effect on noise should be enhanced.Conversely,the larger window has good noise filtering performances,but it will cause some blurring of images.In addition,according to the median filter principle,if the number of noise points in the filtering window is larger than the number of pixels,the performance of conventional median filter drops dramatically.

In the filtering process,the adaptive median filter will change the size of window according to the preset conditions,and it can also judge whether the current pixel is noise.If the current pixel is noise,then the pixel is replaced by the median value.If not,its current value is maintained.The adaptive median filter has three functions.1) Filtering salt and pepper noise,2) smoothing other non-impulse noise,3) protecting the details of the image as much as possible to avoid the refinement or coarsening of the edges.

The relevant symbols are defined as follows by describing the adaptive median filter algorithm based on divide and conquer.The ordered array [] in the filter windowSr×rCan be obtained quickly by fast sorting method based on divide and conquer,Zminis the minimum value in the filter window,Zmaxis the maximum value in the filter window,Zmedis the median in the filter window,Zxyis the value in (x,y),Smaxis the maximum size allowed by filter windowSr×r.The formula is followed.

This had kept their hands very soft and white, like the hands of a girl, and when the water was passed through the lattice, and the servant saw the small, delicate fingers, he said to himself: A maiden must indeed be lovely if she has a hand like that

Formula (7) and (8) determine whether the median value ofSr×ris noise,and

Algorithm 4:Adaptive Median Filtering Based on Divide and Conquer Input:image of size m × n,kernel radius Output:image of the same size as 1:for do 2:forto do 3:initialize arrayto n []4:fordo 5:forto to do 6:add []7:get the ordered arrayto [] quickly by algorithm2,3,where,,8:end for 9:end for 10:are given by formula(9)(10)11:if&are given by formula (7)(8) and &is noise or not.12:if&&,is noise.13:return&&,whether directly 14:else 15:return,output 16:end if 17:else 18:,is noise,extend the size of 19:if ,is less than the maximum size of 20:extend the filter window size，repeat 4～16.21:else 22:return 23:end if 24:end if 25:end for 26:end for

According to the algorithm,the adaptive median filter algorithm based on divide and conquer has great innovation,which not only greatly simplifies the sorting but also makes the size of filtering window more flexible.Especially in the flexibility,the system can change the size of the filter window automatically with the noisy density,which greatly reduces the human intervention,thereby improving the performance.

5 Experimental results and analysis

In order to evaluate the performance of the proposed algorithm,we use four kinds of the CAPTCHA image experimental data provided by Inspur Group for denoising and compare it with the conventional median filtering.The experiment was carried out on Windows 64,CPU 8 GHz,4 G RAM computer.Firstly,describe the CAPTCHA image to be used in this experiment briefly.There are 4 types of 10,000 CAPTCHA images for each class.

(1) Four arithmetic expressions (including numbers and operators,resolution 350×80);

(2) Lowercase English letters and numbers (resolution 200×60);

(3) Slightly overlapped uppercase English letters and numbers (resolution 200×80);

(4) The Chinese character string with complicated geometric structure and uneven stroke size (resolution 150×45).

5.1 Results of related preprocessing

Image preprocessing involves multiple steps,and this paper focuses on the filter denoising module.Three related preprocessing effects are shown in Fig.2.

Figure2:Comparison of the effects of related work

In Fig.2,after three preprocessing operations (b),(c),and (d),the noise of CAPTCHA images has been removed a lot,especially after the closed operation.Although the contour is clear,there are still problems such as line noise and stroke blur in the detail of images,which need to be filtered.

The results of the two filtering algorithms to CAPTCHA images are shown in Fig.3.

Figure3:Comparison of two filtering denoising algorithms

From Figs.3(a) and 3(b),we can see that the denoising performance of the proposed algorithm is better than the conventional median filtering algorithm.Dealing with the first category of CAPTCHA images,the string is non-tilt,non-adhesive and the overall contour is clear.The performances of the two algorithms are very good,and the adaptive median filtering algorithm is completely denoised.

For the second category,the outline of characters is clear and regular.The algorithm has not only good denoising performance but also preserves fine stroke information.Although the denoising effect of the conventional algorithm is better,it is obviously not desirable to destroy the image information.

For the third category,although the overall outline of characters is clear the characters are slightly stuck and tilted,the overall performance of the two algorithms is very good,but the proposed algorithm performs better in smoothing images.

For the fourth category,the geometric structure of Chinese characters is complicated and the thickness of strokes is uneven,which leads to a slightly blurred image of the relevant preprocessing.Under a bad condition,the algorithm of this paper can still present good performances.Compared with the conventional algorithm,this algorithm destroys the image information less,which can preserve the main information of the image to the utmost extent.

5.2 Performance Comparison

Considering improving the recognition rate of CAPTCHA,there are two evaluation criteria,one is the peak signal-to-noise ratio,and the other is the recognition rate of the CAPTCHA image by the tiny convolutional neural network.

Table1:Comparison of PSNR by two filtering algorithms

From Tab.1,the quality of the adaptive median filtering algorithm based on divide and conquer is better than the conventional median filtering,especially the fourth category.

Table2:Comparison of PSNR by proposed algorithm and closed operation

From Fig.2,the outline of CAPTCHA image after the closed operation is very clear.The experimental results in Tab.2 present that the adaptive filtering is still necessary,which can improve the images quality.

CAPTCHA image denoising aims to improve the recognition rate,and the recognition used the state-of-the-art machine learning algorithm-convolutional neural network.According to the difference of CAPTCHA image,training network models can suit their respective characteristics.Then,it can identify different filtered images to obtain the recognition rate.

The first three types of CAPTCHA image only include numbers,operators,and letters.Therefore,7000 images from these three categories were randomly selected for preprocessing,and then the convolutional neural network model was trained.The remaining 3000 images are processed by two different filtering methods and inputted into the trained model to compare the recognition rate.

The fourth type of CAPTCHA image only includes Chinese characters,and its network model structure must be different from the first three types.After preprocessing,7000 images are randomly selected and directly input to the network model for training.Since the rotated Chinese characters are particular in the feature tensor generated by the whole image,the model can be used to correctly recognize Chinese characters.Then,the remaining 3000 images are processed by two different filtering methods and then input them into the trained model to compare the recognition rate.

Table3:Comparison of recognition rates by various algorithms

From Tab.3,the image processed by the adaptive median filter algorithm based on divide and conquer has a significant improvement in recognition rate.In the experiment,there are clear strokes about the first and fourth types of CAPTCHA,so the effects of two algorithms on recognition rate are basically similar.However,for the second and third types of CAPTCHA with blurred strokes and handwriting,the performance of adaptive median filter algorithm based on divide and conquer is much better than the conventional median filter about recognition rate.Although this algorithm performs very well in the processing of CAPTCHA images of the second and third categories,the recognition rate still can’t reach 100%,which is one of the next research directions

6 Conclusions

This paper proposes an improved median filter algorithm based on divide and conquer.On the one hand,the filter window can automatically adjust size by the density of noise to achieve good denoising performance and preserve the information of the images to the utmost extent.On the other hand,the algorithm can quickly calculate the median value of the first filter window by using the divide and conquer method without sorting all the data,which can make full use of the correlation of data,calculate the median value of the remaining windows in turn,and greatly improve the filtering efficiency.Although the algorithm has achieved good results in the experiment,the denoising effect is poor for images with blurred writing and oblique strokes,which leads to bad recognition performance.For the future work,we will further improve the filtering performance under higher noise intensity and severe distortion.

Acknowledgments:This work is supported by the National Natural Science Foundation of China (No.61772561),the Key Research & Development Plan of Hunan Province (No.2018NK2012),the Postgraduate Research and Innovation Project of Hunan Province (No.CX2018B447),the Postgraduate Science and Technology Innovation Foundation of Cent ral South University of Forestry and Technology (20183027),the Key Laboratory for Dig ital Dongting Lake Basin of Hunan Province.

Computers Materials&Continua2019年3期

Computers Materials&Continua的其它文章: Efficient Construction of B-Spline Curves with Minimal Internal Energy; Modeling and Analysis the Effects of EMP on the Balise System; Dynamic Trust Model Based on Service Recommendation in Big Data; R2N:A Novel Deep Learning Architecture for Rain Removal from Single Image; Controlled Secure Direct Communication Protocol via the Three-Qubit Partially Entangled Set of States; Research on the Law of Garlic Price Based on Big Data