• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning

    2022-07-08 01:38:44AhmadHusseinAbabneh

    Ahmad Hussein Ababneh

    Abstract—Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets.Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004.This study investigates which of these datasets can provide significant training and fair evaluation for text classification(TC).In this investigation, well-known and accurate learning models are used, including naive Bayes (NB), random forest (RF), K-nearest neighbor (KNN), support vector machines (SVM), and logistic regression (LR) models.We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison.The performances of the five learning models across the seven datasets are measured and compared with the performances of the same models trained on a well-known English language dataset.The analysis of the relevance and time scores shows that training the SVM model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time,with the accuracy of 82%.

    1.Introduction

    Text classification (TC) is defined as a text mining process that aims to specify the class or category of a given textual document.The output of such a process can be viewed as a set of classes; each of them contains a set of text documents that represents a specific topic or kind.Single-label TC provides a single label for each document, whereas multi-label TC may generate multiple labels for each document[1].

    The massive amount of texts found on the web and the amount of texts that is being loaded every second necessitate a classification model that can classify each document into its appropriate category, facilitating tasks in other fields, such as natural language processing and information retrieval tasks[2].The importance of TC in other text mining fields has been deeply emphasized by Elnagaret al.[1],[3].

    The training and evaluation of newly developed models in the field of TC mainly depend on the existence of a pre-classified dataset or corpus of text documents[4].A large number of datasets based on the English language have been evaluated for developing TC models using various statistical learning models.In [5]Zhanget al.listed these datasets with their respective statistics, including the number of text documents contained in training and testing sets, the number of classes, and accuracy obtained using several supervised and unsupervised learning models on these datasets.The construction of datasets for the TC research purpose for the Arabic language, which is at the core of this research, is still in its early stages.However,promising research in this field has recently emerged, creating new datasets, such as Single-Label Arabic News Articles Dataset (SANAD), NADiA, and KALIMAT.

    In the field of TC, selecting appropriate datasets is typically a major challenge; thus, researchers should have clear understanding of the data on which a model will be trained and tested to obtain unbiased and significant results[5].It is critical to evaluate the existing datasets and assess whether these datasets were prepared according to a scientific approach.Well-prepared datasets imply consistent judgment across different TC models.

    From our viewpoint, two important factors should be considered when assessing the effectiveness and usefulness of a dataset:

    1) Experiments should be conducted on the dataset using successful machine learning (ML) models in the field of TC.In general, supervised TC models have achieved significant results and have been used in this field since 1968[6].Support vector machines (SVMs), logistic regression (LR),K-nearest neighbor (KNN), etc.have been used intensely in this field.A lot of studies, with various ML models, achieved consistent results[7]-[12].Thus, a new dataset, with its training and testing set, should produce comparable results if it is processed using such pre-trained ML models.

    2) The dataset should be compared with the existing datasets.This comparison includes the application of uniform TC models with uniform implementation constraints.In addition, it includes standard accuracy measures that have been used to evaluate TC models, such as the precision, recall,f1-score, macro_avg,and weighted_avg.Furthermore, a heat map could be constructed for each run to provide a clear picture of the behavior of classifiers over each dataset.

    In this paper, we focus on assessing the quality of Arabic language datasets that have been constructed for training and evaluating TC models developed especially for single-label Arabic TC.The Arabic language is imperative because it is used for more than 450 million people and the religious language for all Muslims worldwide[13].Furthermore, the Arabic language has unique features that make the text processing of the language such a challenging task.These features include 1) the inflectional and derivational nature of the language, 2) the lack of capitalization and short vowels, 3) the existence of different variations of the language in use—the modern, classical, and colloquial, and 4) the permission of the construction of a sentence with an implicit subject[14].

    Several Arabic language datasets have been created for training the newly developed TC models.These datasets include SANAD, Khaleej, Arabiya, Akhbarona[15], KALIMAT, a multipurpose Arabic corpus[16],Waten2004[17], and Khaleej2004[17].There are other datasets, but these datasets do not contain a reasonable number of text documents.This study presents a comprehensive guide for researchers in the field of Arabic TC on the best and reliable datasets to use for model training and evaluation.This research aims to investigate existing Arabic datasets that can provide significant model training and fair evaluation of the models for TC.To achieve this aim, several classifiers have been developed and implemented.These classifiers are based on well-known and tested supervised TC methods.These classifiers are tested on the previously mentioned Arabic datasets.The analysis also includes a comparison with an English dataset that has been used in several research studies[18]-[22].

    After introducing the main problem and explaining our aim in Introduction, the remaining sections of the paper are structured as follows.Section 2 describes the learning models used in our experiment.Section 3 illustrates a general framework and highlights steps performed before, during, and after training the models.Section 4 evaluates the models’ training relevance results and highlights the main findings.Finally, Section 5 draws the main conclusion.

    2.Text Classification Tasks

    Both the supervised and unsupervised ML approaches have been examined in the field of TC[6],[23],[24].The difference between supervised and unsupervised learning lies in labeled data.Supervised learning models use labeled data that represent the models’ knowledge repository, and then use this knowledge to make future predictions.Simply, we teach a model with a sufficient amount of data and then make the model for future predictions.For unsupervised learning models, data are unlabeled, and the models are not aware of any information about the classes or categories in the data; these models attempt to find meaningful patterns in the data[24].

    In this paper, learning models in our experiments include a multinomial naive Bayes (NB) classifier, an SVM classifier, a random forest (RF) classifier, a KNN classifier, and an LR classifier.These classifiers adopted the supervised learning approach, and we have explained the reason for this choice in Introduction.The following subsections explain the five models.

    2.1.NB Model

    The NB classifier is one of the simplest classification learning models but has achieved reasonable accuracy[25],[26].Experiments have shown that it is comparable to advanced models, such as SVM models[7].Furthermore, it is considered an efficient learning method[27], and, as Jianget al.mentioned in [28], NB is among the top ten data mining algorithms due to its efficiency and simplicity.

    NB is a family of models.They use the NB assumption, given pairs of datais a point in ad-dimensional real space,andyi∈Y, whereY={1,2,…,m} is a finite set of classifications.A probabilistic model or a joint distribution betweenxandycan be built under the following assumption:

    Assume a family of distributionsPθf(wàn)orx(i)∈Rdandyi∈Y, such that

    In TC,xandycan be expressed as documents and their classes or words.x-values represent features, and these features may be represented by numerical values, such as frequency.Regardless of the numerical representation of features, the fundamental assumption in NB is the conditional independence[27]:

    If we have anotherx∈Rd, we need to predicty.We need to estimateθf(wàn)romD(e.g., using the maximum likelihood); then, we could computeyfrom the equation:

    From the NB rule:

    The denominator does not depend ony; we will omit it, and if we return to the conditional assumption, we can rewrite the previous equation as follows:

    2.2.RF Model

    RF is identified as one of the most popular and powerful ML algorithms that can perform both classification and regression tasks[29],[30].It is an ensemble ML approach, with a divide-and-conquer approach that combines multiple learning models to form one model to improve the performance[1].Thus, as the name suggests, the model creates several decision trees as a boost of a single tree in a cut model.The model builds up these multiple decision trees using an algorithm, such as the information gain or the Gini index approach[31].In general, the more trees in the forest, the more robust the classification, resulting in higher accuracy[32].

    In RF, to classify a new object based on attributes or features, each tree provides its prediction, and the model chooses the classification that has the most votes (the majority voting) of the trees in the forest.These majority votes then help rectify the high variance which means that a model makes a poor prediction when the training dataset is changed and generally appears in a single decision tree model[31].Thus, RF has a low bias, because it is based on decision trees, and has low variance.It uses the bagging technique (the bagging technique is an ensemble ML approach), which combines multiple decision trees to form a single model.The low bias and variance reduce the overfitting problem, which happens when a trained model performs well with training data but provides inaccurate predictions with the testing data[33].

    Complete mathematical analysis of RF can be found in [32].Assume the number of cases in a training set isN(or the number of items in the training data).Then, samples fromNcases will be taken at random but with replacement.If there areMinput variables or features, a numberm<Mis specified such that at each node,mvariables are selected at random out ofM.The best split ofmis used to split the tree node.The value ofmis kept constant, while the forest is grown.Each tree is grown to the largest extent possible, and there is no pruning.We can predict new data by aggregating the prediction ofktrees[30].

    D={x1,x2,…,xn},xi∈Rdyi∈YY={1,2,…,m}

    Given a training dataset and specified classes , where ,selectkrandom samples with replacement fromD, and build or fitkdecision trees, as follows:

    Select random samples, with replacement, fromDandY; name them asDiYi.

    FithionDiYi, wherehiis a decision tree learner that was constructed from theDirandom sample.

    After training, classifying newx′into the appropriate class (or merely obtaining thevalue) can be made by taking the majority vote from all individual decision trees onx′:

    Besides, the low bias and variance advantage of the RF classifier implies that the classifier takes less time for training and testing than other supervised learning models[30],[31].This is rational, because RF belongs to the low bias family, which reduces the number of assumptions made by RF to fit a model, and it needs a maximum of two tuning parameters[32].

    2.3.KNN Model

    KNN is a powerful classification model in the multidimensional classification space that provides competitive accuracy results[34]-[38].The estimation of the classification assumes that neighbors always share common features and, consequently, common classes.

    KNN assumes that in thed-dimensional space, the closer the two objects, the more likely they belong to the same class.The closeness could be depicted as the distance between the objects or simply between points.The distance can be computed using the Euclidean, Hamming, Manhattan, or Minkowski methods.The closeness could also be seen as the similarity between the objects, and this similarity can be computed using one of the similarity computation methods, such as the cosine similarity[38].

    KNN is a classical ML algorithm that has theoretical guarantees: Asnor the size of dataDbecomes huge,there is some theoretical guarantee about the performance of KNN[36].

    Given a training datasetD={(x1,y1), (x2,y2),…,(xn,yn)},xi∈Rdand a finite set of classesyi∈Y,whereY={1, 2,…,m}, there is some distance metric, such as the Euclidean distance, which measures the distance between two points in thed-dimensional space such that

    wherexi,xj∈Rd,xi=(xi1,xi2,…,xid), andxj=(xj1,xj2,…,xjd).In KNN, if we consider the distance between the newxcalledx′and its closest points (point) as the similarity measure, the class ofx′will be

    If we consider the similarity betweenxiandxj, wherexiis a vector of features, such that=(xi1,xi2,…,xih)and=(xj1,xj2,…,xjl),xi,xj∈Rd, is[2]

    we can generalize the similarity principle and reform (3) to be

    KNN also has a probabilistic formulation[34]as follows.A random variabley:

    whereP(y∣x,D) is the fraction of pointsxiinNk(x), andNk(x) is thek-nearest points toxi.

    The predicted class ofx′is

    2.4.SVM Model

    SVM is a learning model that separates data into classes by creating balanced hyperplanes in a highdimensional space.The idea is to find hyperplanes that maximize the distances between the edges of the classes.SVM gradually seeks to find the maximum empty area between any pair of classes, which is represented as the margin that spans from the central line that separates two classes, to the boundaries of the classes[12].In a multi-classification model, SVM creates a set of hyperplanes and then chooses a subset of them that maximizes the separation margins[26].

    SVM is considered a robust and accurate classification model.This judgment is based on a large number of research studies in different fields, which used SVM and achieved significant results[39]-[46].For the Arabic language, Qadiet al.[11]experimented with 10 supervised learning models, wherein using the SVM model was able to obtain the most significant performance result.A complete mathematical formulation of SVM can be found in [47], but a brief and summarized explanation is essential.

    To illustrate the optimization behind the SVM model, we start from a simple linear model for the binary classification.Given thek-dimensional vectorsx(1),x(2), …,x(n), the training set, and input vectors=(x1,x2,…,xd), wheredis the number of features,xi∈Rd, andyis the classes’ labels,y∈YandY∈{1, -1}, the classification function is

    SVM depicts the data as instances that are embedded in ad-dimensional space and finds a weighted vector=(w1,w2,…,wd),wi∈Rd, such that the linew·x=θ, where operator “·” means the dot product,separates the boundary of two classes (as shown inFig.1).

    Representing a text document in ad-dimensional space means that each distinct word (feature)establishes a dimension, and the document is a point in this space; the classification is performed by creating a separation line, such that all points that fall on one side of the line belong to class 1, while the points which fall on the second side belong to class 2, as shown inFig.2.

    Fig.1.Binary linear classification.

    Fig.2.Text documents on the feature space.

    Now, to define the SVM classifier, we can update the idea of the linear classification.FromD,we have pre-labeled examples (training dataset),,y∈Y, andY∈{1,2,…,m}.Using the same technique in the linear classification, the dot product of the weighted vectorwand the vectorxis.

    In linear classification, multiple lines can exist and separate their classes, but SVM seeks to find the best line that divides data points.To define the best line, SVM considers the distance from the separating line that tells us how confident a prediction is[48].Thus, SVM searches for the line that has the largest margins.

    Define the margin 2γas the length of the empty area between the boundaries of the classes, and the best line is located at the middle of this area, as shown inFig.3.

    Fig.3.Largest margins in SVM.

    Then, the distance betweenAandLis

    Fig.4.Computation of γ.

    AndMis onL, so the=-b, then

    This means that the distance or the margin betweenAandLis the vectorw, which is perpendicular to lineL, multiplying by the coordinates ofAplus the offsetb,γ=(w·A+b)y.So, we want to findwsuch that the smallest marginis as large as possible[47]:

    and this leads to the initial optimization

    such that

    In the previous definition ofγ, the longer thew, the bigger the margin will be; thus, we need to normalizewby its length:

    where

    As shown inFig.5, lineLis defined by a few points that are located on the boundary (not by all points in the space).This means that we can ignore most of the points and focus on points that defined the line (these points are called the support vectors).Let us also require support vectorsxito be on the plane defined by

    Fig.5.Relationship between γ and the support vectors.

    To findγ, we need to find the relation between the support vectors: Assume thatx1is the support vector on one side of lineLandx2is the vector point on the other side ofL,

    We can substitute the value of x1into the equationw·x1+b=1, and we obtain

    Now, the equation w ·x2+b=-1, because it is on the other side ofL:

    since

    This implies that maximizingγcan be expressed as follows:

    which is equivalent to

    Therefore, (7) can be rewritten as

    such that

    So far, we implicitly assume that our data are linearly separable, and it is always possible to find the line that entirely separates the classes.However, most real datasets are noisy, which means that no such separable hyperplanes exist[49].Dealing with such data, where finding out a robust separator seems impossible and this almost happens in every dataset, implies to change our formulation and introduce the penalty parameter:

    whereCis a control variable that can be set using cross-validation.

    The previous equation means that the line should have a maximum margin and makes a small number of mistakes.However, the mistaken points do not have equal severity[49], and we can penalize them using the slack variableξ, as shown inFig.6.

    Fig.6.Separated line with slack variable ξ.

    with the constraint

    A new optimization problem appears in which our goal is to find the largest margin and the smallest sum of the penaltyξi:

    2.5.LR Model

    LR is one of the most widely used classification models[50],[51].Even the name contains the word“regression”, LR is mainly used for classification problems, because it predicts categorical values, not continuous ones.

    In LR, we model our data with a linear functionf(w,b)=wx+b, wherewis the weight andbis the bias,and this produces continuous values.In LR, we seek to covert continuous values to the probability.LR models the probability using the sigmoid function[52]:

    This prediction outputs a probability value between 0 and 1.

    In the approximation functionhθ(x), we have to define thew(weight) andb(bias) using a specific cost function, such as the cross-entropy[52]:

    LR optimizes the cost overwandbusing the gradient descent optimization strategy.It starts at some point and iteratively updates the parameters (wandb) through the calculation of the derivative and then follows the direction of the derivative until reaching the global minimum.In each iteration, the parameters are updated in small steps called the learning rateη.The gradient descent then updateswandbusing the following rules:

    Furthermore, the derivatives can be performed using the following equations[51]:

    3.Structural Framework

    This study’s goal is to measure the relevance of existing Arabic language datasets in the context of TC tasks using well-known and widely used supervised learning models.Therefore, we construct a complete experiment that includes the data collection of the existing datasets, text preprocessing, model implementation, and model performance and analysis.The framework established in this experiment is depicted in Fig.7.

    3.1.Experiment Datasets

    Fig.7shows all the steps performed during the implementation of this experiment.Note that the starting point is to collect datasets used and proposed in the field of TC for the Arabic language.As we stated in Introduction, the selection is based on the number of documents included in the datasets and on the number of classes into which the documents are classified.Therefore, many datasets have been excluded.Datasets selected in the experiment and their features are presented inTable 1.

    Table 1: Features of the Arabic language TC datasets

    Fig.7.Experiment framework.

    InTable 1,seven datasets have been chosen and split into training and testing sets.These datasets have been recently proposed in TC as datasets for training and evaluating newly developed models for the Arabic language[3].Table 1shows that the number of documents in both training and testing sets in these datasets varies, ranging from 5690 documents in Khaleej2004 to 110900 in SANAD.

    The SANAD dataset is the largest because it contains three merged datasets: Khaleej, Akhbarona, and Arabiya.Therefore, we process SANAD as one dataset, and as three datasets by splitting it into three parts.Table 1shows the connection between SANAD and the three datasets, and we refer to them as SANADKhaleej, SANAD-Akhbarona, and SANAD-Arabiya inTable 1, but throughout the paper, they will be referred to as Khaleej, Akhbarona, and Arabiya.We process SANAD in this way to determine whether we can use one of the three parts individually as training and testing sets, especially when they contain sufficient documents.These individual examinations of Khaleej, Akhbarona, and Arabiya have an efficiency effect, since there are significant differences in the size of SANAD as a whole and the sizes of its parts.

    The other three datasets are KALIMAT, Waten2004, and Khaleej2004[15].These datasets are constructed in folders; each folder represents a class (culture, sport, religion, etc.).KALIMAT, Waten2004, and Khaleej2004 datasets are originally not split into training and testing sets, but for consistency with the SANAD’s datasets, we split them according to the splitting ratio used in SANAD.

    For comparison, we use another dataset called “20_newsgroup dataset” that is built in English and has been used to the experiment with new classification models[18]-[22].Another reason that motivates us to use this dataset is the number of documents it contains (18846 documents), as it contains a moderate number of documents (most English language datasets have a huge number of documents, see [5]), it is comparable to the seven Arabic datasets shown inTable 1.

    3.2.Dataset Preprocessing

    The general preprocessing operations have been performed, which include the following:

    1) Tokenization: This operation tokenizes a text and represents the text as a set of words or tokens.

    2) Stopwords removal: This operation includes the removal of any kind of speech that is not a noun or verb.The list of distinct stopwords for the Arabic language contains more than 400 terms[13].

    3) Stemming: The process of removing prefixes and suffixes of a token.The stemming operation is essential and has the positive efficiency and performance effect[14].

    3.3.Learning Model Training

    After preprocessing the text documents, the documents are then split into training and testing documents,as shown inTable 1andFig.7.Learning models are trained in the supervised learning context using the dataset to generate the required learning models.Five learning models are developed based on each training dataset.For example, the experiments developed five learning models for SANAD using five learning approaches described in Section 2.This allows us to measure the performance of each learning approach on each dataset and provides a way to measure how the learning approach suited each dataset.So, the total number of models that are learned from the eight datasets (including the English language one) is 40.

    Python 3 is used to develop these models, and we use a machine with the 8-GB random access memory(RAM) and Intel(R) core? i5-7200U CPU@2.50 GHz 2.71 GHz processor.This piece of information is important, so that readers can relate to the execution time analysis that appears in the evaluation section.

    3.4.Outputs of Learning Models

    The classification process is initiated using the 40 training models, and classification results are collected.Each document will automatically classified into a certain category based on the learning model.The outputs of the learning models are a set of categories.The grouped documents share certain criteria based on the learning method used to perform the classification process.Therefore, for the 8 datasets, we have 40 classification output sets (because each dataset was used to train 5 classification methods).The automatically generated output takes the following quadruple shape:

    The quadruple (SVM, Arabiya, 103, Politics) means that the SVM model classifies document 103 in the Arabiya dataset into the Politics category.

    This representation helps grasp the automatically generated category of the document using different learning models.For example, the same document (103 in Arabiya) has been classified by KNN into the Finance category, and this fact can be easily discovered by processing the quadruple (KNN, Arabiya, 103,Finance), which means that the KNN classifier classifies document 103 in the Arabiya dataset into the Finance category.Since document 103 talks about a loan from the International Monetary Fund to Egypt, the two classification results seem both reasonable.

    The output sets of all experiments are large; they equal the number of documents in the testing sets multiplied by 5.However, the important thing is to measure the classification accuracy and relevancy of each learning method on each dataset.This means that the automatically generated classifications should match the classifications obtained using the testing sets in a significant ratio.The well-known accuracy and relevancy measures in the field of TC are described in the next subsection.

    3.5.Accuracy and Relevance Measures

    The testing phase is established using the testing set of each dataset.The testing set contains preclassified documents (actual classes), and then the learned model predicts a class for each document.Important relevance measures are used to measure the match performance between the pre-classified classes and predicted classes.The performance metrics used in this study include the precision, recall,f1-score, macro_avg, weighted_avg, accuracy, and confusion matrix.For clarity, we assume the following:

    1) True positive: The number of documents that are classified into classciand should be inci.

    2) False positive: The number of documents that are classified into classciand should not be inci.

    3) True negative: The number of documents that are not classified into classciand should not be inci.

    4) False negative: The number of documents that are not classified into classciand should be inci.

    5) Support: The actual number of documents in each class (taken from the testing dataset).

    The relevance measures include

    The confusion matrix has also been used in the evaluation, which tells us if the classification model did right or wrong[38].The confusion matrix has been depicted in a heat map form to ensure clarity for readers.The confusion matrix takes the following form:

    wherexdenotes the number of documents classified into Class 1 and predicted by the learning model as Class 1, whereasyrepresents the number of documents classified into Class 1 and predicted by the learning model as Class 2.Thus, the main diagonal represents the learning model’s correct classification (the actual and the predicated match).

    4.Evaluation and Discussion

    The relevance measurement outcomes for each learning model on the eight datasets are collected for further analysis and discussion.The aim is not to choose a particular dataset or model; the aim is to show,using empirical experiments, the features of each model over the Arabic language datasets shown inTable 1.

    4.1.Accuracy Measure

    The final accuracy results are displayed in Fig.8.Note that the accuracy of the LR and SVM models is higher than that of the other models on all datasets, and it shows that all learning models, except the RF model, obtain the highest accuracy on the English language dataset.The accuracy of all models on the Khaleej and Arabiya datasets is convergent and is higher than the accuracy on the other Arabic language datasets.In this study, all learning models obtain the lowest accuracy on the Waten2004 dataset.

    4.2.Execution Time

    The accuracy of the five classifiers is supplemented with the time (in seconds) taken to train and develop each model.Table 2shows the time in seconds required to develop each model over the eight datasets.The horizontal view represents the time according to the datasets,whereas the vertical view represents the time according to the learning models.Table 2shows that both the LR and SVM models consume a considerable amount of time to be able to achieve the final accuracy, as shown inFig.8, whereas the KNN model consumes the lowest time.

    Fig.8.Accuracy results of the five learning models over the eight datasets.

    Table 2: Time in seconds required to build each model using the eight datasets

    The vertical view of Khaleej and English corpus columns shows the convergence between the behavior of the five models on Khaleej and the English language corpus, respectively.And note that the time consumed by the LR and SVM models on the SANAD dataset is 439.73 s and 1942.84 s, respectively.When we compare this to the accuracy of these models over the SANAD dataset inFig.8, we see that the increase in the time penalty comes with no improvement in the accuracy level.

    4.3.Relevance Measures

    The time execution provides an idea of how long it takes to run each model over each dataset with the hardware specifications mentioned in subsection 3.3; it will not be a critical factor in the experiment, because the datasets have different sizes.Therefore, we explore the relevance measures, because they are important to determine which datasets contain sufficient semantic information that can yield a reasonable classification result.

    4.3.1.Macro Average and Weighted Average

    The accuracy curve inFig.8is supplemented byFigs.9to14, which also measure the relevance of each model over the eight datasets.InFigs.9to 14, the datasets are placed on thex-axis, while they-axis represents the relevance values (precision, recall, etc.).The presentation in this way allowed us to see the behavior of each model on the eight datasets separately as well as the behavior of the models in one picture.The salient points in these figures can be summarized as follows:

    1) InFigs.9to14, two peaks appear; one at Arabiya, which spans to Khaleej, and one at the English language dataset.This note provides a positive indication about the possibility of using Khaleej and Arabiya as training datasets for TC tasks.For all learning models, these two datasets offer a productive environment that leads to significant relevant results.

    Fig.9.The macro_avg of precision values of five learning models over eight datasets.

    Fig.10.The macro_avg of the recall values of five learning models over eight datasets.

    Fig.11.The macro_avg of the f1 score values of five learning models over eight datasets.

    Fig.12.The weighted_avg of precision values of five learning models over eight datasets.

    Fig.13.The weighted_avg of the recall values of five learning models over eight datasets.

    Fig.14.The weighted_avg of the f1 score values of five learning models over eight datasets.

    2) InFigs.9to14, the curves are noted to decrease toward the SANAD dataset, which means that the learning models obtained smaller relevance values, except for the NB model, as shown inFig.9; however,the NB model shows unstable behaviors, as shown inFigs.9to14.This comment is valuable because it is mentioned before that SANAD is a collection of three datasets, that is, Arabiya, Khaleej, and Akhbarona, and when we process these three datasets separately using the learning models, we have obtained significant results.The question is why the fusion of these three datasets reduces relevance measures, which seems illogical because ML models should benefit from an increase in the dataset size.The answer to this question relies on the Arabic language.As we mentioned in Introduction, Arabic is a semantically rich language with a massive number of distinct terms.Although all Arab countries speak the same language, Arabic, each has its set of terms that are commonly used in its media.This variation in terms used from one country to the next impacts the statistics used by the learning model.

    3) InFigs.9to14, the models show the lowest relevance on the Waten2004 and KALIMAT datasets except for the NB model; the reason for this comes from the categories chosen.Both of them contain International and Local categories, which contain International or Local news data.Both the Local and International categories contained documents that could be predicted or classified into another category.For example, if the document processed refers to an international sports event, there is a good chance that the learning model will classify this document into the Sports category rather than the International category.

    4) According to relevance curves inFigs.9to14:

    ? The KNN and RF models show a similar trend, and their relevance scores are confined between 50%and 60%.

    ? The SVM and LR models show a convergent trend, and their relevance scores are confined between 70% and 85%.

    ? The SVM model obtains the most significant scores, which is consistent with the results shown inFig.8.

    ? The NB model shows significant results on Akhbarona, Arabiya, Khaleej, SANAD, and the English language corpus.

    4.3.2.Confusion Matrix and Heat Map

    The confusion matrix provides a clear picture of how the learning model predicted or classified the documents.Table 3represents the confusion matrix of the SVM model on Waten2004.The number of categories is six (Culture, Economy, International, Local, Religion, and Sports).The bold and underlined values are the numbers of documents that the SVM model classified correctly based on the actual classification.

    Table 3: Confusion matrix of SVM on Waten2004 dataset

    Confusion matrix interpretation is crucial because it provides information about the relationship between the categories.For example, 307 documents of 656 are classified into the Culture category, and they are preclassified in the testing set of Waten2004 in the Culture category, whereas 349 (33 + 30 + 80 + 133 + 73)documents that should be in the Culture category are incorrectly classified into other categories.Note that 133 of the documents that should be classified into the Culture category are classified into the Religion category, which means that the SVM model found relationships between the documents that belong to both categories.The confusion matrix can also reveal a problem in the classification process.The International category has 149 (33 + 27 + 37 + 21 + 31) misclassified documents, whereas the Local category has 296 (83+ 90 + 17 + 49 + 57) misclassified documents.We clarified the classification problem of International and Local categories when we analyzedFigs.9to 14, and the confusion matrix asserts our clarification.

    As we find that the SVM model has obtained the most significant relevance results, we present the confusion matrices of the SVM model on the eight datasets.The confusion matrices are presented in the heat map form to facilitate the trace of the performance of the classification process, as shown inFigs.15to21.

    Fig.15.Confusion matrix of SVM experiment on Akhbarona dataset.

    Fig.16.Confusion matrix of SVM experiment on Waten2004 dataset.

    Fig.17.Confusion matrix of SVM experiment on Khaleej2004 dataset.

    Fig.18.Confusion matrix of SVM experiment on Arabiya dataset.

    Fig.19.Confusion matrix of SVM experiment on KALIMAT dataset.

    Fig.20.Confusion matrix of SVM experiment on SANAD dataset.

    InFigs.15to22, the color brightness of a cell represents the number of documents in that cell; the brighter the color, the larger the number of documents.The scale on the right of each figure shows where the color changes.Based on this fact, the classification using Arabiya (Fig.18) seems to be the most significant followed by SANAD (Fig.20) and Khaleej (Fig.21).In these three figures, the diagonal appears, which gives the positive indication of the classification quality.Moreover, the trend of the classification inFigs.18,20, and 21 is similar to the trend inFig.22, which represents the classification using the 20_newgroup dataset.Based on classifications represented inFigs.15to 17, and 19, we can see that brighten cells are not located on the diagonal parts of the maps, which could mean that the number of predicted documents that does not match the number of the actual classification is large.

    Fig.21.Confusion matrix of SVM experiment on the Khaleej dataset.

    Fig.22.Confusion matrix of SVM experiment on the 20_newsgroup dataset.

    4.4.Main Findings after Analysis

    The relevance measures and time constraints should be used to determine which of the seven datasets should be used to train a classification model.In this paper, we aim to provide researchers with the required relevance measures upon which researchers can give their decision.Accordingly, we highlight the following findings from our experiment:

    1) For the five learning models, the use of KALIMAT, Waten2004, and Khaleej2004 as training datasets has produced insignificant relevance results (e.g., for the SVM classifier, the accuracy rates using the datasets are 67%, 66%, and 70%, respectively (Fig.8)).Thus, there is no need to consider the time penalty.

    2) The use of SANAD as a training dataset provides reasonable classification (Fig.20), but the time penalty is high (Table 2).The time required to train the SVM model with the SANAD dataset is ten times that required to train with the Khaleej dataset and 42 times that required to train with the Arabiya dataset.

    3) The use of Arabiya and Khaleej, which are complete datasets that are fused to form SANAD, as training datasets for the five learning models, provides significant relevance results (Figs.8to14) with an acceptable time penalty (Table 2).The advantage of the Arabiya dataset is that it provides the most significant classification, and this is shown in the confusion matrix representation.However, the disadvantage is the fewer number of categories; Arabiya has only five categories (Fig.18).On the other hand, the Khaleej dataset has more categories, but the relevance score is the second-best (Fig.21).

    4) The time required to process Akhbarona is less than that required to process SANAD (Table 2).Therefore, Akhbarona can be used instead of SANAD in training a learning model.

    5) In the seven Arabic language datasets, the number of categories is small (Figs.15to21)—the maximum is seven categories.In contrast, the English dataset 20_newgroup contains 20 categories with 7 subcategories of the Computer category (Fig.22).The categories of the seven Arabic datasets are general and each can contain several subcategories.The detailed categories of the English dataset help in increasing the accuracy of the learning process.A future challenge is to create a dataset that contains a considerable number of categories and classify the documents into these categories in a consistent way that improves the learning strategy to develop a learning model with an accurate classification.

    5.Conclusion

    In the paper, we trained five well-known and widely used learning models using seven recently published Arabic language datasets for TC tasks.The SANAD, Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004,and Khaleej2004 datasets were studied.This study presented all relevance measures that are necessary for the researchers to decide which of these datasets provides the best training and testing environment for TC of the Arabic language text.The selected learning models included the well-known and accurate learning models that have demonstrated their competence in TC.

    The decision of which dataset(s) to use as training and testing sets is based on the behaviors of these datasets during the training and evaluation of the learning models.The behavior was then measured through standard relevance and time measures obtained on 40 different learning experiments.After analyzing all experiments, we have found that the Arabiya and Khaleej datasets provided significant training, and this was reflected in their relevance measures, which reached 82%.Furthermore, we found that the learning time of the models on Arabiya and Khaleej was deemed reasonable.The SANAD and Akhbarona datasets showed a convergent relevance trend, with relevance scores ranging from 56% to 68%.However, the execution time to train the models using the SANAD dataset was insignificant.KALIMAT, Waten2004, and Khaleej2004 showed a moderate performance for both relevance and training time.

    One shortcoming was observed in this study: The number of categories in the seven datasets is small compared with that in the existing English language dataset.The maximum number of categories in the Arabic datasets is seven, and all categories depict general topics that could be subdivided into subcategories.

    Disclosures

    The authors declare no conflicts of interest.

    综合色丁香网| 亚洲中文av在线| 天天躁狠狠躁夜夜躁狠狠躁| 日韩免费高清中文字幕av| 两个人看的免费小视频| 精品国产一区二区久久| 精品少妇一区二区三区视频日本电影 | 久久久久久免费高清国产稀缺| 国产亚洲午夜精品一区二区久久| 久久久欧美国产精品| 狠狠精品人妻久久久久久综合| 亚洲av综合色区一区| 老司机亚洲免费影院| 好男人视频免费观看在线| 久久99热这里只频精品6学生| 国产高清国产精品国产三级| xxx大片免费视频| 免费久久久久久久精品成人欧美视频| 在线观看免费日韩欧美大片| 一区在线观看完整版| 免费不卡黄色视频| 久久久久国产一级毛片高清牌| av一本久久久久| 成人漫画全彩无遮挡| 久久综合国产亚洲精品| 亚洲色图综合在线观看| 久久精品国产综合久久久| 五月天丁香电影| 在线免费观看不下载黄p国产| 亚洲精品在线美女| 亚洲av成人不卡在线观看播放网 | 两个人看的免费小视频| 日日摸夜夜添夜夜爱| 一边摸一边抽搐一进一出视频| 亚洲成人免费av在线播放| videosex国产| 18禁裸乳无遮挡动漫免费视频| 王馨瑶露胸无遮挡在线观看| 精品亚洲成a人片在线观看| 国产男女超爽视频在线观看| 国产成人欧美在线观看 | 丝瓜视频免费看黄片| 老汉色av国产亚洲站长工具| 又大又黄又爽视频免费| 精品少妇久久久久久888优播| 青春草国产在线视频| 秋霞伦理黄片| 91国产中文字幕| 久久ye,这里只有精品| 一级a爱视频在线免费观看| 秋霞伦理黄片| 亚洲精品日韩在线中文字幕| 丰满迷人的少妇在线观看| 久久久久国产精品人妻一区二区| 亚洲国产精品999| 这个男人来自地球电影免费观看 | www.av在线官网国产| 一区在线观看完整版| 精品少妇久久久久久888优播| 老汉色∧v一级毛片| 亚洲,欧美,日韩| 香蕉国产在线看| 成人漫画全彩无遮挡| 啦啦啦啦在线视频资源| 五月开心婷婷网| 看非洲黑人一级黄片| av国产精品久久久久影院| 妹子高潮喷水视频| 国产精品.久久久| 看非洲黑人一级黄片| 久久 成人 亚洲| 三上悠亚av全集在线观看| 亚洲中文av在线| 男人操女人黄网站| 校园人妻丝袜中文字幕| 国产精品蜜桃在线观看| 丝袜美腿诱惑在线| 少妇的丰满在线观看| 亚洲欧美清纯卡通| 久久毛片免费看一区二区三区| 麻豆av在线久日| 国产人伦9x9x在线观看| 国产亚洲一区二区精品| 女人久久www免费人成看片| 日本一区二区免费在线视频| 中文字幕精品免费在线观看视频| 午夜免费男女啪啪视频观看| 成人黄色视频免费在线看| 国产99久久九九免费精品| 三上悠亚av全集在线观看| 成人亚洲精品一区在线观看| 一级片'在线观看视频| www日本在线高清视频| 免费少妇av软件| 亚洲精品aⅴ在线观看| 国产精品 欧美亚洲| 99热全是精品| 欧美黄色片欧美黄色片| 一区二区av电影网| 不卡av一区二区三区| 国产淫语在线视频| 亚洲av中文av极速乱| 精品一区二区三区av网在线观看 | 久久久久久久久久久久大奶| 我的亚洲天堂| 91成人精品电影| 男女免费视频国产| 免费av中文字幕在线| 精品少妇一区二区三区视频日本电影 | 一级片免费观看大全| 亚洲国产中文字幕在线视频| 日韩中文字幕欧美一区二区 | 波多野结衣一区麻豆| 最近中文字幕高清免费大全6| 女的被弄到高潮叫床怎么办| 久久人妻熟女aⅴ| 国产男人的电影天堂91| 美女视频免费永久观看网站| 看十八女毛片水多多多| 色综合欧美亚洲国产小说| 九色亚洲精品在线播放| 欧美激情极品国产一区二区三区| 大码成人一级视频| 欧美亚洲日本最大视频资源| 啦啦啦 在线观看视频| 成人国产麻豆网| 午夜福利影视在线免费观看| 男女无遮挡免费网站观看| 亚洲精品自拍成人| 在线观看免费高清a一片| 人人妻,人人澡人人爽秒播 | 69精品国产乱码久久久| 咕卡用的链子| 1024香蕉在线观看| 精品一区二区免费观看| 国产日韩欧美视频二区| 免费高清在线观看日韩| 中文字幕最新亚洲高清| 三上悠亚av全集在线观看| 国产av码专区亚洲av| 国产精品亚洲av一区麻豆 | 香蕉国产在线看| 国产淫语在线视频| 日本色播在线视频| 高清欧美精品videossex| 国产日韩欧美在线精品| 1024视频免费在线观看| 91老司机精品| 97在线人人人人妻| 男女边摸边吃奶| xxxhd国产人妻xxx| 波多野结衣av一区二区av| 精品福利永久在线观看| 国产99久久九九免费精品| 久久精品国产a三级三级三级| 亚洲七黄色美女视频| 精品一区在线观看国产| 亚洲国产欧美日韩在线播放| 交换朋友夫妻互换小说| 亚洲精品久久午夜乱码| 日韩视频在线欧美| 国产国语露脸激情在线看| 我要看黄色一级片免费的| 精品人妻在线不人妻| 在线观看免费日韩欧美大片| 丝袜美足系列| 另类亚洲欧美激情| 精品一区二区三区四区五区乱码 | 日韩一区二区三区影片| 国产免费现黄频在线看| 久久久精品94久久精品| 嫩草影视91久久| 男女下面插进去视频免费观看| 丰满迷人的少妇在线观看| 亚洲图色成人| 免费人妻精品一区二区三区视频| 99九九在线精品视频| 国产精品香港三级国产av潘金莲 | 久久综合国产亚洲精品| 亚洲少妇的诱惑av| 欧美亚洲 丝袜 人妻 在线| 日日摸夜夜添夜夜爱| 国产又爽黄色视频| 免费在线观看黄色视频的| 高清欧美精品videossex| 亚洲成人国产一区在线观看 | 精品国产国语对白av| 日韩电影二区| 极品人妻少妇av视频| 777久久人妻少妇嫩草av网站| 国产一区二区三区综合在线观看| 2018国产大陆天天弄谢| 另类精品久久| 纵有疾风起免费观看全集完整版| av卡一久久| 亚洲国产看品久久| 成人国产麻豆网| 美女国产高潮福利片在线看| 波野结衣二区三区在线| 成人漫画全彩无遮挡| 高清在线视频一区二区三区| 亚洲精品,欧美精品| 咕卡用的链子| 在线天堂中文资源库| 可以免费在线观看a视频的电影网站 | 精品福利永久在线观看| 午夜91福利影院| 成人午夜精彩视频在线观看| 亚洲一级一片aⅴ在线观看| 亚洲熟女毛片儿| 国产成人精品在线电影| 精品第一国产精品| 欧美在线黄色| 一区二区av电影网| 久久久久国产精品人妻一区二区| 免费高清在线观看日韩| 免费观看性生交大片5| 国产xxxxx性猛交| 日本午夜av视频| 免费日韩欧美在线观看| 午夜福利免费观看在线| 亚洲av电影在线观看一区二区三区| 久久久久久人妻| 亚洲精品aⅴ在线观看| 亚洲精品av麻豆狂野| 久久国产精品男人的天堂亚洲| 一区二区三区精品91| 午夜福利影视在线免费观看| 日本黄色日本黄色录像| 国产精品欧美亚洲77777| 中文字幕亚洲精品专区| 国产黄频视频在线观看| 欧美精品亚洲一区二区| 欧美日韩亚洲综合一区二区三区_| 亚洲精品国产色婷婷电影| 国产乱人偷精品视频| 老鸭窝网址在线观看| 啦啦啦视频在线资源免费观看| 日韩精品免费视频一区二区三区| 精品国产一区二区久久| 老汉色∧v一级毛片| 人成视频在线观看免费观看| netflix在线观看网站| 免费看不卡的av| 极品人妻少妇av视频| 建设人人有责人人尽责人人享有的| 精品国产露脸久久av麻豆| 国产高清国产精品国产三级| 精品视频人人做人人爽| 国产成人欧美在线观看 | 亚洲中文av在线| www.熟女人妻精品国产| 国产又爽黄色视频| 亚洲免费av在线视频| 激情视频va一区二区三区| 精品国产露脸久久av麻豆| 我的亚洲天堂| a级片在线免费高清观看视频| 最新在线观看一区二区三区 | 极品少妇高潮喷水抽搐| 日韩伦理黄色片| 97精品久久久久久久久久精品| 叶爱在线成人免费视频播放| 男男h啪啪无遮挡| 亚洲第一青青草原| 日韩av免费高清视频| 久久人人97超碰香蕉20202| 国产黄色免费在线视频| 亚洲国产精品成人久久小说| 精品国产乱码久久久久久男人| 人人澡人人妻人| 亚洲第一av免费看| 日日撸夜夜添| 亚洲图色成人| 国产国语露脸激情在线看| 日韩中文字幕欧美一区二区 | 亚洲国产中文字幕在线视频| 巨乳人妻的诱惑在线观看| 午夜av观看不卡| 老司机在亚洲福利影院| 国产成人精品无人区| 亚洲成人手机| 国产精品.久久久| 丰满迷人的少妇在线观看| 亚洲国产看品久久| 七月丁香在线播放| av线在线观看网站| 日韩伦理黄色片| 天堂俺去俺来也www色官网| 久久99一区二区三区| 精品少妇内射三级| 国产欧美亚洲国产| 久久青草综合色| 少妇被粗大的猛进出69影院| 亚洲精品国产一区二区精华液| 精品福利永久在线观看| 2021少妇久久久久久久久久久| 国产欧美亚洲国产| av国产精品久久久久影院| 国产一卡二卡三卡精品 | 伊人久久国产一区二区| 日韩伦理黄色片| 亚洲精品在线美女| 青草久久国产| 九色亚洲精品在线播放| 欧美精品人与动牲交sv欧美| 老司机亚洲免费影院| 国产男女超爽视频在线观看| 好男人视频免费观看在线| 亚洲精品美女久久av网站| 国产亚洲最大av| 国产淫语在线视频| 一边亲一边摸免费视频| 国产成人系列免费观看| 中国三级夫妇交换| av在线老鸭窝| 女人精品久久久久毛片| 蜜桃国产av成人99| 又大又爽又粗| 免费黄色在线免费观看| 这个男人来自地球电影免费观看 | 亚洲色图 男人天堂 中文字幕| 九色亚洲精品在线播放| 麻豆av在线久日| 9热在线视频观看99| 国产成人精品久久久久久| 午夜日韩欧美国产| 伦理电影大哥的女人| 亚洲精品久久午夜乱码| 看免费成人av毛片| 久久精品久久精品一区二区三区| 在线天堂中文资源库| 精品一区二区三区av网在线观看 | 高清av免费在线| 国产亚洲av片在线观看秒播厂| 操出白浆在线播放| 亚洲精品国产av蜜桃| 国产熟女午夜一区二区三区| 如何舔出高潮| 99精品久久久久人妻精品| 亚洲精品国产区一区二| 69精品国产乱码久久久| 三上悠亚av全集在线观看| 一区在线观看完整版| 妹子高潮喷水视频| 欧美人与性动交α欧美精品济南到| 激情视频va一区二区三区| 亚洲av国产av综合av卡| 建设人人有责人人尽责人人享有的| 亚洲一区二区三区欧美精品| 青春草国产在线视频| 这个男人来自地球电影免费观看 | 操出白浆在线播放| 亚洲av成人不卡在线观看播放网 | 青草久久国产| 丝袜在线中文字幕| 欧美日韩一级在线毛片| 午夜精品国产一区二区电影| 亚洲国产av影院在线观看| 日韩制服丝袜自拍偷拍| 在线观看免费午夜福利视频| 男人舔女人的私密视频| 日本vs欧美在线观看视频| 伊人久久国产一区二区| 一本—道久久a久久精品蜜桃钙片| 日日啪夜夜爽| e午夜精品久久久久久久| 亚洲精品一区蜜桃| 国产精品亚洲av一区麻豆 | 国产高清国产精品国产三级| 国产毛片在线视频| av国产精品久久久久影院| 啦啦啦 在线观看视频| 亚洲,一卡二卡三卡| 国产一区二区三区综合在线观看| 91精品国产国语对白视频| 欧美日本中文国产一区发布| 嫩草影院入口| 免费观看av网站的网址| 国产精品嫩草影院av在线观看| 你懂的网址亚洲精品在线观看| 夜夜骑夜夜射夜夜干| 免费观看av网站的网址| 下体分泌物呈黄色| 国产精品国产三级专区第一集| 青春草国产在线视频| 国产精品成人在线| 三上悠亚av全集在线观看| 日韩不卡一区二区三区视频在线| 国产野战对白在线观看| 自拍欧美九色日韩亚洲蝌蚪91| 精品亚洲成a人片在线观看| 最近最新中文字幕免费大全7| 欧美av亚洲av综合av国产av | 国产精品二区激情视频| 国产免费视频播放在线视频| 亚洲av电影在线进入| 精品国产国语对白av| 叶爱在线成人免费视频播放| 国产片特级美女逼逼视频| 纵有疾风起免费观看全集完整版| 秋霞在线观看毛片| 国产熟女午夜一区二区三区| 国产在线一区二区三区精| 日韩一卡2卡3卡4卡2021年| 日本91视频免费播放| 一本—道久久a久久精品蜜桃钙片| 国产精品久久久av美女十八| 一级毛片电影观看| 亚洲在久久综合| 久久 成人 亚洲| 免费高清在线观看日韩| 在线 av 中文字幕| 亚洲激情五月婷婷啪啪| 亚洲成人一二三区av| 欧美日韩福利视频一区二区| 精品亚洲乱码少妇综合久久| 一本色道久久久久久精品综合| 美女午夜性视频免费| 夫妻午夜视频| 亚洲在久久综合| 卡戴珊不雅视频在线播放| 90打野战视频偷拍视频| 一本大道久久a久久精品| 成人毛片60女人毛片免费| 最近的中文字幕免费完整| 在线观看免费日韩欧美大片| 免费黄网站久久成人精品| 七月丁香在线播放| 日韩一区二区视频免费看| 久久午夜综合久久蜜桃| 黄色视频不卡| 丰满少妇做爰视频| 欧美精品一区二区大全| 久久久久久免费高清国产稀缺| 在线观看免费午夜福利视频| tube8黄色片| 多毛熟女@视频| 高清视频免费观看一区二区| 秋霞伦理黄片| 亚洲欧洲精品一区二区精品久久久 | 亚洲欧美精品自产自拍| 少妇 在线观看| 视频在线观看一区二区三区| 欧美 日韩 精品 国产| 性少妇av在线| 男女无遮挡免费网站观看| 欧美精品av麻豆av| 国产在线免费精品| av电影中文网址| 国产精品久久久久久人妻精品电影 | 女性生殖器流出的白浆| 亚洲四区av| av网站免费在线观看视频| 亚洲精品成人av观看孕妇| 久久精品亚洲av国产电影网| 自线自在国产av| 少妇的丰满在线观看| av视频免费观看在线观看| 国产一区二区 视频在线| 亚洲国产日韩一区二区| 一本大道久久a久久精品| 亚洲国产欧美一区二区综合| 免费看av在线观看网站| 亚洲av男天堂| 两个人免费观看高清视频| 黑人欧美特级aaaaaa片| 天天添夜夜摸| 国产精品免费视频内射| 在线观看免费午夜福利视频| 搡老岳熟女国产| 中文字幕最新亚洲高清| 成人免费观看视频高清| 不卡av一区二区三区| 国产黄色免费在线视频| 伊人久久大香线蕉亚洲五| 狂野欧美激情性xxxx| 午夜免费观看性视频| 我的亚洲天堂| 亚洲成人av在线免费| 国产精品嫩草影院av在线观看| 汤姆久久久久久久影院中文字幕| 日韩欧美一区视频在线观看| 观看美女的网站| 国产av精品麻豆| 亚洲一区中文字幕在线| 午夜av观看不卡| 国产野战对白在线观看| 国产高清国产精品国产三级| 亚洲成色77777| 免费日韩欧美在线观看| 香蕉国产在线看| 男女午夜视频在线观看| 男女高潮啪啪啪动态图| 国产在线免费精品| 久久久久网色| 哪个播放器可以免费观看大片| 国产日韩欧美亚洲二区| 国产片内射在线| 国产探花极品一区二区| 一区二区三区乱码不卡18| 人成视频在线观看免费观看| 欧美成人精品欧美一级黄| 各种免费的搞黄视频| a级片在线免费高清观看视频| 国产1区2区3区精品| 亚洲一码二码三码区别大吗| 国产精品久久久久久精品古装| 七月丁香在线播放| 亚洲成人国产一区在线观看 | 久久久久久人妻| 2018国产大陆天天弄谢| 国产欧美亚洲国产| 美女国产高潮福利片在线看| 美女主播在线视频| 精品国产一区二区三区四区第35| 亚洲精品久久午夜乱码| 男女免费视频国产| 悠悠久久av| 日韩中文字幕视频在线看片| 欧美人与善性xxx| 嫩草影院入口| 午夜福利免费观看在线| 美国免费a级毛片| 十八禁人妻一区二区| 日日啪夜夜爽| 国产男人的电影天堂91| 国产一区二区三区综合在线观看| 老司机亚洲免费影院| 亚洲天堂av无毛| 一个人免费看片子| 欧美日韩视频高清一区二区三区二| 99精国产麻豆久久婷婷| 天堂俺去俺来也www色官网| 国产成人免费观看mmmm| 人妻一区二区av| 一级毛片我不卡| 日韩大码丰满熟妇| 丁香六月天网| 国产成人av激情在线播放| 中文字幕另类日韩欧美亚洲嫩草| 热re99久久精品国产66热6| 亚洲一级一片aⅴ在线观看| 久久av网站| 免费在线观看视频国产中文字幕亚洲 | 精品一品国产午夜福利视频| 国产1区2区3区精品| 电影成人av| 国产深夜福利视频在线观看| 国产高清国产精品国产三级| 国产av码专区亚洲av| 在线观看www视频免费| 狂野欧美激情性xxxx| 午夜福利,免费看| 超碰成人久久| 最黄视频免费看| 久久精品aⅴ一区二区三区四区| 午夜福利视频精品| 亚洲国产欧美一区二区综合| 伊人久久国产一区二区| 黑人欧美特级aaaaaa片| 国产免费福利视频在线观看| 久久久久久久大尺度免费视频| 欧美日韩亚洲国产一区二区在线观看 | 久久久精品免费免费高清| 嫩草影院入口| av在线观看视频网站免费| 午夜福利视频在线观看免费| 久久人人爽av亚洲精品天堂| 免费观看av网站的网址| 久久久久久久大尺度免费视频| √禁漫天堂资源中文www| 日韩熟女老妇一区二区性免费视频| 欧美日韩精品网址| 亚洲伊人久久精品综合| 久久天堂一区二区三区四区| 无限看片的www在线观看| 亚洲欧美一区二区三区黑人| 欧美亚洲日本最大视频资源| 精品久久蜜臀av无| 大香蕉久久网| 国产亚洲av高清不卡| 日日撸夜夜添| 不卡视频在线观看欧美| 久久久久网色| 久久这里只有精品19| 免费在线观看完整版高清| 国产精品嫩草影院av在线观看| 69精品国产乱码久久久| 国产精品久久久人人做人人爽| av电影中文网址| 亚洲一卡2卡3卡4卡5卡精品中文| 日韩电影二区| 久久精品国产综合久久久| 两性夫妻黄色片| 国产免费又黄又爽又色| 日日爽夜夜爽网站| 最新的欧美精品一区二区| 一个人免费看片子| 黄色毛片三级朝国网站| 亚洲精品久久久久久婷婷小说| 一级爰片在线观看| 在线观看免费高清a一片| 亚洲人成电影观看| 欧美av亚洲av综合av国产av | 国产伦人伦偷精品视频| 三上悠亚av全集在线观看| 久久久久人妻精品一区果冻| 欧美日韩一区二区视频在线观看视频在线| 如何舔出高潮| 亚洲精品国产一区二区精华液| 欧美日韩综合久久久久久| 国产成人欧美| 国产视频首页在线观看| av.在线天堂|