A Comparative Study of Higher

Higher-order kernel estimation and kernel density derivative estimation are techniques for reducing the asymptotic mean integrated squared error in nonparametric kernel density estimation. A reduction in the error criterion is an indication of better performance. The estimation of kernel function relies greatly on bandwidth and the identified reduction methods in the literature are bandwidths reliant for their implementation. This study examines the performance of higher order kernel estimation and kernel density derivatives estimation techniques with reference to the Gaussian kernel estimator owing to its wide applicability in real-life-settings. The explicit expressions for the bandwidth selectors of the two techniques in relation to the Gaussian kernel and the bandwidths were accurately obtained. Empirical results using two data sets obviously revealed that kernel density derivative estimation outperformed the higher order kernel estimation excellently well with the asymptotic mean integrated squared error as the criterion function.


Introduction.
The Gaussian kernel estimator is of numerous applications in many fields of study.The extensive applicability of the Gaussian distribution is attributed to its ability to define the probability of any given data set and the smoothness of its estimates using mathematical tools.In probability theory, the Gaussian distribution explains one hundred percent of the values of observations to be investigated in a given probability space with the symmetric property.The Gaussian function is very important in image production owing to the fact that it can be easily extended to any desired dimension of interest.In density estimation, the Gaussian distribution is fundamental in probability estimation, especially in statistics and other related fields (Nan and Ji, 2020; Siloko et al., 2020a;Johnson et al., 2021).
Density estimation is the production of probability estimates from a set of observations and it is of great significance in statistical fields of studies.Generally, the probability density estimate may either be constructed from a known probability distribution which is regarded as parametric estimation or from unknown distribution which is nonparametric estimation.There is usually a predetermined structure in parametric estimation but structure predetermination does not exist in nonparametric estimation because they are pliable.However; the pliability of nonparametric techniques is associated with a high computational cost which constricted their utilizations before the advent of fast computing systems.The cost of computational analysis of manifold observations is often connected with nonparametric estimation particularly with composite statistical designs.There is no imposition of distribution in nonparametric estimation because the data reveals the statistical composition which accounted for their extensive uses since information about some pre-historical data may not be readily available (Green et al., 2015;Scott, 2015).

Pakistan Journal of Statistics and Operation Research
One of the most studied nonparametric density estimation techniques is the kernel method and the prominence of the method is accredited to the clarity of its results and graphical presentation of observations in a comprehensible form for the purpose of visualization (Stanislaw, 2018).The kernel estimator is a probability function with several uses in many fields of study particular in this information age where series of big data are been analyzed for better decision making by individuals and organizations.The popularity of the kernel estimator is also attributed to its reckoning costeffectiveness and clarity of presentation of data in a distinctive manner.As an exploratory and visualization tool, the kernel estimator has been used indirectly in many areas (see Raykar et al., 2015;King et al., 2016;Helu et al., 2017;Rojas, 2017;Spencer, 2017;Bonnier et al., 2019;Siloko et al., 2019;Li et al., 2019;Siloko et al., 2021).Despite the relevance of the kernel estimator in numerous fields and its wide applicability in density estimation, the complexity connected with the appropriate selection of the bandwidth or smoothing parameter which determines the level of smoothness of the estimate is a serious setback to its implementation.The smoothing parameter influences the performance of the kernel estimator but applying diverse kernel estimators like the Epanechnikov against Triweight kernel will definitely produce dissimilar outcomes due to variation in the derivatives of the function.
Nonetheless, the performance of kernel method in data analysis can be enhanced by utilizing higher order kernel and kernel density derivatives because of their bias and variance reduction property.Regarding higher order kernels, the performance typically rely on the magnitude of the bandwidth with its order due to the fact that they converge at faster rates than lower order kernels.The justification for higher order kernel is occasioned by their convergence rates but often constrained by the production of estimates that are with negative components hence, they are oftentimes not probability density estimate and this situation is always complex to be analyzed by data scientists.Irrespective of the negative estimates of higher order kernel, the estimation of massive data with higher order kernel has shown their superiority over the corresponding lower kernel.The improvement in performance of higher order kernel is regularly characterized by a large bandwidth contrary to the usual bandwidth in the estimation of lower order kernels (Marron and Wand, 1992;Jones, 1992;Marron, 1994;Wand and Jones, 1995).
Kernel density derivative estimation have been of immerse significance in data exploration and visualization.The first and second derivatives also known as density gradient and density curvature are of vitally importance because auxiliary information not provided by examining the density function can be made available in kernel density derivative estimation.The secondary information provided by the estimation of the density derivatives enables the data analyst to make accurate decision and future prediction.Critical details about the composition of density function are not readily available with the examination of the density function alone but can only be obtained through the estimation of its derivatives.As a result of the usefulness of density derivative estimation, serious attention should be on the vital cases of density gradient estimation and density Hessian estimation which are the first and second derivatives before considering other higher derivative with more complex mathematical formulation (see Charnigo and Srinivasan, 2011;Henderson and Parmeter, 2015;Sasaki et al., 2015).
The application of kernel density estimator is largely hindered in practice by the complexity associated with the bandwidth selection procedure because the functionality of the estimator depends on the bandwidth.The problem of accurate bandwidth selection is more critical in higher order kernel estimation and kernel density derivative estimation because of the intricacy of their mathematical formulation that requires high computational know-how.The complication of bandwidth selection is more evident in higher dimensional setting since most application of kernel estimation is in multivariate case with different forms of parameterization.A superfluity of bandwidth selectors exists in literature for density estimation but little progress has been made in higher order kernel estimation and kernel density derivative estimation.In spite of the numerous bandwidth selectors, no particular method has been applied in all circumstances due to the variation and structural difference of the estimators.There are several bandwidth selectors for higher order kernel estimation and kernel density derivatives estimation in univariate case while the multivariate case is neglected due to the variations in their parameterizations which involve complex mathematical manipulation (see Härdle et al., 1990;Dobrovidov and Rud'ko, 2010;Chacón and Duong, 2013;Somé and Kokonendji, 2021).
The aim of this paper is to examine two methods of measuring the performance of kernel density estimator which are kernel density derivative estimation and higher order kernel estimation.These two methods are asymptotic mean integrated squared error (AMISE) reducing strategies either in the bias or variance components.The performance of a kernel method is a major determinant of the usability of the kernel method, hence the examination of these techniques.The results of the examination reveal that the kernel density derivative estimation is superior to the higher order kernel estimation in performance with the AMISE as the criterion function.The scope of this paper will be limited to the sixth order kernel estimator and the density curvature estimator because most benefits of higher order kernels are in fourth order kernel while the density gradient estimator helps in the identification of modes of distributions.The paper primarily compares the performance of kernel density derivative and higher order kernel using the Gaussian kernel function due to the continuous differentiability of the distribution.The other part of this paper is organized as follows.In Section 2, the general form of the kernel estimator and its performance measure is presented while Sections 3 presents higher order kernel estimation and kernel density derivative estimation with the performance measure.Section 4 is the results and discussion using real data with emphasis on univariate and bivariate cases only.Section 5 concludes the paper.

Kernel Density Estimator.
The kernel estimator proposed by Rosenblatt (1956) and Parzen (1962) is a weighting function with its univariate form given as where (•) is the kernel function,  is the sample size, ℎ x > 0 is bandwidth (known as smoothing parameter), x represents the range of the observations and   are the observations.The kernel function is symmetric and nonnegative function satisfying the following conditions 2) The conditions in (2.2) suggest that the kernel function is a probability density function because the function must integrate to one with a zero mean and a variance greater than zero (Scott, 2015).The choice of the kernel function is not critical owing to the fact that most kernel functions are probability density function but the choice of the bandwidths has been critically investigated with no single universally acceptable rule in all situations.The performance of the kernel estimator in (2.1) can be assessed using several error criteria functions but the asymptotic mean integrated squared error (AMISE) is consider in this paper due to its inclusion of dimension unlike other error criteria that are dimensionless.The inclusion of dimension in kernel density estimation is a characteristic that provides significant benefits in practical applications.The AMISE has two components which are the integrated variance and integrated squared bias given by 2 x is usually known as roughness of the kernel while  2 () 2 represents the kernel variance.The quantity ( ″ ) = ∫  ″ (x) 2 x is the roughness of the unknown probability distribution for the estimation.The two components of the AMISE depend on the bandwidth which regulates their contributions to the AMISE respectively.The bandwidth with the minimum AMISE called the optimal bandwidth is of the form where  is known as the dimension of the kernel estimator.The kernel estimator is mostly applied in the multivariate setting especially the two-dimensional kernel estimator that bridges the univariate and other higher dimensional kernel estimator.The bivariate kernel density estimation deals with two random variables jointly and the estimator is where ℎ x > 0 and ℎ y > 0 are bandwidths for X and Y respectively while x and y are the ranges of the observations.The bivariate kernel estimator is an effective analytical tool for data analysis and visualization either as a wireframe or contour plot that usually reveal hidden information in the observations (Silverman, 2018;Siloko and Siloko, 2019).The product form of the bivariate kernel involves the multiplication of different univariate estimators given as The product kernel estimator is commonly employed in density estimation and mostly beneficial when there are differences in the various axes of the data to be analyzed.Kernel estimation is reasonably practicable within low and moderate dimension since at higher dimension, data seems to be sparse and that will generate unstable estimate.Hence, the bivariate estimator is of wide applicability in density estimation due to easy accessibility of bivariate data (Scott, 2015).The AMISE of the bivariate product kernel estimator under the regular assumptions is given as The AMISE of the bivariate kernel estimator also require higher derivatives of the function to be estimated which needs approximation as in the univariate kernel estimator.

Higher Order Kernel and Kernel Density Derivative of Gaussian Estimator.
Higher-order kernel estimation and kernel density derivative estimation are AMISE reduction techniques that use large bandwidths.The Gaussian kernel function is our focus because the function is continuously differentiable and with the established evidence of its extensive applicability in data mining and other related statistical fields.

Higher Order Kernel.
The rationale behind the application of higher order kernels is mainly due to their fast convergence rates with their bias reduction property as against the lower order kernel estimators whose rates of convergence is slow, hence the application of higher order kernel estimators (Jones and Foster, 1993; Marron, 1994; Jones and Signorini, 1997; Ishiekwene and Osemwenkhae, 2006; Siloko et al., 2019a).A kernel function is usually of higher-order when the order denoted by  is greater than two, that is  > 2. Higher order kernels are usually constructed from their corresponding lower order kernels using several techniques such as the additive and multiplicative methods.Higher order kernels must satisfy the following conditions ∫ (x)x = 1, ∫ x  (x)x = 0,  = 1,3, … ,  − 1 and ∫ x  (x)x ≠ 0, (3.1.1)where  is the order of the kernel which is the nonzero moment whereas the odd moments of any kernels are zero.The AMISE of the  th order kernel density function is given by 2) where   () 2 is the  th moment of the kernel.The bandwidth that minimizes the AMISE is Assuming in the bandwidth that minimizes the AMISE of higher-order kernel estimation, the unknown probability density function is the Normal distribution and the Gaussian kernel is employed, then the roughness of the Gaussian kernel function which is denoted by ( () ) is given by ) where Γ(•) is the gamma function and  = 2, 4, 6, … , ∞ is the order of the kernel function.The optimal bandwidth that produces the least AMISE value is given by The optimal bandwidth of higher order kernel estimation is of order ( −1 (2+1) ⁄ ) while the order of its AMISE is ( −2 (2+1) ⁄ ).The large bandwidths necessary for the implementation of higher-order kernels is occasioned by the order of the bandwidth.Higher-order kernel estimation also require considerable large sample size in harnessing their potential benefits (Siloko et al., 2019b).

Kernel Density Derivative Estimation.
The derivative of a kernel function can be obtained by differentiating the kernel estimator.Given a kernel function  that is sufficiently differentiable  times, then the ℎ kernel derivative of Equation (2.1) is of the form where  () is the ℎ derivative of  which is usually a symmetric probability density function.Again, the AMISE of the derivative of the kernel estimator if  is continuously differentiable is where ( () ) is roughness of the ℎ kernel derivative estimator,  2 () 2 is variance and ( (+2) ) is roughness of ℎ unknown probability density function.Every derivative order in kernel density derivative estimation attracts two supplemental powers in the asymptotic variance of the AMISE while the order of the asymptotic squared bias is constant (Scott, 2015).The bandwidth with minimum AMISE value in (3.2.2) is Again, the ℎ roughness of the Gaussian kernel denoted by ( ∅ () ) is given as .2.4) If the unknown distribution is the Gaussian distribution with mean () is zero and variance is  2 , then the ℎ roughness with respect to the derivative of the distribution is given as Substituting Equation (3.2.5) into Equation (3.2.3) and expressing it in terms of dimension, we have the bandwidth with the minimum AMISE of the kernel density derivative estimator as where  is usually the standard deviation of the observations.The bandwidth for kernel density derivative estimation must be appropriately chosen because quality density estimator often times does not give rise to superior kernel density derivatives especially as the order of the derivative increases (Sasaki, 2015).The least AMISE value of the optimal bandwidth in (3.2.6) is The derivative of a function is more boisterous than the function itself, hence large bandwidths are usually required for kernel density derivative estimation.In the first and second derivatives, the bandwidths with the minimal AMISE values are of orders ( −1 7 ⁄ ) and ( −1 9 ⁄ ) with the AMISE having orders ( −4 7 ⁄ ) and ( −4 9 ⁄ ) respectively.

The Gaussian Kernel Estimator.
The Gaussian kernel is one of the popular kernel estimators because of its production of smooth density estimates and possession of derivatives of all orders which promoted its extensive application in kernel density derivative estimation.The classical second order Gaussian kernel estimator is the limiting case of the beta polynomial kernel family whose univariate form is given by The corresponding bivariate form of the Gaussian kernel estimator is given as The fourth and sixth order kernel functions of the Gaussian kernel estimator are The derivatives of the Gaussian kernel are usually deduced from the Hermite polynomial family.The Gaussian kernel derivative is denoted by  () (x) = (−1)    (x)(x) where  = 0, 1, 2, … is the derivative order and   (x) is the ℎ Hermite polynomial.The Hermite polynomials have the following values for its first six members:  0 (x) = 1 ,  1 (x) = x ,  2 (x) = x 2 − 1,  3 (x) = x 3 − 3x,  4 (x) = x 4 − 6x 2 + 3, and  5 (x) = x 5 − 10x 3 + 15x.The usual estimator of the Gaussian kernel density derivative is of the form The estimator in Equation (3.3.5) is the generalized form of estimating the ℎ kernel density derivative of the popular Gaussian distribution.If  = 0, the resulting kernel estimator is the traditional second order kernel estimator in Equation (2.1).Another widely used kernel function of the beta polynomial family is the Epanechnikov kernel also known as the optimal kernel with respect to the asymptotic mean integrated squared error (Siloko et al., 2019b).

Result and Discussions.
The performance of higher-order kernel estimation and kernel density derivative estimation is investigated with the AMISE as the criterion function owing to its mathematical tractability.The graphical analysis and computational results obtained from the two methods were with the aid of Mathematica version 12.3 software (Wolfram Research, Inc.).Higher-order kernel estimation and kernel density derivative estimation usually employ large bandwidths in their implementation as the order increases and that accounted for their bias and variance reduction property that often translates to AMISE reduction.A sample size of 1000 and two real data sets were analyzed using the Gaussian kernel estimator for the higher-order kernel estimation and kernel density derivative estimation.
The results in Table 1 are that of a randomly generated data of sample size of 1000 with standard deviation of 23.2069.
The standard deviation is one of the statistical parameters required in the computation of bandwidths for higher order kernel estimation and kernel density derivative estimation.The performance of higher order kernel and kernel density derivative estimations depend on the size of the bandwidth and both methods require large bandwidth for their implementation with increase in the kernel order and derivative order respectively.The bandwidths of higher order kernel estimation are larger than the bandwidths of kernel density derivative estimation but the AMISE values of kernel density derivative estimation are smaller which is an indication of better performance.Again, as seen in Table 1, the benefit of higher order kernel estimation is mainly in the fourth order and reduction in the AMISE value in subsequent orders tend to be minimal (Ishiekwene and Osemwenkhae, 2006).In kernel density derivative estimation, the benefits are mainly in the first and second derivative estimations and graphical visualization of data beyond the second derivative order oftentimes may be difficult especially with higher dimensions.Although, the bandwidths of higher order kernel estimation are larger than the bandwidths of kernel density derivative estimation but with smaller AMISE values and this is occasioned by the two-extra power of the bandwidth in the variance component.The first data set is the Annual Snowfall in Buffalo with a sample size of 63 observations (Scott, 2015).Buffalo is one of the largest cities in the state of New York with a record of an annual snowfall of 84.8 inches which implies that the city is snowier than other areas in the state.The kernel estimates of the snowfall data show that the data are unimodal which is within the region of the annual snowfall.Figure 1 is the kernel estimate of the second order kernel and fourth order kernel while Figure 2 is the estimate of the sixth order kernel with the second order kernel to sixth order kernel estimates.The kernel estimates of the zeroth derivative and first derivative is in Figure 3 while Figure 4 is the kernel estimate of the second derivative with the zeroth to second derivative estimates.The kernel estimates of the snowfall data with the higher order kernel displayed unimodality but with the first derivative estimation, the data are presented to be bimodal and this is attributed to the fact that obscured features in the data set in kernel density estimation can be clearly revealed with the kernel density derivative estimation particularly the density gradient.
The results in Table 2 and Table 3 are for the two AMISE reduction techniques using the univariate Gaussian kernel function.Generally, both techniques require large bandwidths for their effective implementation but the size of the bandwidths of higher order kernel estimation is larger in comparison with the size of bandwidths of kernel density derivative estimation.Again, the AMISE of kernel density derivative estimation is smaller than the AMISE of higher order kernel estimation and that authenticates the fact that kernel density derivative estimation outperform higher order kernel estimation.The estimate of the second order kernel and the zeroth derivative estimate are the same hence; the same bandwidths and AMISE values as seen in Table 2 and Table 3.The estimates of the second, fourth and the sixth order kernels in Figure 2(b) displayed similarity graphically; however, there is great variation in performance with reference to their AMISE values.As order of kernel increases, there is a reduction in the AMISE which is occasioned by the size of the bandwidths and the order of the kernel but there is the tendency of smoothening away some beneficial statistical features in the observations being estimated.Generally, data that exhibits multimodality when estimated with large bandwidths in kernel estimation may tend to display unimodality but with a reduction in the value of the AMISE in terms of performance.Despite the fact that the error criterion is the determinant of the acceptability of a method in kernel density estimation, efforts had been tailored statistically towards the retention of inherent characteristics of the data for prediction and decision making.The determination of vital statistical features in kernel estimation depends greatly on the magnitude of the bandwidth.The results in Table 2 and Table 3 are the performances of the methods considered and from the results; the reduction in AMISE value is more noticeable with the kernel density derivative estimation owing to the fact that its variance component requires two additional powers in the bandwidths.Hence, there are reductions in the variance and bias terms that translate to reduction in the AMISE unlike the higher order kernel estimation whose reduction is mainly in the bias term of the AMISE.Theoretically, in kernel density derivatives estimation, any order of interest can be estimated but the benefits are mainly in the gradient and hessian estimation with little or no benefits in other higher derivatives order.Regarding higher-order kernel estimation with similar estimates, the benefit is mainly in the fourth order kernel and reductions in further higher order may be minimal.
The second data investigated is the old faithful data which comprises of the duration of eruption and waiting time betwixt eruptions of the geyser situated in Yellowstone National Park in United States of America (Azzalini and Bowman, 1990).The old faithful data are made up of 272 data points for the two axes respectively and the bivariate kernel estimates are bimodal indicating that eruption times and the time before the immediate next eruption often demonstrates a distribution that is usually bimodal.The bivariate kernel estimates of the second, fourth and sixth order kernels are in Figure 5, Figure 6 and Figure 7 with the bimodality been obviously displayed.The fundamental role of kernel density estimation is data exploratory analysis and visualization due to its ability of highlighting significant features in the data unlike other data visualization tools like the scatterplots where the observer is only drawn to the data cloud while inherent features of the data are hidden.Again, the contour plots of the bivariate fourth order kernel estimate and sixth order kernel estimate in Figure 6 and Figure 7 revealed the data point that are regarded as outliers while maintaining the bimodal nature of the data set.
The bimodality of the bivariate kernel estimates has evidenced that eruption times and time intermission prior to the next eruption is distinctly correlated.The bivariate kernel estimates of the density gradient estimation and density curvature estimation are in Figure 8 and Figure 9 with all the estimates depicting the bimodal nature of the data.The bivariate estimate the zeroth derivative estimation is same as the bivariate estimate of the second order kernel because the zeroth derivative estimation is same as the second order kernel estimation.Again, the bivariate bandwidths of the kernel density derivative estimation are smaller than the higher order kernel estimation but with smaller AMISE values as in the univariate case.
The application of kernel density derivative estimation and higher-order kernel estimation becomes very complex with increase in the dimensions of the observations.The complexity of estimation of higher dimensions is more noticeable in the graphical presentation of the observation because for higher dimensions above four-dimensional forms, visualization of observations which is very important in kernel density estimation becomes practically difficult.This difficulty connected with higher dimension in nonparametric estimation is known as the curse of dimensionality effects and that has limited kernel density estimation to the bivariate case; although numerical computation can be extended to higher dimensions (Scott, 2015; Siloko et al., 2020b).4. Again, with kernel density derivative estimation after the second derivative order, there is usually little or no gain regarding reduction in the AMISE since the values will be too minimal because of the concurrent reduction in the components of the AMISE.Kernel density derivative estimation and higher-order kernel estimation have demonstrated the potential of reducing the AMISE either in the bias component or both variance and bias terms concurrently depending mainly on the magnitude of the bandwidths and other parameters of interest such as the roughness of the kernel function and roughness of the distribution.
Generally, in kernel density estimation, the superiority or dominance of one method over other existing methods can be ascertained by the value of a known performance measure such as the AMISE or other criteria functions (Jarnicka, 2009).In spite of the establishment of both methods using large bandwidths in their implementations, the inherent statistical features of the observations investigated with these methods were retained as seen in the kernel estimates.Retention of statistical characteristics of observations is the central point of kernel density estimation especially in data visualization.However; the results vividly demonstrated numerically and graphically the dominance of the kernel density derivative estimation over the higher-order kernel estimation in terms of performance.

Conclusions.
Higher-order kernel estimation and kernel density derivative estimation are AMISE reduction techniques that employ large bandwidths for their implementation as the order increases.Although both methods are AMISE reduction techniques; this paper tends to ascertain their AMISE reduction capacities which establishes the superiority of one of the techniques over the other.This study undoubtedly revealed that higher-order kernel estimation is a bias reducing approach that translates to AMISE reduction while kernel density derivative estimation is a bias and variance reduction strategy that resulted in the reduction of AMISE due to the extra-two powers of the bandwidth usually associated with the variance term for every derivative order.The requirements of large bandwidths for both methods are hinged on the noisy nature connected with higher-order kernel estimation and the kernel density derivative estimation of kernel functions in comparison with lower order kernels and the function itself.Numerical evaluation of the performance of kernel density derivative estimation and higher-order kernel estimation using the AMISE shows that the former outperformed the later.Again, in identification of statistical features in the analyzed data, kernel density derivative estimation particularly the density gradient can reveal hidden features that could be of help in prediction and decisionmaking process.Hence; kernel density derivative estimation which is a bias and variance reduction technique is highly recommended in nonparametric density estimation particularly for analyzing data with unknown information.

Figure 1 :Figure 2 :
Figure 2: Sixth order kernel estimate and second order to sixth order kernel estimates of the snowfall data.b.Second to sixth order estimates

Figure 3 :r=2Figure 4 :
Figure 3: Zeroth derivative estimate and first derivative estimate of the snowfall data.a. Zeroth derivative estimate b.First derivative estimate

Figure 5 :
Figure 5: Surface and contour plots of bivariate second order estimates of the old faithful data.

Figure 6 :
Figure 6: Surface and contour plots of bivariate fourth order estimates of the old faithful data.

Figure 7 :
Figure 7: Surface and contour plots of bivariate sixth order estimates of the old faithful data.

Figure 8 :
Figure 8: Surface and contour plots of bivariate first derivative estimate of the old faithful data.

Figure 9 :
Figure 9: Surface and contour plots of bivariate second derivative estimate of the old faithful data.

Table 1 .
Kernel Order, Derivative Order, Bandwidths and AMISE of Simulated Data of Size n=1000

Table 2 .
Order, Bandwidths, Bias 2 , Variance and AMISE of Higher Order Kernel Estimation for First Data.

Table 3 .
Order, Bandwidths, Bias 2 , Variance and AMISE of Kernel Density Derivation Estimation for First Data.

Table 4 .
Order, Bandwidths, Bias 2 , Variance and AMISE of Higher Order Kernel Estimation for Second Data.

Table 5 .
Order, Bandwidths, Bias 2 , Variance and AMISE of Kernel Density Derivation Estimation for Second Data.The bias and variance reduction property of the univariate kernel density derivative estimation is also demonstrated in the bivariate estimation with smaller AMISE values as presented in Table5in comparison with Table