On Smoothed MWSD Estimation of Mixing Proportion

The problem considered in the present paper is that of smooth (nonparametric) estimation of mixing proportions in a mixture population of two (known) distributions represented by F(x) = pF_1(x) + (1-p)F_2(x), -∞<x<+∞, 0<p<1. The two classes of ‘smoothed’ and ‘unsmoothed’ estimates studied in the paper are based on an iid sample of n observations from the mixture population, using the minimum weighted square distance (MWSD) methodology due to Wolfowitz (1953). Comparison of estimators is done based on their relative mean square errors (MSE’s). The superiority of smoothed estimators over their corresponding unsmoothed counterparts is established theoretically as well as by conducting a small Monte-Carlo study that compares their resulting MSE’s. Large sample properties such as a.s. rates of their convergence and asymptotic normality etc. are also established. The results proved are new in the literature.


Introduction
Let { 1 , 2 ,…, } be independent observations from a mixture population represented by the mixture distribution function (d.f.) However, when dealing with (continuous) mixture populations where one cannot legitimately or formally assume that the component distributions do possess densities or when the data recordings are available only in discrete or grouped forms, the validity of the preceding techniques becomes highly questionable. In such situations, it seems much safer to revert to the suggestion of Hall (1981) to base ones Minimum Distance Estimation procedures based on empirical distribution functions rather than the empirical density estimators.
Mixtures of distributions widely used in both biological and physical sciences. Many typical problems in which such mixtures occur have been well described in a series of research papers. Few among them are, i.
In Oceanography, the interest of study is to measure characteristics in natural population of species (fish).
So that the samples of species (fish) are taken to measure the characteristic of each species (fish) in the sample. However we can measure many characteristics such as weight, length etc. of fish which vary with age of species(fish). These characteristics have different distribution for each age group so that the population has a mixture of distributions. ii.
In failure time analysis, it is desired to measure failure time of units in a population. For this purpose, samples of units are taken and failure time measured for each unit in the sample. However, failures are occurring due to different causes. The failure times have distinct distributions due to different causes so that the overall population has a mixture of distributions.
For more details of such examples, refer Choi and Bulgren(1968), Harris(1958), Blischke(1963), Fu(1968), Macdonald and Pitcher(1979), Odell and Basu (1976) and Bruni et al(1983) etc. There are several methods of estimating mixing proportions discussed in the literature. Choi and Bulgren(1968) estimated the mixing measures of combination of known distributions by using Wolfowitz minimum distance method, which minimizes ∫(̃(x)-F(x)) 2 d̃(x), (1.2) based on the usual unsmoothed standard empirical distribution function ̃( x) = −1 ∑ =1 ( ≤x) of a random sample , 1≤i≤n. They investigated the asymptotic properties of their estimators such as strong consistency and asymptotic normality etc. Van Houwelingen (1974) pointed that the variance of Choi and Bulgren estimator is hard to compute and small sample properties of the estimator are difficult to evaluate. As pointed out in Hall (1981), methods based on nonparametric density estimators involve some significant drawbacks.
• specification of window width in kernel based estimators and their behavior which is very sensitive to the choice of window width parameter and also that • their mean square errors converge at a slower rate than order −1 .
To avoid these draw backs, Hall (1981) proposed the nonparametric (MWSD) estimators of mixing proportions in finite mixtures based on the usual empirical distribution function only. But did not make any attempt to derive small as well as large sample properties of these proposed estimators. In the present paper, we discuss estimators based on both the usual empirical distribution functions (e.d.f.) Fn(x) and also the kernel based smoothed e.d.f. ̂n (x) defined, respectively, by {an} being the smoothing bandwidth sequence satisfying 0< →0, n →∞, as n→∞, and K the distribution function corresponding to a known suitable kernel density k.

Smoothed Hall's Minimum Weighted Squared Distance Estimator of Mixing Proportions
In this section, we shall define and study the 'smoothed' version of Hall's (1981) Minimum Weighted squared Distance (MWSD) estimators based on smoothed e.d.f's defined below in (2.2).
The problem of estimation of mixing proportion p for the m = 2 case defined in (1.1) is studied in the present paper and the case m>2 would be considered seperately. Now, if F(x) = p 1 (x) + (1 -p) 2 (x),-∞<x<+∞, clearly we have p = 2) is minimum, when Fj(x); j = 1,2 are known and W(x) is a suitable known weight function-discrete or continuous.
(2.4) The object of the present investigation is to show that Hall's (1981) MWDE procedure can be improved by basing it on the smoothed e.d.f.s in place of the usual e.d.f.s. This is demonstrated below, generally speaking, for both cases when the averaging distribution W is discrete or continuous, but decidedly when W is discrete.
So far none of the researchers have attempted to compute the variance of such estimators. In the present work, the variances, exact MSEs and large sample properties such as strong consistency and asymptotic normality are established and also the superiority of smoothed estimator over the unsmoothed version in the sense of a smaller MSE. Our results are completely new in the literature.

Representations to MWSD estimators ̂ and ̃
We first establish the representation to ̂ in order to prove its asymptotic properties. Recall from (2.3), We shall use equation (2.5) to evaluate the MSE of the estimator ̂ in both cases when the selected W(x) is a discrete or a continuous (averaging) distribution.

Lemma 2.1:
Assume the following conditions on F and the kernel function k: i.
F(x) possesses at least three continuous derivatives; ii.
We now establish large sample MSEs of ̂, ̃ and show the superiority of smoothed estimator ̂ over the unsmoothed version ̃ in the sense of a lower MSE.

Variances of MWSD estimators:
We first obtain the large smaple expressions for the variances of both MSWD estimators ̃ and ̂ defined in (2.4) where 0 , 2 are defined below in (2.35).
Proof: From Lemma 2.1 and (2.5) To evaluate the quantities 1 , 2 , as n→∞, note that in view of (2.7), the last equation (2.9a) immediately yields (2.10) For the evaluation of 1 , we have to deal with two cases when W is discrete or of continuous type separately: Let W stand for a discrete d.f. assigning positive probability p( ) to a countable set of reals { : < +1 , with i∈C, a countable set of integers with only a finite number of 's belonging to each compact interval. (For example, consider the Double Exponential distribution with density f(x) = =: (1) + a n 2 2 (1) + O(a n 4 ), (say) (2.12) and the expression 1 from (2.9a) as 14) with (K) = 2∫ K(t)dK(t), = 1,2 and in 12 for j<i ⟺ < and sufficiently large n, so that ( -) >2 , the last but one equality in (2.15) above following since for ( -) >2 , K( − +t )≡1 identically. From (2.10), (2.14) and (2.15), it therefore follows that =: (2) -1 + a n 2 2 (2) + O(a n 3 ) (say), (2.16) and 12 ( ) 12 ( )p( )p( )+O(a n 4 ) =: (3) + a n 2 2 (3) + O(a n 4 ) (say).   Let W be continuous: In the definition of the weighted (Wolfowitz) square distance if W possesses a density or is simply a continuous d.f., the detailed evaluations for the MSE (̂) show (see Theorem 2.4(b) below) that the situation is not as clear cut as in the case when W is discrete. Since in this continuous case, W assigns no weight to the terms involving sets [x=y] in integral evaluations (unlike the evaluations done for equations (2.12)- (2.19) in the discrete case), the MSE (̂) evaluations, in the continuous d.f. W case, contain no negative term of the order . Even in such a case, among the two terms of the order 2 that we obtain, one is clearly negative and the other neither positive nor negative. So for large n, one may expect in most situations on the average, some reduction in the MSE (̂) brought in by 'smoothing'. Now we do the calculations for the continuous case: From (2.10), we already have (2) (y) 12 (y)dW(y)] +O(a n 4 ), (2.21) To calculate 1 , note from (2.13) that in this case, Further, for evaluating 2 , 3 and the later order terms note that the limits for the variable t, -∞<t<∞, are -1≤t≤1 effectively. This is because for x− y a n > 0, when t>1, K( x− y a n + t)K(t) ≡1 identically so that d[K( x− y a n + t)K(t)]=0 for t>1; again for t<-1, [K( x− y a n

Optimal band width
, The selection of optimal bandwidth ' ' in the MWSD estimator p is done in such a way that its MSE is minimum w.r.to . From Theorem 2.4, when W is a discrete d.f, MSE p = −2 [ 0 -1 + a n

Monte Carlo Simulation study
A simulation study is carried out to estimate mixing proportion p by p and p when the two component distributions are known in both Normal and Exponential populations. Comments: The simulation results show ̂ for smoothed estimator p is less than that of unsmoothed estimator p uniformly for all samples. So the smoothed estimator appears a better estimator in terms of MSE. The average gain in observed efficiency due to smoothing is lying between 3% to 90% for different N sets, each of size n.