Some Model Assisted Estimators Using Functional Form Calibration Approach

The Model assisted estimators are approximately design unbiased, consistent, provides robustness and reduce design variance if underlying model reasonably defines the regression relationship. If the model is mis specified, then model assisted estimators might result in an increase of the design variance but remain approximately design unbiased and show robustness against model-misspecification. The well-known model assisted estimators, generalized regression estimators are members of a larger class of calibration estimators. Calibration method generates calibration weights that meet the calibration constraints and have minimum distance from the sampling design weights. By using different distance measures, classical calibration approach generates different calibration estimators but with asymptotically identical properties. Later, the constraint of distance minimization was reduced for studying the properties of calibration estimators by proposing a simple functional form approach. This approach generates calibration weights that prove helpful to control the changes in calibration weights by using different functions of auxiliary variable’s values. This paper is an extended work on model assisted approach by using functional form calibration weights. Some new model assisted estimators are considered to get efficient and stabilized regression weights by introducing a control matrix. The asymptotic un-biasedness of the proposed estimators is verified and the expressions for MSE are derived in three different cases. A simulation study is done to compare and evaluate the efficiency of the proposed estimators with some existing estimators of population total.


Introduction
A well-known model assisted approach, the classical calibration approach by Deville and Särndal (1992) uses auxiliary information to produce efficient estimates of the population parameters. The approach uses weighted sample observations, whereas the weights are obtained such that the distance between these and design weights are minimized under the condition that they have to satisfy a calibration to benchmark constraints and therefore named as calibration weights. Let is the value of ℎ observation of the study variable and ′ = ( 1 , 2 , … . . , ) for = 1,2, . . . . . , is the auxiliary vector associated with such that the population totals of the auxiliary variables are known prior to sampling. Let the population total of ℎ variable is = ∑ and a vector of population total(s) of p auxiliary variables is denoted by , where Σ is the sum of all ∈ for ={1,2,…….,N} is known prior to estimation of population total of study variable. Also, vector of Horvitz Thompson estimators of population total(s) for auxiliary variables is ̂= ∑ , where Σ is the sum on ∈ . The classical calibration estimator proposed by Deville and Särndal (1992) for estimating population total of study variable is defined as

Pakistan Journal of Statistics and Operation Research
They named the weights , the calibration weights because these weights have minimum distance from the survey design weights and satisfy a calibration to benchmark constraints: By using different distance functions many linear and nonlinear calibration estimators can be obtained. The chi square distance functions generate linear calibration weights that result in linear calibration estimators. The resulted calibration weights can be written in the form: = (1 +́). (1. 3) The = 1 ⁄ is the sampling design weight where is any individual observation weight for ℎ observation. Also, is a vector of Lagrange multipliers and can be obtained from calibration constraints defined in (1.2).Using value of = (∑ ′ ) −1 ( − ̂) , the calibration estimator (1.1) can also be seen an approximate general linear regression estimator, The ̂= ∑ is the Horvitz Thompson estimators for population total of study variable and is a vector of order × 1 defined as To study the properties of calibration estimators in general, Estevao and Särndal (2000) proposed a functional form of calibration weights. They proposed weights that had mathematical form and by defining two parameters produce different weight systems. They specified a vector ′ = ( 1 , 2 , … . . , ) for every ∈ such that (a) ( )= ( ) = and (b) The matrix ′ of order × is non-singular The components of vector are functions of and can be defined as for ≥ 0 and > 0 (1.5) Also, for = 2, we have = The resulted calibration weights are asymptotically design unbiased.
Where the parameters and are chosen to satisfy (a) and (b) and the vector = (∑ ) −1 ( − ̂) is determined by the calibration constraints.
vector is the vector of population totals for auxiliary variables and assumed to be known prior to estimation. The final form of can be obtained using in (1.6), that is The resulted functional form of calibration estimator can be obtained as k w ̂= ∑ and can be written in the form The functional form ̂ generates variety of calibration weights when different values of and are used, therefore is more convenient and flexible. Also, the functional form of calibration estimator is asymptotically equivalent to instrumental calibration estimators proved by Kim and Park (2014). Due to the flexibility and formation of the functional form calibration weights, these can helpful to control the undesirable effects of auxiliary variables by using different choices of and .The functional form of calibration estimator ̂ defined in (1.9) is a linear function of the design weights and the adjustment term. For a strong to moderate linear relationship between the study variable and the auxiliary variable(s) the adjustment term will be equal to the error of the design weights with opposite signs and therefore give minimum mean square error but presence of discrepant or multi-collinear auxiliary variable(s) may cause of negative, inefficient or extreme weights. Estevao and Särndal (2002) have shown that the purpose is not to put the given auxiliary information blindly in estimation but efficient use of auxiliary information is important. The efficient use of auxiliary information in model assisted approaches can result in substantial gain in precision. According to Ståhl et.al (2016), the main advantage of model-assisted estimation is that it does not rely totally on the suitability of the model, but the model only helps to improve the precision of an estimator and in case of failure of assumed model, the resulting estimator remains asymptotically unbiased. The model assisted estimators have been proved efficient and robust under different scenarios. Kim and Rao (2012) used model assisted approach in integrating the data sets from two independently conducted surveys. Breidt (2017) used the model-assisted approach from a complex survey together with auxiliary information to estimate finite population parameters. They reviewed a very broad class of prediction methods including linear models, linear mixed models, nonparametric regression and machine learning techniques. Also, different functions of auxiliary variables have been considered by researchers to improve efficiency of the model assisted estimators. Kumar et.al (2017) considered the case when study variable and auxiliary variable are inversely related and have shown that two calibration approaches provide the different variances. Gard (2019) explored that model assisted estimators generates the most accurate estimates if relevant auxiliary variables that explain nonresponse and the target variables are available. The efficiency of the estimator depends not only on good and related auxiliary information but also on the methods by which the information has been utilized in estimation. Eric (2020) also used model assisted approach in estimating model parameters in circumstances of a complex survey. Recently, Ben et.al (2021) considered the case of obtaining calibration weights when covariates are high dimensional and especially when interactions between variables exist. They proposed a multilevel calibration weighting system that satisfies strict calibration constraints for main weights and loosen calibration constraints for higher-order interactions. Also, the asymptotic properties of these estimators were developed and assessed.

The New Functional Form of Calibration Weights:
In this paper, some new model assisted estimators are proposed using a new functional form of calibration weights that can be used to control unwanted effects of auxiliary variable(s) and to stabilize the calibration weights for precise estimation and prediction. The proposed functional form of calibration estimators is Here the is a matrix of order × . We called it control matrix because it will be used to control the effect of auxiliary variables in model assisted estimation. The is obtained from the calibration constraints (1.2) and vector has been defined in (1.5).
This method like the ridge regression approach is a tradeoff between bias and variance. To stabilize the calibration weights and to prevent them from unwanted effects of the auxiliary variable(s) a control matrix is used, but unlike the ridge approach the penalty factor is not added to the ′ matrix; instead the adjustment terms are controlled explicitly by using a control matrix.
We can simplify the new functional form of calibration weights defined in (2.1) using the vector : The new functional form of model assisted estimators for population total can be obtained using functional form calibration weights defined in (2.2): We have defined ̂ in (1.10). The different model assisted estimators can be obtained by using different values of , and in (2.3) that results in a wider class of regression type estimators. To obtain the different functional form model assisted estimator, we need to specify in (2.3): of order × such that the resulted estimator remains asymptotically unbiased under the randomization distribution. is the classical calibration estimator proposed by Deville and Särndal (1992) for estimating population total of study variable .
The objective is to find the values that prove helpful in controlling the undesirable variability in calibration weights and to get stabilize weights. For example, total number of deaths by COVID-19 in Pakistan using sample data can be estimated when confirmed and recovered cases are used as auxiliary variables and their population totals are known prior to estimation but due to high variation within the data (daily observations of confirmed and recovered cases within and after the wave) can cause extreme calibration weights and may effects the estimation of total deaths. In this case, instead of using = 2 in , some other values of maybe a good choice to generate optimum weights and estimates. Also, in some cases when multiple auxiliary variables are available for estimation, it may helpful to assign weights to these auxiliary variables according to their correlation with the study variable. An example can be estimation of total population on the basis of household size, employed persons and tax payers from a previous survey but size of a household is highly correlated with the variable "population size" as compared to other variables and therefore may get large weight in estimating total population. We are free to made different choices of the matrix for which the calibration weights remains asymptotically unbiased. We consider three different cases for which the resulting calibration estimators remain asymptotically unbiased.

CASE I:
The efficiency of the model assisted estimators depends on the correlated auxiliary variables, used in estimation procedure. In most surveys the complete information about the variance of the study variable is not available and available auxiliary information define only a portion of the variability. The correlation coefficient of determination is a measure of variation that is explained by the auxiliary variables used in a model. In the first case we consider = 2 a matrix of order 1 × 1 where 2 is the adjusted coefficient of determination. The functional form model assisted estimator in this case will be The value of adjusted 2 will assign an appropriate weight to the adjustment term according to the proportion of variability explained by the auxiliary variables. If the value of adjusted 2 ≈ 0 the adjustment term will also tend to zero and the auxiliary variable(s) automatically will be excluded from the model and only design based Horvitz Thompson estimator will be used for estimation. For the case when the linear relations of the study variable are perfectly defined by the use of the auxiliary variable(s), the value of 2 ≈ 1 and there will be no change in calibration weights.

CASE II:
In today's world bulk of information is available and multiple auxiliary variables are used in estimation to increase precision and accuracy of the estimates. When a large number of auxiliary variables are used in estimation, a case of obtaining inefficient weights may be that some of the less correlated auxiliary variable(s) has large sampling errors and therefore influenced the calibration weights and result in unstable or inefficient weights. To avoid such situations the auxiliary variables can be weighted according to their correlation with the study variable. Therefore one possible value for control matrix can be: is a diagonal matrix of order p having partial correlation coefficients values as its diagonal elements, where is the partial correlation coefficient value of with . The norm is used to avoid the negative sign. The matrix will assign weights to the sampling error(s) of auxiliary variable(s) according to their correlation with the study variable and therefore a more correlated auxiliary variable will get large weight and a less correlated variable will be assigned small weight in construction of the calibration weights. If an auxiliary variable has perfect correlation with the study variable, then its correlation coefficient value will be near to 1 and therefore its residual will remain same. Similarly, if an explanatory variable is not correlated with study variable, the value will be approximately zero and in this case the variable will be excluded from the model. Some of these correlations coefficient's values (of p variables included in the estimation) can be obtained from any previous survey and others can be obtained at the sample level. The estimator of population total in this case will be:

CASE III:
We consider another case when inefficient weights are occurring because some of the auxiliary variable's values are more disperse as compared to study variable and therefore may have large sampling errors. This may influence the calibration weights and may cause of inefficient or extreme weights. The matrix given below may be useful to assign appropriate weights to auxiliary variables by comparing its variation with study variables i.e. is the ratio of 2 (square of the coefficient of variation of study variable) and 2 (square coefficient of variation of the ℎ auxiliary variable). The elements of the control matrix will assign appropriate weight to the sampling error of each auxiliary variable by comparing its variance with study variable.
The functional form of model assisted estimator in this case will be: The where . = . .
We assume that and are smaller magnitude as compare to (auxiliary variable(s) Total) and (study variable total), also ( population total of ℎauxiliary variable) is positive and | | <1

Result 1:
The calibration estimator (2.5) is asymptotically unbiased and the bias related to it is Using results Fuller and Isaki (1982) and Estevao and Sarndal (2000), the bias of the quantities ) which is negligible when is large and even for a modest sample size.
The Mean Square Error (MSE) of the estimator (2.5) is:

Result 3:
The estimator (2.9) is asymptotically unbiased with bias . The has bias of ( 1 ) and hence the estimator has bias of ( 1 2 ), The MSE of estimator (2.9) is derived as: .

3-Simulation Study:
We examine and compare the performance of the proposed estimators ̂1 , ̂2 and ̂3 defined in (2.5), (2.7) and (2.9) respectively through a simulation study with classical Horvitz Thompson estimator (̂) and with the special case of the functional form calibration estimator (̂) proposed by Estevao &Särndal (2000, 2002 defined in (1.9). The efficiency is compared through bias and Mean square Error. A population consists of 1000 normally distributed values of dependent variable (Y) is generated. The correlated auxiliary information is generated in two cases, in first case only one moderately linearly related variable ( 1 X ) with the study variable is generated such that = 2 or = . The Bias and MSE are calculated on repeated sampling of 500 samples each of size 100. Similarly the process is repeated for two auxiliary variables; one auxiliary variable is generated taking ( 0.65 < < 0.85) such that the generated variable is moderately correlated with study variable whereas second variable is generated by fixing the value of correlation coefficient between (0.25 to 0.30). In particular, the purpose is to study the impact of including less correlated auxiliary variables in model assisted estimation. The population total of 1000 values 1000 .055 is estimated using the five estimators. The table below is showing the results of simulation study. The results of the simulation study show that the estimator ̂ under-estimate the total of the study variable as compared to ̂ (Horvitz Thompson's estimator) and the three considered cases of the functional form model assisted estimators. However, when only one auxiliary variable is used, the decrease in bias and Mean Square Error (MSE) can be observed when the proposed functional form model assisted estimators are used to estimate population total under the condition when response variable follow a normal distribution. An interesting result can be observed that by choosing control matrix = × , where is a diagonal matrix (consists of partial correlations of with as diagonal elements) yields optimum results among all estimators considered for comparison. Also, for the case of two auxiliary variables when one is weakly correlated with the response variable the ̂2 again produces the minimum bias and MSE, the reason may be that control matrix assign weights to the sampling errors of the auxiliary variables that are proportional to their correlation with the study variable and hence second auxiliary variable get less weight as compared to the first variable in generation of calibration weights and in estimation. Also, we can examine that the other two proposed estimators are also more efficient than Horvitz Thompson (̂) and functional form calibration estimator proposed by Estevao &Särndal (2000) (̂) but yield slightly larger bias and MSE than ̂2 when the data is generated through normal distribution.

4-Conclusion:
The utility of a given estimator depends on what auxiliary data are available and how it is used in estimation. The auxiliary variables having large variances and discrepant values may effects estimation process. To control the unwanted effects of auxiliary variables and for optimal use of the available information, a new functional form of calibration estimator is proposed to estimate population total which result in a family of model assisted estimators. Also, the bias and MSE expressions of the three functional form calibration estimators are derived and a simulation study is conducted to assess the performance of the proposed estimators. The result demonstrates that the proposed estimators perform well as compared to Horvitz Thompson and functional form calibration estimators proposed by Estevao and Sarndal (2000). As the results are obtained under a single condition, when random error follow a normal distribution and auxiliary variables are moderately or weakly correlated with the study variables, therefore, more work may be needed to develop specific guidelines for various type of auxiliary information available for estimation. A further study can explore performance of different cases of the proposed functional form under various other probability distributions. The important question is that for more than two auxiliary variables which case of control matrix may result better. Also, for auxiliary variables having large variances what will be the role of control matrix and which estimator will perform well.