联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-10 10:04

arXiv:1611.06653v1 [stat.ME] 21 Nov 2016

SIMEX estimation for single-index model with

covariate measurement error

Yiping Yang1,

1College of Mathematics and Statistics, Chongqing Technology and Business University,

Chongqing 400067, P. R. China

2Department of Mathematics, Hong Kong Baptist University, Hong Kong

3Beijing Institute for Scientific and Engineering Computing, Beijing University of

Technology, Beijing 100124, P. R. China

Abstract

In this paper, we consider the single-index measurement error

model with mismeasured covariates in the nonparametric part. To

solve the problem, we develop a simulation-extrapolation (SIMEX)

algorithm based on the local linear smoother and the estimating

equation. For the proposed SIMEX estimation, it is not needed to

assume the distribution of the unobserved covariate. We transform

the boundary of a unit ball in R

p

to the interior of a unit ball in

R

p−1 by using the constraint kβk = 1. The proposed SIMEX estimator

of the index parameter is shown to be asymptotically normal

under some regularity conditions. We also derive the asymptotic

bias and variance of the estimator of the unknown link function.

Finally, the performance of the proposed method is examined by

simulation studies and is illustrated by a real data example.

Key Words: Single-index model; Measurement error; Local linear smoother; SIMEX;

Estimating equation.

AMS2000 Subject Classifications: primary 62G05, 62G08; secondary 62G20

1

1 Introduction

One major problem in fitting multivariate nonparametric regression models is the

“curse of dimensionality”. To overcome the problem, the single-index model has

played an important role in the literature. In this paper, we consider the singleindex

model of the form

Y = g(β

TX) + ε, (1.1)

where Y is the response variable, X is a p × 1 covariate vector, g(·) is the unknown

link function, β = (β1, . . . , βp)

T

is the unknown index parameter, and ε is a random

error with E(ε|X) = 0 almost surely. We further assume the Euclidean norm

kβk = 1 for the identifiability purpose. Model (1.1) reduces the covariate vector

into an index which is a linear combination of covariates, and hence avoids the

“curse of dimensionality”.

Single-index models have been extensively studied in the literature. See, for example,

H¨ardle & Tsybakov (1993), H¨ardle, Hall & Ichimura (1993), Carroll, Fan,

Gijbels & Wand (1997), Xue & Zhu (2006), Li, Zhu, Xue & Feng (2010), Lai, Li

& Lian (2013), Li, Lai & Lian (2015), among others. For estimating the index

parameter and the unknown link function, Duan & Li (1991) developed the sliced

inverse regression method. H¨ardle & Tsybakov (1993) proposed the average derivative

method to obtain a root-n consistent estimator of the index vector β. Carroll

et al. (1997) used the local linear method to estimate the unknown parameters and

the unknown link function for generalized partially linear single-index models. Naik

& Tsai (2000) proposed the partial least squares estimator for single-index models.

Xue & Zhu (2006) and Zhu & Xue (2006) proposed the bias-corrected empirical

likelihood method to construct the confidence intervals or regions of the parameters

of interest. Liang, Liu, Li & Tsai (2010) proposed the semiparametrically efficient

profile least-squares estimators of regression coefficients for partially linear singleindex

models. Zhang, Huang & Lv (2010) extended the generalized likelihood ratio

test to the single-index model. Cui, H¨ardle & Zhu (2011) introduced the estimating

function method to study the single-index models. Pang & Xue (2012) and Yang,

Xue & Li (2014) investigated the single-index random effects models with longitudinal

data. Li, Peng, Dong & Tong (2014) constructed the simultaneous confidence

bands for the nonparametric link function in single-index models.

In this paper, we are interested in estimating the index parameter β and the

unknown link function g(·) in model (1.1) when the covariate vector X is measured

2

with error. We assume an additive measurement error model as

W = X + U, (1.2)

where W is the observed surrogate, U follows N(0, Σu) and is independent of (X, Y ).

When U is zero, there is no measurement error. For simplicity, we consider only the

case where the measurement error covariance matrix Σu is known. Otherwise, Σu

need to be first estimated, e.g., by the replication experiments method in Carroll,

Ruppert, Stefanski & Crainiceanu (2006). We refer to the models characterized by

(1.1) and (1.2) as the single-index measurement error model.

The measurement error models arise frequently in practice and are attracting

attention in medical and statistical research. For example, covariates such as the

blood pressure (Carroll et al. 2006) and the CD4 count (Lin & Carroll 2000, Liang

2009) are often subject to measurement error. For a class of generalized linear

measurement error models, Stefanski & Carroll (1989) and Nakamura (1990) used a

method of moment identities to construct the corrected score functions, Yang, Li &

Tong (2015) further developed the corrected empirical likelihood method. Cook &

Stefanski (1994) developed the SIMEX method to correct the effect estimates in the

presence of additive measurement error. Carroll, Lombard, K¨uchenhoff & Stefanski

(1996) further investigated the asymptotic distribution of the SIMEX estimator.

Since then, the SIMEX method has become a standard tool for correcting the biases

induced by measurement error in covariates for many complex models. Carroll,

Maca & Ruppert (1999) and Delaigle & Hall (2008) applied the SIMEX technique

to local polynomial nonparametric regression and spline-based regression. Liang &

Ren (2005) applied the SIMEX technique to the generalized partially linear models

with the linear covariate being measured with additive error. Other interesting

works in SIMEX include, for example, Cui & Zhu (2003), Ma & Carroll (2006),

Apanasovich & Carroll (2009), Ma & Li (2010), Ma & Yin (2011), Sinha & Ma

(2014), Zhang, Zhu & Zhu (2014), Cao, Lin, Shi, Wang & Zhang (2015), and Wang

& Wang (2015).

Note that the aforementioned SIMEX methods may not be able to handle the

multivariate nonparametric measurement error regression models owing to the “curse

of dimensionality”. In view of this, Liang & Wang (2005) considered the partially

linear single-index measurement error models with the linear part containing the

measurement error, where they applied the correction for attenuation approach to

obtain the efficient estimators of the parameters of interest. Their method, however,

is not applicable for the occurrence with measurement errors in the nonparametric

3

part. This motivates us to develop a new SIMEX method to solve this problem.

Specifically, we combine the SIMEX method, the local linear approximation method,

and the estimating equation to handle the single-index measurement error model.

Our method has several desirable features. First, our proposed method can deal

with multivariate nonparametric measurement error regression and avoids “curse of

dimensionality” by introducing the index parameter. Second, we use the SIMEX

technique to construct the efficient estimation and reduce the bias of the estimator,

and do not assume the distribution of the unobservable X. Third, to obtain

the efficient estimator of β, we regard the constraint kβk = 1 as a piece of prior

information and adopt the “delete-one-component” method.

The remainder of the paper is organized as follows. In Section 2, we develop the

SIMEX algorithm to obtain the estimators of the index parameter and the unknown

link function, and investigate their asymptotic properties. In Section 3, we present

and compare the results from simulation studies and also apply the proposed method

to a real data example for illustration. Some concluding remarks are given in Section

4, and the proofs of the main results are given in the Appendix.

2 Main Results

2.1 Methodology

To conduct efficient estimation for β in the presence of covariate measurement error,

Cook & Stefanski (1994) introduced the SIMEX algorithm. The SIMEX algorithm

consists of the simulation step, the estimation step, and extrapolation steps. It aims

to add additional variability to the observed W in order to establish the trend between

the measurement error induced bias and the variance of induced measurement

error, and then extrapolate this trend back to the case without measurement error

(Carroll et al. 2006). In this section, we use the SIMEX algorithm, the local linear

smoother and the estimating equation to estimate β and g(·). First, we estimate g(·)

as a function of β by using the local linear smoother. We then estimate the parametric

part based on the estimating equation. The proposed algorithm is described

as follows.

(I) Simulation step

For each i = 1, . . . , n, we generate a sequence of variables

Wib(λ) = Wi + (λΣu)

1/2Uib, b = 1, . . . , B,

4

where Uib ∼ N(0, Ip), Ip is a p × p identity matrix, B is a given integer, and

λ ∈ Λ = {λ1, λ2, . . . , λM} is the grid of λ in the extrapolation step. We set the

range from 0 to 2.

(II) Estimation step

Suppose that g(·) has a continuous second derivative. For t in a small neighborhood

of t0, g(t) can be approximated as g(t) ≈ g(t0) + g

(t0)(t − t0) ≡ a + b(t − t0).

With the simulated Wib(λ), we first estimate g(t0) as a function of β by a local linear

smoother, denoted by ˆg(β, λ;t0), in Step 1. We then propose a new estimator of

β(λ) in Steps 2 and 3, denoted by βˆ(λ). The specific procedure is as follows.

Step 1. For each fixed t0 and β, ˆg(β, λ;t0) and ˆg

(β, λ;t0) are estimated by

minimizing

with respect to a and b, where Kh(·) = h

−1K(·/h), K(·) is a kernel function with h

the bandwidth. Let ˆa and ˆb be the solutions to problem (2.1). Then, ˆg(β, λ;t0) = ˆa

and ˆg

(β, λ;t0) = ˆb. Let

Mni(β, λ;t0) = Uni(β, λ;t0)

.Xn

j=1

Unj (β, λ;t0),

Mfni(β, λ;t0) = Ueni(β, λ;t0)

.Xn

j=1

Unj (β, λ;t0),

where Uni(β, λ;t0) = Kh(β

TWib(λ)−t0){Sn,2(β, λ;t0)−[β

TWib(λ)−t0]Sn,1(β, λ;t0)},

Ueni(β, λ;t0) = Kh(β

TWib(λ) − t0){[β

TWib(λ) − t]Sn,0(β, λ;t0) − Sn,1(β, λ;t0)}, and

Sn,l(β, λ;t0) = 1

TWib(λ) − t0) for l = 0, 1, 2. Simple calculation

yields

Chang, Xue & Zhu (2010) showed that the coverage rate of the estimator of g

′(t)

is slower than that of g(t) if the same bandwidth is used. Because of this, we have

suggested another bandwidth h1 to control the variability in the estimator of g

′(t).

We use h1 to replace h in ˆg′(β, λ;t0) and write it as ˆg′h1(β, λ;t0).

5

Step 2. To estimate β, we use the “delete-one-component” method in Zhu &

Xue (2006) to transform the boundary of a unit ball in R

p

to the interior of a unit ball in R p−1

. Let β

(r) = (β1, . . . , βr−1, βr+1, . . . , βp) be a (p − 1) dimensional vector

deleting the rth component βr. Without loss of generality, we assume there is a

positive component βr; otherwise, we may consider βr = −(1 − kβ(r)k2)1/2. Let

β = (β1, . . . , βr−1,(1 − kβ(r)k)2)1/2, βr+1, . . . , βp)T.

Note that β

(r)

satisfies the constraint kβ

(r)k < 1. We conclude that β is infinitely differentiable

in a neighborhood of β

(r) and the Jacobian matrix is Jβ

(r) = (γ1, . . . , γp)T,

where γs(1 ≤ s ≤ p, s 6= r) is a (p − 1) dimensional vector with the sth component

being 1, and γr = −(1 − kβ(r)k2)− 12 β(r). Given the estimators ˆg(β, λ;t0) and gˆ′h1

(β, λ;t0) in (2.2) and (2.3), respectively, an estimator of β(r), βˆ(r)b

(λ), is obtained

by solving the following equation:

Next, we can obtain an estimator of β, say βˆ

b(λ), by implementing the Fisher’s

method of scoring version of the Newton-Raphson algorithm to solve the estimating

equation (2.4). We summarize the iterative algorithm in what follows.

(1) Choose the initial values for β, denoted by βe

b(λ), where b = 1, . . . , B.

(2) Update βeb(λ) with βeb(λ) = βˆ∗b

(3) Repeat Step (2) until convergence.

In the iterative algorithm, the initial values of β, βint, with norm 1 is obtained

by fitting a linear model.

Remark 1. Similar to Cui et al. (2011), we discuss the solution of the estimating

equation. In fact, the solution of the estimating equation Qnb(β

(r), λ) is just the

6

least-squares estimator of β(r). The least-squares objective function is defined by

G(β(r), λ) = Xn

i=1

{Yi − gˆ(β, λ; β

TWib(λ))}2.

The minimum of the objective function G(β

(r)

, λ) with respect to β

(r)

is the solution

of the estimating equation Qnb(β

(r)

, λ) because the estimating equation Qnb(β

(r)

, λ)

is the gradient vector of G(β

(r)

, λ). Note that {kβ

(r)k < 1} is an open, connected

subset of R

p−1

. By the regularity condition (C2), we known that the least-squares

objective function G(β

(r)

, λ) is twice continuously differentiable on {kβ

(r)k < 1}

such that the global minimum of G(β

(r)

, λ) can be achieved at some point. By some

simple calculations, we have

where A(β(λ), λ) is a positive definite matrix for λ ∈ Λ defined in Condition (C6).

Then, the Hessian matrix 1

is positive definite for all values of β

(r) and

λ ∈ Λ. Hence, the estimating equation (2.4) has a unique solution.

Step 3. With the estimated values βˆ

b(λ) over b = 1, . . . , B, we average them

and obtain the final estimate of β as

(III) Extrapolation step

For the extrapolant function, we consider the widely used quadratic function

G(λ, Ψ) = ψ1 + ψ2λ + ψ3λ

2 with Ψ = (ψ1, ψ2, ψ3)

T

(Lin & Carroll 2000, Liang &

Ren 2005). We fit a regression model of {βˆ(λ), λ ∈ Λ} on {λ ∈ Λ} based on G(λ, Γ),

and denote Γ as the estimated value of Γ. The SIMEX estimator of ˆ β is then defined

as βˆ

SIMEX = G(−1, Γ). When ˆ λ shrinks to 0, the SIMEX estimator reduces to the

naive estimator, βˆ

Naive = G(0, Γ), that neglects the measurement error with a direct ˆ

replacement of X by W.

The SIMEX estimator, ˆgSIMEX(t0), is obtained in the same way. β in Step 1 of

the estimation step is replaced by βˆ

SIMEX and the estimator ˆgb(λ;t0) is obtained with

the bandwidth h2. ˆgb(λ;t0) over b = 1, . . . , B is averaged, then ˆg(λ;t0) is obtained

The extrapolation step results in Aˆ, which minimizes P

λ∈Λ

{gˆ(λ;t0)−G(λ; A)}

2 with

respect to A. The SIMEX estimator of ˆgSIMEX(t0) is given by

gˆSIMEX(t0) = G(−1, Aˆ ).

2.2 Asymptotic properties

To investigate the asymptotic properties of the estimators for the index parameter

and the link function, we first present some regularity conditions.

(C1) The density function, f(t), of β

T X is bounded away from zero. It also satisfies

the Lipschitz condition of order 1 on T = {t = β

T x : x ∈ A}, where A is the

bounded support set of X.

(C2) g(·) has a continuous second derivative on T .

(C3) The kernel K(·) is a bounded and symmetric density function with a bounded

support satisfying the Lipschitz condition of order 1 and R ∞

−∞ u

2K(u)du 6= 0.

(C6) A(β(λ), λ) is a positive definite matrix for λ ∈ Λ, where

A(β(λ), λ) = E

(C7) The extrapolant function is theoretically exact.

Remark 2. Condition (C1) ensures that the the density function of β

T X is positive.

Condition (C2) is the standard condition in smoothness. Condition (C3) is

the common assumption for the second-order kernels. Condition (C4) is a necessary

condition for deriving the asymptotic normality for the proposed estimator.

Condition (C5) specifies some mild condition for the choice of bandwidth. Finally,

Condition (C6) ensures that there is asymptotic variance for the estimator βˆ

SIMEX,

and Condition (C7) is the common assumption for the SIMEX method.

To derive the theoretical results, we introduce some new definitions and notations.

For the given Λ = {λ1, . . . , λM}, let βˆ(Λ) be the vector of estimators

8

(βˆ(λ1), . . . , βˆ(λM)), denoted by vec{βˆ(λ), λ ∈ Λ}. Let also Γ = (

where Γj

is the parameter vector estimated in the extrapolation step for the jth

component of βˆ(λ) for j = 1, . . . , p. We define G(Λ, Γ) = vec{G(λm, Γj ), j =

1, . . . , p, m = 1, . . . , M}, Res(Γ) = βˆ(Λ) − G(Λ, Γ),

Theorem 1. Suppose that the regularity conditions (C1)–(C7) hold. Then, as n →

∞, we have

−→ denotes the convergence in distribution, GΓ(λ, Γ) = {∂/∂(Γ)

T }G(λ, Γ),

Theorem 1 indicates that βˆ

SIMEX is a root-n consistent estimator. Its asymptotic

distribution is similar to that of the parametric estimator of β without measurement

error, whereas the asymptotic covariance matrix of the resulting estimator is more

complicated.

Let f0(·) be the density function of β

(λ, A)Eq, where Eq is the q × q matrix of all elements being

zero except for the first element being one and q is the dimension of A.

9

Theorem 2. Suppose that the regularity conditions (C1)–(C7) hold, and assume

that nh5

2 = O(1). Then, as n → ∞ and B → ∞, the SIMEX estimator gˆSIMEX(t0)

is asymptotically equivalent to an estimator whose bias and variance are given respectively

Theorem 2 implies that the βˆ

SIMEX does not affect the estimator of ˆgSIMEX(t0)

because βˆ

SIMEX is root-n consistent. As pointed out in Carroll et al. (1999), the

variance of ˆgSIMEX(t0) is asymptotically the same as if the measurement error was

ignored, but multiplied by a factor, C(Λ, A)DCT

(Λ, A), which is independent of the

regression function.

3 Numerical studies

3.1 Simulation study

In this section, we evaluate the finite sample performance of the proposed method

via simulation studies. Consider the following model

is a two-dimensional vector with independent

N(0, 1) components, the error εi

is generated from N(0, 0.22), Yi

is generated

according to the model, Ui

is generated from N(0, diag(σ2u, 0)). We take σu = 0.2, 0.4

and 0.6 to represent different levels of measurement errors. In simulation study,

we compare the naive estimates (Naive) that ignore measurement errors and the

SIMEX estimates with quadratic extrapolation function. The sizes of the samples

are n = 50, 100 and 150. For each setting, we simulate 500 times to assess the performance.

Using the SIMEX algorithm, we take λ = 0, 0.2, . . . , 2 and B = 50. We

use the Epanechnikov kernel K(u) = 0.75(1−u

2

)+. As pointed out in Liang & Wang

(2005), the computation is quite expensive for the SIMEX method. In view of this,

we apply a “rule of thumb” to select the bandwidths, which is the same in spirit as

the selection method in Apanasovich & Carroll (2009). Specifically, the bandwidths

10

h, h1 and h2 are taken to be cn−1/4

(log n)

−1/2

, cn−1/5 and cn−1/5

, where c is the

standard deviation of β

T

intW. To explained the rationality of the “rule of thumb”

(RT), we compare with the results of simulations by using the cross-validation (CV)

method to select the bandwidths. We apply the same bandwidths for each λ and b

since it is time consuming for the CV method. The CV statistic is given by

CV(h) = 1

where ˆg[i](·) and βˆ

[i] are the SIMEX estimators of g(·) and β which are computed with

all of the samples but the ith subject deleted. The hopt is obtained by minimizing

CV(h). It can be shown hopt = Cn−1/5

for a constant C > 0. Therefore, we use the

bandwidths

h = hoptn

−1/20(log n)

−1/2

, h1 = hopt, h2 = hopt.

To evaluate the performance of the bandwidth selection for the CV method, we

first plot the CV(h) versus the bandwidth h. The simulation result is shown in

Figure 1 with n = 100 and σµ = 0.4 for one run, and other cases are similar. Figure

1 shows the relationship of CV(h) versus h with h ranging from [0.1, 1]. From Figure

1, we can see that the CV(h) function is convex, and reaches the minimum value

when h is around 0.35.

Table 1 summarizes the biases and standard deviations (SD) of the parameter

β obtained by the SIMEX and naive estimators with the two different bandwidth

selections. From Table 1, the results of the SIMEX and naive estimators made by

different bandwidths have little difference. Hence, to reduce the calculation time,

we use the “rule of thumb” to select the bandwidths in the real data analysis.

Next, we compare the naive estimators and the SIMEX estimators. From Table

1, we can see that the SIMEX estimates of β1 and β2 have smaller biases than the

naive estimates. However, the standard deviations based on the SIMEX estimates

are larger than those based on the naive estimates. We can also see that the bias and

SD decrease as n increases and the estimators depend on the measurement error.

The performance of the estimator for the link function g(t) is discussed by 500

replications. The estimator ˆg(t) is ˆg(t) = 1

gˆm(t). To assess the estimator

gˆ(t), we use the root mean squared error (RMSE), which is given by

RMSE = "

Figure 1: Plot of the CV(h) versus the bandwidth h with n = 100 and σµ = 0.4.

where ngrid is the number of grid points, and {tk, k = 1, 2, . . . , ngrid} are equidistant

grid points. In the simulation study, we take ngrid = 15. The estimated link function

and the boxplot for the 500 RMSEs are given in Figure 2. From Figure 2 (a), we

see that the SIMEX estimated curve is closer to the real link function curve than

the naive estimated curve. Figure 2 (b) shows that the RMSEs of the SIMEX and

naive estimators for the link function are not large, but the RMSEs of the SIMEX

estimator are slightly larger than the naive estimator.

Note that the SD and RMSE based on the SIMEX estimators are larger than

the naive estimators for the parameter β and the link function g(·), respectively.

This can be intuitively illustrated with the linear model. Consider the linear model

Y = β0 +βxx+ǫ, where E(ǫ) = 0 and Var(ǫ) = σ2ǫ. If replacing x with W +

√λσeeb,

where eb ∼ N(0, 1) and W = x + e with e have mean 0 and variance σ2e, then

βˆx(b, λ) has the asymptotic variance {σ2ǫ/[σ2x+(1+λ)σ2e]}. If λ = −1, then βx(b, −1)

is identical to the true parameter, with the asymptotic variance σ2ǫ /σ2x. If λ = 0,

βx(b, 0) is just the naive estimator, with the asymptotic variance σ2ǫ/(σ2x+σ2e). Hence,

it can be seen easily that the SD or RMSE of the naive estimators is smaller than

that of the SIMEX estimators.

12

Table 1: The biases and standard deviations (SD) of the parameters β1 and β2

obtained by the SIMEX and naive estimators.

SIMEX Naive

Figure 2: (a) The real curve (solid curve), the naive estimated curve (dashed curve)

and the SIMEX estimated curve (dotted-dashed curve) for the link function g(t) when

n = 100 and σu = 0.4. (b) The boxplots of the 500 RMSE values for the estimate of

g(t).

13

3.2 Real data analysis

We now analyze a data set from the Framingham Heart Study to illustrate the

proposed method. The data set contains 5 variables with 1615 males and it has

been used by many authors to illustrate semiparametric partially linear models (see

Liang, H¨ardle & Carroll (1999), Wang, Brown & Cai (2011)). We are interested in

whether the age and the serum cholestoral have an effect to the blood pressure. We

use the proposed model to analyze the Framingham data to compare the SIMEX and

naive estimators. We use the Epanechnikov kernel and the bandwidths h = 0.0589

and h1 = h2 = 0.2309. Let Y be their average blood pressure in a fixed two-year

period, W1 and W2 be the standardized variable for the logarithm of the serum

cholestoral level (log(SC)) and age, respectively. Similar to Liang et al. (1999),

W1 is subject to the measurement error U and σ

2

u

is estimated to be 0.2632 by

two replicates experiments. Figure 3 shows the duplicated serum cholestoral level

measurements from 1615 males. The estimators of β and g(·) based on the SIMEX

and naive methods are reported in Table 2, Figure 4 and Figure 5.

200 300 400 500

100 200 300 400 500

First serum cholesterol level

Second serum cholesterol level

Figure 3: Duplicated serum cholestoral level measurements from 1615 males in

Framingham Heart Study.

Table 2: The estimators of the parameters obtained by the SIMEX and naive

methods for the Framingham data.

Method log(SC) Age

SIMEX 0.5237 0.8502

Naive 0.4194 0.9099

14

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

0.0 0.2 0.4 0.6 0.8

Lambda

Cholesterol (lambda)

SIMEX Estimate

Naive Estimate

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

0.75 0.80 0.85 0.90 0.95 1.00

Lambda

Age(lambda)

SIMEX Estimate

Naive Estimate

Figure 4: The extrapolated point estimators for the Framingham data. The simulated

estimates {βˆ(λ), λ} are plotted (dots), and the fitted quadratic function (solid

lines) is extrapolated to λ = −1. The extrapolation results are the SIMEX estimates

(squares).

−2 −1 0 1 2

120 125 130 135 140

Naive

Figure 5: The link function estimators for the Framingham data: the naive estimated

curve (solid curve) and the SIMEX estimated curve (dashed curve).

15

From Table 2, we can see that the SIMEX estimate of the index coefficient

log(SC) is larger, while the SIMEX estimate of Age is smaller than the naive estimate.

The results also show that the serum cholestoral and the age are statistically

significant. Figure 4 shows the trace of the extrapolation step for the SIMEX algorithm.

The estimates of the two index coefficients for the different λ values are

plotted. The SIMEX estimates of index coefficients correspond to −1 on the horizontal

axis, while the naive estimates correspond to 0 on the horizontal axis. Figure

5 shows that the estimates of g(·) are obtained by the SIMEX method and the naive

method. The patterns of the two curves are similar. Table 2 and Figure 5 show

that the age and the serum cholestoral have a positive association with the blood

pressure. As expected, when the measurement error is taken into account, we find a

somewhat stronger positive association between the serum cholestoral and the blood

pressure. Liang et al. (1999) also analyzed the relationship among the blood pressure,

the age, and the logarithm of serum cholesterol level by the partially linear

errors-in-variables model, where the logarithm of serum cholesterol level was the

covariate of the corresponding parameter and the age was a scalar covariate of the

corresponding unknown function. When they accounted for the measurement error,

the estimator of the parameter was larger than that of ignoring the measurement

error. It implied that the blood pressure and the serum cholestoral had a stronger

positive correlation when considering the measurement error. The estimator of the

unknown function showed that the age was positively associated with the blood

pressure. Our findings basically agree with those discovered in Liang et al. (1999).

4 Conclusion

We propose the SIMEX estimation of the index parameter and the unknown link

function for single-index models with covariate measurement error. The asymptotic

normality of the estimator of the index parameter and the asymptotic bias and

variance of the estimator of the unknown link function are derived under some

regularity conditions. The proposed index parameter estimator is root-n consistent,

which is similar to that of the estimator of a parameter without measurement error,

but the asymptotic covariance has a complicated form. The asymptotic variance

of the estimator of the unknown link function is of order (nh2)

−1

. Our simulation

studies indicate that the proposed method works well in practice.

The proposed method can be extended to some other models, including partially

linear single-index models with measurement error in nonparametric components

16

and generalized single-index models with covariate measurement error. We can also

extend to single-index measurement error models with cluster data by assuming

working independence in the estimating equations. Future study is needed to investigate

how to take into account the within-cluster correlation for cluster data

to improve the efficiency of the estimator of the index parameter for single-index

measurement error models with cluster data.

Appendix

The following notation will be used in the proofs of the lemmas and theorems. Set

β0 be true value, Bn = {β : kβk = 1, kβ − β0k ≤ c1n

−1/2} for some positive constant

c1. Let fλ(·) be the density function of β

TWb(λ). Note that if λ = 0, f0(·) is the

density function of β

TW.

Lemma 1. Let (ζ1, η1), . . . ,(ζn, ηn) be i.i.d. random vectors, where ηi

’s are scalar

random variables. Assume further that E|η1|

s < ∞, and supx R|y|

sf(x, y)dy < ∞,

where f(·, ·) denotes the joint density of (ζ1, η1). Let K(·) be a bounded positive

function with a bounded support, satisfying a Lipschitz condition.

Proof: This follows immediately from the result that was obtained by Mack &

Silverman (1982).

Lemma 2. Suppose that conditions (C1)–(C4) hold.

Proof: By the theory of least squares, we have

ξn(β, λ;t) = (ξn,0(β, λ;t)), ξn,1(β, λ;t))T

for l = 0, 1, 2. A simple calculation yields, for l = 0, 1, 2, 3,

E[h

−1Sn,l(β, λ;t)] = fλ(t)µl + O(h). (A.2)

By Lemma 1, we have

h

−1Sn,l(β, λ;t) − E[h

−1Sn,l(β, λ;t)] = Op



log(1/h)

nh 1/2!,

which, combining with (A.2), proves that, for t ∈ T and β ∈ Bn,

where S(λ;t) = fλ(t) ⊗ diag(1, µ2), and ⊗ is the Kronecker product.

which has mean zero and the following asymptotic variance

[nh2f0(t0)]−1

var(Y |β

TW = t0)ν2. (A.10)

For λ > 0, using the similar argument of (A8) in Carroll et al. (1999), we have

var(ˆg(λ;t0)) = O{(nh2B)

Then, for B sufficiently large, the variability of ˆg(λ; ·) is negligible for λ > 0 compared

to λ = 0. Hence, in what follows, we will ignore this variability by treating B

as if it was equal to infinity.

We obtain Aˆ by solving the following equation

Applying the Taylor expansion for the left side of (A.11), we obtain

The left side of (A.12) has approximate mean

and its approximate variance is given by

[nh2f0(t0)]−1

Because ˆgSIMEX(t0) = G(−1, Aˆ), so that its asymptotic bias is

and its asymptotic variance is

[nh2f0(t0)]−1

ν2var(Y |β

TW = t0)C(Λ, A)DCT

(Λ, A).

This completes the proof.

21

References

Apanasovich, T. V. & Carroll, R. J. (2009). SIMEX and standard error estimation

in semiparametric measurement error models, Electronic Journal of Statistics

3: 318–348.

Cao, C. Z., Lin, J. G., Shi, J. Q., Wang, W. & Zhang, X. Y. (2015). Multivariate

measurement error models for replicated data under heavy-tailed distributions,

Journal of Chemometrics 29: 457–466.

Carroll, R. J., Fan, J., Gijbels, I. & Wand, M. P. (1997). Generalized partially linear

single-index models, Journal of the American Statistical Association 92: 477–

489.

Carroll, R. J., Lombard, F., K¨uchenhoff, H. & Stefanski, L. A. (1996). Asymptotics

for the SIMEX estimator in structural measurement error models, Journal of

the American Statistical Association 91: 242–250.

Carroll, R. J., Maca, J. & Ruppert, D. (1999). Nonparametric regression in the

presence of measurement error, Biometrika 86: 541–554.

Carroll, R. J., Ruppert, D., Stefanski, L. A. & Crainiceanu, C. M. (2006). Measurement

Error in Nonlinear Model, 2nd ed. Chapman & Hall, London.

Chang, Z. Q., Xue, L. G. & Zhu, L. X. (2010). On an asymptotically more ef-

ficient estimation of the single-index model, Journal of Multivariate Analysis

101: 1898–1901.

Cook, J. & Stefanski, L. A. (1994). Simulation-extrapolation method in parametric

measurement error models, Journal of the American Statistical Association

89: 1314–1328.

Cui, H. J. & Zhu, L. X. (2003). Semiparametric regression model with errors in

variables, Scandinavian Journal of Statistics 30: 429–442.

Cui, X., H¨ardle, W. & Zhu, L. X. (2011). The EFM approach for single-index

models, Annals of Statistics 39: 1658–1688.

Delaigle, A. & Hall, P. (2008). Using SIMEX for smoothing parameter choice in

errors-in-variables problems, Journal of the American Statistical Association

130: 280–287.

22

Duan, N. & Li, K. C. (1991). Slicing regression: a link free regression method,

Annals of Statistics 19: 505–530.

H¨ardle, W., Hall, P. & Ichimura, H. (1993). Optimal smoothing in single-index

models, Annals of Statistics 21: 157–178.

H¨ardle, W. & Tsybakov, A. B. (1993). How sensitive are average derivative, Journal

of Econometrics 58: 31–48.

Lai, P., Li, G. R. & Lian, H. (2013). Quadratic inference functions for partially linear

single-index models with longitudinal data, Journal of Multivariate Analysis

118: 115–127.

Li, G. R., Lai, P. & Lian, H. (2015). Variable selection and estimation for partially

linear single-index models with longitudinal data, Statistics and Computing

25: 579–593.

Li, G. R., Peng, H., Dong, K. & Tong, T. J. (2014). Simultaneous confidence bands

and hypothesis testing in single-index models, Statistica Sinica 24: 937–955.

Li, G. R., Zhu, L. X., Xue, L. G. & Feng, S. Y. (2010). Empirical likelihood

inference in partially linear single-index models for longitudinal data, Journal

of Multivariate Analysis 101: 718–732.

Liang, H. (2009). Generalized partially linear mixed-effects models incorporating

mismeasured covariates, Annals of the Institute of Statistical Mathematics

61: 27–46.

Liang, H., H¨ardle, W. & Carroll, R. J. (1999). Estimation in a semiparametric

partially linear errors-in-variables model, Annals of Statistics 27: 1519–1535.

Liang, H., Liu, X., Li, R. Z. & Tsai, C. L. (2010). Estimation and testing for partially

linear single-index models, Annals of Statistics 38: 3811–3836.

Liang, H. & Ren, H. (2005). Generalized partially linear measurement error models,

Journal of Computational and Graphical Statistics 14: 237–250.

Liang, H. & Wang, N. (2005). Partially linear single-index measurement error models,

Statistica Sinica 15: 99–116.

23

Lin, X. & Carroll, R. J. (2000). Nonparametric function estimation for clustered data

when the predictor is measured without/with error, Journal of the American

Statistical Association 95: 520–534.

Ma, Y. & Carroll, R. J. (2006). Locally efficient estimators for semiparametric models

with measurement error, Journal of the American Statistical Association

101: 1465–1474.

Ma, Y. Y. & Li, R. Z. (2010). Variable selection in measurement error models,

Bernoulli 16: 274–300.

Ma, Y. & Yin, G. (2011). Censored quantile regression with covariate measurement

errors, Statistica Sinica 21: 949–971.

Mack, Y. P. & Silverman, B. W. (1982). Weak and strong uniform consistency of

kernel regression estimates, Z. Wahrsch. verw. Gebiete 61: 405–415.

Naik, P. & Tsai, C. L. (2000). Partial least squares estimator for single-index models,

Journal of the Royal Statistical Society, Series B 62: 763–771.

Nakamura, T. (1990). Corrected score functions for errors-in-variables models:

methodology and application to generalized linear models, Biometrika 77: 127–

137.

Pang, Z. & Xue, L. G. (2012). Estimation for the single-index models with random

effects, Computational Statistics & Data Analysis 56: 1837–1853.

Sinha, S. & Ma, Y. (2014). Semiparametric analysis of linear transformation models

with covariate measurement errors, Biometrics 70: 21–32.

Stefanski, L. & Carroll, R. (1989). Unbiased estimation of a nonlinear function of a

normal mean with application to measurement error models, Communications

in Statistics: Theory and Methods 18: 4335–4358.

Wang, L., Brown, L. D. & Cai, T. T. (2011). A difference based approach to the

semiparametric partial linear model, Electronic Journal of Statistics 5: 619–641.

Wang, X. & Wang, Q. (2015). Semiparametric linear transformation model with differential

measure- ment error and validation sampling, Journal of Multivariate

Analysis 141: 67–80.

24

Xue, L. G. & Zhu, L. X. (2006). Empirical likelihood for single-index models, Journal

of Multivariate Analysis 97: 1295–1312.

Yang, S. G., Xue, L. G. & Li, G. R. (2014). Simultaneous confidence bands for singleindex

random effects models with longitudinal data, Statistics and Probability

Letters 85: 6–14.

Yang, Y. P., Li, G. R. & Tong, T. J. (2015). Corrected empirical likelihood for a class

of generalized linear measurement error models, Science China Mathematics 58.

Zhang, J., Zhu, L. & Zhu, L. (2014). Surrograte dimension reduction in measurement

error regressions, Statistica Sinica 24: 1341–1363.

Zhang, R. Q., Huang, Z. S. & Lv, Y. Z. (2010). Statistical inference for the index

parameter in single-index models, Journal of Multivariate Analysis 101: 1026–

1041.

Zhu, L. X. & Xue, L. G. (2006). Empirical likelihood confidence regions in a partially

linear single-index model, Journal of the Royal Statistical Society, Series B

68: 549–570.

25


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp