代写belt Data、代写R语言、R程序设计调试、代做License

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Algorithm 算法作业Algorithm 算法作业

代写belt Data、代写R语言、R程序设计调试、代做License_restricts

日期：2019-07-23 10:51

Package ‘cat’

February 19, 2015

Version 0.0-6.5

Repository CRAN

Date/Publication 2012-10-30 18:21:53

NeedsCompilation yes

License_restricts_use no

R topics documented:

belt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

bipf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

crimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

da.cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

dabipf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

ecm.cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

em.cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

imp.cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

ipf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

logpost.cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

mda.cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

mi.inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

older . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

prelim.cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

rngseed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Index 23

2 bipf

belt Data on driver injury and seat belt use

Description

Data on driver injury and seat belt use.

Usage

data(belt)

Format

The data frame belt.frame contains the following columns:

I Injury to driver (I1=Reported by police, I2=Follow up

B Belt use (B1=Reported by police, B2=Follow up

D Damage to vehicle (high, low)

S Sex: Male or Female

Freq Count

Note

A matrix belt with similarly named columns exists that can be input directly to functions which do

not admit data frames. Both the data frame and matrix include all complete and incomplete cases,

from the police reports and follow up study.

Source

Schafer (1996) Analysis of Incomplete Multivariate Data. Chapman \& Hall, Section 7.4.3, which

cites

Hochberg, Y. (1977) On the use of double sampling schemes in analyzing categorical data with

misclassification errors, JASA, vol. 71, p. 914-921.

bipf Bayesian Iterative Proportional Fitting (BIPF)

Description

Markov-Chain Monte Carlo method for simulating posterior draws of cell probabilities under a

hierarchical loglinear model

Usage

bipf(table,margins, prior=0.5, start, steps=1, showits=FALSE)

bipf 3

Arguments

table contingency table (array) to be fitted by a log-linear model. All elements must

be non-negative.

margins vector describing the marginal totals to be fitted. A margin is described by the

factors not summed over, and margins are separated by zeros. Thus c(1,2,0,2,3,0,1,3)

would indicate fitting the (1,2), (2,3), and (1,3) margins in a three-way table, i.e.,

the model of no three-way association.

prior optional array of hyperparameters specifying a Dirichlet prior distribution. The

default is the Jeffreys prior (all hyperparameters = .5). If structural zeros appear

in table, a prior should be supplied with hyperparameters set to NA for those

cells.

start starting value for the algorithm. The default is a uniform table. If structural zeros

appear in table, start should contain zeros in those cells and ones elsewhere.

steps number of cycles of Bayesian IPF to be performed.

showits if TRUE, reports the iterations so the user can monitor the progress of the algorithm.

Value

array like table, but containing simulated cell probabilities that satisfy the loglinear model. If the

algorithm has converged, this will be a draw from the actual posterior distribution of the parameters.

Note

The random number generator seed must be set at least once by the function rngseed before this

function can be used.

The starting value must lie in the interior of the parameter space. Hence, caution should be used

when using a maximum likelihood estimate (e.g., from ipf) as a starting value. Random zeros in

a table may produce mle’s with expected cell counts of zero, and any zero in a starting value is

interpreted by bipf as a structural zero. This difficulty can be overcome by using as a starting value

calculated by ipf after adding a small positive constant (e.g., 1/2) to each cell.

References

Schafer (1996) Analysis of Incomplete Multivariate Data. Chapman \& Hall, Chapter 8.

See Also

ipf and rngseed.

Examples

data(HairEyeColor) # load data

m=c(1,2,0,1,3,0,2,3) # no three-way interaction

thetahat <- ipf(HairEyeColor,margins=m,

showits=TRUE) # fit model

thetahat <- ipf(HairEyeColor+.5,m) # find an interior starting value

rngseed(1234567) # set random generator seed

4 da.cat

theta <- bipf(HairEyeColor,m,

start=thetahat,prior=0.5,

steps=50) # take 50 steps

crimes U.S. National Crime Survey

Description

Victimization status of households on two occasions.

Usage

data(crimes)

Format

The matrix crimes contains the following columns:

V1 Victimization status on first occasion (1=No, 2=Yes)

V1 Victimization status on second occasion (1=No, 2=Yes)

Freq Count

Source

Schafer (1996) Analysis of Incomplete Multivariate Data. Chapman \& Hall, Section 7.4.3, which

cites

Kadane, J.B. (1985) Is victimization chronic? A Bayesian Analysis of multinomial missing data,

Journal of Econometrics, vol. 29, p. 47-67.

da.cat Data Augmentation algorithm for incomplete categorical data

Description

Markov-Chain Monte Carlo method for simulating draws from the observed-data posterior distribution

of underlying cell probabilities under a saturated multinomial model. May be used in

conjunction with imp.cat to create proper multiple imputations.

Usage

da.cat(s, start, prior=0.5, steps=1, showits=FALSE)

da.cat 5

Arguments

s summary list of an incomplete categorical dataset created by the function prelim.cat.

start starting value of the parameter. This is an array of cell probabilities of dimension

s$d, such as one created by em.cat. If structural zeros appear in the table,

starting values for those cells should be zero.

prior optional array of hyperparameters specifying a Dirichlet prior distribution. The

default is the Jeffreys prior (all hyperparameters = supplied with hyperparameters

set to NA for those cells.

steps number of data augmentation steps to be taken. Each step consists of an imputation

or I-step followed by a posterior or P-step.

showits if TRUE, reports the iterations so the user can monitor the progress of the algorithm.

Details

At each step, the missing data are randomly imputed under their predictive distribution given the

observed data and the current value of theta (I-step), and then a new value of theta is drawn

from its Dirichlet posterior distribution given the complete data (P-step). After a suitable number

of steps are taken, the resulting value of the parameter may be regarded as a random draw from its

observed-data posterior distribution.

When the pattern of observed data is close to a monotone pattern, then mda.cat is preferred because

it will tend to converge more quickly.

Value

an array like start containing simulated cell probabilities.

Note

IMPORTANT: The random number generator seed must be set at least once by the function rngseed

before this function can be used.

References

Schafer (1996) Analysis of Incomplete Multivariate Data, Chapman \& Hall, Chapter 7.

See Also

prelim.cat, rngseed, mda.cat, imp.cat.

Examples

data(crimes)

x <- crimes[,-3]

counts <- crimes[,3]

s <- prelim.cat(x,counts) # preliminary manipulations

thetahat <- em.cat(s) # find ML estimate under saturated model

rngseed(7817) # set random number generator seed

6 dabipf

theta <- da.cat(s,thetahat,50) # take 50 steps from MLE

ximp <- imp.cat(s,theta) # impute once under theta

theta <- da.cat(s,theta,50) # take another 50 steps

ximp <- imp.cat(s,theta) # impute again under new theta

dabipf Data augmentation-Bayesian IPF algorithm for incomplete categorical

data

Description

Markov-Chain Monte Carlo method for simulating draws from the observed-data posterior distribution

of underlying cell probabilities under hierarchical loglinear models. May be used in conjunction

with imp.cat to create proper multiple imputations.

Usage

dabipf(s, margins, start, steps=1, prior=0.5, showits=FALSE)

Arguments

s summary list of an incomplete categorical dataset created by the function prelim.cat.

margins vector describing the marginal totals to be fitted. A margin is described by the

factors not summed over, and margins are separated by zeros. Thus c(1,2,0,2,3,0,1,3)

would indicate fitting the (1,2), (2,3), and (1,3) margins in a three-way table, i.e.,

the model of no three-way association.

start starting value of the parameter. The starting value should lie in the interior of

the parameter space for the given loglinear model. If structural zeros are present,

start should contain zeros in those positions.

steps number of complete cycles of data augmentation-Bayesian IPF to be performed.

prior optional array of hyperparameters specifying a Dirichlet prior distribution. The

default is the Jeffreys prior (all hyperparameters = .5). If structural zeros are

present, a prior should be supplied with hyperparameters set to NA for those

cells.

showits if TRUE, reports the iterations so the user can monitor the progress of the algorithm.

Value

array of simulated cell probabilities that satisfy the loglinear model. If the algorithm has converged,

this will be a draw from the actual posterior distribution of the parameters.

dabipf 7

Note

The random number generator seed must be set at least once by the function rngseed before this

function can be used.

The starting value must lie in the interior of the parameter space. Hence, caution should be used

when using a maximum likelihood estimate (e.g., from ecm.cat) as a starting value. Random zeros

in a table may produce mle’s with expected cell counts of zero. This difficulty can be overcome by

using as a starting value a posterior mode calculated by ecm.cat with prior hyperparameters greater

than one.

References

Schafer (1996) Analysis of Incomplete Multivariate Data. Chapman \& Hall, Chapter 8.

Examples

# Example 1 Based on Schafer's p. 329 and ss. This is a toy version,

# using a much shorter length of chain than required. To

# generate results comparable with those in the book, edit

# the \dontrun{ } line below and comment the previous one.

data(belt)

attach(belt.frame)

EB <- ifelse(B1==B2,1,0)

EI <- ifelse(I1==I2,1,0)

belt.frame <- cbind(belt.frame,EB,EI)

colnames(belt.frame)

a <- xtabs(Freq ~ D + S + B2 + I2 + EB + EI,

data=belt.frame)

m <- list(c(1,2,3,4),c(3,4,5,6),c(1,5),

c(1,6),c(2,6))

b <- loglin(a,margin=m) # fits (DSB2I2)B2I2EBEI)(DEB)(DEI)(SEI)

# in Schafer's p. 304

a <- xtabs(Freq ~ D + S + B2 + I2 + B1 + I1,

data=belt.frame)

m <- list(c(1,2,5,6),c(1,2,3,4),c(3,4,5,6),

c(1,3,5),c(1,4,6),c(2,4,6))

b <- loglin(a,margin=m) # fits (DSB1I1)(DSB2I2)(B2I2B1I1)(DB1B2)

# (DI1I2)(SI1I2) in Schafer's p. 329

s <- prelim.cat(x=belt[,-7],counts=belt[,7])

m <- c(1,2,5,6,0,1,2,3,4,0,3,4,5,6,0,1,3,5,0,1,4,6,0,2,4,6)

theta <- ecm.cat(s,margins=m, # excruciantingly slow; needs 2558

maxits=5000) # iterations.

rngseed(1234)

# Now ten multiple imputations of the missing variables B2, I2 are

# generated, by running a chain and taking every 2500th observation.

# Prior hyperparameter is set at 0.5 as in Shchafer's p. 329

imputations <- vector("list",10)

8 dabipf

for (i in 1:10) {

cat("Doing imputation ",i,"\n")

theta <- dabipf(s,m,theta,prior=0.5, # toy chain; for comparison with

steps=25) # results in Schafer's book the next

# statement should be run,

# rather than this one.

## Not run: theta <- dabipf(s,m,theta,prior=0.5,steps=2500)

imputations[[i]] <- imp.cat(s,theta)

}

detach(belt.frame)

# Example 2 (reproduces analysis performed in Schafer's p. 327.)

# Caveat! I try to reproduce what has been done in that page, but although

# the general appearance of the boxplots generated below is quite similar to

# that of Schafer's Fig. 8.4 (p. 327), the VALUES of the log odds do not

# quite fall in line with those reported by said author. It doesn't look like

# the difference can be traced to decimal vs. natural logs. On the other hand,

# Fig. 8.4 refers to log odds, while the text near the end of page 327 gives

# 1.74 and 1.50 as the means of the *odds* (not log odds). FT, 22.7.2003.

data(older) # reading data

x <- older[,1:6] # preliminary manipulations

counts <- older[,7]

s <- prelim.cat(x,counts)

colnames(x) # names of columns

rngseed(1234)

m <- c(1,2,3,4,5,0,1,2,3,5,6,0,4,3) # model (ASPMG)(ASPMD)(GD) in

# Schafer's p. 327

# do analysis with different priors

theta <- ecm.cat(s,m,prior=1.5) # Strong pull to uniform table

# for initial estimates

prob1 <- dabipf(s,m,theta,steps=100, # Burn-in period

prior=0.1)

prob2 <- dabipf(s,m,theta,steps=100, # Id. with second prior

prior=1.5)

lodds <- matrix(0,5000,2) # Where to store log odds ratios.

oddsr <- function(x) { # Odds ratio of 2 x 2 table.

o <- (x[1,1]*x[2,2])/

(x[1,2]*x[2,1])

return(o)

}

for(i in 1:5000) { # Now generate 5000 log odds

prob1 <- dabipf(s,m,prob1, prior=0.1)

t1 <- apply(prob1,c(1,2),sum) # Marginal GD table

ecm.cat 9

# Log odds ratio

lodds[i,1] <- log(oddsr(t1))

prob2 <- dabipf(s,m,prob2, prior=1.5) # Id. with second prior

t2 <- apply(prob2,c(1,2),sum)

lodds[i,2] <- log(oddsr(t2))

}

lodds <- as.data.frame(lodds)

colnames(lodds) <- c("0.1","1.5") # Similar to Schafer's Fig. 8.4.

boxplot(lodds,xlab="Prior hyperparameter")

title(main="Log odds ratio generated with DABIPF (5000 draws)")

summary(lodds)

ecm.cat ECM algorithm for incomplete categorical data

Description

Finds ML estimate or posterior mode of cell probabilities under a hierarchical loglinear model

Usage

ecm.cat(s, margins, start, prior=1, showits=TRUE, maxits=1000,

eps=0.0001)

Arguments

s summary list of an incomplete categorical dataset produced by the function

prelim.cat.

margins vector describing the sufficient configurations or margins in the desired loglinear

model. A margin is described by the factors not summed over, and margins are

separated by zeros. Thus c(1,2,0,2,3,0,1,3) would indicate the (1,2), (2,3), and

(1,3) margins in a three-way table, i.e., the model of no three-way association.

The integers 1,2,. . . in the specified margins correspond to the columns of the

original data matrix x that was used to create s.

start optional starting value of the parameter. This is an array with dimensions s$d

whose elements sum to one. The default starting value is a uniform array (equal

probabilities in all cells). If structural zeros appear in the table, start should

contain zeros in those positions and nonzero (e.g. uniform) values elsewhere.

prior optional vector of hyperparameters for a Dirichlet prior distribution. The default

is a uniform prior distribution (all hyperparameters = 1) on the cell probabilities,

which will result in maximum likelihood estimation. If structural zeros appear

in the table, a prior should be supplied with NAs in those cells.

showits if TRUE, reports the iterations of ECM so the user can monitor the progress of

the algorithm.

maxits maximum number of iterations performed. The algorithm will stop if the parameter

still has not converged after this many iterations.

10 ecm.cat

eps convergence criterion. This is the largest proportional change in an expected cell

count from one iteration to the next. Any expected cell count that drops below

1E-07 times the average cell probability (1/number of non-structural zero cells)

is set to zero during the iterations.

Details

At each iteration, performs an E-step followed by a single cycle of iterative proportional fitting.

Value

array of dimension s$d containing the ML estimate or posterior mode, assuming that ECM has

converged by maxits iterations.

Note

If zero cell counts occur in the observed-data tables, the maximum likelihood estimate may not be

unique, and the algorithm may converge to different stationary values depending on the starting

value. Also, if zero cell counts occur in the observed-data tables, the ML estimate may lie on the

boundary of the parameter space. Supplying a prior with hyperparameters greater than one will give

a unique posterior mode in the interior of the parameter space. Estimated probabilities for structural

zero cells will always be zero.

References

Schafer (1996), Analysis of Incomplete Multivariate Data. Chapman \& Hall, Chapter 8

X. L. Meng and D. B. Rubin (1991), "IPF for contingency tables with missing data via the ECM

algorithm," Proceedings of the Statistical Computing Section, Amer. Stat. Assoc., 244–247.

See Also

prelim.cat, em.cat, logpost.cat

Examples

data(older) # load data

# Example 1

older[1:2,] # see partial content; older.frame also

# available.

s <- prelim.cat(older[,-7],older[,7]) # preliminary manipulations

m <- c(1,2,5,6,0,3,4) # margins for restricted model

try(thetahat1 <- ecm.cat(s,margins=m))# will complain

thetahat2 <- ecm.cat(s,margins=m,prior=1.1)

# same model with prior information

logpost.cat(s,thetahat2) # loglikelihood under thetahat2

# Example 2 (reproduces analysis performed in Schafer's p. 327.)

m1 <- c(1,2,3,5,6,0,1,2,4,5,6,0,3,4) # model (ASPMG)(ASPMD)(GD) in

em.cat 11

# Schafer's p. 327

theta1 <- ecm.cat(s,margins=m1,

prior=1.1) # Prior to bring MLE away from boundary.

m2 <- c(1,2,3,5,6,0,1,2,4,5,6) # model (ASPMG)(ASPMD)

theta2 <- ecm.cat(s,margins=m2,

prior=1.1)

lik1 <- logpost.cat(s,theta1) # posterior log likelihood.

lik2 <- logpost.cat(s,theta2) # id. for restricted model.

lrt <- -2*(lik2-lik1) # for testing significance of (GD)

p <- 1 - pchisq(lrt,1) # significance level

cat("LRT statistic for \n(ASMPG)(ASMPD) vs. (ASMPG)(ASMPD)(GD): ",lrt," with p-value = ",p)

em.cat EM algorithm for incomplete categorical data

Description

Finds ML estimate or posterior mode of cell probabilities under the saturated multinomial model.

Usage

em.cat(s, start, prior=1, showits=TRUE, maxits=1000,