似然比检验-dreamjdn-ChinaUnix博客

阳光的味道dreamjdn.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

dreamjdn

博客访问： 695629
博文数量： 90
博客积分： 1631
博客等级：上尉
技术积分： 1413
用户组：普通用户
注册时间： 2008-04-15 22:43

文章分类

全部博文（90）

文章存档

2017年（8）

2016年（9）

2015年（11）

2014年（10）

2013年（9）

2012年（9）

2010年（2）

2009年（10）

2008年（22）

我的朋友

转载自http://blog.sina.com.cn/s/blog_4a45f01301013ndt.html

似然比(likelihood ratio, LR) 是反映真实性的一种指标，属于同时反映灵敏度和特异度的复合指标。即有病者中得出某一筛检试验结果的概率与无病者得出这一概率的比值。

　　该指标全面反映筛检试验的诊断价值，且非常稳定。似然比的计算只涉及到灵敏度与特异度，不受患病率的影响。

　　因检验结果有阳性与阴性之分，似然比可相应地区分为阳性似然比(positive likelihood ratio, ＋LR)和阴性似然比(negative likelihood ratio, －LR)。

　　阳性似然比是筛检结果的真阳性率与假阳性率之比。说明筛检试验正确判断阳性的可能性是错误判断阳性可能性的倍数。比值越大，试验结果阳性时为真阳性的概率越大。

　　用诊断试验检测经诊断金标准确诊的患病人群的阳性率(a/(a+c))与以金标准排除诊断的受试者中试验阳性即假阳性率(b/(b+d))之间的比值.

　　因真阳性率即为敏感性，假阳性率与特异性成互补关系，所以，也可表示成敏感性与(1－特异性)之比：

　　LR= [a/(a+c)]÷[b/(b+d)]=Sen/1－Spe

　　Sen:敏感性；　Spe:特异性　；a:真阳性；b:假阳性；c:假阴性；d:真阴性

　　阴性似然比是筛检结果的假阴性率（1-Se）与真阴性率（Sp）之比。表示错误判断阴性的可能性是正确判断阴性可能性的倍数。其比值越小，试验结果阴性时为真阴性的可能性越大。

　　似然比检验（LRT）用来评估两个模型中那个模型更适合当前数据分析。具体来说，一个相对复杂的模型与一个简单模型比较，来检验它是不是能够显著地适合一个特定的数据集。如果可以，那么这个复杂模型的附加参数能够用在以后的数据分析中。LRT应用的一个前提条件是这些待比较的模型应该是分级的巢式模型。具体来讲，是说相对于简单模型，复杂模型仅仅是多了一个或者多个附加参数。增加模型参数必定会导致高似然值成绩。因此根据似然值的高低来判断模型的适合度是不准确的。LRT提供了一个客观的标准来选择合适的模型。LRT检验的公式

LR = 2*(lnL1-lnL2)其中L1为复杂模型最大似然值，L2为简单标准模型最大似然值LR近似的符合卡方分布。为了检验两个模型似然值的差异是否显著，我们必须要考虑自由度。LRT检验中，自由度等于在复杂模型中增加的模型参数的数目。这样根据卡方分布临界值表，我们就可以判断模型差异是否显著。更多的参考资料：The LRT is explained in more detail by Felsenstein (1981),Huelsenbeck and Crandall (1997), Huelsenbeck and Rannala (1997), and Swofford et al. (1996). While the focus of this page is using the LRT to compare two competing models, under some circumstances one can compare two competing trees estimated using the same likelihood model. There are many additional considerations (e.g., see Kishino and Hasegawa 1989, Shimodaira and Hasegawa 1999, andSwofford et al. 1996).

Likelihood ratio may refer to:

, a statistical test for comparing two models.
, ratios based on sensitivity and specificity, used to assess diagnostic tests.
, ratio of likelihoods used to update prior probabilities to posterior in and .

In , a likelihood ratio test is a used to compare the fit of two models, one of which (the model) is a special case of the other (the model). The test is based on the ratio, which expresses how many times more likely the data are under one model than the other. This likelihood ratio, or equivalently its , can then be used to compute a , or compared to a to decide whether to reject the null model in favour of the alternative model.

Use

Each of the two competing models, the null model and the alternate model, is separately fitted to the data and the log- recorded. The test statistic (usually denoted D) is twice the difference in these log-likelihoods:

$.begin{align} D & = -2.ln.left( .frac{.text{likelihood for null model}}{.text{likelihood for alternative model}} .right) . &= -2.ln(.text{likelihood for null model}) + 2.ln(.text{likelihood for alternative model})] . .end{align}$

The model with more parameters will always fit at least as well (have a greater log-likelihood). Whether it fits significantly better and should thus be preferred is determined by deriving the probability or of the difference D. In many cases, the of the is approximately a with equal to df2 ? df1 , if the nested model with fewer parameters is correct. Symbols df1 and df2 represent the number of free parameters of models 1 and 2, the null model and the alternate model, respectively. The test requires nested models, that is: models in which the more complex one can be transformed into the simpler model by imposing a set of constraints on the parameters.^{need examples or references to describe}

For example: if model 1 has 1 free parameter and a log-likelihood of ?8024 and the alternative model has 3 degrees of freedom and a LL of ?8012, then the probability of this difference is that of chi-square value of +2·(8024 ? 8012) = 24 with 3 ? 1 = 2 degrees of freedom. Certain assumptions must be met for the statistic to follow a chi-squared distribution and often empirical p-values are computed.

Background

The likelihood ratio, often denoted by Λ (the capital ), is the ratio of the varying the parameters over two different sets in the numerator and denominator. A likelihood-ratio test is a statistical test for making a decision between two hypotheses based on the value of this ratio.

It is central to the – approach to statistical hypothesis testing, and, like statistical hypothesis testing generally, is both widely used and much criticized; see , below.

Simple-versus-simple hypotheses

A statistical model is often a of or f(x | θ). A simple-vs-simple hypotheses test has completely specified models under both the and hypotheses, which for convenience are written in terms of fixed values of a notional parameter θ:

$.begin{align} H_0 &:& .theta=.theta_0 ,. H_1 &:& .theta=.theta_1 . .end{align}$

Note that under either hypothesis, the distribution of the data is fully specified; there are no unknown parameters to estimate. The likelihood ratio test statistic can be written as:

$.Lambda(x) = .frac{ L(.theta_0|x) }{ L(.theta_1|x) } = .frac{ f(x|.theta_0) }{ f(x|.theta_1) }$

$.Lambda(x)=.frac{L(.theta_0.mid x)}{.sup.{.,L(.theta.mid x):.theta.in.{.theta_0,.theta_1.}.}},$

where L(θ | x) is the . Note that some references may use the reciprocal as the definition. In the form stated here, the likelihood ratio is small if the alternative model is better than the null model and the likelihood ratio test provides the decision rule as:

If Λ > c, do not reject H₀; If Λ < c, reject H₀; Reject with probability q if Λ = c.

The values c, .; q are usually chosen to obtain a specified α, through the relation: $q\cdot P(\Lambda=c \;|\; H_0) + P(\Lambda . The</span></span> <span style=$ states that this likelihood ratio test is the among all level-α tests for this problem.

Definition (likelihood ratio test for composite hypotheses)

A null hypothesis is often stated by saying the parameter θ is in a specified subsetΘ₀ of the parameter space Θ.

$.begin{align} H_0 &:& .theta .in .Theta_0. H_1 &:& .theta .in .Theta_0^{.complement} .end{align}$

The is L(θ | x) = f(x | θ) (with f(x | θ) being the pdf or pmf) is a function of the parameter θ with x held fixed at the value that was actually observed, i.e., the data. The likelihood ratio test statistic is

$.Lambda(x)=.frac{.sup.{.,L(.theta.mid x):.theta.in.Theta_0.,.}}{.sup.{.,L(.theta.mid x):.theta.in.Theta.,.}}.$

Here, the .sup notation refers to the function.

A likelihood ratio test is any test with critical region (or rejection region) of the form .{x|.Lambda .le c.} where c is any number satisfying 0.le c.le 1 . Many common test statistics such as the , the , and the are tests for nested models and can be phrased as log-likelihood ratios or approximations thereof.

Interpretation

Being a function of the data x, the LR is therefore a . The likelihood-ratio test rejects the null hypothesis if the value of this statistic is too small. How small is too small depends on the significance level of the test, i.e., on what probability of is considered tolerable ("Type I" errors consist of the rejection of a null hypothesis that is true).

The corresponds to the maximum probability of an observed outcome under the. The corresponds to the maximum probability of an observed outcome varying parameters over the whole parameter space. The numerator of this ratio is less than the denominator. The likelihood ratio hence is between 0 and 1. Lower values of the likelihood ratio mean that the observed result was much less likely to occur under the null hypothesis as compared to the alternate. Higher values of the statistic mean that the observed outcome was more than or equally likely or nearly as likely to occur under the null hypothesis as compared to the alternate, and the null hypothesis cannot be rejected.

Distribution: Wilks' theorem

If the distribution of the likelihood ratio corresponding to a particular null and alternative hypothesis can be explicitly determined then it can directly be used to form decision regions (to accept/reject the null hypothesis). In most cases, however, the exact distribution of the likelihood ratio corresponding to specific hypotheses is very difficult to determine. A convenient result, due to , says that as the sample size n approaches , the test statistic ? 2log(Λ) for a nested model will be asymptotically with equal to the difference in dimensionality of Θ and Θ₀. This means that for a great variety of hypotheses, a practitioner can compute the likelihood ratio Λ for the data and compare ? 2log(Λ) to the chi squared value corresponding to a desired as an approximate statistical test.

Examples

Coin tossing

An example, in the case of Pearson's test, we might try to compare two coins to determine whether they have the same probability of coming up heads. Our observation can be put into a contingency table with rows corresponding to the coin and columns corresponding to heads or tails. The elements of the contingency table will be the number of times the coin for that row came up heads or tails. The contents of this table are our observation X.

	Heads	Tails
Coin 1	k_1H	k_1T
Coin 2	k_2H	k_2T

Here Θ consists of the parameters p_1H, p_1T, p_2H, and p_2T, which are the probability that coin 1 (2) comes up heads (tails). The hypothesis space H is defined by the usual constraints on a distribution, $0 .le p_{ij} .le 1$ , and p_iH + p_iT = 1. The null hypothesis H₀is the sub-space where p_1j = p_2j. In all of these constraints, i = 1,2 and j = H,T.

Writing n_ij for the best values for p_ij under the hypothesis H, maximum likelihood is achieved with

$n_{ij} = .frac{k_{ij}}{k_{iH}+k_{iT}}.$

Writing m_ij for the best values for p_ij under the null hypothesis H₀, maximum likelihood is achieved with

$m_{ij} = .frac{k_{1j}+k_{2j}}{k_{1H}+k_{2H}+k_{1T}+k_{2T}},$

which does not depend on the coin i.

The hypothesis and null hypothesis can be rewritten slightly so that they satisfy the constraints for the logarithm of the likelihood ratio to have the desired nice distribution. Since the constraint causes the two-dimensional H to be reduced to the one-dimensional H₀, the asymptotic distribution for the test will be χ²(1), the χ²distribution with one degree of freedom.

For the general contingency table, we can write the log-likelihood ratio statistic as

$-2 .log .Lambda = 2.sum_{i, j} k_{ij} .log .frac{n_{ij}}{m_{ij}}.$

Criticism

criticisms of classical likelihood ratio tests focus on two issues:^[]

the function in the calculation of the likelihood ratio, saying that this takes no account of the uncertainty about θ and that using maximum likelihood estimates in this way can promote complicated alternative hypotheses with an excessive number of free parameters;
testing the probability that the sample would produce a result as extreme or more extreme under the null hypothesis, saying that this bases the test on the probability of extreme events that did not happen.

Instead they put forward methods such as , which explicitly take uncertainty about the parameters into account, and which are based on the evidence that did occur.

A frequentist reply to this critique is that likelihood ratio tests provide a practicable approach to statistical inference – they can easily be computed, by contrast to Bayesian posterior probabilities, which are more computationally intensive.^[] The Bayesian reply to the latter is that computers obviate any such advantage.

References

Mood, Duane C. Boes; Introduction to theory of Statistics (page 410)
Cox, D. R. and Hinkley, D. VTheoretical Statistics, Chapman and Hall, 1974. (page 92)
George Casella, Roger L. BergerStatistical Inference, Second edition (page 375)
(1938). "The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses". The Annals of Mathematical Statistics 9: 60–62.:.

Likelihood ratios in diagnostic testing

In , likelihood ratios are used for assessing the value of performing a diagnostic test. They use the of the test to determine whether a test result usefully changes the probability that a condition (such as a disease state) exists.

Calculation

Two versions of the likelihood ratio exist, one for positive and one for negative test results. Respectively, they are known as the likelihood ratio positive (LR+) andlikelihood ratio negative (LR–).

The likelihood ratio positive is calculated as

$LR+ = .frac{.text{sensitivity}}{1 - .text{specificity}}$

which is equivalent to

$LR+ = .frac{.Pr({T+}|D+)}{.Pr({T+}|D-)}$

or "the probability of a person who has the disease testing positive divided by the probability of a person who does not have the disease testing positive." Here "T+" or "T?" denote that the result of the test is positive or negative, respectively. Likewise, "D+" or "D?" denote that the disease is present or absent, respectively. So "true positives" are those that test positive (T+) and have the disease (D+), and "false positives" are those that test positive (T+) but do not have the disease (D?).

The likelihood ratio negative is calculated as

$LR- = .frac{1 - .text{sensitivity}}{.text{specificity}}$

which is equivalent to

$LR- = .frac{.Pr({T-}|D+)}{.Pr({T-}|D-)}$

or "the probability of a person who has the disease testing negative divided by the probability of a person who does not have the disease testing negative." These formulae and formulae for approximate confidence intervals can be found in Altman et al.

The of a particular diagnosis, multiplied by the likelihood ratio, determines the . This calculation is based on . (Note that odds can be calculated from, and then converted to, .)

[] Application to medicine

A likelihood ratio of greater than 1 indicates the test result is associated with the disease. A likelihood ratio less than 1 indicates that the result is associated with absence of the disease. Tests where the likelihood ratios lie close to 1 have little practical significance as the post-test probability (odds) is little different from the pre-test probability, and as such is used primarily for diagnostic purposes, and not screening purposes. When the positive likelihood ratio is greater than 5 or the negative likelihood ratio is less than 0.2 (i.e. 1/5) then they can be applied to the pre-test probability of a patient having the disease tested for to estimate a post-test probability of the disease state existing. A positive result for a test with an LR of 8 adds approximately 40% to the pre-test probability that a patient has a specific diagnosis. In summary, the pre-test probability refers to the chance that an individual has a disorder or condition prior to the use of a diagnostic test. It allows the clinician to better interpret the results of the diagnostic test and helps to predict the likelihood of a true positive (T+) result.

Research suggests that physicians rarely make these calculations in practice, however, and when they do, they often make errors. A compared how well physicians interpreted diagnostic tests that were presented as either and , a likelihood ratio, or an inexact graphic of the likelihood ratio, found no difference between the three modes in interpretation of test results.

[] Example

A medical example is the likelihood that a given test result would be expected in a patient with a certain disorder compared to the likelihood that same result would occur in a patient without the target disorder.

Some sources distinguish between LR+ and LR?. A worked example is shown below.

Relationships among terms · talk ·

		Condition (as determined by "")
		Positive	Negative
Test outcome	Positive	True Positive	False Positive ()	→ = $.,.!.tfrac{.Sigma.text{ True Positive}}{.Sigma.text{ Test outcome Positive}}$
Test outcome	Negative	False Negative ()	True Negative	→ = $.,.!.tfrac{.Sigma.text{ True Negative}}{.Sigma.text{ Test outcome Negative}}$
		↓ = $.,.!.tfrac{.Sigma.text{ True Positive}}{.Sigma.text{ Condition Positive}}$	↓ = $.,.!.tfrac{.Sigma.text{ True Negative}}{.Sigma.text{ Condition Negative}}$

A worked example The fecal occult blood (FOB) screen test was used in 2030 people to look for bowel cancer:

		Patients with (as confirmed on )
		Positive	Negative
Fecal occult blood screen test outcome	Positive	True Positive (TP) = 20	False Positive (FP) = 180	→ Positive predictive value = TP / (TP + FP) = 20 / (20 + 180) = 20 / 200 = 10%
Fecal occult blood screen test outcome	Negative	False Negative (FN) = 10	True Negative (TN) = 1820	→ Negative predictive value = TN / (FN + TN) = 1820 / (10 + 1820) = 1820 / 1830 ≈ 99.5%
		↓ Sensitivity = TP / (TP + FN) = 20 / (20 + 10) = 20 / 30 ≈ 66.67%	↓ Specificity = TN / (FP + TN) = 1820 / (180 + 1820) = 1820 / 2000 = 91%

Related calculations

False positive rate (α) = FP / (FP + TN) = 180 / (180 + 1820) = 9% = 1 ? specificity
False negative rate (β) = FN / (TP + FN) = 10 / (20 + 10) = 33% = 1 ? sensitivity
= sensitivity = 1 ? β
Likelihood ratio positive = sensitivity / (1 ? specificity) = 66.67% / (1 ? 91%) = 7.4
Likelihood ratio negative = (1 ? sensitivity) / specificity = (1 ? 66.67%) / 91% = 0.37

Hence with large numbers of false positives and few false negatives, a positive FOB screen test is in itself poor at confirming cancer (PPV = 10%) and further investigations must be undertaken, it did, however, correctly identify 66.7% of all cancers (the sensitivity). However as a screening test, a negative result is very good at reassuring that a patient does not have cancer (NPV = 99.5%) and at this initial screen correctly identifies 91% of those who do not have cancer (the specificity).

[] Estimation pre- and post-test probability

Further information:

The likelihood ratio of a test provides a way to estimate the of having a condition.

With pre-test probability and likelihood ratio given, then, the post-test probabilities can be calculated by the following three steps:

Pretest odds = (Pretest probability / (1 - Pretest probability)
Posttest odds = Pretest odds * Likelihood ratio

In equation above, positive post-test probability is calculated using the likelihood ratio positive, and the negative post-test probability is calculated using thelikelihood ratio negative.

Posttest probability = Posttest odds / (Posttest odds + 1)

In fact, post-test probability, as estimated from the likelihood ratio and pre-test probability, is generally more accurate than if estimated from the positive predictive value of the test, if the tested individual has a different pre-test probability than what is the prevalence of that condition in the population.

[] Example

Taking the medical example from above, the positive pre-test probability is calculated as:

Pretest probability = (2 + 1) / 203 = 0.0148
Pretest odds = 0.0148 / (1 - 0.0148) =0.015
Posttest odds = 0.015 * 7.4 = 0.111
Posttest probability = 0.111 / (0.111 + 1) =0.1 or 10%

As demonstrated, the positive post-test probability is numerically equal to thepositive predictive value, equivalent to negative post-test probability being numerically equal to negative predictive value.

[] References

Gardner, M.; Altman, Douglas G. (2000). Statistics with confidence: confidence intervals and statistical guidelines. London: BMJ Books. .
Beardsell A, Bell S, Robinson S, Rumbold H. MCEM Part A:MCQs, Royal Society of Medicine Press 2009
McGee S (August 2002). "Simplifying likelihood ratios". J Gen Intern Med 17 (8): 646–9. . .
Harrell F, Califf R, Pryor D, Lee K, Rosati R (1982). "Evaluating the Yield of Medical Tests". JAMA 247 (18): 2543–2546. :. .
Reid MC, Lane DA, Feinstein AR (1998). "Academic calculations versus clinical judgments: practicing physicians’ use of quantitative measures of test accuracy".Am. J. Med. 104 (4): 374–80. :. .
Steurer J, Fischer JE, Bachmann LM, Koller M, ter Riet G (2002). "Communicating accuracy of tests to general practitioners: a controlled study". BMJ 324 (7341): 824–6. :. . .
Puhan MA, Steurer J, Bachmann LM, ter Riet G (2005). "A randomized trial of ways to describe test accuracy: the effect on physicians' post-test probability estimates". Ann. Intern. Med. 143 (3): 184–9. .
.. Retrieved 2009-04-04.
, from CEBM (Centre for Evidence-Based Medicine). Page last edited: 01 February 2009

Bayes factor

In , the use of Bayes factors is a alternative to classical. Bayesian model comparison is a method of based on Bayes factors.

Definition

The posterior probability of a model M, given data D, Pr(M | D), is given by :

$.Pr(M|D) = .frac{.Pr(D|M).Pr(M)}{.Pr(D)}.$

The key data-dependent term Pr(D | M) is a , and is sometimes called the evidence for model or hypothesis, M; evaluating it correctly is the key to Bayesian model comparison. The evidence is usually the or of another inference, namely the inference of the parameters of model M given the data D.

Given a problem in which we have to choose between two models, on the basis of observed data D, the plausibility of the two different models M₁ and M₂, parametrised by model parameter vectors θ₁ and θ₂ is assessed by the Bayes factor Kgiven by

$K = .frac{.Pr(D|M_1)}{.Pr(D|M_2)} = .frac{.int .Pr(.theta_1|M_1).Pr(D|.theta_1,M_1).,d.theta_1} {.int .Pr(.theta_2|M_2).Pr(D|.theta_2,M_2).,d.theta_2 }.$

where p(D | M_i) is called the for model i.

If instead of the Bayes factor integral, the likelihood corresponding to the of the parameter for each model is used, then the test becomes a classical .^[] Unlike a likelihood-ratio test, this Bayesian model comparison does not depend on any single set of parameters, as it integrates over all parameter in each model (with respect to the respective priors). However, an advantage of the use of Bayes factors is that it automatically, and quite naturally, includes a penalty for including too much model structure. It thus guards against . For models where an explicit version of the likelihood is not available or too costly to evaluate numerically, can be used for model selection in a Bayesian framework.

Other approaches are:

to treat model comparison as a , computing the expected value or cost of each model choice;
to use ().

[] Interpretation

A value of K > 1 means that the data indicate that M₁ is more strongly supported by the data under consideration than M₂. Note that classical gives one hypothesis (or model) preferred status (the 'null hypothesis'), and only considers evidence against it. gave a scale for interpretation of K:

K	dB	bits	Strength of evidence
< 1:1	< 0		Negative (supports M₂)
1:1 to 3:1	0 to 5	0 to 1.6	Barely worth mentioning
3:1 to 10:1	5 to 10	1.6 to 3.3	Substantial
10:1 to 30:1	10 to 15	3.3 to 5.0	Strong
30:1 to 100:1	15 to 20	5.0 to 6.6	Very strong
>100:1	>20	>6.6	Decisive

The second column gives the corresponding weights of evidence in (tenths of a power of 10); are added in the third column for clarity. According to a change in a weight of evidence of 1 deciban or 1/3 of a bit (i.e. a change in an odds ratio from evens to about 5:4) is about as finely as can reasonably perceive their in a hypothesis in everyday use.

The use of Bayes factors or classical hypothesis testing takes place in the context of rather than . That is, we merely wish to find out which hypothesis is true, rather than actually making a decision on the basis of this information. draws a strong distinction between these two because classical hypothesis tests are not in the Bayesian sense. Bayesian procedures, including Bayes factors, are coherent, so there is no need to draw such a distinction. Inference is then simply regarded as a special case of decision-making under uncertainty in which the resulting action is to report a value. For decision-making, Bayesian statisticians might use a Bayes factor combined with a and a associated with making the wrong choice. In an inference context the loss function would take the form of a . Use of a for example, leads to the expected taking the form of the .

[] Example

Suppose we have a which produces either a success or a failure. We want to compare a model M₁ where the probability of success is q = ?, and another model M₂where q is completely unknown and we take a for q which is on [0,1]. We take a sample of 200, and find 115 successes and 85 failures. The likelihood can be calculated according to the :

${{200 .choose 115}q^{115}(1-q)^{85}}.$

So we have

$P(X=115|M_1)={200 .choose 115}.left({1 .over 2}.right)^{200}=0.005956...,.,$

but

$P(X=115|M_2)=.int_{0}^1{200 .choose 115}q^{115}(1-q)^{85}dq = {1 .over 201} = 0.004975....,.$

The ratio is then 1.197..., which is "barely worth mentioning" even if it points very slightly towards M₁.

This is not the same as a classical likelihood ratio test, which would have found the estimate for q, namely ¹¹⁵?₂₀₀ = 0.575, and used that to get a ratio of 0.1045... (rather than averaging over all possible q), and so pointing towards M₂. Alternatively, 's "exchange rate"^[] of two units of likelihood per degree of freedom suggests that M₂ is preferable (just) to M₁, as $0.1045.ldots = e^{-2.25.ldots}$ and 2.25 > 2: the extra likelihood compensates for the unknown parameter in M₂.

A of M₁ (here considered as a ) would have produced a more dramatic result, saying that M₁ could be rejected at the 5% significance level, since the probability of getting 115 or more successes from a sample of 200 if q = ? is 0.0200..., and as a two-tailed test of getting a figure as extreme as or more extreme than 115 is 0.0400... Note that 115 is more than two standard deviations away from 100.

M₂ is a more complex model than M₁ because it has a free parameter which allows it to model the data more closely. The ability of Bayes factors to take this into account is a reason why has been put forward as a theoretical justification for and generalisation of , reducing .

[] See also

Schwarz's
's (MML)

Statistical ratios

[] References

Goodman S (1999). (PDF). Ann Intern Med 130 (12): 995–1004. ..
Goodman S (1999). (PDF). Ann Intern Med 130 (12): 1005–13. ..
Robert E. Kass and Adrian E. Raftery (1995) "Bayes Factors", Journal of the American Statistical Association, Vol. 90, No. 430, p. 791.
Toni, T.; Stumpf, M.P.H. (2009). (PDF). Bioinformatics 26 (1): 104.:. . ..
H. Jeffreys, The Theory of Probability (3e), Oxford (1961); p. 432
(1979). "Studies in the History of Probability and Statistics. XXXVII A. M. Turing's statistical work in World War II". 66 (2): 393–396.:. .

Gelman, A., Carlin, J.,Stern, H. and Rubin, D. Bayesian Data Analysis. Chapman and Hall/CRC.(1995)
Bernardo, J., and Smith, A.F.M., Bayesian Theory. John Wiley. (1994)
Lee, P.M. Bayesian Statistics. Arnold.(1989).
Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M., Bayesian Methods for Nonlinear Classification and Regression. John Wiley. (2002).
Richard O. Duda, Peter E. Hart, David G. Stork (2000) Pattern classification (2nd edition), Section 9.6.5, p. 487-489, Wiley,
Chapter 24 in by , 1994.
(2003) Information theory, inference and learning algorithms, CUP, , (also )
Winkler, Robert, Introduction to Bayesian Inference and Decision, 2nd Edition(2003), Probabilistic. .

信号检测和估计

在有噪声的通信和控制系统中接收端对收到的受干扰的信号用统计推断理论来判断信号的存在和估计信号的参数。在接收端对收到的受干扰的信号时利用信号概率和噪声功率等信息按照一定的准则判定信号的存在，称为信号检测。在接收端利用收到的受干扰的发送信号序列尽可能精确地估计该发送信号的某些参数值（如振幅、频率、相位、时延和波形等），称为信号估计或参数估计。
　　发展概况　1942年提出最优平滑滤波器和预测滤波器，建立维纳滤波理论，为信号检测和估计奠定了理论基础。维纳的这项研究成果，直到1949年才正式公布。由于需要的存储量和计算量极大，很难进行实时处理，因而限制了它的应用和发展。1943年D.O.诺思提出最大输出信噪比的匹配滤波器理论。1946年苏联学者 B.A.科捷利尼科夫提出潜在抗干扰理论，以最大后验概率为准则构造理想接收机，再将实际接收机与理想接收机进行比较，找出可挖掘的潜力。1950年 P.M.伍德沃德将信息量的概念用于雷达信号检测，提出一系列综合最优雷达系统的新概念。
　　1953年D.米德尔登和W.W.彼得森等人将统计假设检验和统计推断理论等数理统计方法用于信号检测，建立统计检测理论。1960～1961年和R. .布什提出递推滤波器。将状态变量法引入滤波理论。不要求保存过去的测量数据，当新的数据测得之后，根据新的数据和前一时刻诸量的估值，借助于系统本身的状态转移方程，按照递推公式，即可算出新的诸量的估值。这就大大减小了滤波器的存储器和计算量，突破了平稳随机过程的约束，便于实时处理。1965年以来信号估计广泛采用自适应滤波器。它在数字通信、语言处理和消除周期性干扰等方面，已取得良好的效果。
　　信号检测　由于许多实际的通信和控制问题都具有二元的性质，可把收到的信号划分为1或0，所以信号检测问题主要就是根据收到的信号在两个假设之中选择其中一个假设的问题。为了形成最优推断程序，应假定每个正确的或错误的推断代表接收端观察者的得益或损失，称为损失函数,记作C_ij。这里j是信号的实际值，i是信号的推断值,C_ij表示信号的实际值是j而推断值为i时的得益或损失。常用的信号检测方法有参数检测法、非参数检测法、鲁棒检测法和自适应检测法等。①参数检测法：这种方法是根据噪声的概率分布和一些统计推断准则来设计最优检测器。图1是白噪声条件下最优检测器的原理图。图中 h_i(t)是与信号s_i匹配的滤波器的冲激响应,ξ_i是匹配滤波器的偏置,它与噪声的统计特性、信号的能量、发送信号的先验概率P_i和损失函数C_ij 有关。推断准则不同,仅偏置略有不同。最常用的统计推断准则有贝叶斯准则、反概率最大准则、最大似然比准则、最大最小准则和奈曼－皮尔逊准则。 ②非参数检测法：这种方法用于噪声的统计特性基本上未知的情况，只要求噪声分布函数为连续的一般性条件。它是一种比较保守的信号检测方法。③鲁棒检测法：它是上述两种方法的折衷。它假定噪声分布符合某种统计规律，但又不致使模型的约束条件过严。④自适应检测法：如果噪声是时变的非白色噪声，则需要对冲激响应h_i(t)和偏置ξ_i进行自适应调整。自适应最优检测器可适应具有不同概率分布的时变噪声，但结构复杂，宜用大规模集成电路来实现。
　　信号估计　在通信和控制中常常需要利用受干扰的发送信号序列来尽可能精确地估计该发送信号的某些参量值（如幅度、相位、频率、时延、甚至波形）。信号估计问题主要是求最优估计算子，即设计一个能处理各种观察数据而产生最优估计的滤波器。滤波器的期望输出就是信号的估值，它可以是信号本身，也可以是信号的延迟或导前，这就是滤波、平滑和预测问题。通常把信号估计分为两大类，有条件的和无条件的。无条件估计算子不需要利用发送信号先验概率的知识，即认为先验概率密度分布是均匀的。条件估计算子则需要利用发送信号的概率密度分布的知识。评价信号估计的准则最常用的是均方误差最小准则。图2示出信号估值慗_i(t)与最优滤波器输出x₀(t) 的波形。如果均方误差最小,则最优滤波器输出的波形最接近于信号估值的波形。常用的信号估计方法有维纳滤波器、卡尔曼-布什滤波器、自适应滤波器、相关估计法和无偏估计法等。①维纳滤波器：维纳根据均方误差最小准则导出最优滤波器。维纳滤波器的输出即为信号估值。②卡尔曼-布什滤波器:如果对于受干扰的信号流x(t)第一次处理0～m个样本,第二次处理1～m＋1个样本，第三次处理2～m＋2 个样本,以此类推,并把每相邻两次处理的样本有机地联系起来，就可以利用存储容量有限的计算机处理延续时间很长的信号流。③自适应滤波器：有时变噪声时维纳滤波器就不是最优滤波器。此时可根据噪声的分布特性对加权矩阵进行自适应调节来改善滤波器的性能。自适应滤波器不仅适用于时域和频域,而且可以推广到空间滤波。④相关估计法,有平稳白噪声时匹配滤波器可输出最大信噪比，相当于时域相关器。如果只对信号的某一参数（如时延、相位或频率）进行估值，则可在接收机中设计一个与发送信号相同的本地信号，并使它对应于发送信号中要估计的参数是可知的。把收到的受干扰的信号与本地信号作相关处理,待相关器输出最大值,则本地信号的这一已知参数就是要估计的参数值。这种方法实现简单，虽对噪声的分布特性约束过严,但在远程防空雷达和深空通信中,允许对噪声的分布特性作出假设,因此应用十分广泛。相关估计法实质上是根据反概率最大准则来进行估计的。⑤无偏估计法：指产生的各个估计的平均值等于待估计的参数的真值,无偏估计法就是设计一个无偏估计算子。现代通信中的调制方法,火箭和导弹的制导,以及大多数工业控制问题，都涉及到非线性的消息模型，因而迫切需要解非线性滤波问题。非线性滤波可以从修正的富克－普朗克方程得到均值估计的精确表达式，但工程应用十分因难，只能采用近似方法。

大样本统计

研究样本大小n趋于无限时,和相应的统计方法的极限性质(又称渐近性质)，并据以构造具有特定极限性质的统计方法。例如,用均值大样本统计估计均值θ，在n→时，以概率1收敛于θ(见),称为θ的强相合估计。的这个性质只有在n→时才有意义，这叫做大样本性质,而强相合性的研究属于大样本统计的范围。根据统计量的极限性质而得出的统计方法称为大样本方法。例如:设X₁,X₂,…,X_n是从正态总体N(μ,σ²)中抽出的样本，μ和σ未知，要作μ的。记样本方差为当依分布收敛于标准正态分布N（0,1）。基于这个性质可知, 当n较大时,可用大样本统计作为 μ 的区间估计，其中是标准正态分布的上分位数（见）；这个估计的置信系数当n→时趋于指定的 1-α(0<α<1)。这就是一个大样本方法。
　　与大样本性质和大样本方法相对，小样本性质是指在样本大小n固定时统计方法的性质,小样本方法是指基于n固定时的统计量性质的统计方法。如上述第一例,当n固定时有E=θ,即为θ的无偏估计(见);的这个性质在n固定时有意义,所以是小样本性质。又如，英国统计学家W.S.戈塞特(又译哥色特，笔名“学生”)在1908年找到了的精确分布为自由度是n-1的t分布(见)。基于此事实,可知对任何固定的n,μ的区间估计具有确切的置信系数1-α。其中是自由度为n-1的 t分布上分位数。这个性质对任何固定的 n都成立。因而上述区间估计是小样本方法。总之，区分大、小样本性质（或方法）的关键在于样本大小 n是趋于无限还是固定，而不在于n数值的大小。
　　小样本方法也称为“精确方法”，因为它往往是基于有关统计量的精确分布（如前例中的t分布）;与此相应，小样本方法的统计特性，如显著性水平（见）、置信系数(见)等，往往是精确而非近似的。与此相对，大样本方法也称为“渐近方法”或“近似方法”，因为它是基于统计量的渐近分布，且有关的统计特性只是近似而非精确的。在应用中,样本大小n总是一个有限数，这里就有一个近似程度如何的问题。如在对N（μ,σ²）中的μ作区间估计的例子中，指定的置信系数为0.95，按大样本理论作出区间估计当n→时,其置信系数趋于0.95,但即使n很大,置信系数也只是接近而非确切等于0.95。为了在使用它时做到心中有数，需要在n固定的情况下,对真实的置信系数与其近似值0.95的差距作出有用的估计，在大样本方法的使用中，一般都存在此问题。但由于数学上的困难，目前使用的许多大样本方法中，通常很少有有效的误差估计，这是大样本方法的弱点。然而它仍有重要的理论和实际意义：它不仅提供了一批可供选用的统计方法，而且，经验证明，当一个统计方法不具备某些基本的大样本性质（如相合性）时，常常也很难有良好的小样本性质。评价一个统计方法的优良性时，大样本性质是不可忽视的。
　　相合性，是一项重要的大样本性质。一般地说，统计方法的相合性是指：只要样本大小n足够大,则使用这个统计方法时，可以用任意确切的程度回答所提出的统计推断问题。例如,估计的相合性是表示,当n→时,估计量在一定意义下，如依概率收敛或几乎必然收敛或以r阶平均收敛 (见)于被估计值。检验的相合性是指它在任意指定的备择假设处的功效当 n→时趋于 1。相合性是最基本也是最容易满足的大样本性质。还有渐近无偏性、渐近有效性（见）、和渐近正态性，或更一般地，渐近于某种特殊的极限分布的性质，也都是重要的大样本性质。
　　大样本统计的发展，依赖于的极限理论，它在一定程度上已构成概率论极限理论的一个方面。1900年K.皮尔森证明了关于拟合优度的ⅹ²统计量的分布渐近于ⅹ²分布的著名定理，可以作为大样本理论的发端。更早一些，在概率论中就证明了关于二项分布渐近于正态分布的定理，这个定理也可用于大样本统计方法（求二项分布参数的大样本区间估计），但习惯上把这定理看作是纯粹概率论的定理。自1900年以后，特别是二次大战后的30多年中，大样本理论发展很快，达到了相当深入的地步，重要的结果有：关于拟合优度的ⅹ²检验渐近于ⅹ²分布的理论，最大似然估计及一般渐近有效估计的理论，似然比检验及一般渐近有效估计的理论，稳健估计大样本理论以及中大量的大样本理论。现在，大样本理论在数理统计学中仍是一个活跃的研究方面。（见、、）

参考书目
　J. Serfling,ApproxiMation Theorems in MatheMatical Statistics, John Wiley & Sons, New York,1980.

数据的统计处理方法

由于测量的偶然误差以及被研究的物理现象本身的随机性质，实验观测数据是由带有偶然性的一些随机数据组成的。实验数据处理的任务是，由测得的有限个随机数据（观测值随机变量的一个样本），推断被测定物理量的数值，或物理量之间的函数关系或被研究的物理现象的其他规律性。数据处理必须应用以随机量为研究对象的统计数学方法，主要是概率论、数理统计学和随机过程理论的数学方法。在粒子物理实验中，由于物理现象本身固有的随机性质很突出，物理现象的规律性往往被所测数据表面上的偶然性所掩盖，所以选择适当的统计方法进行数据处理就更为重要。数据处理中最常用的统计方法有参量估计、假设检验、拟合以及蒙特－卡罗模拟等。
　　参量估计　被测定的物理量常常是观测值所服从的统计分布中的参量。例如，稳定粒子的寿命是其生存时间观测值所服从的指数分布中的参量，的质量和寿命是其衰变产物系统不变质量所服从的布赖特－维格纳(Breit-Wigner)分布中的参量。由观测数据推断物理量的数值，需要应用数理统计学中的参量估计方法。
　　最大似然法是估计分布参量值的一个最常用的方法。若观测值x服从概率密度函数为p(x；θ)的一个统计分布，分布参量θ为待测定的物理量，进行N次独立测量得到一组观测值x₁、x₂、…、x_N，似然函数是在参量取某特定值 θ的条件下出现该组观测数据的概率，最大似然法选择使似然函数取最大值的参量值作为特定参量θ的估计值：

估计值的误差用一定置信水平下的置信区间表示（见）。置信区间可根据估计值的分布性质用区间估计的方法定出，利用参量估计的置信分布方法，可以得到被估物理量的一个完整的概率推断，即该物理量的置信分布。通常的测量误差处理是参量估计的一个特殊情况：观测值服从正态分布，被测物理量的真值是观测值正态分布的期待值，其最大似然估计值是观测值的算术平均值，平均值左右一倍标准误差区间的置信水平为68.3％。
　　在被测物理量θ是一个随机变量,并且已知它的概率分布p(θ)（验前分布）的情况下，利用贝叶斯公式可以从观测值x₁、x₂、…、x_N得出关于被测物理量数值的一个更精密的推断,即该物理量的验后分布，参量θ的验后分布的概率密度为

任意区间【θ₁,θ₂】的置信水平为

。

　　拟合　拟合是寻求被观测物理量之间的函数关系的统计数学方法，又叫做观测数据的平滑，设y和x都是被观测的物理量，并且y是x的函数，函数关系由理论公式y＝f(x;c)表示,式中c＝(с₁,с₂,…,с_m)为m个待定的参量，拟合的任务是由测得的N对观测值、、…、推断理论公式中的未知参量c。
　　最常用的拟合方法是最小二乘法，在各观测值彼此独立且x 的测量误差可以忽略的情况下，最小二乘法选择使各观测点残差（y 的观测值与理论值之差）的“加权平方和最小”的参量值╦ 作为参数的估计值，即

其中σ崿为观测值y壟的方差。
　　最小二乘法可用于解决物理实验中各种经验公式的实验曲线的建立问题（如粒子物理实验中粒子径迹的重建）。
　　假设检验　参量估计和拟合方法用于由实验数据估计观测值统计分布中或被观测量间函数关系中的待定参量。但是，观测值所服从的统计分布或被观测量间函数关系的理论公式常常只是一种统计假设；这种假设是否能应用于实验的具体情况，是否同观测结果有显著的矛盾，需要用观测数据予以检验。实验中需要检验的统计假设还可以是关于观测值统计分布参量数值的某种断言，假设检验方法常用于判断实验条件（例如仪器指标）是否正常，是否存在明显的系统误差，或者实验结果中是否包含着观测值的统计分布或被观测量间函数关系的理论假设中所没有考虑到的新现象。假设检验方法还可用于从两种理论假设中挑选一个最可能的假设，例如从不同粒子的混合束中，根据测得的数据有效地挑选出某种需要的粒子。
　　假设检验的一般方法，是选择一个观测数据的函数λ(x),叫做检验统计量，λ的数值表现了理论假设同实测数据的差异，而且在理论假设成立的条件下λ 的统计分布已知，则如果由测得数据算出的λ 值落入了表明与理论假设差异很大的某个区域之内（即在理论假设成立的条件下由λ的统计分布算得λ值落入该区域内的概率──显著水平──很小），就表明观测数据同理论假设存在显著的矛盾。
　　在实际问题中，应当根据具体情况选择适当的检验统计量。
　　一个广泛使用的检验统计量是皮尔孙 (Pearson)ⅹ²量，其定义为

式中n_i为落入区间i中观测值的个数,E_i为区间i中观测值个数的理论预期值。显然，ⅹ²值的大小表现了实验数据与理论值差异的大小；同时，如果理论假设是正确的,则ⅹ²量渐近地服从一个已知的ⅹ²分布,可以利用ⅹ²分布对差异的大小作出定量的概率估计。
　　在观测值统计分布中的参量θ只有两个可能值 θ₀和θ₁的情况下,对于由观测值x判断参量是否为特定值θ₀的参量检验问题,似然比是一个很有用的检验统计量,似然比的定义为

。

　　蒙特－卡罗模拟　见。
　　参考书目
　李惕碚著：《实验的数学处理》，科学出版社，北京，1980。
　A.G.Frodesen,et al.,ProBability and Statistics in particle Physics, Universitetsforlaget, Bergen,1979.

检测理论

应用统计推断理论研究从有噪声的信号中提取信息的最佳方式和检测系统的最佳设计，为的一个分支。检测就是从有限个可能出现的信号集合中作出选择，从而在有噪声的信号中提取信息的过程。
　　检测理论是统计信号处理的理论基础之一。除了广泛应用于雷达、声纳、通信、自动控制等技术外，在地震学、射电天文学、地球物理、生物医学、生物物理学等学科领域里，也有广泛的应用。
　　经典检测 　又称假设检验，它包括三个基本组成部分。首先是产生各种假设的源。假设可以看成是关于可能判决的陈述。产生这些陈述的机构称为源。其次是概率转移机构，对应于每一种假设，它按照某种概率规律产生赖以作出判断的观察量，即观察空间中的一点。最后根据某种意义下的最佳判决准则，把观察空间划分为对应于各种假设的区域，亦即产生出最佳判决的系统。
　　判决的依据是观察量的统计特性。对应于每一种假设，似然函数定义为各种条件概率密度函数P_i(r│H_i)(i＝0,1,…,M-1)；r为观测矢量(r₁,r₂,…，r_n)。在参量检测中，关于噪声以及信号与噪声之和的概率密度函数是已知的，或者除去确定它们的有限个参量以外是已知的。前者对应于简单假设检验，后者对应于复合假设检验。如果噪声分布的真正形式未知，则有限数目的参量不足以确定它们，这时的检测称为非参量检测。信号检测可以分为三种不同水平的问题。①确知信号的检测,如同步数字通信。②含未知参量的信号检测,如、目标检测，非相干数字通信系统，慢衰落信道。③随机信号的检测，如被动声纳、地震检测和射电天文检测。
　　二元检测 　源输出为两种可能的假设之一，分别表示为H₀和H₁),并称H₀为“零”假设。例如,雷达目标检测中,H₀假设和H₁假设分别表示目标的不存在和存在,记为
H₀:r(t)=n(t)

H₁:r(t)=s(t)+n(t)　（0≤t≤T）
式中r(t)为接收信号；s(t)为预期目标回波信号；n(t)为干扰噪声。时区【0，T】为观察区间。另一个普通的例子为二元通信问题。这时H₀和H₁可分别对应于发送的空号s₀(t)和传号s₁(t)。H₀:r(t)=s₀(t)+n(t),H₁:r(t)=s₁(t)+n(t),0≤t≤T。信号检测的目的就是要根据r(t)在某种最佳意义下，判决目标的存在或不存在，判决发送的是空号还是传号。
　　M元检测 　源输出为M个可能的假设H₀，H₁，…，之一。典型例子为雷达的多目标检测和M元通信问题。信号参量估计问题也可近似地当作M元检测问题来处理。
　　最佳准则 　理论上的最佳准则为贝叶斯准则──使全部判决的平均风险为最小的准则。假定M个可能发生的消息的先验概率已知并为P(H_j)(j=0,1,…,M-1)。如果实际存在的是消息j而被判定为消息i，定义其判决代价为C_ij。假定C_ij(i,j＝0,1,…，M-1)已经确定。贝叶斯准则是对于任何一组观测数据，选择假设H_j，它产生的平均风险最小。平均风险的定义为

P(H_i|H_j)表示H_j为真时，选择 H_i的概率。选择使平均风险为极小的假设与选择使条件风险为极小的假设是等效的。条件风险的定义为

即给定一组测量数据r，判决假设 H_j为真时的风险。P(H_i│r)称为后验概率, 即给定r，H_i为真的概率。这种准则下的最佳检测器是通过计算一组M-1个似然比，然后基于各似然比的相对大小作出判决来实现的，这就是似然比检测系统。第i个似然比的定义为第i个似然函数与第 0个似然函数之比。对于二元检测，只须把似然比Λ(r)＝P₁(r |H₁)/P₀(r|H₀)与特定门限值λ比较,如果大于门限值判为H₁，否则判为H₀。作为贝叶斯准则的特例,实际上常用的还有n个判决准则。
　　如果给定各代价函数，而先验概率未知，一个可能的合理的策略是假定最不利的先验分布，然后再采用贝叶斯准则，这就是极小化极大准则。
　　通信系统常用最小错误概率准则，即最大后验概率准则，又称“理想观察者”准则。假定正确判决不付出代价，各类错误判决的代价相等，此时使平均错误概率最小就相当于使贝叶斯风险最小。
　　雷达和声纳目标检测中，先验概率和各种代价函数均不容易确定。这时可以采用奈曼－皮尔逊准则。这一准则的判决门限λ可由虚警（即误判目标存在）概率α确定如下：

以门限λ进行似然比判决的系统,其漏警（即漏判目标存在的）概率

在给定的虚警概率α下达到最小。
　　由于似然比既不取决于各先验分布，而且它与各判决代价无关，在上述几种准则下，最佳检测系统仍然是似然比系统，只是各判决门限由相应准则来决定。
　　匹配滤波器 　假定所接收的有噪声信号为s_i(t)＋n(t),其中n(t)为高斯白噪声，s_i(t)为相应于消息i的确知信号，这时计算似然比的装置实质上是一个线性滤波器，其传输函数H(ω)与信号的频谱函数S(ω)有下述关系

式中 k为实数；t₀为抽样判决时刻。上式说明所求线性滤波器的传输函数等于信号频谱的共轭值，因而习惯上称之为匹配滤波器。
　　匹配滤波器有下述性质：①在所有线性滤波器中，匹配滤波器在输出端给出最大瞬时信噪比。②在白噪声下，匹配滤波器的输出瞬时信噪比只与输入信号能量和白噪声功率谱密度有关。③与信号s(t)匹配的匹配滤波器对于信号as(t-τ)(ɑ为常数；τ为时延)来说也是匹配的。即匹配滤波器对于波形相似而振幅和时延参量不同的信号具有适应性，但一般对频移信号是不适应的。④在高斯白噪声情况下，匹配滤波器等效于一个互相关器。
　　参考书目
　H. L. Van　Trees, Detection,Estimation,and Modulation Theory,Part I,Wiley,New York,1968.

似然比检验、wald检验、拉克朗日乘数检验的思想和方法

请问计量经济里三大检验包括似然比检验、wald检验、拉克朗日乘数检验的思想和方法分别是什么

似然比检验、wald检验、拉格朗日乘数检验都基于MLE，就大样本而言三者是渐进等价的。
1、似然比检验的思想是：如果参数约束是有效的，那么加上这样的约束不应该引起似然函数最大值的大幅度降低。

也就是说似然比检验的实质是在比较有约束条件下的似然函数最大值与无约束条件下似然函数最大值。似然比定义为有约束条件下的似然函数最大值与无约束条件下似然函数最大值之比。以似然比为基础可以构造一个服从卡方分布统计量（具体形式参见Greene）。

2、wald检验的思想是：如果约束是有效的，那么在没有约束情况下估计出来的估计量应该渐进地满足约束条件，因为MLE是一致的。

以无约束估计量为基础可以构造一个Wald统计量（具体形式参见Greene），这个统计量也服从卡方分布；

3、拉格朗日乘数检验的思想是：在约束条件下，可以用拉格朗日方法构造目标函数。如果约束有效，则最大化拉格朗日函数所得估计量应位于最大化无约束所得参数估计值附近。

这里也是构造一个LM统计量（具体形式参见Greene），该统计量服从卡方分布。

对于似然比检验，既需要估计有约束的模型，也需要估计无约束的模型；对于Wald检验，只需要估计无约束模型；对于LM检验，只需要估计有约束的模型。一般情况下，由于估计有约束模型相对更复杂，所有Wald检验最为常用。对于小样本而言，似然比检验的渐进性最好，LM检验也较好，Wald检验有时会拒绝原假设，其小样本性质不尽如人意。

1943年，在讨论检验方法的优良性时，对于线性模型的线性假设，第一次证明了似然比检验的优良性，是对多参数假设检验第一个非局部优良性的工作，如用λ 表示似然比检验非零分布中的非中心参数，许宝騄证明了：如果功效函数只依赖于λ，那么似然比检验就是一致最强的。此项研究被后来的研究证明，并被给予非常高的评价。

基于似然比检验原理的先验分布的选取
The Choice of Priors Based on the Theory of Likelihood Ratio Test
作　者周巧娟(Qiao-Juan Zhou);师义民(Yi-Min Shi)
刊　名科技通報
卷期／出版年月 23卷5期(2007/09)
利用似然比检验原理，给出了选取先验分布及先验分布族的似然比检验方法。用该方法讨论了几种不同情况的简单假设及复杂假设。文中以正态分布为例，给出了均值参数θ的先验分布及先验分布族的选择方法。实例表明，由该方法得到的先验分布及先验分布族具有较好的稳健性。

极大似然估计和广义矩估计

广义似然比检验

阅读(20680) | 评论(0) | 转发(1) |

0

上一篇：MATLAB概率密度函数估计

下一篇：核密度估计

给主人留下些什么吧！~~

评论热议

请登录后评论。
登录注册

转载自http://blog.sina.com.cn/s/blog_4a45f01301013ndt.html

Background

Simple-versus-simple hypotheses

Definition (likelihood ratio test for composite hypotheses)

Interpretation

Distribution: Wilks' theorem

Examples

Coin tossing

Criticism

See also

Context

References

[] Application to medicine

[] Example

[] Estimation pre- and post-test probability

[] Example

[] References

[] Interpretation

[] Example

[] See also

[] References

数据的统计处理方法

似然比检验、wald检验、拉克朗日乘数检验的思想和方法