E-Book, Englisch, 178 Seiten
Manski Partial Identification of Probability Distributions
1. Auflage 2006
ISBN: 978-0-387-21786-4
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 178 Seiten
Reihe: Springer Series in Statistics
ISBN: 978-0-387-21786-4
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Sample data alone never suffice to draw conclusions about populations. Inference always requires assumptions about the population and sampling process. Statistical theory has revealed much about how strength of assumptions affects the precision of point estimates, but has had much less to say about how it affects the identification of population parameters. Indeed, it has been commonplace to think of identification as a binary event - a parameter is either identified or not - and to view point identification as a pre-condition for inference. Yet there is enormous scope for fruitful inference using data and assumptions that partially identify population parameters. This book explains why and shows how.
The book presents in a rigorous and thorough manner the main elements of Charles Manski's research on partial identification of probability distributions. One focus is prediction with missing outcome or covariate data. Another is decomposition of finite mixtures, with application to the analysis of contaminated sampling and ecological inference. A third major focus is the analysis of treatment response. Whatever the particular subject under study, the presentation follows a common path. The author first specifies the sampling process generating the available data and asks what may be learned about population parameters using the empirical evidence alone. He then ask how the (typically) setvalued identification regions for these parameters shrink if various assumptions are imposed. The approach to inference that runs throughout the book is deliberately conservative and thoroughly nonparametric.
Conservative nonparametric analysis enables researchers to learn from the available data without imposing untenable assumptions. It enables establishment of a domain of consensus among researchers who may hold disparate beliefs about what assumptions are appropriate. Charles F. Manski is Board of Trustees Professor at Northwestern University. He is author of Identification Problems in the Social Sciences and Analog Estimation Methods in Econometrics. He is a Fellow of the American Academy of Arts and Sciences, the American Association for the Advancement of Science, and the Econometric Society.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;7
2;Contents;8
3;Introduction: Partial Identification and Credible Inference;12
4;1 Missing Outcomes;17
4.1;1.1. Anatomy of the Problem;17
4.2;1.2. Means;19
4.3;1.3. Parameters that Respect Stochastic Dominance;22
4.4;1.4. Combining Multiple Sampling Processes;24
4.5;1.5. Interval Measurement of Outcomes;28
4.6;Complement 1A. Employment Probabilities;29
4.7;Complement 1B. Blind-Men Bounds on an Elephant;32
4.8;Endnotes;34
5;2 Instrumental Variables;37
5.1;2.1. Distributional Assumptions and Credible Inference;37
5.2;2.2. Some Assumptions Using Instrumental Variables;38
5.3;2.3. Outcomes Missing-at-Random;40
5.4;2.4. Statistical Independence;41
5.5;2.5. Mean Independence and Mean Monotonicity;43
5.6;2.6. Other Assumptions Using Instrumental Variables;47
5.7;Complement 2A. Estimation with Nonresponse Weights;48
5.8;Endnotes;49
6;3 Conditional Prediction with Missing Data;51
6.1;3.1. Prediction of Outcomes Conditional on Covariates;51
6.2;3.2. Missing Outcomes;52
6.3;3.3. Jointly Missing Outcomes and Covariates;52
6.4;3.4. Missing Covariates;57
6.5;3.5. General Missing-Data Patterns;60
6.6;3.6. Joint Inference on Conditional Distributions;64
6.7;Complement 3A. Unemployment Rates;66
6.8;Complement 3B. Parametric Prediction with Missing Data;67
6.9;Endnotes;69
7;4 Contaminated Outcomes;71
7.1;4.1. The Mixture Model of Data Errors;71
7.2;4.2. Outcome Distributions;73
7.3;4.3. Event Probabilities;74
7.4;4.4. Parameters that Respect Stochastic Dominance;76
7.5;Complement 4A. Contamination Through Imputation;79
7.6;Complement 4B. Identification and Robust Inference;81
7.7;Endnotes;83
8;5 Regressions, Short and Long;84
8.1;5.1. Ecological Inference;84
8.2;5.2. Anatomy of the Problem;85
8.3;5.3. Long Mean Regressions;87
8.4;5.4. Instrumental Variables;92
8.5;Complement 5A. Structural Prediction;95
8.6;Endnotes;96
9;6 Response-Based Sampling;98
9.1;6.1. Reverse Regression;98
9.2;6.2. Auxiliary Data on Outcomes or Covariates;100
9.3;6.3. The Rare-Disease Assumption;100
9.4;6.4. Bounds on Relative and Attributable Risk;102
9.5;6.5. Sampling from One Response Stratum;105
9.6;Complement 6A. Smoking and Heart Disease;108
9.7;Endnotes;109
10;7 Analysis of Treatment Response;110
10.1;7.1. Anatomy of the Problem;110
10.2;7.2. Treatment Choice in Heterogeneous Populations;113
10.3;7.3. The Selection Problem and Treatment Choice;116
10.4;7.4. Instrumental Variables;119
10.5;Complement 7A. Identification and Ambiguity;121
10.6;Complement 7B: Sentencing and Recidivism;123
10.7;Complement 7C. Missing Outcome and Covariate Data;125
10.8;Complement 7D. Study and Treatment Populations;128
10.9;Endnotes;129
11;8 Monotone Treatment Response;131
11.1;8.1. Shape Restrictions;131
11.2;8.2. Monotonicity;134
11.3;8.3. Semi-Monotonicity;138
11.4;8.4. Concave Monotonicity;143
11.5;Complement 8A: Downward-Sloping Demand;147
11.6;Complement 8B. Econometric Response Models;149
11.7;Endnotes;150
12;9 Monotone Instrumental Variables;152
12.1;9.1. Equalities and Inequalities;152
12.2;9.2. Mean Monotonicity;154
12.3;9.3. Mean Monotonicity and Mean Treatment Response;156
12.4;9.4. Variations on the Theme;160
12.5;Complement 9A. The Returns to Schooling;160
12.6;Endnotes;164
13;10 The Mixing Problem;165
13.1;10.1. Within-Group Treatment Variation;165
13.2;10.2. Known Treatment Shares;168
13.3;10.3. Extrapolation from the Experiment Alone;171
13.4;Complement 10A. Experiments Without Covariate Data;172
13.5;Endnotes;176
14;References;178
15;Index;186
Introduction: Partial Identification and Credible Inference (p. 1-2)
Statistical inference uses sample data to draw conclusions about a population of interest. However, data alone do not suffice. Inference always requires assumptions about the population and the sampling process. Statistical theory illuminates the logic of inference by showing how data and assumptions combine to yield conclusions.
Empirical researchers should be concerned with both the logic and the credibility of their inferences. Credibility is a subjective matter, yet I take there to be wide agreement on a principle that I shall call:
The Law of Decreasing Credibility: The credibility of inference decreases with the strength of the assumptions maintained.
This principle implies that empirical researchers face a dilemma as they decide what assumptions to maintain: Stronger assumptions yield inferences that may be more powerful but less credible. Statistical theory cannot resolve the dilemma but can clarify its nature.
It is useful to distinguish combinations of data and assumptions that point-identify a population parameter of interest from ones that place the parameter within a set-valued identification region. Point identification is the fundamental necessary condition for consistent point estimation of a parameter. Strengthening an assumption that achieves point identification may increase the attainable precision of estimates of the parameter. Statistical theory has had much to say about this matter. The classical theory of local asymptotic efficiency characterizes, through the Fisher information matrix, how attainable precision increases as more is assumed known about a population distribution. Nonparametric regression analysis shows how the attainable rate of convergence of estimates increases as more is assumed about the shape of the regression. These and other achievements provide important guidance to empirical researchers as they weigh the credibility and precision of alternative point estimates.
Statistical theory has had much less to say about inference on population parameters that are not point-identified (see the historical note at the end of this Introduction). It has been commonplace to think of identification as a binary event - a parameter is either identified or it is not - and to view point identification as a precondition for meaningful inference. Yet there is enormous scope for fruitful inference using data and assumptions that partially identify population parameters. This book explains why and shows how.
Origin and Organization of the Book
The book has its roots in my research on nonparametric regression analysis with missing outcome data, initiated in the late 1980s. Empirical researchers estimating regressions commonly assume that missingness is random, in the sense that the observability of an outcome is statistically independent of its value. Yet this and other point-identifying assumptions have regularly been criticized as implausible. So I set out to determine what random sampling with partial observability of outcomes reveals about mean and quantile regressions if nothing is known about the missingness process or if assumptions weak enough to be widely credible are imposed. The findings were sharp bounds whose forms vary with the regression of interest and with the maintained assumptions. These bounds can readily be estimated using standard methods of nonparametric regression analysis.
Study of regression with missing outcome data stimulated investigation of more general incomplete data problems. Some sample realizations may have unobserved outcomes, some may have unobserved covariates, and others may be entirely missing. Sometimes interval data on outcomes or covariates are available, rather than point measurements. Random sampling with incomplete observation of outcomes and covariates generically yields partial identification of regressions. The challenge is to describe and estimate the identification regions produced by incomplete-data processes when alternative assumptions are maintained. researchers estimating regressions commonly assume that missingness is random, in the sense that the observability of an outcome is statistically independent of its value. Yet this and other point-identifying assumptions have regularly been criticized as implausible. So I set out to determine what random sampling with partial observability of outcomes reveals about mean and quantile regressions if nothing is known about the missingness process or if assumptions weak enough to be widely credible are imposed. The findings were sharp bounds whose forms vary with the regression of interest and with the maintained assumptions. These bounds can readily be estimated using standard methods of nonparametric regression analysis.




