Buch, Englisch, Band 199, 224 Seiten, Format (B × H): 139 mm x 215 mm
From Raw to Ready
Buch, Englisch, Band 199, 224 Seiten, Format (B × H): 139 mm x 215 mm
Reihe: Quantitative Applications in the Social Sciences
ISBN: 978-1-0719-1956-9
Verlag: Sage Publications Inc Ebooks
This book focuses on the process of preparing raw data for analysis—commonly known as data cleaning. It covers a range of topics including data compilation, variable naming and labeling, data examination, and variable re-coding and transformations, among others. Two example projects and datasets are used to illustrate the methods in the book, and the datasets, script files, and output files in both R and Stata are available to download from the accompanying website.
Autoren/Hrsg.
Weitere Infos & Material
Preface
Acknowledgments
About the Author
Part: 1 Introduction
Chapter 1: Data Preparation: The Need for Strategy and Transparency
Importance of data preparation
Transparency
Tools for transparency
Summary
Chapter 2: Software and Script File Considerations
Software considerations
Script file robustness and legibility
Summary
Chapter 3: File Organization and Naming
Dual workflow and primary script files
File structure and document organization
Naming: files, folders, and more
Summary
Part 2: CLEANR Method
Chapter 4: Introduction
Data preparation steps
Rationale behind the order
Reconsider the rules of ordering
Summary
Chapter 5: Compiling Data
Preparing for data compilation
Collecting data
Downloading data
Steps between downloading data and uploading data
Uploading and importing data
Dropping and keeping variables
Appending and merging data frames
Re-shaping data frames
Summary
Chapter 6: Labeling and Naming Variables and Values
Variable naming
Variable and value labels
Summary
Chapter 7: Examining Data
Data quality indicators
Respondent quality
Characteristics of data sample
Summary
Chapter 8: Addressing Data Problems
Low quality data
Anomalous data
Missing data strategies
Summary
Chapter 9: New Variable Creation
Composite (scale) variables
Standardizing through use of proportions, percents, and rates
Integer/label encoding
Re-coding/discretization
Summary
Chapter 10: Re-configure, Re-examine, and Review
Re-configuring data
Re-examining data
Code review
Summary
Part 3: Review and Conclusion
Chapter 11: CLEANR in practice
General Social Survey
Systematic Review
Summary
Chapter 12: Conclusion
Section I: Best Practices
Section II: CLEANR Method
Section III: Conclusion
Conclusion
References