E-Book, Englisch, 444 Seiten
Reihe: Statistics and Computing
Krause / Olson The Basics of S-PLUS
4th Auflage 2005
ISBN: 978-0-387-28390-6
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 444 Seiten
Reihe: Statistics and Computing
ISBN: 978-0-387-28390-6
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Proven bestseller: almost 6000 copies sold in the U.S. in two editions New edition updated to cover S-PLUS 6.0 Can be used as an introduction to R, as well as S-PLUS New exercises have been added; Includes a comparison of S-PLUS and R Well-suited for self-study
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;6
2;Contents;9
3;Figures;17
4;Tables;21
5;1 Introduction;23
5.1;1.1 The History of S and S-Plus;24
5.2;1.2 S-Plus on Different Operating Systems;26
5.3;1.3 Notational Conventions;28
6;2 Graphical User Interface;31
6.1;2.1 Introduction;31
6.2;2.2 System Overview;32
6.2.1;2.2.1 Using a Mouse;33
6.2.2;2.2.2 Object Explorer;33
6.2.3;2.2.3 Commands Window;33
6.2.4;2.2.4 Toolbars;34
6.2.5;2.2.5 Graph Sheets;34
6.2.6;2.2.6 Script Window;34
6.3;2.3 Getting Started with the Interface;35
6.3.1;2.3.1 Importing Data;35
6.3.2;2.3.2 Graphs;35
6.3.3;2.3.3 Data and Statistics;37
6.3.4;2.3.4 Customizing the Toolbars;37
6.3.5;2.3.5 Chapters;38
6.4;2.4 Detailed Use of the GUI Interface;40
6.5;2.5 Object Explorer;40
6.6;2.6 Help;41
6.7;2.7 Data Export;43
6.8;2.8 Working Directory;45
6.9;2.9 Data Import;46
6.10;2.10 Data Summaries;49
6.11;2.11 Graphs;51
6.12;2.12 Trellis Graphs;58
6.13;2.13 Linear Regression;60
6.14;2.14 PowerPoint (Windows Only);64
6.15;2.15 Excel (Windows Only);66
6.16;2.16 Script Window;67
6.17;2.17 UNIX/Linux GUI;69
6.18;2.18 Summary;78
6.19;2.19 Exercises;79
6.20;2.20 Solutions;80
7;3 A First Session;95
7.1;3.1 General Information;95
7.1.1;3.1.1 Starting and Quitting;96
7.1.2;3.1.2 The Help System;97
7.1.3;3.1.3 Before Beginning;97
7.2;3.2 Simple Structures;98
7.2.1;3.2.1 Arithmetic Operators;98
7.2.2;3.2.2 Assignments;99
7.2.3;3.2.3 The Concatenate Command: c;101
7.2.4;3.2.4 The Sequence Command: seq;102
7.2.5;3.2.5 The Replicate Command: rep;103
7.3;3.3 Mathematical Operations;104
7.4;3.4 Use of Brackets;106
7.5;3.5 Logical Values;107
7.6;3.6 Review;110
7.7;3.7 Exercises;113
7.8;3.8 Solutions;114
8;4 A Second Session;117
8.1;4.1 Constructing and Manipulating Data;117
8.1.1;4.1.1 Matrices;118
8.1.2;4.1.2 Arrays;123
8.1.3;4.1.3 Data Frames;126
8.1.4;4.1.4 Lists;129
8.2;4.2 Introduction to Functions;130
8.3;4.3 Introduction to Missing Values;131
8.4;4.4 Merging Data;132
8.5;4.5 Putting It All Together;133
8.6;4.6 Exercises;136
8.7;4.7 Solutions;138
9;5 Graphics;147
9.1;5.1 Basic Graphics Commands;147
9.2;5.2 Graphics Devices;148
9.2.1;5.2.1 Working with Multiple Graphics Devices;150
9.3;5.3 Plotting Data;150
9.3.1;5.3.1 The plot Command;151
9.3.2;5.3.2 Modifying the Data Display;152
9.3.3;5.3.3 Modifying Figure Elements;153
9.4;5.4 Adding Elements to Existing Plots;155
9.4.1;5.4.1 Functions to Add Elements to Graphs;155
9.4.2;5.4.2 More About;157
9.4.3;5.4.3 More on Adding Axes;157
9.4.4;5.4.4 Adding Text to Graphs;159
9.5;5.5 Setting Options;160
9.6;5.6 Figure Layouts;162
9.6.1;5.6.1 Layouts Using Trellis Graphs;162
9.6.2;5.6.2 Matrices of Graphs;162
9.6.3;5.6.3 Multiple-Screen Graphs;163
9.6.4;5.6.4 Figures of Speci.ed Size;164
9.7;5.7 Exercises;167
9.8;5.8 Solutions;168
10;6 Trellis Graphics;175
10.1;6.1 An Example;176
10.2;6.2 Trellis Basics;178
10.2.1;6.2.1 Trellis Syntax;178
10.2.2;6.2.2 Trellis Functions;179
10.2.3;6.2.3 Displaying and Storing Graphs;179
10.3;6.3 Output Devices;180
10.4;6.4 Customizing Trellis Graphs;182
10.4.1;6.4.1 Setting Options;182
10.4.2;6.4.2 Arranging the Layout of a Trellis Graph;183
10.4.3;6.4.3 Ordering of Graphs;185
10.4.4;6.4.4 Axis Customization;186
10.4.5;6.4.5 Modifying Panel Strips;187
10.4.6;6.4.6 Arranging Several Graphs on a Single Page;187
10.4.7;6.4.7 Updating Existing Trellis Graphs;189
10.4.8;6.4.8 Writing Panel Functions;190
10.5;6.5 Further Trellis Hints;193
10.5.1;6.5.1 Useful General Trellis Settings;194
10.5.2;6.5.2 Graphing Individual Pro.les;195
10.5.3;6.5.3 Preparing Data to Use for Trellis;196
10.5.4;6.5.4 The subset Option;197
10.5.5;6.5.5 Adding a Key;197
10.5.6;6.5.6 The subscripts Option in Panel Functions;199
10.6;6.6 Exercises;203
10.7;6.7 Solutions;205
11;7 Exploring Data;215
11.1;7.1 Descriptive Data Exploration;215
11.2;7.2 Graphical Exploration;226
11.2.1;7.2.1 Interactive Dynamic Graphics;241
11.2.2;7.2.2 Old-Style Graphics;241
11.3;7.3 Distributions and Related Functions;242
11.4;7.4 Confirmatory Statistics and Hypothesis Testing;247
11.5;7.5 Missing and In.nite Values;253
11.5.1;7.5.1 Testing for Missing Values;254
11.5.2;7.5.2 Supplying Data with Missing Values to Functions;254
11.5.3;7.5.3 Missing Values in Graphs;255
11.5.4;7.5.4 Infinite Values;255
11.6;7.6 Exercises;257
11.7;7.7 Solutions;260
12;8 Statistical Modeling;273
12.1;8.1 Introductory Examples;273
12.1.1;8.1.1 Regression;273
12.1.2;8.1.2 Regression Diagnostics;275
12.2;8.2 Statistical Models;277
12.3;8.3 Model Syntax;278
12.4;8.4 Regression;279
12.4.1;8.4.1 Linear Regression and Modeling Techniques;280
12.4.2;8.4.2 ANOVA;283
12.4.3;8.4.3 Logistic Regression;285
12.4.4;8.4.4 Survival Data Analysis;287
12.4.5;8.4.5 Endnote;289
12.5;8.5 Exercises;290
12.6;8.6 Solutions;293
13;9 Programming;307
13.1;9.1 Lists;307
13.1.1;9.1.1 Adding and Deleting List Elements;309
13.1.2;9.1.2 Naming List Elements;310
13.1.3;9.1.3 Applying the Same Function to List Elements;312
13.1.4;9.1.4 Unlisting a List;316
13.1.5;9.1.5 Generating a List by Using;316
13.2;9.2 Writing Functions;316
13.2.1;9.2.1 Documenting Functions;319
13.2.2;9.2.2 Scope of Variables;319
13.2.3;9.2.3 Parameters and Defaults;320
13.2.4;9.2.4 Passing an Unspeci.ed Number of Parameters to a Function;322
13.2.5;9.2.5 Testing for Existence of an Argument;323
13.2.6;9.2.6 Returning Warnings and Errors;323
13.2.7;9.2.7 Using Function Arguments in Graphics Labels;324
13.3;9.3 Iteration;325
13.3.1;9.3.1 The for Loop;325
13.3.2;9.3.2 The while Loop;326
13.3.3;9.3.3 The repeat Loop;327
13.3.4;9.3.4 Vectorizing a Loop;327
13.3.5;9.3.5 Large Loops;329
13.4;9.4 Debugging: Searching for Errors;330
13.4.1;9.4.1 Syntax Errors;331
13.4.2;9.4.2 Invalid Arguments;332
13.4.3;9.4.3 Execution or Run-Time Errors;332
13.4.4;9.4.4 Logical Errors;333
13.5;9.5 Output Using the;336
13.6;9.6 The paste Function;338
13.7;9.7 Exercises;340
13.8;9.8 Solutions;341
14;10 Object-Oriented Programming;345
14.1;10.1 Creating Classes and Objects;347
14.2;10.2 Creating Methods;350
14.3;10.3 Debugging;355
14.4;10.4 Help;356
14.5;10.5 Summary and Overview;356
14.6;10.6 Exercises;357
14.7;10.7 Solutions;358
15;11 Input and Output;371
15.1;11.1 Reading Commands from a File:The source Function;371
15.2;11.2 Data Import/Export: Easiest Method;372
15.3;11.3 Data Import/Export: General Method;374
15.4;11.4 Data Import/Export: Basic Method;375
15.5;11.5 Reading Data from the Terminal;376
15.6;11.6 Editing Data;377
15.7;11.7 Transferring Data: The data.dump and data.restore Functions;378
15.8;11.8 Recording a Session;378
15.9;11.9 Exercises;380
15.10;11.10 Solutions;381
16;12 Tips and Tricks;385
16.1;12.1 Useful Techniques;385
16.1.1;12.1.1 Housekeeping: Cleaning Up Directories;385
16.1.2;12.1.2 Storing and Restoring Graphical Parameters;386
16.1.3;12.1.3 Naming of Objects;386
16.1.4;12.1.4 Repeating Commands;387
16.2;12.2 Programming Environment and Techniques;388
16.2.1;12.2.1 The Process of Developing a Function;388
16.2.2;12.2.2 Setting up an Editor and Running the Code in S-Plus;388
16.2.3;12.2.3 Treating Data Frames as Lists;390
16.2.4;12.2.4 Working with Graph Sheets;391
16.2.5;12.2.5 Incorporating and Accessing C and Fortran Programs;393
16.2.6;12.2.6 Batch Jobs;396
16.2.7;12.2.7 Libraries;398
16.3;12.3 Factors;400
16.3.1;12.3.1 Creating Factors and Ordered Factors;400
16.3.2;12.3.2 Internal Representation of Factors;402
16.3.3;12.3.3 Where Levels Play a Role;403
16.3.4;12.3.4 Where Factors Can Lead Their Own Lives;404
16.3.5;12.3.5 How Factors Come Into Life;406
16.3.6;12.3.6 Adding and Dropping Factor Levels;407
16.4;12.4 Including Graphs in Text Processors;408
16.4.1;12.4.1 Generating Graphs for Windows Applications;409
16.4.2;12.4.2 Generating PostScript Graphs;410
16.4.3;12.4.3 PostScript Graphs in LATEX;411
16.4.4;12.4.4 If You Don’t Have a PostScript Printer;412
16.4.5;12.4.5 Greek Letters in Graphs;412
16.5;12.5 Exercises;414
16.6;12.6 Solutions;416
17;13 S-Plus Internals;423
17.1;13.1 How S-Plus Works Under UNIX;423
17.1.1;13.1.1 The Working Chapter;424
17.1.2;13.1.2 Customization on Start-Up and Exit;424
17.2;13.2 How S-Plus Works Under Windows;426
17.2.1;13.2.1 Command Line Options;426
17.2.2;13.2.2 Start-up and Exit Functions;427
17.2.3;13.2.3 How the Script Window works;428
17.3;13.3 Storing Mechanism;429
17.4;13.4 Levels of Calls;430
17.5;13.5 Exercises;432
17.6;13.6 Solutions;433
18;14 Information Sources on and Around S-Plus;435
18.1;14.1 Insightful;435
18.2;14.2 S-News: Exchanging Information with Other Users;436
18.3;14.3 The StatLib Server;436
18.4;14.4 What Next?;437
19;15 R;439
19.1;15.1 Development;440
19.2;15.2 Some Similarities Between R and S;440
19.3;15.3 Some Differences Between R and S;440
19.3.1;15.3.1 Language;441
19.3.2;15.3.2 Libraries;442
19.3.3;15.3.3 Trellis-Type Graphs;442
19.3.4;15.3.4 Colors and Lines;443
19.3.5;15.3.5 Data Import and Export Formats;443
19.3.6;15.3.6 Memory Handling;443
19.3.7;15.3.7 Mathematical Formulae in Graphs;443
19.3.8;15.3.8 Graphical User Interfaces;443
19.3.9;15.3.9 Start-Up Mechanism;444
19.3.10;15.3.10 Windows Integration;444
19.3.11;15.3.11 Support;444
19.4;15.4 Summary;445
20;16 Bibliography;447
20.1;16.1 Print Bibliography;447
20.2;16.2 On-Line Bibliography;449
20.2.1;16.2.1 S–PLUS Related Sources;449
20.2.2;16.2.2 TEX- Related Sources;451
20.2.3;16.2.3 Other Sources;451
21;Index;453
7 Exploring Data (p. 193)
In the preceding chapters, we have laid the foundation for understanding the concepts and ideas of the S-Plus system. We explored basic ideas and how to use S-Plus for performing calculations, and we have seen how data can be generated, stored, and accessed. Furthermore, we also looked at how data can be displayed graphically. All this will be useful as we explore real data sets in this chapter. We will explore data sets that come with S-Plus, speci.cally the Barley and Geyser data sets.
Rather than presenting a list of available statistical functions, we will go through a typical data analysis as a way of introducing the more useful and common commands and the kind of output we’ll encounter. We chose to use S-Plus data sets so you can follow along with the analysis we present and complete the exercises at the end of this chapter. We divide the data analysis into two categories: "descriptive" and "graphical" exploration. Further sections cover distributions and related functions, con.rmatory statistics and hypothesis testing, and missing and in.nite values.
7.1 Descriptive Data Exploration
We will now explore the di.erent variables contained in the Barley data set. We will first analyze the variables in one dimension, or, in other words, we will take a univariate approach. The analysis of the dependence between the variables and the exploration of higher-dimensional structure follows later.
The Barley Data Set
The Barley data are measurements of yield in bushels per acre at di.erent sites. The analysis comprises 6 sites planting 10 di.erent varieties of barley in 2 successive years, 1931 and 1932. The data set therefore contains 120 measurements of barley yield. Our main goal will be to investigate di.erences in barley yields given by the di.erent variable constellations, such as the 1931 harvest of the .fth variety on site 4 and the 1932 harvest of the seventh variety at the same site.
Just enter
> barley
to see the data.Exploratory data analysis (EDA) is an approach to investigating data that stresses the need to know more about the structure and information inherent in the data. The methods used with this approach are referred to as descriptive, as opposed to con.rmatory. Descriptive simply means that simple summaries are used to describe the data: their shapes, sizes, relationships, and the like. Examples of descriptive statistics are means, medians, standard deviations, ranges, and so on.
Given the basic information about the Barley data, the following analysis is intended to gain more information and structural knowledge about the numbers we have.
A typical place to begin is, of course, looking at the data. If the data set is small, we can easily look at it simply by printing it out. We check the data size by entering
> dim(barley)
120 4




