Muller / Brisebarre / De Dinechin Handbook of Floating-Point Arithmetic

1. Auflage 2009
ISBN: 978-0-8176-4705-6
Verlag: Birkhäuser Boston
Format: PDF
Kopierschutz: 1 - PDF Watermark

Häufig gestellte Fragen zu E-Books

E-Book, Englisch, 579 Seiten

Handbook of Floating-Point Arithmetic
1. Auflage. 2009, 978-0-8176-4704-9, Buch

E-Book, Englisch, 579 Seiten

ISBN: 978-0-8176-4705-6
Verlag: Birkhäuser Boston
Format: PDF
Kopierschutz: 1 - PDF Watermark

Häufig gestellte Fragen zu E-Books

160,49 €

(inkl. MwSt.)

versandkostenfreie Lieferung
sofort verfügbar

Floating-point arithmetic is the most widely used way of implementing real-number arithmetic on modern computers. However, making such an arithmetic reliable and portable, yet fast, is a very difficult task. As a result, floating-point arithmetic is far from being exploited to its full potential. This handbook aims to provide a complete overview of modern floating-point arithmetic. So that the techniques presented can be put directly into practice in actual coding or design, they are illustrated, whenever possible, by a corresponding program. The handbook is designed for programmers of numerical applications, compiler designers, programmers of floating-point algorithms, designers of arithmetic operators, and more generally, students and researchers in numerical analysis who wish to better understand a tool used in their daily work and research.

Muller / Brisebarre / De Dinechin Handbook of Floating-Point Arithmetic jetzt bestellen!

Autoren/Hrsg.

Muller, Jean-Michel

Brisebarre, Nicolas

De Dinechin, Florent

Jeannerod, Claude-Pierre

Lefèvre, Vincent

Lefèvr, Vincent

Melquiond, Guillaume

Revol, Nathalie

Stehlé, Damien

Torres, Serge

Weitere Infos & Material

Inhaltsverzeichnis

1;Preface;14
2;List of Figures;16
3;List of Tables;19
4;I Introduction, Basic Definitions, and Standards;22
4.1;1 Introduction;23
4.1.1;1.1 Some History;23
4.1.2;1.2 Desirable Properties;26
4.1.3;1.3 Some Strange Behaviors;27
4.1.3.1;1.3.1 Some famous bugs;27
4.1.3.2;1.3.2 Difficult problems;28
4.2;2 Definitions and Basic Notions;33
4.2.1;2.1 Floating-Point Numbers;33
4.2.2;2.2 Rounding;40
4.2.2.1;2.2.1 Rounding modes;40
4.2.2.2;2.2.2 Useful properties;42
4.2.2.3;2.2.3 Relative error due to rounding;43
4.2.3;2.3 Exceptions;45
4.2.4;2.4 Lost or Preserved Properties of the Arithmetic on the Real Numbers;47
4.2.5;2.5 Note on the Choice of the Radix;49
4.2.5.1;2.5.1 Representation errors;49
4.2.5.2;2.5.2 A case for radix 10;50
4.2.6;2.6 Tools for Manipulating Floating-Point Errors;52
4.2.6.1;2.6.1 The ulp function;52
4.2.6.2;2.6.2 Errors in ulps and relative errors;57
4.2.6.3;2.6.3 An example: iterated products;57
4.2.6.4;2.6.4 Unit roundoff;59
4.2.7;2.7 Note on Radix Conversion;60
4.2.7.1;2.7.1 Conditions on the formats;60
4.2.7.2;2.7.2 Conversion algorithms;63
4.2.8;2.8 The Fused Multiply-Add (FMA) Instruction;71
4.2.9;2.9 Interval Arithmetic;71
4.2.9.1;2.9.1 Intervals with floating-point bounds;72
4.2.9.2;2.9.2 Optimized rounding;72
4.3;3 Floating-Point Formats and Environment;74
4.3.1;3.1 The IEEE 754-1985 Standard;75
4.3.1.1;3.1.1 Formats specified by IEEE 754-1985;75
4.3.1.2;3.1.2 Little-endian, big-endian;79
4.3.1.3;3.1.3 Rounding modes specified by IEEE 754-1985;80
4.3.1.4;3.1.4 Operations specified by IEEE 754-1985;81
4.3.1.5;3.1.5 Exceptions specified by IEEE 754-1985;85
4.3.1.6;3.1.6 Special values;88
4.3.2;3.2 The IEEE 854-1987 Standard;89
4.3.2.1;3.2.1 Constraints internal to a format;89
4.3.2.2;3.2.2 Various formats and the constraints between them;90
4.3.2.3;3.2.3 Conversions between floating-point numbers and decimal strings;91
4.3.2.4;3.2.4 Rounding;92
4.3.2.5;3.2.5 Operations;92
4.3.2.6;3.2.6 Comparisons;93
4.3.2.7;3.2.7 Exceptions;93
4.3.3;3.3 The Need for a Revision;93
4.3.3.1;3.3.1 A typical problem: ``double rounding'';94
4.3.3.2;3.3.2 Various ambiguities;96
4.3.4;3.4 The New IEEE 754-2008 Standard;98
4.3.4.1;3.4.1 Formats specified by the revised standard;99
4.3.4.2;3.4.2 Binary interchange format encodings;100
4.3.4.3;3.4.3 Decimal interchange format encodings;101
4.3.4.4;3.4.4 Larger formats;111
4.3.4.5;3.4.5 Extended and extendable precisions;111
4.3.4.6;3.4.6 Attributes;112
4.3.4.7;3.4.7 Operations specified by the standard;116
4.3.4.8;3.4.8 Comparisons;118
4.3.4.9;3.4.9 Conversions;118
4.3.4.10;3.4.10 Default exception handling;119
4.3.4.11;3.4.11 Recommended transcendental functions;122
4.3.5;3.5 Floating-Point Hardware in Current Processors;123
4.3.5.1;3.5.1 The common hardware denominator;123
4.3.5.2;3.5.2 Fused multiply-add;123
4.3.5.3;3.5.3 Extended precision;123
4.3.5.4;3.5.4 Rounding and precision control;124
4.3.5.5;3.5.5 SIMD instructions;125
4.3.5.6;3.5.6 Floating-point on x86 processors: SSE2 versus x87;125
4.3.5.7;3.5.7 Decimal arithmetic;126
4.3.6;3.6 Floating-Point Hardware in Recent GraphicsProcessing Units;127
4.3.7;3.7 Relations with Programming Languages;128
4.3.7.1;3.7.1 The Language Independent Arithmetic (LIA) standard;128
4.3.7.2;3.7.2 Programming languages;129
4.3.8;3.8 Checking the Environment;129
4.3.8.1;3.8.1 MACHAR;130
4.3.8.2;3.8.2 Paranoia;130
4.3.8.3;3.8.3 UCBTest;134
4.3.8.4;3.8.4 TestFloat;135
4.3.8.5;3.8.5 IeeeCC754;135
4.3.8.6;3.8.6 Miscellaneous;135
5;II Cleverly Using Floating-Point Arithmetic;136
5.1;4 Basic Properties and Algorithms;137
5.1.1;4.1 Testing the Computational Environment;137
5.1.1.1;4.1.1 Computing the radix;137
5.1.1.2;4.1.2 Computing the precision;139
5.1.2;4.2 Exact Operations;140
5.1.2.1;4.2.1 Exact addition;140
5.1.2.2;4.2.2 Exact multiplications and divisions;142
5.1.3;4.3 Accurate Computations of Sums of Two Numbers;143
5.1.3.1;4.3.1 The Fast2Sum algorithm;144
5.1.3.2;4.3.2 The 2Sum algorithm;147
5.1.3.3;4.3.3 If we do not use rounding to nearest;149
5.1.4;4.4 Computation of Products;150
5.1.4.1;4.4.1 Veltkamp splitting;150
5.1.4.2;4.4.2 Dekker's multiplication algorithm;153
5.1.5;4.5 Complex numbers;157
5.1.5.1;4.5.1 Various error bounds;158
5.1.5.2;4.5.2 Error bound for complex multiplication;159
5.1.5.3;4.5.3 Complex division;162
5.1.5.4;4.5.4 Complex square root;167
5.2;5 The Fused Multiply-Add Instruction;169
5.2.1;5.1 The 2MultFMA Algorithm;170
5.2.2;5.2 Computation of Residuals of Division and Square Root;171
5.2.3;5.3 Newton--Raphson-Based Division with an FMA;173
5.2.3.1;5.3.1 Variants of the Newton--Raphson iteration;173
5.2.3.2;5.3.2 Using the Newton--Raphson iteration for correctly rounded division;178
5.2.4;5.4 Newton--Raphson-Based Square Root with an FMA;185
5.2.4.1;5.4.1 The basic iterations;185
5.2.4.2;5.4.2 Using the Newton--Raphson iteration for correctly rounded square roots;186
5.2.5;5.5 Multiplication by an Arbitrary-Precision Constant;189
5.2.5.1;5.5.1 Checking for a given constant C if Algorithm 5.2 will always work;190
5.2.6;5.6 Evaluation of the Error of an FMA;193
5.2.7;5.7 Evaluation of Integer Powers;195
5.3;6 Enhanced Floating-Point Sums, Dot Products, and Polynomial Values;198
5.3.1;6.1 Preliminaries;199
5.3.1.1;6.1.1 Floating-point arithmetic models;200
5.3.1.2;6.1.2 Notation for error analysis and classical error estimates;201
5.3.1.3;6.1.3 Properties for deriving validated running error bounds;204
5.3.2;6.2 Computing Validated Running Error Bounds;205
5.3.3;6.3 Computing Sums More Accurately;207
5.3.3.1;6.3.1 Reordering the operands, and a bit more;207
5.3.3.2;6.3.2 Compensated sums;209
5.3.3.3;6.3.3 Implementing a ``long accumulator'';216
5.3.3.4;6.3.4 On the sum of three floating-point numbers;216
5.3.4;6.4 Compensated Dot Products;218
5.3.5;6.5 Compensated Polynomial Evaluation;220
5.4;7 Languages and Compilers;222
5.4.1;7.1 A Play with Many Actors;222
5.4.1.1;7.1.1 Floating-point evaluation in programming languages;223
5.4.1.2;7.1.2 Processors, compilers, and operating systems;225
5.4.1.3;7.1.3 In the hands of the programmer;226
5.4.2;7.2 Floating Point in the C Language;226
5.4.2.1;7.2.1 Standard C99 headers and IEEE 754-1985 support;226
5.4.2.2;7.2.2 Types;227
5.4.2.3;7.2.3 Expression evaluation;230
5.4.2.4;7.2.4 Code transformations;233
5.4.2.5;7.2.5 Enabling unsafe optimizations;234
5.4.2.6;7.2.6 Summary: a few horror stories;235
5.4.3;7.3 Floating-Point Arithmetic in the C++ Language;237
5.4.3.1;7.3.1 Semantics;237
5.4.3.2;7.3.2 Numeric limits;238
5.4.3.3;7.3.3 Overloaded functions;239
5.4.4;7.4 FORTRAN Floating Point in a Nutshell;240
5.4.4.1;7.4.1 Philosophy;240
5.4.4.2;7.4.2 IEEE 754 support in FORTRAN;243
5.4.5;7.5 Java Floating Point in a Nutshell;244
5.4.5.1;7.5.1 Philosophy;244
5.4.5.2;7.5.2 Types and classes;245
5.4.5.3;7.5.3 Infinities, NaNs, and signed zeros;247
5.4.5.4;7.5.4 Missing features;248
5.4.5.5;7.5.5 Reproducibility;249
5.4.5.6;7.5.6 The BigDecimal package;250
5.4.6;7.6 Conclusion;251
6;III Implementing Floating-Point Operators;253
6.1;8 Algorithms for the Five Basic Operations;254
6.1.1;8.1 Overview of Basic Operation Implementation;254
6.1.2;8.2 Implementing IEEE 754-2008 Rounding;256
6.1.2.1;8.2.1 Rounding a nonzero finite value with unbounded exponent range;256
6.1.2.2;8.2.2 Overflow;258
6.1.2.3;8.2.3 Underflow and subnormal results;259
6.1.2.4;8.2.4 The inexact exception;260
6.1.2.5;8.2.5 Rounding for actual operations;260
6.1.3;8.3 Floating-Point Addition and Subtraction;261
6.1.3.1;8.3.1 Decimal addition;264
6.1.3.2;8.3.2 Decimal addition using binary encoding;265
6.1.3.3;8.3.3 Subnormal inputs and outputs in binary addition;266
6.1.4;8.4 Floating-Point Multiplication;266
6.1.4.1;8.4.1 Normal case;267
6.1.4.2;8.4.2 Handling subnormal numbers in binary multiplication;267
6.1.4.3;8.4.3 Decimal specifics;268
6.1.5;8.5 Floating-Point Fused Multiply-Add;269
6.1.5.1;8.5.1 Case analysis for normal inputs;269
6.1.5.2;8.5.2 Handling subnormal inputs;273
6.1.5.3;8.5.3 Handling decimal cohorts;274
6.1.5.4;8.5.4 Overview of a binary FMA implementation;274
6.1.6;8.6 Floating-Point Division;277
6.1.6.1;8.6.1 Overview and special cases;277
6.1.6.2;8.6.2 Computing the significand quotient;278
6.1.6.3;8.6.3 Managing subnormal numbers;279
6.1.6.4;8.6.4 The inexact exception;280
6.1.6.5;8.6.5 Decimal specifics;280
6.1.7;8.7 Floating-Point Square Root;280
6.1.7.1;8.7.1 Overview and special cases;280
6.1.7.2;8.7.2 Computing the significand square root;281
6.1.7.3;8.7.3 Managing subnormal numbers;282
6.1.7.4;8.7.4 The inexact exception;282
6.1.7.5;8.7.5 Decimal specifics;282
6.2;9 Hardware Implementation of Floating-Point Arithmetic;283
6.2.1;9.1 Introduction and Context;283
6.2.1.1;9.1.1 Processor internal formats;283
6.2.1.2;9.1.2 Hardware handling of subnormal numbers;284
6.2.1.3;9.1.3 Full-custom VLSI versus reconfigurable circuits;285
6.2.1.4;9.1.4 Hardware decimal arithmetic;286
6.2.1.5;9.1.5 Pipelining;287
6.2.2;9.2 The Primitives and Their Cost;288
6.2.2.1;9.2.1 Integer adders;288
6.2.2.2;9.2.2 Digit-by-integer multiplication in hardware;294
6.2.2.3;9.2.3 Using nonstandard representations of numbers;294
6.2.2.4;9.2.4 Binary integer multiplication;295
6.2.2.5;9.2.5 Decimal integer multiplication;297
6.2.2.6;9.2.6 Shifters;298
6.2.2.7;9.2.7 Leading-zero counters;298
6.2.2.8;9.2.8 Tables and table-based methods for fixed-point function approximation;300
6.2.3;9.3 Binary Floating-Point Addition;302
6.2.3.1;9.3.1 Overview;302
6.2.3.2;9.3.2 A first dual-path architecture;303
6.2.3.3;9.3.3 Leading-zero anticipation;305
6.2.3.4;9.3.4 Probing further on floating-point adders;309
6.2.4;9.4 Binary Floating-Point Multiplication;310
6.2.4.1;9.4.1 Basic architecture;310
6.2.4.2;9.4.2 FPGA implementation;310
6.2.4.3;9.4.3 VLSI implementation optimized for delay;312
6.2.4.4;9.4.4 Managing subnormals;315
6.2.5;9.5 Binary Fused Multiply-Add;316
6.2.5.1;9.5.1 Classic architecture;317
6.2.5.2;9.5.2 To probe further;319
6.2.6;9.6 Division;319
6.2.6.1;9.6.1 Digit-recurrence division;320
6.2.6.2;9.6.2 Decimal division;323
6.2.7;9.7 Conclusion: Beyond the FPU;323
6.2.7.1;9.7.1 Optimization in context of standard operators;324
6.2.7.2;9.7.2 Operation with a constant operand;325
6.2.7.3;9.7.3 Block floating point;327
6.2.7.4;9.7.4 Specific architectures for accumulation;327
6.2.7.5;9.7.5 Coarser-grain operators;331
6.2.8;9.8 Probing Further;334
6.3;10 Software Implementation of Floating-Point Arithmetic;335
6.3.1;10.1 Implementation Context;336
6.3.1.1;10.1.1 Standard encoding of binary floating-point data;336
6.3.1.2;10.1.2 Available integer operators;337
6.3.1.3;10.1.3 First examples;340
6.3.1.4;10.1.4 Design choices and optimizations;342
6.3.2;10.2 Binary Floating-Point Addition;343
6.3.2.1;10.2.1 Handling special values;344
6.3.2.2;10.2.2 Computing the sign of the result;346
6.3.2.3;10.2.3 Swapping the operands and computing the alignment shift;347
6.3.2.4;10.2.4 Getting the correctly rounded result;349
6.3.3;10.3 Binary Floating-Point Multiplication;355
6.3.3.1;10.3.1 Handling special values;355
6.3.3.2;10.3.2 Sign and exponent computation;357
6.3.3.3;10.3.3 Overflow detection;359
6.3.3.4;10.3.4 Getting the correctly rounded result;360
6.3.4;10.4 Binary Floating-Point Division;363
6.3.4.1;10.4.1 Handling special values;364
6.3.4.2;10.4.2 Sign and exponent computation;365
6.3.4.3;10.4.3 Overflow detection;368
6.3.4.4;10.4.4 Getting the correctly rounded result;369
6.3.5;10.5 Binary Floating-Point Square Root;375
6.3.5.1;10.5.1 Handling special values;376
6.3.5.2;10.5.2 Exponent computation;378
6.3.5.3;10.5.3 Getting the correctly rounded result;379
7;IV Elementary Functions;387
7.1;11 Evaluating Floating-Point Elementary Functions;388
7.1.1;11.1 Basic Range Reduction Algorithms;392
7.1.1.1;11.1.1 Cody and Waite's reduction algorithm;392
7.1.1.2;11.1.2 Payne and Hanek's algorithm;394
7.1.2;11.2 Bounding the Relative Error of Range Reduction;395
7.1.3;11.3 More Sophisticated Range Reduction Algorithms;397
7.1.3.1;11.3.1 An example of range reduction for the exponential function;399
7.1.3.2;11.3.2 An example of range reduction for the logarithm;400
7.1.4;11.4 Polynomial or Rational Approximations;401
7.1.4.1;11.4.1 L2 case;402
7.1.4.2;11.4.2 L, or minimax case;403
7.1.4.3;11.4.3 ``Truncated'' approximations;405
7.1.5;11.5 Evaluating Polynomials;406
7.1.6;11.6 Correct Rounding of Elementary Functions tobinary64;407
7.1.6.1;11.6.1 The Table Maker's Dilemma and Ziv's onion peelingstrategy;407
7.1.6.2;11.6.2 When the TMD is solved;408
7.1.6.3;11.6.3 Rounding test;409
7.1.6.4;11.6.4 Accurate second step;413
7.1.6.5;11.6.5 Error analysis and the accuracy/performance tradeoff;414
7.1.7;11.7 Computing Error Bounds;415
7.1.7.1;11.7.1 The point with efficient code;415
7.1.7.2;11.7.2 Example: a ``double-double'' polynomial evaluation;416
7.2;12 Solving the Table Maker's Dilemma;418
7.2.1;12.1 Introduction;418
7.2.1.1;12.1.1 The Table Maker's Dilemma;419
7.2.1.2;12.1.2 Brief history of the TMD;423
7.2.1.3;12.1.3 Organization of the chapter;424
7.2.2;12.2 Preliminary Remarks on the Table Maker's Dilemma;425
7.2.2.1;12.2.1 Statistical arguments: what can be expected in practice;425
7.2.2.2;12.2.2 In some domains, there is no need to find worst cases;429
7.2.2.3;12.2.3 Deducing the worst cases from other functions or domains;432
7.2.3;12.3 The Table Maker's Dilemma for Algebraic Functions;433
7.2.3.1;12.3.1 Algebraic and transcendental numbers and functions;433
7.2.3.2;12.3.2 The elementary case of quotients;435
7.2.3.3;12.3.3 Around Liouville's theorem;437
7.2.3.4;12.3.4 Generating bad rounding cases for the square root using Hensel 2-adic lifting;438
7.2.4;12.4 Solving the Table Maker's Dilemma for Arbitrary Functions;442
7.2.4.1;12.4.1 Lindemann's theorem: application to some transcendental functions;442
7.2.4.2;12.4.2 A theorem of Nesterenko and Waldschmidt;443
7.2.4.3;12.4.3 A first method: tabulated differences;445
7.2.4.4;12.4.4 From the TMD to the distance between a grid and a segment;447
7.2.4.5;12.4.5 Linear approximation: Lefèvre's algorithm;449
7.2.4.6;12.4.6 The SLZ algorithm;456
7.2.4.7;12.4.7 Periodic functions on large arguments;461
7.2.5;12.5 Some Results;462
7.2.5.1;12.5.1 Worst cases for the exponential, logarithmic, trigonometric, and hyperbolic functions;462
7.2.5.2;12.5.2 A special case: integer powers;471
7.2.6;12.6 Current Limits and Perspectives;471
8;V Extensions;473
8.1;13 Formalisms for Certifying Floating-Point Algorithms;474
8.1.1;13.1 Formalizing Floating-Point Arithmetic;474
8.1.1.1;13.1.1 Defining floating-point numbers;475
8.1.1.2;13.1.2 Simplifying the definition;477
8.1.1.3;13.1.3 Defining rounding operators;478
8.1.1.4;13.1.4 Extending the set of numbers;481
8.1.2;13.2 Formalisms for Certifying Algorithms by Hand;482
8.1.2.1;13.2.1 Hardware units;482
8.1.2.2;13.2.2 Low-level algorithms;483
8.1.2.3;13.2.3 Advanced algorithms;484
8.1.3;13.3 Automating Proofs;485
8.1.3.1;13.3.1 Computing on bounds;486
8.1.3.2;13.3.2 Counting digits;488
8.1.3.3;13.3.3 Manipulating expressions;490
8.1.3.4;13.3.4 Handling the relative error;494
8.1.4;13.4 Using Gappa;495
8.1.4.1;13.4.1 Toy implementation of sine;495
8.1.4.2;13.4.2 Integer division on Itanium;499
8.2;14 Extending the Precision;503
8.2.1;14.1 Double-Words, Triple-Words…;504
8.2.1.1;14.1.1 Double-word arithmetic;505
8.2.1.2;14.1.2 Static triple-word arithmetic;508
8.2.1.3;14.1.3 Quad-word arithmetic;510
8.2.2;14.2 Floating-Point Expansions;513
8.2.3;14.3 Floating-Point Numbers with Batched Additional Exponent;519
8.2.4;14.4 Large Precision Relying on Processor Integers;520
8.2.4.1;14.4.1 Using arbitrary-precision integer arithmetic for arbitrary-precision floating-point arithmetic;522
8.2.4.2;14.4.2 A brief introduction to arbitrary-precision integer arithmetic;523
9;VI Perspectives and Appendix;527
9.1;15 Conclusion and Perspectives;528
9.2;16 Appendix: Number Theory Tools for F-P Arithmetic;529
9.2.1;16.1 Continued Fractions;529
9.2.2;16.2 The LLL Algorithm;532
10;Bibliography;537
11;Index;574

Produktsicherheit

Fragen zum Artikel?

Ihre Fragen, Wünsche oder Anmerkungen

Vorname*

Nachname*

Ihre E-Mail-Adresse*

Kundennr.

Ihre Nachricht*

Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.

Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.

160,49 € (inkl. MwSt.)

sofort verfügbar

Webcode: www2.sack.de/35qb9