E-Book, Englisch, Band 50, 370 Seiten
Reihe: Lecture Notes in Computational Science and Engineering
Bücker / Corliss / Hovland Automatic Differentiation: Applications, Theory, and Implementations
1. Auflage 2006
ISBN: 978-3-540-28438-3
Verlag: Springer Berlin Heidelberg
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, Band 50, 370 Seiten
Reihe: Lecture Notes in Computational Science and Engineering
ISBN: 978-3-540-28438-3
Verlag: Springer Berlin Heidelberg
Format: PDF
Kopierschutz: 1 - PDF Watermark
This collection covers the state of the art in automatic differentiation theory and practice. Practitioners and students will learn about advances in automatic differentiation techniques and strategies for the implementation of robust and powerful tools. Computational scientists and engineers will benefit from the discussion of applications, which provide insight into effective strategies for using automatic differentiation for design optimization, sensitivity analysis, and uncertainty quantification.
Written for: Computational scientists
Keywords: automatic differentiation, optimization, sensitivity analysis.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;5
2;Contents;7
3;List of Contributors;11
4;Perspectives on Automatic Differentiation: Past, Present, and Future?;18
4.1;1 The Algorithmic Approach;19
4.2;2 Transformation of Algorithms;20
4.3;3 Development of AD;21
4.4;4 Present Tasks and Future Prospects;28
4.5;5 Beyond AD;31
5;Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities ;32
5.1;1 Introduction and Summary;32
5.2;2 Motivations and Early History;34
5.3;3 Types of Differentiation Capability We Have Developed;42
6;Solutions of ODEs with Removable Singularities ;52
6.1;1 Introduction;52
6.2;2 Notation and Some Polynomial Algebra;53
6.3;3 Elementary Functions;53
6.4;4 Other Functions;59
6.5;5 Higher Order Equations;61
6.6;6 Open Questions;62
7;Automatic Propagation of Uncertainties;64
7.1;1 Introduction;64
7.2;2 Linear Models;65
7.3;3 Contrast with Interval Analysis;68
7.4;4 Nonlinear Models;69
7.5;5 Implementation with Automatic Differentiation;71
7.6;6 Validation of Uncertainty Models;73
7.7;7 Way Ahead;75
8;High-Order Representation of Poincare Maps;76
8.1;1 Introduction;76
8.2;2 Overview of DA Tools;77
8.3;3 Description of the Method;78
8.4;4 Examples;80
9;Computation of Matrix Permanent with Automatic Differentiation;84
9.1;1 Introduction;84
9.2;2 Formulation;85
9.3;3 Methods;87
9.4;4 Algorithms;89
9.5;5 Discussions and Comments;92
9.6;6 Conclusion;93
10;Computing Sparse Jacobian Matrices Optimally ;94
10.1;1 Introduction;94
10.2;2 Optimal Matrix Compression and Restoration;96
10.3;3 Schur Complement Approach;98
10.4;4 Combined Determination;100
10.5;5 Using Recurring Sparsity Structure in Rows;101
10.6;6 Numerical Experiments;103
10.7;7 Concluding Remarks;103
11;Application of AD-based Quasi-Newton Methods to Stiff ODEs;106
11.1;1 Introduction;106
11.2;2 Quasi-Newton Approximations;108
11.3;3 Implementation Details;112
11.4;4 Numerical Results;112
11.5;5 Conclusions and Outlook;115
12;Reduction of Storage Requirement by Checkpointing for Time- Dependent Optimal Control Problems in ODEs;116
12.1;1 Introduction;116
12.2;2 Quasilinearization Techniques;118
12.3;3 Nested Reversal Schedules;121
12.4;4 Numerical Example;126
12.5;5 Conclusion and Outlook;127
13;Improving the Performance of the Vertex Elimination Algorithm for Derivative Calculation ;128
13.1;1 Introduction;128
13.2;2 Heuristics;130
13.3;3 Performance Analysis;131
13.4;4 A Statement Reordering Scheme;133
13.5;5 A Greedy List Scheduling Algorithm;135
13.6;6 Conclusions and Further Work;137
13.7;Acknowledgements;137
14;Flattening Basic Blocks;138
14.1;1 The Problem;138
14.2;2 Variable Identification;141
14.3;3 Removing Ambiguity by Splitting;142
14.4;4 Practical Solution;143
14.5;5 Splitting into Edge Subgraphs;146
14.6;6 Outlook;148
14.7;7 Conclusions;150
15;The Adjoint Data-Flow Analyses: Formalization, Properties, and Applications;152
15.1;1 Introduction;152
15.2;2 Adjoints by Automatic Differentiation;153
15.3;3 Classical Data-Flow Analyses;154
15.4;4 Adjoint Data-Flow Analyses;155
15.5;5 Application;160
15.6;6 Conclusion;162
16;Semiautomatic Differentiation for Efficient Gradient Computations;164
16.1;1 Introduction;164
16.2;2 Action on a Mesh;165
16.3;3 Some AD Alternatives;166
16.4;4 The RAD Package for Reverse AD;169
16.5;5 Test Results;170
16.6;6 Implications for Source Transformation;174
16.7;7 Concluding Remarks;174
16.8;Acknowledgment;175
17;Computing Adjoints with the NAGWare Fortran 95 Compiler;176
17.1;1 Aims of the CompAD Project;176
17.2;2 Compiler AD – A Motivating Example;177
17.3;3 Linearization of the Computational Graph;179
17.4;4 Putting AD into the Compiler;181
17.5;5 Case Study: Seeding in Forward and Reverse Mode;183
17.6;6 Summary, Conclusion, and Outlook;186
18;Extension of TAPENADE toward Fortran 95;188
18.1;1 Introduction;188
18.2;2 Nesting of Modules and Subprograms;189
18.3;3 Derived Types;190
18.4;4 Overloading;191
18.5;5 Array Features;193
18.6;6 Conclusion;195
19;A Macro Language for Derivative Definition in ADiMat;198
19.1;1 Introduction;198
19.2;2 MATLAB in the Context of an AD Tool;199
19.3;3 The Macro Language;200
19.4;4 Exploiting Structure of a Given Code;205
19.5;5 Conclusion and Future Work;205
20;Transforming Equation-Based Models in Process Engineering ;206
20.1;1 Introduction;206
20.2;2 Dynamic Optimization;207
20.3;3 The Intermediate Format CapeML;208
20.4;4 ADiCape: Automatic Differentiation of CapeML;210
20.5;5 The Overall Structure of the System;213
20.6;6 Concluding Remarks and Directions for Future Work;215
21;Simulation and Optimization of the Tevatron Accelerator;216
21.1;1 Introduction;216
21.2;2 The Tevatron Accelerator – Machine Description;217
21.3;3 The Model, Criteria and Parameters to Control;217
21.4;4 Map Methods;218
21.5;5 Different Optimization Schemes and Proposals;221
21.6;6 Transfer Map Comparison;226
21.7;7 Conclusions;226
22;Periodic Orbits of Hybrid Systems and Parameter Estimation via AD ;228
22.1;1 Hybrid Systems;230
22.2;2 Taylor Series Integration;231
22.3;3 Periodic Orbits;232
22.4;4 Parameter Estimation;234
22.5;5 Software;235
22.6;6 Applications;235
22.7;7 Conclusions;240
23;Implementation of Automatic Differentiation Tools for Multicriteria IMRT Optimization;242
23.1;1 Introduction;242
23.2;2 Methods and Materials;244
23.3;3 Results;248
23.4;4 Conclusions;251
24;Application of Targeted Automatic Differentiation to Large-Scale Dynamic Optimization ;252
24.1;1 Introduction;252
24.2;2 Directional Second Order Adjoint Method and AD;254
24.3;3 Dynamic Optimization and dSOA;259
24.4;4 Conclusions;263
25;Automatic Differentiation: A Tool for Variational Data Assimilation and Adjoint Sensitivity Analysis for Flood Modeling;266
25.1;1 Introduction;267
25.2;2 The Adjoint Method;267
25.3;3 River Hydraulics;268
25.4;4 Catchment Hydrology;274
25.5;5 Conclusion;279
26;Development of an Adjoint for a Complex Atmospheric Model, the ARPS, using TAF ;280
26.1;1 Introduction;280
26.2;2 The Advanced Regional Prediction System;281
26.3;3 Transformation of Algorithms in Fortran (TAF);282
26.4;4 Code Generation and Testing;282
26.5;5 Testing Results;286
26.6;6 Conclusion;289
27;Tangent Linear and Adjoint Versions of NASA/ GMAO’s Fortran 90 Global Weather Forecast Model;292
27.1;1 Introduction;292
27.2;2 Finite-volume General Circulation Model;293
27.3;3 Applying TAF to fvGCM;293
27.4;4 Parallelisation;294
27.5;5 Linearising around an External Trajectory;297
27.6;6 Performance of Generated Code;298
27.7;7 Application Example;299
27.8;8 Conclusions;301
28;Efficient Sensitivities for the Spin-Up Phase;302
28.1;1 Introduction;302
28.2;2 Spin-up Sensitivities;304
28.3;3 Implementation;305
28.4;4 Numerical Example;307
28.5;5 Performance analysis;308
28.6;6 Summary and Outlook;310
28.7;Acknowledgements;310
29;Streamlined Circuit Device Model Development with fREEDA and ADOL-C ;312
29.1;1 Introduction;312
29.2;2 Background;313
29.3;3 Transient Circuit Simulation Device Modelling;316
29.4;4 Selected Modelling Examples;321
29.5;5 Conclusion;324
30;Adjoint Differentiation of a Structural Dynamics Solver;326
30.1;1 Introduction;326
30.2;2 Optimisation of the Boom Structure;328
30.3;3 Di.erentiation of the BEAM3D Code;330
30.4;4 Performance Issues;335
30.5;5 Conclusions;336
30.6;Acknowledgements;336
31;A Bibliography of Automatic Differentiation;338
31.1;Comments;338
32;References;340
33;Index;372
Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities (p. 15)
Paul J. Werbos
National Science Foundation, Arlington, VA, USA
pwerbos@nsf.gov
Summary.
Backwards calculation of derivatives – sometimes called the reverse mode, the full adjoint method, or backpropagation – has been developed and applied in many fields. This paper reviews several strands of history, advanced capabilities and types of application – particularly those which are crucial to the development of brain-like capabilities in intelligent control and artificial intelligence.
Key words: Reverse mode, backpropagation, intelligent control, reinforcement learning, neural networks, MLP, recurrent networks, approximate dynamic programming, adjoint, implicit systems
1 Introduction and Summary
Backwards differentiation or "the reverse accumulation of derivatives" has been used in many different fields, under different names, for different purposes. This paper will review that part of the history and concepts which I experienced directly. More importantly, it will describe how reverse differentiation could have more impact across a much wider range of applications.
Backwards differentiation has been used in four main ways known to me:
1. In automatic differentiation (AD), a field well covered by the rest of this book. In AD, reverse di.erentiation is usually called the "reverse method" or the "adjoint method." However, the term "adjoint method" has actually been used to describe two different generations of methods. Only the newer generation, which Griewank has called "the true adjoint method," captures the full power of the method.
2. In neural networks, where it is normally called "backpropagation" [532, 541, 544]. Surveys have shown that backpropagation is used in a majority of the real-world applications of artificial neural networks (ANNs). This is the stream of work that I know best, and may even claim to have originated.
3. In hand-coded "adjoint" or "dual" subroutines developed for specific models and applications, e.g., [534, 535, 539, 540].
4. In circuit design. Because the calculations of the reverse method are all local, it is possible to insert circuits onto a chip which calculate derivatives backwards physically on the same chip which calculates the quantit( ies) being differentiated. Professor Robert Newcomb at the University of Maryland, College Park, is one of the people who has implemented such "adjoint circuits."
Some of us believe that local calculations of this kind must exist in the brain, because the computational capabilities of the brain require some use of derivatives and because mechanisms have been found in the brain which fit this idea.
These four strands of research could benefit greatly from greater collaboration. For example – the AD community may well have the deepest understanding of how to actually calculate derivatives and to build robust dual subroutines, but the neural network community has worked hard to find many ways of using backpropagation in a wide variety of applications.
The gap between the AD community and the neural network community reminds me of a split I once saw between some people making aircraft engines and people making aircraft bodies.




