using FORTRAN 95 homeranking.info These worksheets aim to provide an introduction to programming. The language chosen for this is FORTRAN. You must attribute the work to “Introduction to Programming using Fortran. 95/ /” (but not in any way that suggests that the author endorses you or your. Fortran 95 contains only minor changes to Fortran 90; however, a between Fortran 90/95 and Fortran will not be as large as that between FORTRAN.
|Language:||English, Spanish, Portuguese|
|Genre:||Children & Youth|
|ePub File Size:||21.53 MB|
|PDF File Size:||13.29 MB|
|Distribution:||Free* [*Regsitration Required]|
Lahey/Fujitsu Fortran 95 (LF95) is a complete implementation of the Fortran 95 standard This manual is intended as a reference to the Fortran 95 language for . University of Cambridge. Department of Physics. Computational Physics. Self- study guide 2. Programming in Fortran Dr. Rachael Padman. Michaelmas. trying to write a compiler or operating system in Fortran is probably mad or about to The UNIX f95 command compiles that file, assuming it is valid Fortran.
In the statement print " A ", "Enter the lower and upper bounds: The selector61, selector62, selector63, This section descri es some of the asic elements of 0ortran. Intel Fortran New Project dialog. VBA is an abbreviated version of the Visual Basic language used by popular data-management programs, including Excel, to write macros. Use the fact that Y is the number of Bernoulli trials required to obtain r successes, with p as the per-trial success rate.
Navigation index next Fortran90 1. Summary of the language: What are good books to learn Fortran from? What are other good online Fortran resources? How to interface Fortran with C and Python? What is the most natural way to handle initial and final points of intervals? What is the most natural starting index for numbering? Why does Fortran default array indexing start at 1? What is the motivation behind Fortran numbering convention?
Does Fortran support closures? Does Fortran support nested functions? However, others will disregard the extra digits and store it in single precision as approximately 1. Putting the value into a double-precision variable, as in double precision:: The best way to guarantee greater accuracy is to specify the literal in exponential notation using D rather than E.
That is, the constant could be written as 1. Exponential notation using an E, as in 1. If you still have any doubts about what your compiler is doing, the intrinsic function kind may be used to query the system to determine 2.
To see how your compiler handles the previous example, run this test program: For example, the generic absolute value function abs may be applied to integer or real variables. If x is real, then abs x returns a real number of the same kind; if x is an integer, then abs x returns an integer of the same kind. Other commonly used generic functions include sqrt, exp, log, log10, sign,] sin, cos, and tan.
All of these functions are elemental and can thus be applied to arrays. When using these generic functions, it is helpful to check the language reference material supplied with the compiler to see what types of arguments are expected and what type of value will be returned; not doing so could produce unexpected results.
For example, the generic square root function sqrt expects to operate on real numbers of various kinds but not on integers. Depending on the options of your compiler, sqrt may accept an integer argument and perform a type conversion to real or it may not. Introduction to Modern Fortran That is, the expression sqrt 4 might possibly evaluate to 2. To help ensure consistency of results across compilers, it would be wise to explicitly convert the argument with real in this case.
Fortran allows you to write your own generic procedures. This will be discussed in the next chapter, when we take up the subject of modules. This statement, along with its diabolical cousin pause which was declared obsolescent in Fortran 90 and deleted from Fortran 95 , appear in old-fashioned Fortran programs to halt execution in the event of an error. Streamline the procedure by making use of the fact that X T WX is symmetric. Make the argument for w optional, so that if no weights are provided, the procedure computes X TX by default.
Explain why the expression 2. D0 may evaluate to. D0 may not. If you are not sure, write a test program to print out the value and kind of each subexpression. The X matrix is overwritten with Q. Write a procedure that accepts an assumed-shape two-way real array of nonnegative observed counts and computes the two-way array of estimated expected counts, X 2 , G2 , and the degrees of freedom.
When calculating G2 , handle observed counts of zero by treating 0 log 0 as zero. A hallmark of object-oriented programming is encapsulation: Fortran 95 is not a true object-oriented language.
The full object-oriented paradigm has been promised for Fortran Nevertheless, recent work by Akin and others has demonstrated that it is indeed possible to write structured Fortran 95 code that mimics many of the key qualities of object-orientedness.
We then describe three new features of Fortran—modules, derived types, and pointers—and discuss their role in the development of self-contained software components. The chapter concludes with a detailed example of an object class, a generic error handler, which will be used to store and retrieve error and warning messages in all of our subsequent program examples. A Pseudo Object-Oriented Style 3.
More precisely, the archetype or design for the package is called a class, and an object is a particular instance or realization of the class created by a program.
For example, consider the following variable declaration statement: Many objects of interest to a statistical programmer will be considerably more complicated than a single integer. An object designed to hold the input data for this analysis may contain two rank-one real arrays—one for holding the response variable and one for holding the weights—and a rank-two array for the covariates.
In Fortran, even the simplest objects have a surprising number of properties. For example, in well-written code, any expression involving an element x i will never be evaluated unless x has been allocated and i lies between lbound x and ubound x. Recall, for example, the rank-one real array x used in the uniform generator program of Section 2.
Consider the size of a rank-one real allocatable array x. We get the size by evaluating size x. Then, for each property, we must decide whether it will be a read-write property, whose value can be put or gotten by an external entity; a write-only property, whose value can be put but will never have to be gotten; or a read-only property, whose value can be gotten but not put.
For example, putting one read-write property could automatically change the values of many read-only properties. Restricting certain properties to be read-only is an excellent strategy for preventing the occurrence of unwanted events.
Put is one special kind of method that sets a property, and get is another special kind of method that retrieves a property. Other methods may be more complicated, performing nontrivial computations and altering the values 58 3.
A Pseudo Object-Oriented Style of one or more properties simultaneously. Consider again a rank-one real array x designed to hold random variates. The data values stored within x could then be made read-only, so that a user could never set them directly but only retrieve them after the random generation method has been called.
The constructor determines the initial or default state of the object. Taken together, these methods are called the interface. By nature, statisticians who write programs or procedures may have a tendency to focus primarily on computational details and numerical methods, paying relatively little attention to the interface. In many cases, however, the quality of the interface is ultimately the most crucial factor in determining how useful the procedure will be.
That is, given a table of nonnegative observed 3. For example: A primitive kind of polymorphism has been implemented in Fortran. In Section 2. This feature, called overloading, can be implemented within a Fortran module using an interface block; this will be explained in the next section. Inheritance is an asymmetric relationship between classes of objects in which one class shares all the properties and behavior of another class.
Languages that support inheritance are convenient because they allow a programmer to create multiple classes of similar objects with minimal duplication of code. Fortran 95 was never designed to support inheritance. In this book, we will not attempt to do either. We will, however, show the statistical programmer how to use modern Fortran to develop object classes with their own methods and interfaces, which may then become the building blocks of high-quality, reliable applications.
Investigate the history of object-oriented programming. Which major programming languages implement all aspects of the object-oriented paradigm? Which languages are only partially object-oriented? What is a method? Should every method alter one or more properties of an object?
What are the practical reasons for making a property of an object read-only? Would it ever be sensible to have a property that is writeonly? Choose a simple but nontrivial statistical procedure that is familiar to you.
For example, if you have some background in survival analysis, the procedure could be computing the Kaplan-Meier estimate of a survivor function from a sample of failure times, some of which are right-censored. Conceptualize an object class for this procedure. Describe what actions will be taken by the constructor whenever an object of this class is created. List all the relevant properties of the object class, and decide for each property whether it will be readwrite or read-only.
A typical Fortran 77 application contained a main program that called various subroutines and functions. With the advent of Fortran 90, a new organizational unit was introduced, called the module. The essential syntax of a module is shown below. A mod- 62 3. A Pseudo Object-Oriented Style ule whose contents are all private can be of no use to other program units, so at least some features must be made public. On the other hand, automatically declaring all contents public can be undesirable because it makes the module less self-contained and increases the possibility that the module will interact with other program units in undesirable ways e.
To use the public aspects of a module in another program unit, you need to include a use statement near the beginning of that unit. If a module is used by another program unit, that module must be compiled before compiling the other unit. Consider the following modules designed to hold arrays used in weighted least-squares regression. For example, consider this subroutine for calculating the symmetric matrix X T W X: The source code looks clean and compact because no arguments are being passed.
However, this strategy violates a number of principles of good programming practice, and we do not recommend it. One reason is that, when data exist as global variables within a module, it becomes tedious to create or use multiple instances of them. If we decided to make changes to the workspaces e. In our experience, we have found it undesirable for Fortran modules to contain any variables at all, whether public or private.
Module parameters i. For these reasons, we will refrain from using variables in any of our modules from this point onward, and we urge our readers to do the same. Derived types—the subject of the next section— are custom-designed data structures of arbitrary complexity that can be instantiated within programs and passed as arguments to functions and subroutines. Functions and subroutines that operate on the derived type become the gets, puts, and other methods that create an interface between the object and the outside world.
Judicious use of public and private enables the objects to be created and manipulated by external programs while keeping the inner workings of the module hidden from them.
Gaining familiarity with object-oriented programming 3. Our approach of using Fortran modules to create object classes will be illustrated profusely throughout the rest of this book. A generic procedure is actually a group of functions or subroutines that can be invoked by a common name. This feature, which is also called overloading, is illustrated by the following example.
Then we write a simple interface block above the contains statement, like this: By writing additional subroutines and listing them in the interface block, we can easily expand the capabilities of the generic procedure to handle double-precision reals, logical values, and so on.
An interface block can group either a set of subroutines or a set of functions, but we cannot have both subroutines and functions in the same block. A subroutine or function is considered to be external if its source code is located outside of the main program and is not a part of any other module or procedure.
An external procedure does not have an explicit interface, so the compiler may not require the actual and dummy arguments to agree in type, kind, or rank. Show that it is possible but not recommended to write an external function whose dummy argument is a rank-one array but whose actual argument has rank two. Why do you think that we avoid the use of external procedures, placing them instead within modules?
Give as many reasons as you can.
Why not? Think about the build process. Create a generic module procedure for computing variances. Use double-precision arithmetic throughout. Arrays, although powerful, are limited in important respects. It is not possible to create a mixed-type matrix with one column containing reals and another containing integers. Nor is it possible to add properties to an array beyond the basic ones already provided by Fortran kind, rank, shape, etc. For example, there is no way to attach row or column names to a matrix.
The concept is well-illustrated by a simple example. Suppose that you want to construct a database for holding vital information for people whom you know—names, addresses, telephone numbers, birthdates, and so on.
The components of a derived type may be scalars, arrays, pointers, and even derived types. For example, we may create a derived type to hold a birthdate, type:: One may create arrays, even allocatable arrays, whose elements are instances of a derived type: A function with a derived-type argument is shown below. Therefore, as discussed in Section 2. In fact, it is the key idea of pseudo object-oriented programming in Fortran. We will use this technique extensively to develop self-contained code modules that can be shared among many applications yet are easy to maintain and extend without breaking the applications that use them.
At this point, however, we have not yet discussed how to create a derived type whose components may vary in size. This feature has been promised for Fortran , but it is not part of the 95 standard.
Every derived type created by Fortran 95 must have a physical size that can be discerned at the time of compilation, yet we clearly need components whose sizes may vary at run time. The solution to this problem is found in pointers; this will be discussed in Section 3. The problem is that we have not written a constructor.
A constructor, as described in Section 3. For example, here is a procedure that initializes an instance to a unit circle centered at the origin: Unlike the initialization of local variables Section 2. If the dummy argument is intent out , the components will be given the initial values each time the procedure is called. The procedure might begin like this. Style tip When creating a derived type, apply default initialization to each of the components. The location of any point in two-dimensional space R2 can be represented by a pair of real numbers x, y.
Write a procedure that rotates a point for any given angle and pivot. Overload the procedure to rotate circles and triangles as well. Euclid c. Choose any two sides of the triangle and construct their perpendicular bisectors. The point at which these bisectors intersect is the center of the circle. The bisector of the third side passes through this point as well.
The radius is simply the distance from this center to any of the three vertices.
This is unfortunate. In Fortran, however, use of pointers is much more limited. The architects of Fortran 90 and Fortran 95 enforced restrictions on pointers to enhance the ability of compilers to optimize code for faster execution. These restrictions also bring a degree of safety to the programmer because many pointer-related mistakes are detected during compilation.
In Fortran, however, a pointer is best thought of as an alias, a portable name, that can be attached to any object of a given type. Consider the following snippet of code: Rather, c is a name that can be assigned to an already existing rank-two integer array.
If we had declared integer, allocatable:: As a pointer, however, statements involving c access the same memory locations as statements involving x; c and x refer to a single array. A pointer may point to any variable with the target attribute, provided that the target is of the same type, kind, and rank. However, the name b may be assigned to a section of x, provided that the section has rank one: A pointer can be made to point to nothing, in which case it is said to be disassociated or null; this is accomplished by the nullify command: To prevent such mistakes, we may test the status of a pointer using the intrinsic inquiry function associated.
The expression associated a evaluates to. A pointer declared at the beginning of a program, as in program test real:: For this reason, programmers are strongly encouraged to nullify all pointers at the beginning of a program. In Fortran 90, this had to be done through nullify statements.
Repeated use of nullify at the beginning of a program or procedure is simple enough, but it can also be somewhat tedious. To simplify matters, a feature was added in Fortran 95 to declare and nullify new pointer variables at the same time. The new feature is the null constructor, which can be used like this: Suppose we declare a pointer to a real value, real, pointer:: First, if x happens to be associated with a target, it is disassociated from that target.
Second, the processor sets aside a section of the heap currently unused memory large enough to hold a single real value. Third, x is pointed to that new section of memory, so that x can now be used to store a datum.
In a similar fashion, we can also allocate array pointers.
If x is declared as integer, pointer:: Suppose that we apply allocate to a pointer that is presently assigned to an allocatable array: This is a rather bad practice and should be discouraged for reasons that we now describe. When deallocating pointers, we need to be careful because deallocate should only be applied to a pointer 78 3.
Nor should we attempt to deallocate a pointer that is currently pointing to an allocatable array or an array section. But the second kind of violation— trying to deallocate a pointer whose target goes by another name—is more insidious. To prevent those mistakes, it is an excellent practice to draw a sharp distinction in any program between two kinds of pointers: Consider this: Data can no longer be stored in or retrieved from them, nor can they be deallocated and returned to the heap, because they no longer have names.
If pointers are repeatedly allocated within an iterative procedure, the program may eventually run out of memory and crash. Leaks can be stopped by judiciously checking the status of pointers, putting a statement such as if associated x deallocate x before each allocation. Explain what action is taken by each line of code below, and identify the state of x, y, p, and q at each step.
Suppose that a pointer appears as a local variable within a procedure. Does this depend on whether the null constructor was used? If you are not sure, write a test program. Fortran allows you to write a procedure in which a dummy argument is not a pointer but the corresponding actual argument is. To see how this can be useful, apply a simple matrix operation e. Fortran does not allow you to create an array whose elements are pointers.
Demonstrate with an example. However, we have not yet presented any compelling reasons why pointers are necessary or even helpful. Not at all. One very important reason for using pointers is that they allow us to create derived types whose components vary in size.
In Section 3. A Pseudo Object-Oriented Style hold the data needed for weighted least-squares regression. The only way to do this in Fortran 95 is to use pointers within a derived type: Once we have determined the number of cases n and the number of covariates p for the current dataset, we may allocate the components of data, like this: Rather, the user should be allowed only to retrieve values of these components after the regression procedure 3. To make them read-only, two things must be done.
For getting the estimated coefficients from WLS. Returns 0 if successful, 1 if error. Later we ought to add something here to report a! A Pseudo Object-Oriented Style In this example, it seems logical to give the dummy argument beta the attribute intent out. However, Fortran does not allow us to declare intent for pointer dummy arguments; doing so would create confusion because it would be unclear whether this attribute refers to the pointer or to its target.
Recursive data structures tend to be an unpopular topic among students in introductory programming courses. Nevertheless, these structures are quite useful.
To see why this can be useful, suppose that we want to create a data structure to store text message lines that can grow during program execution to hold as many additional strings of text as necessary.
A much better strategy is to implement the message handler as a linked list. This type has two components: To build a longer linked 3. Two linked instances of a recursive derived type. An object of this new type can then be passed as an argument to procedures that will insert additional nodes, print the contents of the list, and so on.
Here is a procedure for adding a new message line to the list: A linked list for storing text message lines. The next procedure prints all of the message lines to the screen in the order in which they were stored. Merely nullifying the head and tail pointers is unacceptable because that will cause a memory leak; we need to explicitly deallocate each node.
Notice, however, that once these procedures are written, they are extremely easy to use: This illustrates perhaps the greatest advantage of the objectoriented approach to programming.
Details about the inner workings of the objects are kept hidden from the outside world and may be enhanced or changed in the future without requiring changes to programs and procedures that use them. Can you do this without pointers?
Using pointers, modify this procedure so that it optionally returns the number of iterations performed and the estimated value of y at each iteration. We can now begin to implement this object class in Fortran using modules, derived types, and pointers.
Provide default initialization for all components. Place the derived type within a module. Declare the derived type to be public but keep all of its components private. Write public procedures for getting the values of the read-only properties expected, df, Pearson, and deviance. Write a simple test program that calls each of these module procedures.
Recursive data structures naturally lend themselves to recursive procedures. Create a derived type that can serve as a node in a linked list. Write recursive subroutines for adding another node to the end of the list and for deleting the entire list. Implement a personal telephone directory as a linked list. Write procedures for adding a new node, for printing the entire list, and for deleting the list.
Binary tree containing an alphabetically ordered list of names. Modify the linked list from the previous exercise so that entries are stored in alphabetical order with respect to name. That is, instead of inserting new nodes at the end of the list, insert them at an appropriate place to maintain the ordering.
For simplicity, determine alphabetical order by comparing character strings with the intrinsic functions llt, lle, lgt, or lge. A binary tree is a recursive data structure in which each node contains pointers to two additional nodes. The left-hand pointer is directed toward a node that is in some sense less than the current node, and the right-hand pointer is directed at a node that is greater.
A simple binary tree containing an alphabetically ordered list of names is shown in Figure 3. Write a recursive subroutine that adds a new name to a tree by allocating a node at the appropriate place.
This subroutine should have two dummy arguments: A Pseudo Object-Oriented Style d. Write a recursive procedure for destroying i. Write a test program that adds the names shown in Figure 3. Using a binary tree, write a procedure that tabulates numeric data. That is, given a rank-one array of real numbers x1 , x2 ,. A Generic Error Handler 3. Just a few reasons are: Sometimes the problem arises because someone attempts to use the program incorrectly or in a manner that the author does not anticipate.
Recognizing that run-time errors are inevitable, one should develop a coherent strategy for handling them and apply it consistently from the start. The subject of anticipating exceptions is discussed in the next chapter. See section 4. We also want to report meaningful messages to users at run time so that they may understand what happened and take 3. A Generic Error Handler 89 corrective measures.
Finally, we want the error-handling facility to be simple to use, to avoid tedious repetition, and to keep the code clean and compact. As mentioned in Section 2. Consider a procedure for matrix inversion.
If the procedure encounters a singular matrix, halting the program may be reasonable in some situations, but in others we may want to continue execution, switching to a method capable of computing a generalized e. In general, when an error occurs within a procedure, the ultimate decision about what action to take should not be made within the procedure itself but should be deferred to the program unit that calls it. Our strategy has two crucial features.
First, whenever an error occurs within a procedure, control is returned to the program unit that calls it. Second, we create a class of objects for storing a sequence of text error messages.
Whenever an error occurs, we store an informative message that includes a precise description of the error to help the program user to understand what happened. In addition, we also store the name of the program unit in which the error occurred, along with the names of any higher-level units that called it, so that a traceback of the calling sequence is available to the programmer.
As in the previous section, we will implement the messaging system as a linked list. The type itself is public so that it can be used by the outside world, but its contents are kept private.
The module without its procedures is shown below. A Pseudo Object-Oriented Style error handler. Generic error message handler for both console and non-console!
The routines in this module do not halt program! Written by J. Parameters private to this module integer, parameter:: Public type for holding a linked list of messages sequence private! Using sequence causes the data within a type to be stored in memory in the same sequence as the components are declared. Using sequence will be necessary when creating COM servers as described in the later chapters of this book. These public procedures, and any private procedures used by them, appear in the module below the contains statement.
The next function returns. Next, we have a routine for appending a single line to the list. This subroutine has two required arguments: It also has a number of optional arguments through 3. One may also supply the name of the subroutine or function where the error occurred. If we adhere to the practice of always returning control to the calling procedure if an error occurs and always reporting the name of the program unit to the error handler, then a complete traceback of the calling sequence will be stored for debugging purposes.
This subroutine, whose major parts are shown below, also provides a good example of the case construct described in Section 2.
Stores a message in the error handler! In the string, message lines are separated by characters that signal the on-screen printing mechanism to advance to 3. A Generic Error Handler 95 new lines. These nonprinting characters can be inserted through the Fortran intrinsic function achar. Retrieves all stored messages as a single character! ASCII carriage control characters.
As discussed in Section 3. The instance can then be passed by argument to any procedure that may generate an error. Below is a module containing a procedure that overwrites the lower triangle of a matrix with its Cholesky factor based on an algorithm given by Thisted The dummy argument for the input matrix has assumed shape.
The procedure will fail if the actual argument is not square or not positive. Overwrites lower triangle of a symmetric, pos. The upper triangle is untouched. Returns 0 if successful, 1 if failed. Style tip Character-string parameters containing the module and procedure names provide a convenient way to pass these names to the error handler.
D0 98 3. A Pseudo Object-Oriented Style integer:: The screen output from this test program looks like this: A module should contain no variables, whether public or private. Local variables within module procedures are acceptable, but there should be no variables declared in a module before the contains statement.
Module parameters are acceptable, both public and private. If a module contains a public derived type whose contents are private, then communication between an instance of the type and the outside should be accomplished by public functions or subroutines placed in the same module. When developing a module, do not presume that your public functions or subroutines will be used intelligently or correctly by other program units—even if the developer of the other program units is you. Rather, assume that the procedures will be misused.
Extensively check the input arguments for correctness and consistency, and provide some means for reporting inconsistencies and error messages to the outside world e.
Make your source code self-documenting by including extensive comments, especially in public functions and subroutines. Therefore, strive to make 3. Implement this method in a function or subroutine that uses the error handler module. Our error handler module can be enhanced in many ways. Many datasets can be represented in a rectangular form, with rows corresponding to observational units and columns corresponding to variables.
If all of the variables are numeric, then the dataset can be stored very simply as a rank-two real array. If both numeric and nonnumeric e. Devise a strategy for storing variables in a linked list, with one node per variable, and implement it in a Fortran module. Create a derived type for storing an empirical distribution function, with procedures for computing it from a sample and for drawing random samples from it.
We now turn our attention to the heart of a statistical application, the computational routines. Volumes of procedures written since the late s, many of them in Fortran 77, are still widely used. Fortunately, we have learned much from past mistakes. Golub and van Loan provide excellent coverage of general matrix computations, 4.
A respectable treatment of numerical accuracy and error analysis is beyond the scope of this book. We would feel irresponsible, however, if we did not mention some key ideas and provide a few simple guidelines dictated by common sense. Today, unless we are processing large volumes of data, it often makes sense simply to declare all real variables to have 64 bits. The declarations integer, parameter:: A similar intrinsic function is available for integers. The declaration integer, parameter:: The functions range and huge can also be applied to integer variables.
Alternatively, you can use the one-pass updating method described in Exercise 10 of Section 2. Another famous example of where catastrophic cancellation error may occur is in the computation of ex for x j. Or we can compute the upper triangle of the matrix and copy its contents into the lower triangle, eliminating calculations that are redundant.
These extended-range representations may be helpful in certain circumstances but should not be used as a matter of routine, as they may cause a program to be slow and nonportable.
In other cases, the result may be represented by a special code such as NaN or Inf, and the program may continue to run. This is simply not true. A simple remedy in this case is to work with the log of the gamma function. The main culprit is exponentiation. In most cases, this is acceptable. We should, however, make sure that a zero value does not cause any problems later e.
As a matter of routine, one should incorporate checks to prevent division by zero, logarithms of nonpositive numbers, square roots of negative numbers, and so on. Many novice programmers fail to put enough safeguards into their computational routines, and the resulting programs crash too often.
Carried to an extreme, however, too much vigilance may cause the programmer and the program itself to get bogged down. With enough safeguards, it is theoretically possible to thwart every conceivable arithmetic exception that might ever occur, but the code would be too cumbersome to write and would run too slowly.
Expert programmers who produce very-highperformance numerical procedures, such as those in LAPACK, strike a careful balance between preventing exceptions through coded safeguards and allowing exceptions to occur, with the subsequent actions determined by special exception-handling procedures. A useful discussion on this balance is given by Hauser For most of us, the commonsense strategy is to add enough safeguards to prevent crashes during routine use and in exceptional circumstances, yet allow them to be theoretically possible under rare and highly extreme conditions.
Implementing Computational Routines 4. Some of these tend to be numerically stable and accurate, whereas others are inherently ill-conditioned and susceptible to major error. If the source code is provided, the source can simply be copied and placed in your own program.
If the source is not available, you will have to link to the procedures either during the build process or at run time using dynamic link libraries DLLs. If you choose to use procedures from any of these libraries, make sure that you understand the implications of any copyright restrictions or licensing requirements before distributing your program to others.
On the other hand, if you choose to code a procedure yourself, take some time to learn about the properties of the algorithm that you intend to use. The well-known Numerical Recipes textbooks by Press et al. Most of these procedures were written in Fortran 77, and some of them will need revision to conform to practices suggested in this book—e. Adaptations of some Applied Statistics algorithms to modern Fortran are publicly available from the Statlib archives at Carnegie Mellon University and other Web sites.
Write a procedure for computing a sample variance with single precision using the right-hand side of the identity 4. Implement this procedure in a Fortran procedure using single-precision arithmetic. Fitting a Simple Finite Mixture 4. Suppose we have a sample y1 ,.
Maximum-likelihood ML estimates can be easily calculated by the following EM algorithm. EM increases the loglikelihood at each step and will eventually converge to something, but the solution may not be well-behaved. Sometimes EM will converge to an estimate on the boundary of the parameter space.
The parameter space also has continuous regions of indeterminacy. For these reasons, it is a good idea to examine the score functions at the EM solution to see whether they are zero and to check the Hessian to see if the loglikelihood is concave. A bare-bones module of constants is shown below. Fitting a Simple Finite Mixture constants. Define compiler-specific KIND numbers for integers,! Common integer values returned by all functions to indicate! Style tip Place important parameters in a module so that, if it becomes necessary to change them, the changes can be made quickly.
First, the procedure will automatically be given an explicit interface. As a result, during the build process, the linker will check to make sure that when a program calls the procedure, the actual and dummy arguments all agree in type and kind.
With a module procedure, the argument list may use enhanced features such as assumed-shape arrays and optional arguments to make the calling process more convenient and less prone to error. Most importantly, if the procedure is placed within a module, all of its inner workings—the parameters, data types, and procedures used by it—can be kept private, so that future changes and enhancements are less likely to propagate errors.
Here is the skeleton of the module that will hold the computational engine. This module has one public procedure; everything else is kept private. Implementing Computational Routines implicit none private! Because it is public, we should presume that it will be misused and put in safeguards to prevent common errors and provide feedback through informative messages.
EM algorithm for computing ML estimates for the mixture of! Input data containing the sample: Starting values for parameters; these will also return! Number of EM iterations performed: T if EM converged, F otherwise: Loglikelihood function and its first two derivatives at!
Error messages: Implementing Computational Routines Notice how easy it is to add safeguards and error messages using the system that we developed in Section 3. If necessary, more can be added in the future. For example, we should probably check to make sure that all the data values in y are positive. But life is hectic, and despite our good intentions we may never actually do it later.
As a rule, force yourself to add comments generously to all public procedures so that they become self-documenting. Style tip If an iterative procedure must be aborted, report to the error handler the iteration at which the problem occurred. This procedure as shown is still not entirely safe.
We may approach this task in at least three ways. First, we can write a console program that reads in a dataset, calls the EM procedure, and reports the results.
A second way is to encapsulate the procedure in a dynamic link library DLL and call it from another application e. A third way is to package it as a COM server and call it from another application. Each method has its own advantages and drawbacks. A stand-alone console program is highly portable. Writing a program for your own personal use that works on a single dataset is easy. Moreover, console applications tend to be poor environments for testing. Either way, this requires sending data back and forth between our console program and a computational package, which can be tedious.
If we want to test the procedure as quickly as possible and then move on to other tasks, a better method is to produce a DLL. Most Fortran compilers can create DLLs quite easily. However, the methods by which applications communicate with DLLs may be crude or poorly documented.
It is easy to make mistakes by passing data incorrectly, and if there is a problem, the system may not tell us where it occurred. The process of creating and using DLLs will be described in Chapter 6. The third option, creating a COM server, opens up elegant and robust channels of communication between your engine and the outside world. Ultimately, this may be the best way to package your software to make it versatile and easy for others to use.
It does, however, require a fair amount of additional coding to create and document all the properties and methods. The subject of COM servers will be taken up in detail beginning with Chapter 7.
Implementing Computational Routines This is what we see when we run the program from a command line. How can we be sure that this answer is correct? This would help us to verify not only that the program is running as intended but that our formulas for the derivatives are correct.
Most Fortran projects will be substantially larger than this one. In practice, it is unwise to write much more than this without stopping to test what we have written and verify that the results are correct. Novice programmers often make the mistake of writing long programs without stopping to check them along the way. Any program, aside from a very short one, should be written incrementally and tested repeatedly throughout the coding process. Fitting a Simple Finite Mixture need a method for calling it—usually through a crude console program or a DLL—to systematically test each part before going on to the next step.
If you are able to compute the log-gamma function see Exercise 4, Section 4. The ML estimation procedure of the previous section is a fairly complete analysis that would be applied only once to a given dataset. Under ordinary circumstances, however, a typical client application is likely to call it just once.
Routines of this type are called higher-level procedures. Lowerlevel procedures, on the other hand, are those that are called from a typical application hundreds, thousands, or millions of times. In describing matrix computations, numerical analysts use more precise notions of level. The distinction between these classes is not always clear because a higher-level procedure in one application may serve as a lower-level procedure in another.
Lower-level procedures are the building blocks of a statistical program. A procedure written for one application could easily be used in another. As your programming experience grows, you will want to collect these procedures and arrange them into modules that can be shared among applications. The origin of this saying is not clear, although a possible early source is Knuth Either way, the important message is that in computationally intensive programming, the crucial lines of source code are often found at the lower levels.
Important strategies for lower-level procedures include reducing the overhead cost associated with each procedure call, taking advantage of special structure, loop reordering, and optimization. One task that should be avoided in lower-level routines is dynamic allocation of temporary workspaces.
If the procedure is called over and over, the repeated allocation and deallocation of memory may seriously degrade performance. Lower-level procedures do sometimes require workspace arrays whose size is not known in advance. The accuracy of the approximation may be improved by increasing the number of quadrature points n.
An implementation for an arbitrary f might begin like this: If this procedure were called repeatedly with the same value for n, the reallocation and recomputation of x and w would be redundant.
A better strategy would be to write a separate routine for computing x and w so that these arrays could be created in advance. Implementing Computational Routines Be aware that memory may be allocated within a procedure even if there are no allocate statements. In that case, Fortran would have automatically allocated them at the start of the procedure and deallocated them at the end Section 2. Arrays of this type, called automatic arrays, should be avoided in lower-level procedures.
There is another situation where an array may be allocated without an explicit request from the programmer: This will be explained shortly, after we discuss issues of array storage and stride. In general, we want to use methods that are tailored to take advantage of any special characteristics of A. Whether you are calling professionally written routines from a package such as LAPACK or writing the routines yourself, be sure to apply methods that take advantage of any special structure that may exist.
A better method puts j in the outer loop, k in the middle, and i in the innermost loop: The reasons have to do with the way data are stored and accessed. For example, the elements of A are stored in a columnwise fashion, with a 1,1 , a 2,1 ,. Implementing Computational Routines must be moved in one section at a time. This copying of data from one area of memory to another can be more time-consuming than the actual mathematical computations performed by the processor.
Stride refers to the increment or spacing of memory locations that are being operated upon in an iterative procedure. Many of the matrix procedures commonly used in statistics, including Gaussian elimination and Cholesky factorization, can be arranged to make them rich in saxpys and other unit-stride operations.
A guide to matrix computations that exploit the use of saxpys is given by Golub and van Loan Optimization refers to a variety of measures automatically taken to improve overall performance. Another commonly used trick is loop unrolling, in which iteration cycles of an inner loop are replaced by multiple copies of the code within the body of the loop. An old-fashioned processor would wait for an instruction to be fully loaded, execute the instruction, wait for the next one, and so on.
Newer, more sophisticated processors are designed for vector computations and pipelining. Pipelining is a technique that increases the overall volume of work performed by reducing the amount of processor idle time, much like an assembly line in a factory. In a pipeline, future instructions are loaded while current instructions are being carried out.
Pipelining tends to work best for highly repetitive tasks with unit stride, such as saxpys on long vectors. For shorter vectors, the time required to set up the pipeline may outweigh any resulting savings.
Recent years have also seen developments in parallel computing, in which computational tasks are simultaneously scheduled across multiple processors. The best methods for exploiting parallel architecture typically depend 4. Implementing Computational Routines on the details of the system, including the number of processors, the manner in which they communicate with each other and the degree to which they share memory. An extended discussion of how to write Fortran programs to exploit parallel architecture is given by Press et al.
Most readers will be more interested in writing code that performs well across a variety of systems than optimizing a program for any particular system. Therefore, we suggest that you follow a few simple principles. Be aware of operations on array elements that can be carried out in any order, and implement them with expressions involving whole arrays, array sections, and where and forall statements. Finally, take advantage of the optimization features of your compiler when your program is ready for distribution and use.
The programs were run on a 1. When the 4. In this procedure, subroutine schmubroutine a, n, b integer:: Similarly, the contents of an allocatable array are stored in contiguous memory. The target of an array pointer, however, is not necessarily contiguous, as you can see from these examples. Many Fortran compilers will set up these temporary arrays to maintain contiguity even if they are not really needed i. There are two ways to resolve this problem. Moreover, the procedure then becomes less useful to other program units because we can no longer pass non-pointer arrays to it; if a dummy argument is a pointer, the corresponding actual argument must also be a pointer.
A better way to resolve this problem is to restructure the calling program so that the actual argument is no longer a pointer but an adjustable or allocatable array. One of the most popular techniques for generating normal random variatesnormal random variates is the polar method of Box and Muller Implement the polar method in a Fortran procedure that returns a single random variate with each call. Section 4.
In the procedure for in-place Cholesky factorization presented in Section 3. Repeat the experiment, passing the same matrix as an array pointer. Then modify the procedure by making the dummy argument into an adjustable array and repeat the two calls. These procedures are not of professional quality; when very 4. An alternative algorithm described by Golub and van Loan , Algorithm 4. An implementation of that algorithm is shown below. If the procedure will be public, add safeguards to ensure that the actual arguments have the correct dimensions.
The standard way to invert SPD matrices is by the Cholesky factorization: The inverse of a lower-triangular matrix, which is also lower triangular, may be computed by a variant of forward substitution. A procedure for overwriting a lower triangle with its inverse is shown below. In the innermost loop, this procedure accesses the elements of the matrix by rows and by columns.
Without overwriting, a version of forward substitution is available based entirely on columns Golub and van Loan, Overwrites a lower-triangular matrix a with its inverse! Style tip If you do not want to overwrite the matrix supplied, do not use this function. Rather, overload it with a companion function that accepts two array arguments, one that is intent in and another that is intent out.
Having both procedures available helps to avoid unnecessary copying of data. Here is a multiplication procedure that avoids unnecessary computation by skipping elements known to be zero and by capitalizing on the symmetry of the result. The input array is accessed entirely by column. Premultiplies a lower-triangular matrix a by its upper! Implementing Computational Routines! When weights w1 ,.