.po 1.05i .ll 6.55i 7 October 1992 .pp Here is a current version of the maximum likelihood program for estimating quantitative genetic parameters for two populations and obtaining tests of the null hypothesis that their G matrices are the same. This version provides the capability of estimating additive, dominance, and environmental components of variance and covariance from, for example, a nested sib design. However, it can also be used to analyse data in simpler structures (e.g. full-sib or clonal design). I've tried to make the program understandable by augmenting the comments (within braces{}) throughout, but I can't claim that it is particularly "user-friendly". I hope you at least find it usable. .pp The source code for the program, written in standard Pascal, is on the diskette as pcr2.p. The program estimates genetic and environmental variances and covariances. It also estimates the grand means for the traits in the two populations, as well as additional fixed effects (e.g., block effects in the example). It should compile without trouble under Turbo, Berkeley Unix, or VMS. As it stands, the programs should compile under Unix or VMS. To compile under Turbo, remove brackets around the two "assign" statements at the beginning of the main program and the "close" statement toward the end of the main program (3rd line from the end). .pp I haven't sent compiled versions, because you will undoubtedly find it necessary to modify the constants at the beginning of the programs, in order to accommodate your data. Complaints like "array index out of bounds" indicate that the dimensions are too small. The constants are defined as follows: .np r = the number of traits to be analysed. This may be fewer than the number of traits in the dataset (see below). .np maxnparm = 6 * nvarcov. This is the maximum number of variance and covariance components to be estimated. .np f = the number of "extended families". This is the number of distinct groups of interrelated individuals. .np sibs = the maximum number of individuals in an extended family, included unmeasured individuals (i.e., often parents). .np hfsibs = the maximum number of individuals related through at least one parent; .np fsibs = the maximum number of individuals related as full sibs; (hsibs and fsibs will not differ in the case of full-sib or clonal data). .np nvc = r (r + 1)/2. .np endval integer value used to indicate the last line of data. Must appear at the beginning of that line. .np missing is used to test for missing data. The values specified here must be larger than the missing value used in the data, but less than any actual data values. Thus, in the code and example file you receive, -98 is specified in the program, and -99 is used as the missing value in the data. .np pos = r(sibs) .np nmax = the maximum total number of individuals in the dataset. .np null = the missing value for parental ids. (i.e. the parents of the founders. This must be left set to zero). .np delta = the stopping criterion for the maximization. When the difference of successive likelihoods < delta, then the program stops and prints results. .np maxround = the maximum number of iterations permitted. If maxround is reached before the difference between successive likelihoods < delta, then the program prints a warning, current estimates and stops. .np maxfix=the maximum number of fixed factors (including population). .np p=the maximum number of fixed effects to be estimated: i.e. the sum of number of levels for each fixed factor, multiplied by the number of traits. .pp The data should be in a file called 'pcdata'. The input file should contain the following information in order: .pp Line 1: a) A designation of the type of analysis you want, '1' for maximum likelihood or '2' for restricted maximum likelihood. (The former gives biased estimates with smaller variances, the latter the converse. The latter gives estimates identical to ANOVA in the balanced case.) b) An integer flag to indicate that the 'sibs' within a family are clonally related: 1, if this is the case; 0 otherwise. c) The number of characters to be analysed (may be less than the number in the dataset. d) the number of categorical fixed factors in the data (apart from the grand mean). If the only fixed factor is the population effect, the number here is 1. e) A list of integers designating which of the traits are to be entered in the analysis, referring to the traits in the order in which they appear in the data. .pp Line 2: Supply hypothetical values for all the variance and covariance components to be estimated. They should appear in this order: V\dA\u of trait 1, Cov\dA\u of traits 1 and 2, Cov\dA\u of traits 1 and 3, etc. V\dA\u of trait 2, Cov\dA\u of traits 2 and 3, and so on through the additive variances and covariances for population 1, Followed by all these components for population 2. After these, come the dominance components in the same order, and these are followed by environmental components in the same order. If there is one trait, then the program expects six values on this line. In general, there are p=3r(r+1) parameters, where r is the number of characters. The values given must be such that phenotypic variances (V\dA\u + V\dD\u + V\dE\u) are > or = 0 for each population, and phenotypic covariances must be less than or equal to the absolute value of the square root of the product of the two phenotypic variances. The values you choose serve as starting values for the iterative estimation. N.B. These values need not all appear on the same line. In the example dataset, each value occupies a different line. For the numbering of lines that follows, assume a single line is used for these values. .pp Line 3: On the third line should appear the number of components you wish to eliminate from the full model, followed by their indices, according to the ordering given for values in line 2 (e.g, in a one character case, if you want to constrain V\dD\u for population 1 to 0, specify 1 3). This permits you to explicitly constrain to zero any components of variance (and their associated covariance components) known (from previous analyses, for example) to take on estimates < 0. This program does not automatically constrain the estimates to lie within the parameter space, but such a version is in the works. .pp Line 4: On the fourth line should appear the number of additive components you wish to constrain to be equal in the two populations, followed by their indices, within the range 1 to r*(r+1)/2. .pp Line 5ff: The data from the pedigree begin on this line, one line for each individual in the pedigree, including parents of the progeny generation, whether or not they are measured. Each individual has: .np an ID number (these must be consecutive through the whole dataset, and the sequence of ID numbers must include the ID's for both parents) .np the ID# of the paternal parent .np the ID# of the maternal parent. In the case of a clonal analysis, it is necessary to imagine that each observed individual has maternal and paternal clonal 'parents', who are the same individual. Thus an individual indexed 1, might have father 107 and mother 107. .np an integer identifier for the population (1 or 2) .np an integer identifier for the level of each categorical fixed factor. The number of columns giving fixed effect levels must match the number of fixed factors given in line 1. (This is used only if there are fixed factors other than 'population'). Consecutive values must be used for the levels within a fixed factor. .np the phenotype(s) in the order in which you referred to them in line 2. Individuals in two generations must be included, whether or not the parents were measured. (It is straightforward to extend the analysis to more generations, but this version of the program will not properly accommodate such data.) Every offspring must have ID#s for two parents specified, in order to establish the pedigree. The ID#s for their parents, in turn, should be 0. Where any phenotype is unmeasured on a given individual, the phenotypic value should be -99.0. (You can change this to another value by changing the value of the constant, 'missing', at the beginning of the program). .lp After all the data, there must be a line that contains only a zero. You can check to make sure that the sorting into families is being done properly by examining the output from the procedure called "PrintFams", which gives a listing of the last 20 lines of data in the order received. In each line there will also be values for next maternal sib (NMS), and family ID number (FamID) immediately before the value of the first character. (Here, as above, "family" refers to the extended family, the collection of all individuals that are related). .pp As noted above, this version of the program gives unconstrained estimates, that is, estimates that are not confined to the parameter space, as true ML (REML) estimates should be. This means you may obtain negative estimates for variance components and estimates of covariance components that yield genetic correlations outside the range (-1,1). The estimates are reported (to a file name 'pcout') in the order corresponding to line 2 above. .pp A word of caution about the number of characters that can or should be analysed for comparison of G: the number of parameters to be estimated (maxnparm) goes up steeply with the number of traits. The number of parameters can readily surpass the total number of observations available. When that happens, it is not possible to obtain estimates for all parameters. Similarly, as that point is approached, there is progressively less information available per parameter. Thus, comparison of G for many traits in a single analysis may be infeasible for many datasets. (Obviously, how many is too many depends on the number of individuals and their genetic design). Analyses of too many traits for the data at hand will tend to have low statistical power and may have convergence difficulties. This is not a drawback of likelihood, but simply a consequence of the amount of information available in the data. If information is available on many traits, I recommend that analyses be carried out on subsets of the traits. .pp Included on the diskette are four files, including this one (progdoc.pcr). The source code of the ml analysis program is pcr2.p. There is a sample set of data (pcdata), and an output file for the analysis of those data (pcout. Additional analyses in which particular components are constrained are given in the file pc2tests. The example dataset is that for tiller dry weight used in Shaw and Billington (1991, Evolution 45:1287). .nf Ruth Shaw, Department of Botany and Plant Sciences University of California, Riverside, CA 92521 After January 1: Department of Ecology, Evolution, and Behavior University of Minnesota, St. Paul, MN 55108