© Copyright 1997 by Ziheng Yang
Ne, written and distributed for academic use free of charge by Ziheng Yang. The program implements the maximum likelihood methods of Takahata, Satta, and Klein (1995) and Yang (1997) for analysing sequence divergence data from multiple loci to estimate THETA = 4N*MU, where MU is the mutation (substitution) rate per nucleotide site and N is the population size of either an extant species or the common ancestor of two extant species. (The notation of Yang  is used, with BLOCK CAPITALS used for greek letters.) Takahata et al. also considered the case of three species, but this is not yet implemented in the program. Takahata et al.'s method assumes the same mutation rate for all loci, while Yang's method allows variable mutation rates among loci either by using a gamma distribution or by using independently estimated relative mutation rates for different loci.
For one species, two individuals are sampled at random at each locus, and the data consist of the numbers of sites (ni) and mutations (ki) at locus i, with i = 1, 2, ..., p. The program estimates THETA = 4N*MU, where N is the population size of the analysed species.
For two species, one individual is sampled at random from each species at each locus, and the data consist of the number of sites at locus i (ni) and the number of mutations (ki) separating the two species at locus i. The two parameters that can be estimated are THETA0 = 4N0*MU, where N0 is the population size of the ancestral population, and GAMMA = TAU*MU, where TAU is the separation time of the two species. If independent estimate of GAMMA is available, say, from phylogenetic analysis, THETA0 alone can be estimated by the program (see Yang 1997). While Yang (1997) suggested this as a possible way of reducing the sampling error in the estimated THETA0, Takahata and Satta (1997) pointed out that phylogenetic estimates of GAMMA may be biased for the exact reason that ancestral polymorphism is ignored in the estimation.
In Takahata et al.'s data (which were used by Yang 1997),
the numbers of mutations at loci were estimated by the number of
different sites between the two sequences under the infinite-sites
model. The data sets are included in this distribution. It is a good
idea that you run the example data sets to reproduce the estimates of
Yang (1997) before attempting to analyse your own data. Besides this
file gives a brief description of the program.
Ne.c) is written in ANSI C, and should work with any ANSI C compatible comiplers. Furthermore, executables for PowerMac and Windows 95 are supplied.
UNIX systems. The command for compiling the program will
depend on your compiler. Try
acc with and without the flag
-lm. One of
them should work.
cc -o Ne Ne.c -lm
gcc -o Ne Ne.c
Windows 95. The executable file
Ne.exe is a
Win32 console application.
PowerMac. PowerMac executable
Ne.PPC uses the
default data file name (
The program can be obtained from
Ne.dat. The following data are from Takahata et al. (1995 Table 2) for a two-species analysis.
The first line specifies the number of loci. Each row then lists ni and ki for each locus. If externally obtained estimates of relative rates for loci are to be used, they should be listed as the third column in the data file.
Ne, and it will read the default data file
Ne.dat, and output results on the screen. On UNIX or MS Windows systems, you can specify the data file name as a command line argument:
The program uses a simple interactive interface, asking questions about how you want the analysis to be done. The questions are self-explanatory. For a one-species analysis, it asks whether rates are constant among loci, approximated by a gamma distribution, or provided in the data file. For a two-species analysis, the program asks whether you want to estimate GAMMA (=TAU*MU) or specify its value so that THETA0 alone is to be estimated. The program then goes through a loop to perform several runs using different initial values for the maximum likelihood iteration. This is done to guard against possible existence of local optima. The program makes use of an intuitive scaling strategy, which may not work well in some cases. Depending on your data, you may encounter floating exception errors with poor initial values. If you are unable to obtain estimates even with good initial values, contact the author.
All results are printed out on the screen, and no output file is generated by the program.