This tutorial is designed to be read from beginning to end, but if you like you can jump straight to:
First off, you should have read the basic tutorial at least to the point where it links to this document (though it is wise to read the whole thing, as many of the concepts are the same). You should also read our tutorial why you might want to use a Bayesian analysis.
Now do the following:
Step 1: Use the converter to get a LAMARC infile from your data.
Step 2: run LAMARC and tell it about that infile.
For the purposes of illustration, I'm going to assume that you have at least two genetic regions in your data set, coming from at least two populations.
Now we're going to use the menu. To change from a likelihood run to a Bayesian run, first select the 'Search Strategy menu' (S), and toggle the 'Perform Bayesian or Likelihood analysis' option (P). If you're experimenting and just want to see a Bayesian run, that's all you'd need to do--you could hit '.' now and run LAMARC. But let's explore some of the other options available to us in a Bayesian run.
First, you'll notice that you now have a new menu option available to you entitled, 'Bayesian Priors Menu' (B). Select that, and you'll get a sub-menu listing the different active forces, and a summary of what the priors look like for each of them. By default, the priors for all forces but growth are logarithmic, so you'll see something like:
Bayesian Priors Menu T Bayesian priors for Theta (all logarithmic) M Bayesian priors for Migration Rate (all logarithmic) ---------- <Return> = Go Up | . = Run | Q = Quit
Hit 'T' and it'll take you to a list of all the priors for the thetas, including a default. You can then edit the default with 'D', or edit one particular prior by selecting that prior's number. This will take you to a menu like the following:
Bayesian Priors Menu for Theta for Population 1 Priors may be either linear or logarithmic. The lower bound for logarithmic priors must be above zero, in addition to any other constraints an evolutionary force might have. D Use the default prior for this force Yes S Shape of the prior log U Upper bound of the prior 10 L Lower bound of the prior 1e-05 ---------- <Return> = Go Up | . = Run | Q = Quit
As you can see, you have two ways to change the prior--you can change the boundaries, and you can change its shape (or density). The 'S' option will toggle the shape between logarithmic and linear, and you can set the upper and lower bounds with the 'U' and 'L' options. This is your opportunity to input what you already know about the parameters you wish to estimate. It's obviously important to get the units right, so be sure to read the forces section of the manual, and figure out any differences between how you typically think about your parameters and how LAMARC uses those parameters. (One typical 'gotcha' is that LAMARC always uses per-site estimates, but some researchers use per-locus estimates.)
Not quite--the standard search strategy is not really appropriate for a Bayesian run, so we're going to change it. Go to the top menu by hitting 'Return' a few times, then select 'S' ('Search Strategy Menu'), then S again ('Sampling strategy (chains and replicates)'). Once here, change the number of initial chains to 1 and final chains to 1 (options 1 and 5). Finally, change the number of replicates ('R') to 3. (In a production run, you're probably better off changing the 'Final number of samples' to 30,000 instead of changing the number of replicates to 3, but we want a bit more feedback for this sample run, which replicates will give us.)
Yes! One quick thing, however: from the main menu, select S (Search Strategy Menu), then R (the Rearrangers Menu). Here's where you can change the various rearrangers, including (now) the Bayesian rearranger. This controls how much relative time is spent sampling new parameters (the Bayesian rearranger) vs. sampling new trees (all the other rearrangers). In a run where you are trying to estimate many parameters (say, in a system with several populations), this menu is where you could increase the time spent resampling from those parameters.
But for now, we'll leave it as it is, with the Bayesian rearranger set to the same relative frequency as the Topology rearranger. Hit Run ('.') and I'll walk you through the output.
This probably looks something like:
14:33:01 Initial chain 1: [====================] 1000 steps 14:35:47 Predicted end of chains for this region: Fri Apr 22 08:32:28 2005 14:35:47 Accepted 17% | Point Likelihood 2.09739689 | Data lnL -3268.84119 Trees discarded due to too many events: 2 Trees discarded due to too small population sizes: 0 Trees discarded due to an infinitesimal data likelihood: 0 Trees discarded due to extremely long branch lengths: 0 Bayes-Arranger accepted 80/421 proposals Tree-Arranger accepted 81/468 proposals Tree-Size-Arranger accepted 9/111 proposals Number of unique sampled values for each parameter: 9: Theta for population number 1 4: Theta for population number 2 20: Migration rate into population number 1 from population number 2 31: Migration rate into population number 2 from population number 1 Class Theta population number 1 0.002060 population number 2 0.009636 Population Mig population number 1 -------- 184.1942 population number 2 48.03229 -------- 14:35:47 Final chain 1: [| ] 325
Much of this is the same as in the basic tutorial, but let's revisit all the pieces anyway.
The main output file will match the output file from a likelihood run; more information can be found in this section of the main tutorial, and still more information can be found in the Output Files section of the main manual. The principal difference between a Bayesian output file and a likelihood output file is that Bayesian parameter estimates are known as Most Probable Estimates (MPEs) instead of Maximum Likelihood Estimates (MLEs). This is because a Bayesian run produces probability density functions from which we read off the peak, instead of calculating likelihoods.
This difference also shows up in the profile tables of the outfile, where the Point Probabilities of the parameters are reported instead of the Log Likelihoods. Again, this is due to the type of analysis being done. Point probabilities are the absolute values of the probability density function for that parameter value. They can be used to compare their magnitudes at different parameter values, but are meaningless outside of that context.
Finally, the parameters are profiled without regard to other parameters in the run. As noted earlier, this can mask any correlations that might legitimately exist in your data, but the profiles are otherwise accurate.
Curve files, as you no doubt remember, are the full detailed output from a Bayesian run, and you should definitely look at them. The easiest way to do this is to import the file into a spreadsheet program like Excel (the numbers are tab-delimited), highlight the two columns of data, and select 'make a graph of this' (you'll want an X-Y Scatter Plot type graph). You can then simply look at the resulting curve to see if it's lumpy or nicely monotonic.
An excellent question, and one that we attempt to answer in some detail in Analyzing the Rest of Your Data.