The converter command file is an XML-format text file which can be used to bypass the converter GUI interface and directly provide information to the converter.
For most LAMARC users, running the lamarc file converter in GUI mode will be the quickest and most intuitive way to convert data files for use in LAMARC. However, there are a few situations in which it may be necessary to write a converter command file. These situations include:
If a command file is needed to access a particular feature, it can be read into the converter either in batch mode or from the GUI.
An example converter command file with matching MIGRATE data files is provided in the batch_converter/ directory. The file sample-conv-cmd.xml (actual xml is is here) annotated with comments, and should be a good guide to what's going on.
The simplest way to create your own file is probably a combination of:
The rest of this section is provided as a reference should copying from examples is not sufficient for your needs.
You can use your converter command file by:
The top level tag of the file is a <lamarc-converter-cmd> tag. Its possible immediate children are listed in the table below. Note that none of these child tags are required. This is because, generally speaking, fragments of complete converter command files are allowed to be read in from the GUI.
Top Level Tags in Lamarc Converter Command File | |||
---|---|---|---|
parent tag | child tag | child required | child instances allowed |
<lamarc-converter-cmd> | <traits> | optional | SINGLE |
<regions> | optional | SINGLE | |
<populations> | optional | SINGLE | |
<individuals> | optional | SINGLE | |
<panels> | optional | SINGLE | |
<infiles> | optional | SINGLE | |
<outfile> | optional | SINGLE | |
<lamarc-header-comment> | optional | SINGLE |
The <traits> tag is used only when trait mapping. If you are not mapping traits, you may skip ahead to the regions section.
The <traits> tag contains definitions of one or more of the following objects.
Tags Describing Traits in Lamarc Converter Command File | |||
---|---|---|---|
parent tag | child tag | child required | child instances allowed |
<traits> | <trait-info> | optional | multiple |
<phenotype> | optional | multiple | |
<trait-info> | <name> | REQUIRED | SINGLE |
<allele> | REQUIRED | multiple | |
<phenotype> | <name> | REQUIRED | SINGLE |
<genotype-resolutions> | REQUIRED | multiple | |
<genotype-resolutions> | <trait-name> | REQUIRED | SINGLE |
<haplotypes> | REQUIRED | multiple | |
<haplotypes> | <alleles> | REQUIRED | SINGLE |
<penetrance> | REQUIRED | SINGLE | |
tag | contents | ||
<allele> | unique name; should not contain spaces | ||
<alleles> | ordered list of names (from <allele> tags of corresponding trait), separated by spaces | ||
<penetrance> | value between 0 and 1; indicates the chance that an individual with these specific alleles will display the enclosing trait | ||
<name> | unique name; should not contain spaces | ||
<trait-name> | unique name; should not contain spaces |
In section Modeling Linkage Properties and Relative Mutation Rates of Your Data of the documentation
Specifying Inheritance Relationships | |||
---|---|---|---|
parent tag | child tag | child required | child instances allowed |
<regions> | <region> | REQUIRED | multiple |
<region> | <name> | REQUIRED | SINGLE |
<effective-popsize> | optional | SINGLE | |
<segments> | optional | SINGLE | |
<trait-location> | optional | multiple | |
<trait-location> | <trait-name> | REQUIRED for mapping optional for others | SINGLE |
tag | contents | ||
<effective-popsize> | value greater than 0; defaults to 1; the relative effective population size of samples from this region. | ||
<trait-name> | unique name; should not contain spaces |
Specifying Properties of Data Samples | |||
---|---|---|---|
parent tag | child tag or attribute | child required | child instances allowed |
<segments> | <segment> | REQUIRED | multiple |
<segment> | datatype | REQUIRED | - |
marker-proximity | optional | - | |
<name> | REQUIRED | SINGLE | |
<markers> | REQUIRED | SINGLE | |
<map-position> | optional | SINGLE | |
<length> | optional | SINGLE | |
<locations> | optional | SINGLE | |
<first-position-scanned> | optional | SINGLE | |
<unresolved-markers> | optional | SINGLE | |
tag | contents | ||
<markers> | number of sites with data; for dna this is the number of sites sequenced; for snp data it is the number of snps; for kallele and microsat data it is the number of distinct sites at which kallele/msat data was collected. | ||
<map-position> | location of <first-position-scanned> in region-wide coordinates | ||
<length> | total number of bases searched for data | ||
<locations> | the location of each particular data site of your data in segment coordinates | ||
<first-position-scanned> | the location of the first sampled location in your data in segment coordinates | ||
attribute | value | meaning | |
datatype | dna | DNA data | |
snp | SNP data | ||
kallele | k-allele data | ||
microsat | microsattelite data | ||
marker-proximity | linked | individual data markers likely to be inherited together | |
unlinked | individual data markers are independently inherited |
The <populations> tag is used to name distinct populations. If your data files have named populations, the population names here should match the names that are in your files.
Specifying population names with the <populations> tag | |||
---|---|---|---|
parent tag | child tag | child required | child instances allowed |
<populations> | <population> | Y | Y |
tag | contents | ||
<population> | a name unique among all populations, regions, and segments |
The <infiles> tag will tell the converter where to find your data, and how to associate each file with the previously-defined regions, segments, and populations.
Tags Describing Input Files in Lamarc Converter Command File | |||
---|---|---|---|
parent tag | child tag or attribute | child required | child instances allowed |
<infiles> | <infile> | REQUIRED | multiple |
<infile> | datatype | REQUIRED | - |
format | optional | - | |
sequence-alignment | optional | - | |
<name> | REQUIRED | SINGLE | |
<segments-matching> | REQUIRED | SINGLE | |
<pop-matching> | optional | SINGLE | |
<individuals-from-samples> | optional | SINGLE | <individuals-from-samples> | type | REQUIRED | - | <population-matching> | type | REQUIRED | - |
<population-name> | depends on value of type attribute | multiple | <segments-matching> | type | REQUIRED | - |
<segment-name> | depends on value of type attribute | multiple | |
tag | contents | ||
<individuals-from-samples> | the number of adjacent samples to bundle into a single individual | ||
attribute | value | meaning | |
datatype | dna | DNA data | |
snp | SNP data | ||
kallele | k-allele data | ||
microsat | microsattelite data | ||
format | migrate | input file is a migrate file | |
phylip | input file is a phylip file | ||
sequence-alignment | interleaved | the first line of each sequence appears, followed by all second lines, then all third lines, etc. | |
sequential | each entire sequence appears in the file before the next one starts. | ||
type for <individuals-from-samples> | byAdjacency | bundle adjacent samples into individuals | |
type for <population-matching> | byList | Each population referred to in the file is to be assigned to a particular population defined in this file. If this type is used, sub-tags of the type <population-name> should be used to define those populations (each should have a name that matches a population defined in the <populations> tag, above). | |
byName | The file itself contains information about what populations the data refers to. These names must match the names given in the 'population' tag, above. | ||
single | All individuals in the file are to be assigned to a single population. That population must then be defined by a <population-name> subtag. | ||
type for <segments-matching> | byList | Each segment referred to in the file is to be assigned to a particular segment defined in this file. If this type is used, sub-tags of the type <segment-name> should be used to define those segment (each should have a name that matches a defined segment). | |
single | All individuals in the file are to be assigned to a single segment. That segment must then be defined by a <segment-name> subtag. |
<outfile>, where you can specify the name of the file that you want the converter to produce,
Tags Describing Output Files in Lamarc Converter Command File | |||
---|---|---|---|
tag | contents | ||
<outfile> | name of outfile to produce; defaults to infile.xml |
Miscellaneous Tags in Lamarc Converter Command File | |||
---|---|---|---|
tag | contents | ||
<lamarc-header-comment> | text of comment to be inserted in lamarc file |
For most LAMARC analyses, it is not necessary to specify which pairs (or more) of data sequences belong to the same individual. However, there are a few cases where it may be necessary, including:
Assigning samples to individuals, and optionally assigning trait phenotypes or information about haplotype resolution to them is done with the <individuals> tag. An example can be found in section Assigning Phenotypes to Individuals of the Trait Mapping documentation.
Specifying Relationships between Individuals and Sample Data in Converter Command File | |||
---|---|---|---|
parent tag | child tag | child required | child instances allowed |
<individuals> | <individual> | optional | multiple |
<individual> | <name> | REQUIRED | SINGLE |
<sample> | REQUIRED | multiple | |
<phase> | optional | multiple | |
<has-phenotype> | optional | multiple | |
<genotype-resolutions> | optional | multiple | |
<sample> | <name> | REQUIRED | SINGLE |
<phase> | <segment-name> | REQUIRED | SINGLE |
<unresolved-markers> | REQUIRED | SINGLE | |
tag | contents | ||
<name> | a name unique among all individuals and samples | ||
<has-phenotype> | a <phenotype>name already defined in the <traits> section | ||
<genotype-resolution> | an "anonymous" phenotype belonging to the enclosing individual only. See <traits> subtags table for definition | ||
<segment-name> | the name of the segment to which this set of phase information applies | ||
<unresolved-markers> | sites for which data markers are unresolved for this individual and segment |
To see an example of the <phase>, <segment-name> and <unresolved-markers> tags in use, see the file sample-conv-cmd.xml (actual xml is here)
The values for the 'unresolved-markers' tag should be site labels. The first valid site in a segment is the value of the 'first-position-scanned' tag for that segment, and the last valid site is determined by the length of the segment. If the segment does not have as many markers in it as valid sites (as for SNP data), the values here should match the values in the 'locations' tag for the segment. In the example file, the second segment of the second chromosome has SNP data with markers at positions 13, 19, 35, 77, 102, 112, and 204. These are therefore the only valid values for the 'phase' tag for this segment.
Panel member counts should be entered only if the user wishes to invoke Panel Correction. They need not be specified for all regions, only those for which one has the number of sequences used to create the panel.
WARNING: Do not estimate the number of sequences used to create a panel, it will make your results indefensible. If you do not have the actual number of sequences, you should not use Panel Correction. Your mutation rates will be lower, but that's the best you can do without knowing more about how the panel was created.
Specifying Panel Correction Information in Converter Command File | |||
---|---|---|---|
parent tag | child tag | child required | child instances allowed |
<panels> | <panel> | optional | multiple |
<panel> | <panel-name> | optional | SINGLE |
<panel-region> | REQUIRED | SINGLE | |
<panel-pop> | REQUIRED | SINGLE | |
<panel-size> | REQUIRED | SINGLE |