(Back | Contents | Next)

Converter Command File Reference

Converter Command File Introduction

The converter command file is an XML-format text file which can be used to bypass the converter GUI interface and directly provide information to the converter.

When to use a Converter Command File

For most LAMARC users, running the lamarc file converter in GUI mode will be the quickest and most intuitive way to convert data files for use in LAMARC. However, there are a few situations in which it may be necessary to write a converter command file. These situations include:

If a command file is needed to access a particular feature, it can be read into the converter either in batch mode or from the GUI.

An Example Converter Command File

An example converter command file with matching MIGRATE data files is provided in the batch_converter/ directory. The file sample-conv-cmd.xml (actual xml is is here) annotated with comments, and should be a good guide to what's going on.

How to Create a Converter Command File

The simplest way to create your own file is probably a combination of:

The rest of this section is provided as a reference should copying from examples is not sufficient for your needs.

How to Use a Converter Command File

You can use your converter command file by:

Command File Overview

The top level tag of the file is a <lamarc-converter-cmd> tag. Its possible immediate children are listed in the table below. Note that none of these child tags are required. This is because, generally speaking, fragments of complete converter command files are allowed to be read in from the GUI.

Top Level Tags in Lamarc Converter Command File
parent tagchild tagchild requiredchild instances allowed
<lamarc-converter-cmd> <traits> optionalSINGLE
<regions> optionalSINGLE
<populations> optionalSINGLE
<individuals> optionalSINGLE
<panels> optionalSINGLE
<infiles> optionalSINGLE
<outfile> optionalSINGLE
<lamarc-header-comment> optionalSINGLE

Traits

The <traits> tag is used only when trait mapping. If you are not mapping traits, you may skip ahead to the regions section.

The <traits> tag contains definitions of one or more of the following objects.

Below is a table discribing the relevant XML tags. You can also find an examples trait-info definition and examples of phenotype definitions in the section on trait mapping.

Table of Sub-Tags of <traits>

Tags Describing Traits in Lamarc Converter Command File
parent tagchild tagchild requiredchild instances allowed
<traits> <trait-info> optionalmultiple
<phenotype> optionalmultiple
<trait-info> <name> REQUIREDSINGLE
<allele> REQUIREDmultiple
<phenotype> <name> REQUIREDSINGLE
<genotype-resolutions> REQUIREDmultiple
<genotype-resolutions> <trait-name> REQUIREDSINGLE
<haplotypes> REQUIREDmultiple
<haplotypes> <alleles> REQUIREDSINGLE
<penetrance> REQUIREDSINGLE
tagcontents
<allele> unique name; should not contain spaces
<alleles> ordered list of names (from <allele> tags of corresponding trait), separated by spaces
<penetrance> value between 0 and 1; indicates the chance that an individual with these specific alleles will display the enclosing trait
<name> unique name; should not contain spaces
<trait-name> unique name; should not contain spaces

Tags Specifying Inheritance and Mutation Models: <regions> and <segments>

In section Modeling Linkage Properties and Relative Mutation Rates of Your Data of the documentation

Regions

Specifying Inheritance Relationships
parent tagchild tagchild requiredchild instances allowed
<regions> <region>REQUIREDmultiple
<region> <name> REQUIREDSINGLE
<effective-popsize> optionalSINGLE
<segments> optionalSINGLE
<trait-location> optionalmultiple
<trait-location><trait-name> REQUIRED for mapping
optional for others
SINGLE
tagcontents
<effective-popsize> value greater than 0; defaults to 1; the relative effective population size of samples from this region.
<trait-name> unique name; should not contain spaces

Segments

Specifying Properties of Data Samples
parent tagchild tag or attributechild requiredchild instances allowed
<segments><segment> REQUIREDmultiple
<segment> datatype REQUIRED-
marker-proximity optional-
<name> REQUIREDSINGLE
<markers> REQUIREDSINGLE
<map-position> optionalSINGLE
<length> optionalSINGLE
<locations> optionalSINGLE
<first-position-scanned> optionalSINGLE
<unresolved-markers> optionalSINGLE
tagcontents
<markers> number of sites with data; for dna this is the number of sites sequenced; for snp data it is the number of snps; for kallele and microsat data it is the number of distinct sites at which kallele/msat data was collected.
<map-position> location of <first-position-scanned> in region-wide coordinates
<length> total number of bases searched for data
<locations> the location of each particular data site of your data in segment coordinates
<first-position-scanned> the location of the first sampled location in your data in segment coordinates
attributevaluemeaning
datatype dnaDNA data
snpSNP data
kallelek-allele data
microsatmicrosattelite data
marker-proximity linkedindividual data markers likely to be inherited together
unlinkedindividual data markers are independently inherited

Populations

The <populations> tag is used to name distinct populations. If your data files have named populations, the population names here should match the names that are in your files.

Specifying population names with the <populations> tag
parent tagchild tagchild requiredchild instances allowed
<populations> <population> YY
tagcontents
<population> a name unique among all populations, regions, and segments

Data files

The <infiles> tag will tell the converter where to find your data, and how to associate each file with the previously-defined regions, segments, and populations.

Tags Describing Input Files in Lamarc Converter Command File
parent tagchild tag or attributechild requiredchild instances allowed
<infiles><infile> REQUIREDmultiple
<infile> datatype REQUIRED-
format optional-
sequence-alignment optional-
<name> REQUIREDSINGLE
<segments-matching> REQUIREDSINGLE
<pop-matching> optionalSINGLE
<individuals-from-samples> optionalSINGLE
<individuals-from-samples> type REQUIRED-
<population-matching> type REQUIRED-
<population-name> depends on value of type attributemultiple
<segments-matching> type REQUIRED-
<segment-name> depends on value of type attributemultiple
tagcontents
<individuals-from-samples> the number of adjacent samples to bundle into a single individual
attributevaluemeaning
datatype dnaDNA data
snpSNP data
kallelek-allele data
microsatmicrosattelite data
format migrateinput file is a migrate file
phylipinput file is a phylip file
sequence-alignment interleavedthe first line of each sequence appears, followed by all second lines, then all third lines, etc.
sequentialeach entire sequence appears in the file before the next one starts.
type for <individuals-from-samples> byAdjacency bundle adjacent samples into individuals
type for <population-matching> byList Each population referred to in the file is to be assigned to a particular population defined in this file. If this type is used, sub-tags of the type <population-name> should be used to define those populations (each should have a name that matches a population defined in the <populations> tag, above).
byName The file itself contains information about what populations the data refers to. These names must match the names given in the 'population' tag, above.
single All individuals in the file are to be assigned to a single population. That population must then be defined by a <population-name> subtag.
type for <segments-matching> byList Each segment referred to in the file is to be assigned to a particular segment defined in this file. If this type is used, sub-tags of the type <segment-name> should be used to define those segment (each should have a name that matches a defined segment).
single All individuals in the file are to be assigned to a single segment. That segment must then be defined by a <segment-name> subtag.

Specifying the Name of the Produced Lamarc file

<outfile>, where you can specify the name of the file that you want the converter to produce,

Tags Describing Output Files in Lamarc Converter Command File
tagcontents
<outfile> name of outfile to produce; defaults to infile.xml

Miscellaneous Tags

Miscellaneous Tags in Lamarc Converter Command File
tagcontents
<lamarc-header-comment> text of comment to be inserted in lamarc file

Specifying Relationships Between Individuals and Data Samples

For most LAMARC analyses, it is not necessary to specify which pairs (or more) of data sequences belong to the same individual. However, there are a few cases where it may be necessary, including:

Assigning samples to individuals, and optionally assigning trait phenotypes or information about haplotype resolution to them is done with the <individuals> tag. An example can be found in section Assigning Phenotypes to Individuals of the Trait Mapping documentation.

Specifying Relationships between Individuals and Sample Data in Converter Command File
parent tagchild tagchild requiredchild instances allowed
<individuals><individual> optionalmultiple
<individual> <name> REQUIREDSINGLE
<sample> REQUIREDmultiple
<phase> optionalmultiple
<has-phenotype> optionalmultiple
<genotype-resolutions> optionalmultiple
<sample> <name> REQUIREDSINGLE
<phase> <segment-name> REQUIREDSINGLE
<unresolved-markers> REQUIREDSINGLE
tagcontents
<name> a name unique among all individuals and samples
<has-phenotype> a <phenotype>name already defined in the <traits> section
<genotype-resolution> an "anonymous" phenotype belonging to the enclosing individual only. See <traits> subtags table for definition
<segment-name> the name of the segment to which this set of phase information applies
<unresolved-markers> sites for which data markers are unresolved for this individual and segment

To see an example of the <phase>, <segment-name> and <unresolved-markers> tags in use, see the file sample-conv-cmd.xml (actual xml is here)

The values for the 'unresolved-markers' tag should be site labels. The first valid site in a segment is the value of the 'first-position-scanned' tag for that segment, and the last valid site is determined by the length of the segment. If the segment does not have as many markers in it as valid sites (as for SNP data), the values here should match the values in the 'locations' tag for the segment. In the example file, the second segment of the second chromosome has SNP data with markers at positions 13, 19, 35, 77, 102, 112, and 204. These are therefore the only valid values for the 'phase' tag for this segment.

Specifying Panel Correction Information

Panel member counts should be entered only if the user wishes to invoke Panel Correction. They need not be specified for all regions, only those for which one has the number of sequences used to create the panel.

WARNING: Do not estimate the number of sequences used to create a panel, it will make your results indefensible. If you do not have the actual number of sequences, you should not use Panel Correction. Your mutation rates will be lower, but that's the best you can do without knowing more about how the panel was created.

Specifying Panel Correction Information in Converter Command File
parent tagchild tagchild requiredchild instances allowed
<panels><panel> optionalmultiple
<panel> <panel-name> optionalSINGLE
<panel-region> REQUIREDSINGLE
<panel-pop> REQUIREDSINGLE
<panel-size> REQUIREDSINGLE

(Back | Contents | Next)