ATTN: Newer Relout version has made this Relout extinct. The new version is HERE.

INTRODUCTION

It is often desireable to exclude the close relatives from the genotype files before GWAS. RelOut is a script that allows you to do that in various ways. It uses the standard PLINK files as input and exludes the minimal number of individuals while meeting the threshold criteria set by the user. ATTN! If your RelOut input files are not produced by PLINK, please CLICK HERE!

There are four ways to set the exclusion criteria (and the combinations of them):
a) a genome file from PLINK (--genome),
b) a missing file from PLINK (--missing),
c) a phenotype file,
d) the syntax priority.
These options are explained below. For more info please contact RelOut 'at' ToomasHaller 'dot' com.

A VERY QUICK START

If you want to do the simplest thing, which is to exclude the minimal number of individuals so that all the remaining genetic distances are less than 0.1, simply do this:

1) Use PLINK to make the genome file:
./plink --ped *.ped --map *.map --genome --min 0.1 --out relatives
Note: the min value should be <= 0.1. The main output file is called "relatives.genome".

2) Use RelOut to get the list of individuals to remove:
./RELOUT32 -gfile relatives.genome -lowlim 0.1 -out remove_these_individuals.txt
Note: you may need to use RELOUT64 instead (if you have a 64-bit machine)

3) Use PLINK to remove the selected individuals:
./plink --ped *.ped --map *.map --remove remove_these_individuals.txt --recode --out relatives_removed
DETAILS FOR ADVANCED USE

RelOut can accept three types of files (all tab-delimited) as input but only one is absolutely required:

1. PLINK genome file (ALWAYS REQUIRED)
This contains the genetic distances info and should be made like this:
./plink --ped *.ped --map *.map --genome --min 0.1 --out relatives
The main output file is called "relatives.genome".

2. PLINK individual call rate (missing) file (OPTIONAL)
This contains the '1 - missing' values for all individuals and should be made like this:
./plink --ped *.ped --map *.map --missing --out missing

3. PLINK style phenotype file (OPTIONAL)
This should contain the same FID and IID values in the first two columns as the previous two files. This is followed by the phenotype info in the subsequent columns. Any number of columns is allowed; the entries must be positive numbers, -9 (or another negative number) should be used to indicate missing values. File delimitor can be anything but tab is used by default. This file must have a header. NOTE! To see an example of a phenotype file, please CLICK HERE!

- - - - - - -
SCENARIO I: if you have 1 input file (PLINK genome file):

You can use the following parameters:
./RELOUT32 -gfile -out -uplim -lowlim -like -downtlike
-gfile:  Name of the PLINK genome file (see above). REQUIRED.
-out:  Name of the output file. Default: the PLINK genome file + ".out".
-uplim:  Genetic distance upper threshold (0.0-1.0); genetic distances above this value are considered to indicate duplicate entries and are eliminated by removing one individual. Default: 0.9
-lowlim:  Genetic distance lower threshold (0.0-1.0); genetic distances above this value are eliminated by removing one individual. This is the MAIN PARAMETER that controls the RelOut program Default: 0.1
-like:  When one ID needs to be removed from the dataset and there is no reason to prefer one, the user can define combinations of keybord symbols (no spaces) to give some IDs advantage against elimination. IDs that contain the defined sequence of letters are less likely to be removed. Default: not used by default
-dontlike:  When one ID needs to be removed from the dataset and there is no reason to prefer one, the user can define combinations of keybord symbols (no spaces) to give some IDs disadvantage in elimination. IDs that contain the defined sequence of letters are more likely to be removed. Default: not used by default

Example:
./RELOUT32 -gfile relatives.genome -out results.txt -uplim 0.95 -lowlim 0.125 -like _good -downtlike _test
In this example uplim and lowlim values are defined, the IDs that contain the phrase "_good" are favored and the ones that contain "_test" are disfavored.

- - - - - - -
SCENARIO II: if you have 2 input files (PLINK genome file and PLINK call rate file):

You can use the following parameters:
./RELOUT32 -gfile -mfile -out -uplim -lowlim -like -dontlike
-gfile:  Name of the PLINK genome file (see above). REQUIRED.
-mfile:  Name of the PLINK call rate file (see above). REQUIRED.
-out, -uplim, -lowlim, -like, -dontlike:  Please refer to SCENARIO I above for explanations.

Example:
./RELOUT32 -gfile relatives.genome -mfile callrates.txt -out results.txt -uplim 0.85 -lowlim 0.0625 -like experiment1 -dontlike experiment2
In this example uplim and lowlim values are defined. The IDs that contain the phrase "experiment1" are favored and the ones that contain "experiment2" are disfavored only when there are no other reasons to prefer one ID to the other.

- - - - - - -
SCENARIO III: if you have 2 input files (PLINK genome file and phenotype file):

You can use the following parameters:
./RELOUT32 -gfile -pfile -pfiledelim -pheno1col -pheno2col -pheno1mode -pheno2mode -pheno1value -pheno2value-out -uplim -lowlim -like -dontlike
-gfile:  Name of the PLINK genome file (see above). REQUIRED.
-pfile:  Name of the phenotype file (see above). REQUIRED.
-pfiledelim:  Phenotype file delimiter. This can be "tab", "space", or any symbol or string without spaces. Default: tab
-pheno1col, -pheno2col:  Column number (3-x) indicating the column number where the 1. and 2. phenotype are, respectively. If only one phenotype is used, only -pheno1col should be specified. At least one phenotype column must be specified. NOTE that both columns can be the same number (in that case each phenotype is tested twice before deciding which ID to remove, see below). Default: 0, it means 'this paramater not used'
-pheno1mode, -pheno2mode:  This is used to specify the mode how the phenotypes should be treated. The options are: "max", "min", "equal". 'Max' means that larger values are preferred, 'min' means that smaller values are preferred and 'equal' means that a given value is preferred when deciding what IDs to retain (and not to remove). The first phenotype gets a higher priority, the second phenotype is used in the decision making only when a) the first phenotypes were equal, b) the second phenotype is a negative value. NOTE that this can be utized to use the first phenotype as a yes/no switch (e.g. men vs. woman) and the second one as a more fine modulator (e.g. tall vs. short). NOTE that if one column was specified as two phenotypes (e.g. -pheno1col 3 -pheno2col 3) then the same phenotype is tested twice as explained above. Default: "max", NOTE that if one of the phenotypes is a negative number, it is always preferentially removed, regardless of how the mode is specified.
-pheno1value, -pheno2value:  This is used to specify a value if '-pheno1mode' or -'pheno2mode' was set to 'equal'. If only one phenotype is used, only -pheno1value should be specified. Default: no default
-out, -uplim, -lowlim, -like, -dontlike:  Please refer to SCENARIO I above for explanations.

Example:
./RELOUT32 -gfile relatives.genome -pfile phenos.txt -pfiledelim space -pheno1col 3 -pheno2col 4 -pheno1mode equal -pheno2mode min -pheno1value 1 -out remove.txt -uplim 0.8 -lowlim 0.11 -like confirmed -dontlike unconfirmed
This is a maximal example, usually so many parameters are not required. Here the phenotype file delimiter is set to 'space', phenotype one is in column 3, phenotype two is in column 4. Phenotype one is tested first. If one is equal to 1, and the other one is not, the first one is retained and the second one is removed. If they are equal, however, the second phenotype is tested and smaller values are preferred (not removed). Please refer to SCENARIO I above for the explanation of the other parameters.

- - - - - - -
SCENARIO IV: if you have 3 input files (PLINK genome file, PLINK call rate file, phenotype file):

You can use the following parameters:
./RELOUT32 -gfile -mfile -pfile -pfiledelim -sigcr -pheno1col -pheno2col -pheno1mode -pheno2mode -pheno1value -pheno2value -out -uplim -lowlim -like -dontlike
-gfile:  Name of the PLINK genome file (see above). REQUIRED.
-mfile:  Name of the PLINK call rate file (see above). REQUIRED.
-pfile:  Name of the phenotype file (see above). REQUIRED.
-sigcr:  Significant call rate value (0.0-1.0). This defines the call rate threshold value. If one call rate is above the sigcr and the other one is below it, the ID with the lower callrate value is removed regardless of how the phenotypes are defined. If both call rates are either above or below the sigcr value, only phenotype data are used to decide which ID to remove (see SCENARIO III above). Default: 0.97
-pfiledelim, -pheno1col, -pheno2col, -pheno1mode, -pheno2mode, pheno1value, pheno2value:  Please refer to SCENARIO III above for explanations.
-out, -uplim, -lowlim, -like, -dontlike:  Please refer to SCENARIO I above for explanations.

Example:
./RELOUT32 -gfile relatives.genome -mfile missing.txt -pfile phenos.txt -sigcr 0.95 -pfiledelim space -pheno1col 7 -pheno2col 11 -pheno1mode max -pheno2mode min -out remove.txt -uplim 0.9 -lowlim 0.125 -like cases -dontlike controls
Here sigcr is set to 0.95 and two phenotypes are considered. Please refer to SCENARIO I, II, III above for the explanation of the other parameters.

- - - - - - -
General notes: The IDs must always be unique. The PLINK files should not be modified. All columns must contain values. Put -9 in place of missing values.

DOWNLOAD

The RelOut download contains the executable. Please note that since RelOut uses Qt libraries, you either need to have Qt (4.4.3 or higher) installed on your computer or the correct Qt libraries available. Static compilation options will become available in the future. When you transfer the executable into your destination folder, you need to give it the required permissions. This can be done by typing 'chmod ugo+rwx RELOUT##', where ## is either 32 or 64.

 Linux (v.0.2 - last update March 20, 2012):
   RelOut executable for 32-bit Ubuntu 10.04 (zip format)
   RelOut executable for 64-bit Redhat (gz format)

Before you download, please decide whether you need the 32-bit or 64-bit version. A regular laptop or desktop likely needs the 32-bit version and the servers typically need the 64-bit version. If you are using a Linux I'm not currently supporting or a different OS alltogether (Windows, MacOS), please contact me by e-mail and I'll see what I can do. Please contact me if you have difficulties running the program (relout 'at' toomashaller 'dot' com).

BACK TO MAIN PAGE

© 2015 www.ToomasHaller.com