I have noticed that genetic maps in IBD6-11_SNPs.text and IBD12-22_SNPs.text are very accurate, i.e. positions of each SNP as given in column D in bp is exactly as given by
http://integrin.ucd.ie/cgi-bin/rs2cm.cgi
I have checked few thousands of SNPs
Map in IBD1-5_SNPs.text was slightly off but still accurate enough to estimate the segments length in cM.
So all the data for the script is available in the files, we do not have to query in outside genetics maps.
For those who would like to investigate their data in the mean time short guide what is to be done:
All data in extended IBD files have in column E FastIBD scores less than 1.0E-10 so this score is sufficient for IBD and should not concern us any more. The length in cM is of greatest interest. We are searching for segments longer than 1cM as only those are IBD with high probability.
How to calculate length in cM?
For example in IBD6-11_SNPs.text I select firs row:
Code:
A B C D E
NA21436 GSM536527 10 45 6,08E-11
(What NA21436 and GSM536527 are can be fund in IBD6-11_samples.text.)
10 in column C marks the beginning of the segment
45 in column D marks the end of the segmnt.
It means that we should look for beginning SNP in line 10 of IBD6-11_SNPs.text and for the end SNP in line 45 of IBD6-11_SNPs.text.
Code:
Line Chrom SNP ? Distance
10 6 rs12212839 0 173345
45 6 rs2473492 0 511673
From column Distance we get beginning and end of the segment in bp.
One centimorgan corresponds to about 1 million base pairs in humans on average.
The length in cM will be (511673-173345)/1000000 =0,338328 cM.
It means the segment is too short for IBD <1 cM.