Donate Now
Goal amount for this month: 180 EUR, Received: 55 EUR (31%)
By donating, you not only support the continued existence of this site, you also improve this site in various ways, by making it affordable for ForumBiodiversity to upgrade the server with better hardware and licensed non-free proprietary software, but also motivating the staff to work harder. ABF will always be free of charge (gratis) to use. However, if everyone donates a small monthly amount, it makes a tremendous difference for the forum's overall quality in the long haul.
I thought it might be useful to have a tool that is better than just calculating the Euclidean distance between different genomes. The Euclidean distance calculation is one of the most accurate methods that we can use here. But the problem of the Euclidean distance calculation is that it does not take into account that the different components (West_Asian, South_Asian, Atlantic_Baltic, etc.) have different distances to each other. For instance, we know that the West_Asian component is closer to Atlantic_Baltic than to the Sub_Saharan component.
Intuitively, we know that Individual#1 and #2 are closer related than both to Individual#3, because Sub_Saharan component is so different compared to the other two components. However, the Euclidean distance would not see that because it is treating each component equally.
Euclidean distance #1 vs #2: ((10%-80%)^2+(80%-10%)^2+(10%-10%)^2)^0.5=98.99
Euclidean distance #1 vs #3: ((80%-10%)^2+(10%-10%)^2+(10%-80%)^2)^0.5=98.99
Euclidean distance #2 vs #3 ((10%-10%)^2+(80%-10%)^2+(10%-80%)^2)^0.5=98.99
Thus, I set-up a new method (distance calculator) that is taking into account how related the components are, when it is used to calculate the adjusted distance.
Normal Euclidean and adjusted Euclidean distance for Individual#1 (sorted by adjusted Euclidean distance):
# ID Normal Distance Adjusted distance
1 Individual#1 0.0 0.0
2 Individual#2 99.0 89.0
3 Individual#3 99.0 124.5
Normal Euclidean and adjusted Euclidean distance for Individual#2 (sorted by adjusted Euclidean distance):
# ID Normal Distance Adjusted distance
1 Individual#2 0.0 0.0
2 Individual#1 99.0 89.0
3 Individual#3 99.0 125.2
Normal Euclidean and adjusted Euclidean distance for Individual#3 (sorted by adjusted Euclidean distance):
# ID Normal Distance Adjusted distance
1 Individual#3 0.0 0.0
2 Individual#1 99.0 124.5
3 Individual#2 99.0 125.2
---------- Post added 2011-11-03 at 16:10 ----------
I think the reason why Iranians_D are get closer to Kurds, it might be do that the Iranians on Behar are from Southern Iran and likely from the same regions most likely South western iran.
What mathematical formula do you use to adjust Euclidian distances to the FST distance??
I've been trying to do it for a while, but I lack some quite basic maths knowledge to do it properly.
I ended up doing an approximation by calculating the coordinates on each of the 6f dimensions for groups (based on the approximate locations of components on the graphs Dienekes provided), but that's a bit clunky and still an approximation, so I'd be really interested in your methodes.
What mathematical formula do you use to adjust Euclidian distances to the FST distance??
I've been trying to do it for a while, but I lack some quite basic maths knowledge to do it properly.
I ended up doing an approximation by calculating the coordinates on each of the 6f dimensions for groups (based on the approximate locations of components on the graphs Dienekes provided), but that's a bit clunky and still an approximation, so I'd be really interested in your methodes.
I will try to explain it with an example, not easy though.
I.
In order to calculate the distance of two points in 2 dimensions (x and y) you can use the Pythagorean theorem (a^2+b^2=c^2 or (a^2+b^2)^0.5=c):
Point1:
x1=3
y1=0
Point2:
x2=0
y2=4
((x1-x2)^2+(y1-y2)^2)^0.5
((3-0)^2+(0-4)^2)^0.5
=(9+16)^0.5
=(25)^0.5
=5
The distance between point1 and point2 is 5.
II.
In order to calculate the distance of two points in 3 dimensions (x, y and z) you can use the extended Pythagorean theorem or Euclidean distance (a^2+b^2+c^2=d^2 or (a^2+b^2+c^2)^0.5=d):
Point1:
x1=3
y1=0
z1=1
III.
There is an alternative way to get the correct distance result (5.099) of II.
Imagine you have these two points (point1 and point2) in a 3-dimensional matrix, but you only can see the distance based on two dimension at a time. You have to look at it from 3 different position, each giving you a different distance.
Point1:
x1=3
y1=0
z1=1
Point2:
x2=0
y2=4
z2=2
So, you would 3 distances:
distance1=((x1-x2)^2+(y1-y2)^2)^0.5
=(25)^0.5
=5
distance2=((x1-x2)^2+(z1-z2)^2)^0.5
=(10)^0.5
=3.16
Just by knowing these three distances (i.e. 5, 3.16, and 4.12) in 2-dimensional matrices for the two 3-dimensional points, you can calculate the total distance in the 3-dimensional matrix.
(((Distance1)^2+(Distance2)^2+(Distance3)^2)/(number of total dimensions-1))^0.5
=((5^2+3.16^2+4.12^2)/(3-1))^0.5
=((25+10+17)/2)^0.5
=(52/2)^0.5
=(26)^2
=5.099
IV.
In order to calculate the distance of two points in 4 dimensions you can do the same as in III.. Of course, now you have to look at it even more positions, when only 2-dimensional distances are given, precisely 6 different positions, each giving you a different distance.
Point1:
w1=5
x1=3
y1=0
z1=1
Point2:
w2=0
x2=0
y2=4
z2=2
So, you would 6 distances:
distance1=((x1-x2)^2+(y1-y2)^2)^0.5
=(25)^0.5
=5
distance2=((x1-x2)^2+(z1-z2)^2)^0.5
=(10)^0.5
=3.16
Just by knowing these 6 distances in 2-dimensional matrices for the two 4-dimensional points, you can calculate the total distance in the 4-dimensional matrix.
(((Distance1)^2+(Distance2)^2+(Distance3)^2)+(Dist ance4)^2+(Distance5)^2+(Distance6)^2/(number of total dimensions-1))^0.5
=((25+10+17+34+41+26)/(4-1))^0.5
=((153)/3)^0.5
=51^0.5
=7.14
V.
You can add up more dimensions the same way I did in III. and IV.
See the components of Dodecad as dimensions, see the individual Dodecad Eurasia7 results as points in a 7-dimensional matrix (e.g. w=Atlantic_Baltic, x=South_Asian, y=East_Asian, etc.).
First, we assume that all components are equally related to each other (all Fst value are equal)
IIX.
Calculate all the 21 two-dimensional distances of two individual Dodecad Eurasia7 results and then use these 21 distances to calculate the overall distance in the 7-dimensional matrix, just like in IV (but with more distances (21) and dimensions (7-1=6) in the formula.
IX.
Now, we assume that all components are not equally related to each (all Fst different are different)
Do the same as in IIX., but multiply each 2-dimensional distance with the corresponding Fst distance from Dienekes table.
You have now adjusted the total distance by the Fst values provided by Dienekes.
X.
Because you multiplied a factor (Fst) to each 2D-distance, you number is off by the mean Fst value. For Eurasia7 it is 0.108 or 1/9.261981398.
Thus, you have to multiply the final result from IX. with 9.261981398 to normalize the data.
This is amazing. It's a much better way of looking at affinity via ADMIXTURE clusters.
Can you come up with a simple tool that takes in data from different ADMIXTURE runs? I could make a blog post about it, because this is the correct way of doing things, but no one's doing it.
This is amazing. It's a much better way of looking at affinity via ADMIXTURE clusters.
Can you come up with a simple tool that takes in data from different ADMIXTURE runs? I could make a blog post about it, because this is the correct way of doing things, but no one's doing it.
Also, can you make a list for the Poles?
Thanks.
Here is the TOP20 list for the Poles.
TOP20 normal and adjusted Euclidean distances for Polish_D using Dodecad Eurasia7 (sorted by adjusted Euclidean distance):
Can you come up with a simple tool that takes in data from different ADMIXTURE runs?
I could make an Excel file with Makro.
---------- Post added 2011-11-04 at 01:34 ----------
Originally Posted by StarDS9
I think the reason why Iranians_D are get closer to Kurds, it might be do that the Iranians on Behar are from Southern Iran and likely from the same regions most likely South western iran.
Yes, I agree. When Zack from Harappa Ancestry Project analyzed the data of Behar, he saw elevated African admixture (>10%) in 3 Iranians, most likely from the South.
Thanks, Palisto, this absolutely awesome, I just have a little question
Originally Posted by Palisto
III.
There is an alternative way to get the correct distance result (5.099) of II.
Imagine you h,ave these two points (point1 and point2) in a 3-dimensional matrix, but you only can see the distance based on two dimension at a time. You have to look at it from 3 different position, each giving you a different distance.
Point1:
x1=3
y1=0
z1=1
Point2:
x2=0
y2=4
z2=2
So, you would 3 distances:
distance1=((x1-x2)^2+(y1-y2)^2)^0.5
=(25)^0.5
=5
distance2=((x1-x2)^2+(z1-z2)^2)^0.5
=(10)^0.5
=3.16
Just by knowing these three distances (i.e. 5, 3.16, and 4.12) in 2-dimensional matrices for the two 3-dimensional points, you can calculate the total distance in the 3-dimensional matrix.
(((Distance1)^2+(Distance2)^2+(Distance3)^2)/(number of total dimensions-1))^0.5
=((5^2+3.16^2+4.12^2)/(3-1))^0.5
=((25+10+17)/2)^0.5
=(52/2)^0.5
=(26)^2
=5.099
IV.
In order to calculate the distance of two points in 4 dimensions you can do the same as in III.. Of course, now you have to look at it even more positions, when only 2-dimensional distances are given, precisely 6 different positions, each giving you a different distance.
Point1:
w1=5
x1=3
y1=0
z1=1
Point2:
w2=0
x2=0
y2=4
z2=2
So, you would 6 distances:
distance1=((x1-x2)^2+(y1-y2)^2)^0.5
=(25)^0.5
=5
distance2=((x1-x2)^2+(z1-z2)^2)^0.5
=(10)^0.5
=3.16
Just by knowing these 6 distances in 2-dimensional matrices for the two 4-dimensional points, you can calculate the total distance in the 4-dimensional matrix.
(((Distance1)^2+(Distance2)^2+(Distance3)^2)+(Dist ance4)^2+(Distance5)^2+(Distance6)^2/(number of total dimensions-1))^0.5
=((25+10+17+34+41+26)/(4-1))^0.5
=((153)/3)^0.5
=51^0.5
=7.14
Did you actually mean, for distance3 the following:
distance3=((y1-y2)^2+(z1-z2)^2)^0.5
??