User Tag List

Results 1 to 5 of 5

Thread: PCA/MDS analyses can be wrong3024 days old

  1. #1
    Established Member
    Race Realist Lemminkäinen's Avatar
    Last Online
    @
    Join Date
    2010-05-25
    Posts
    10,622
    Location
    Helsinki
    Gender
    Y-DNA
    I1-L258
    mtDNA
    H39
    Phenotype
    Appalachid
    Metaethnos
    Finnic-Baltic-Germanic
    Ethnicity
    Finnish
    Politics
    Vandalism in Rome
    Finland European Union

    Default PCA/MDS analyses can be wrong

    Like an echo, what I have said on several forums:


    . By computer simulation of vicariance on the basis of coalescent theory, EIGENSOFT systematically overestimates the number of significant principal components. Furthermore, this overestimation is larger for samples of admixed individuals than for samples of unadmixed individuals. Overestimating the number of significant principal components can potentially lead to a loss of power in association testing by adjusting for unnecessary covariates and may lead to incorrect inferences about group differentiation

    http://www.ncbi.nlm.nih.gov/pubmed/2...?dopt=Abstract
    Blog: http://terheninenmaa.blogspot.fi/, with essence "Believe me, or I'll nuke you".

    H39 - Thracia 1650 BC, Hungary 5000 BC
    I1 - Transdanubia 5000 BC

    Three simple facts about Finns:
    1. Baltic Finnic languages (including Finnish) never came from the Volga basin along with ancestors of present-day Finns.
    2. Finnish I1 (around 30% of all Finns) has Germanic roots from the late Bronze Age or the early Iron Age.
    3. As to the Finnish prehistory we have no evidences about any Iron Age (or later) east-to-west migration, but many unquestionable evidences about west-to-east migrations.

    Väinämöinen - R1a
    Lemminkäinen - I1
    Joukahainen - N

  2. The Following User Says Thank You to Lemminkäinen For This Useful Post:

    Svin (2011-04-04)

  3. # ADS
    Advertisement bot
    Join Date
    2013-03-24
    Location
    ForumBiodiversity.com
    Posts
    All threads
       
     

  4. #2
    Established Member
    Your Friend
    Last Online
    @
    Join Date
    2009-10-23
    Posts
    9,652
    Gender
    Y-DNA
    R1a-Z282
    mtDNA
    H7
    Metaethnos
    Slavic
    Ethnicity
    Polish
    Phenotype
    Barbarian
    Religion
    Crop Circles
    Poland

    Default

    It's not really possible to make the error described above on a 2D or 3D plot.

    There are many issues with PCA-MDS plots, but they're more about correctly interpreting the data than the fact that the results are wrong.

  5. The Following User Says Thank You to Polako For This Useful Post:

    sgc2009 (2011-04-04)

  6. #3
    Established Member
    Race Realist Lemminkäinen's Avatar
    Last Online
    @
    Join Date
    2010-05-25
    Posts
    10,622
    Location
    Helsinki
    Gender
    Y-DNA
    I1-L258
    mtDNA
    H39
    Phenotype
    Appalachid
    Metaethnos
    Finnic-Baltic-Germanic
    Ethnicity
    Finnish
    Politics
    Vandalism in Rome
    Finland European Union

    Default

    Quote Originally Posted by Polako View Post
    It's not really possible to make the error described above on a 2D or 3D plot.

    There are many issues with PCA-MDS plots, but they're more about correctly interpreting the data than the fact that the results are wrong.
    I is not question about "can you make an error on plots", it is question "can it make errors with respect to the reality". Even if you did averything right.

    "EIGENSOFT systematically overestimates the number of significant principal components." means as I understand it that MDS-plots build too many significant principal components, in other word, it build components that cause errors with respect to reality -> "Furthermore, this overestimation is larger for samples of admixed individuals than for samples of unadmixed individuals. " I an see that there is certain reasons why admixed samples lead to additional components with less real meaning.

    In this context even the 4th and higher dimensions dont generate particular errors, because it is an internal error in the method and not particular for certain dmensions. If you see something weird in higher dimensions, it is because you see too few and less significant components, not because there could be extra errors.
    Last edited by Lemminkäinen; 2011-04-04 at 19:12.
    Blog: http://terheninenmaa.blogspot.fi/, with essence "Believe me, or I'll nuke you".

    H39 - Thracia 1650 BC, Hungary 5000 BC
    I1 - Transdanubia 5000 BC

    Three simple facts about Finns:
    1. Baltic Finnic languages (including Finnish) never came from the Volga basin along with ancestors of present-day Finns.
    2. Finnish I1 (around 30% of all Finns) has Germanic roots from the late Bronze Age or the early Iron Age.
    3. As to the Finnish prehistory we have no evidences about any Iron Age (or later) east-to-west migration, but many unquestionable evidences about west-to-east migrations.

    Väinämöinen - R1a
    Lemminkäinen - I1
    Joukahainen - N

  7. #4
    Established Member
    Your Friend
    Last Online
    @
    Join Date
    2009-10-23
    Posts
    9,652
    Gender
    Y-DNA
    R1a-Z282
    mtDNA
    H7
    Metaethnos
    Slavic
    Ethnicity
    Polish
    Phenotype
    Barbarian
    Religion
    Crop Circles
    Poland

    Default

    ^ When using EIGENSOFT, that program will tell you which eigenvectors are significant, via what's called the Tracy-Widom statistics. So, for example, it'll say that the first 25 eigenvectors are good to use in the analysis for population stratification.

    However, it looks like these estimates are too much, so instead of 25, only 15 or whatever eigenvectors should be considered. So if you only use 3 to make a 3D plot, then there's no drama....unless, of course, the audience can't read the plot in the context it was designed.

    This is an issue for someone like Dienekes, who often uses more than 10 dimensions to find population structure. Basically, he might be over doing it, and lumping together groups and individuals that shouldn't be lumped together. But I don't really know, because I prefer 3D plots that I can spin myself around on.

  8. #5
    Established Member
    Race Realist Lemminkäinen's Avatar
    Last Online
    @
    Join Date
    2010-05-25
    Posts
    10,622
    Location
    Helsinki
    Gender
    Y-DNA
    I1-L258
    mtDNA
    H39
    Phenotype
    Appalachid
    Metaethnos
    Finnic-Baltic-Germanic
    Ethnicity
    Finnish
    Politics
    Vandalism in Rome
    Finland European Union

    Default

    Quote Originally Posted by Polako View Post
    ^ When using EIGENSOFT, that program will tell you which eigenvectors are significant, via what's called the Tracy-Widom statistics. So, for example, it'll say that the first 25 eigenvectors are good to use in the analysis for population stratification.

    However, it looks like these estimates are too much, so instead of 25, only 15 or whatever eigenvectors should be considered. So if you only use 3 to make a 3D plot, then there's no drama....unless, of course, the audience can't read the plot in the context it was designed.

    This is an issue for someone like Dienekes, who often uses more than 10 dimensions to find population structure. Basically, he might be over doing it, and lumping together groups and individuals that shouldn't be lumped together. But I don't really know, because I prefer 3D plots that I can spin myself around on.
    Yes, this is true. But however different thing. This study speaks about components, not dimensions. Components are inside dimensions. Dienekes' mistake is that he believes that adding dimensions corrects errors in components. I saw it in his writing. Absolutely wrong.

    ---------- Post added 2011-04-04 at 21:42 ----------

    Then speaking about the issues which I have no experience - practical things. I think that you are right; we dont need tens of dimensions, just three is good. But what can be done to avoid internal methodical errors ? I really hesitate to say anything, becuse I would need testing to see effects. But I suppose that we should avoid too large ethnic sample groups, especially homogenous groups, because this would lead to just what the previous study warns; to too many significant components. Of course there is special cases, relatives, but also too heavy ethnic groups do the same. But this is only an improvement, not a final solution.
    Last edited by Lemminkäinen; 2011-04-04 at 19:46.
    Blog: http://terheninenmaa.blogspot.fi/, with essence "Believe me, or I'll nuke you".

    H39 - Thracia 1650 BC, Hungary 5000 BC
    I1 - Transdanubia 5000 BC

    Three simple facts about Finns:
    1. Baltic Finnic languages (including Finnish) never came from the Volga basin along with ancestors of present-day Finns.
    2. Finnish I1 (around 30% of all Finns) has Germanic roots from the late Bronze Age or the early Iron Age.
    3. As to the Finnish prehistory we have no evidences about any Iron Age (or later) east-to-west migration, but many unquestionable evidences about west-to-east migrations.

    Väinämöinen - R1a
    Lemminkäinen - I1
    Joukahainen - N

Similar Threads

  1. What's so wrong with colonialism?
    By DogHouse in forum Race & Ethnicity in Society
    Replies: 5
    Last Post: 2019-06-01, 13:20
  2. What is wrong with the Swedes?
    By Wojewoda in forum Europe
    Replies: 34
    Last Post: 2010-10-31, 01:23
  3. Replies: 18
    Last Post: 2010-06-02, 20:37
  4. STRUCTURE analyses and their relation to race
    By Grasshoppa in forum General Genetics Discussion
    Replies: 3
    Last Post: 2010-03-02, 07:03
  5. 23andMe raises prices, splits its health and ancestry analyses
    By Polako in forum Genetic DNA Companies
    Replies: 10
    Last Post: 2009-11-18, 11:26

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
<