LD Clusters (TagSNPs)
Analysis of linkage disequilbrium (LD) between polymorphic sites in a locus was performed to identify "clusters" of highly correlated sites based on the r2 LD statistic. Sites were binned into sets of highly informative markers to minimize redundant data. This data is useful for the development of a minimal set of SNPs which could be used for large-scale genotyping of similar sample populations.
The sequence context for each SNP is indicated by a two letter code preceeding the SNP reference sequence position. The first position of the code indicates whether the sequence context is: (U)nique sequence or (R)epeat containing sequence. This information is important because designing genotyping assays is generally easier for unique sequences.
The second position in the code provides information on the genomic context:
(F)lanking region, 5' or 3' (U)TR, (I)ntron, (S)ynonymous cSNP, or (N)onsynonymous cSNP.
See Carlson et al., Am. J. Hum. Genet., 74:106-120, 2004