Data Download

Bulk Download  |  Download by Chromosome  |  SIFT/PolyPhen Data  |  dbSNP rs IDs

All PGA Variation Data

Bulk Download of All Variation Data Files

WARNING: This is a very large file and will take several minutes to download. The file is a compressed and "tarred" unix file containing the entire directory of text data files. These are the same text data files which appear in the data pages for each candidate gene on our Finished Genes List. Please see our Usage Statement if this work is used in a publication. A complete description of the contents of each of these files is found in the README.txt file.


Download of Variation Data (Single File )

Global Prettybase Files

This is a tab delimited text file in our "prettybase" format, which describes all SNP sites discovered by the SeattleSNPs PGA. The format of this file is:

Line format:
<chromosome position-HUGO_NAME-chromosome> <PGA Sample ID> <Allele1> <Allele2>

Example: 74772592-PLAU-10 D001 G T

The 'chromosome position' is generated from mapping to the most recent genome assembly available from the UCSC Genome Assembly

Download of PGA Variation Data by Chromosome

These are tab delimited text files in our "prettybase" format, which describes all SNP sites discovered by the SeatteSNPs PGA but separated into files based on chromosome. The format of this file is:

Line format:
<chromosome position-HUGO_NAME-chromosome > <PGA Sample ID> <Allele1> <Allele2>

Example: 74772592-10-PLAU D001 G T

The chromosome position is generated by mapping to Genome Build 36, hg18:
UCSC Genome Assembly
    Chromosome 1
Chromosome 2
Chromosome 3
Chromosome 4
Chromosome 5
Chromosome 6
Chromosome 7
Chromosome 8
Chromosome 9
Chromosome 10
Chromosome 11
Chromosome 12
Chromosome 13
Chromosome 14
Chromosome 15
Chromosome 16
Chromosome 17
Chromosome 18 Chromosome 19
Chromosome 20
Chromosome 21
Chromosome 22
X Chromosome
Y Chromosome

PGA SIFT/PolyPhen Data

Putative functional changes in a candidate gene's protein function were assessed by taking the nonsynonymous coding SNPs (cSNPs) for each gene and using both SIFT and Polyphen. Generally, each nonsynonymous amino acid change is analyzed in the context of other evolutionary similar proteins to determine the likelihood the polymorphic nonsynonymous change, and then statistically classified. Each of these programs classifies each coding SNP as tolerant or intolerant (SIFT), or benign, possibly damaging, probably damaging (Polyphen).
Combined SIFT/PolyPhen Data for PGA Nonsynonymous SNPs
Combined SIFT/PolyPhen Data for PGA Nonsynonymous SNPs (Intolerant or Potentially Damaging)
SIFT Data for PGA Nonsynonymous SNPs (Potentially Intolerant)
PolyPhen Data for PGA Nonsynonymous Genes (Potentially Damaging)

Download of dbSNP rs IDs For All Variations (Single File)


This is a tab-delimited text file that lists the NCBI dbSNP rs IDs as of June 4, 2007. In addition, the chromosome positions for UCSC Browser builds hg17 and hg18 are given.

Following several header lines, the columns are:

<gene name><PGA local ID><chromosome number><hg17 position><hg18 position><rs ID>


sftpb    SFTPB-002233    chr2    85805543    85747396    rs3024799