From teachingmaterials

Jump to: navigation, search


Exercise: Genotype to Phenotype


In this exercise you will reconstruct the past by analysing the DNA of the first ancient human genome sequence - a paleo-eskimo, dug out from permafrost in northern Greenland. You will analyse the genome to find single nucleotide polymorphisms (SNPs) — differences in single DNA base pairs that exist between individual genomes, and that may act as markers of an individual's physical traits.

Rasmussen et al. [1] have sequenced the genome of a man from the Saqqaq culture, using DNA from hair preserved in permafrost in Greenland. They analysed the genome to find single nucleotide polymorphisms (SNPs) — differences in single DNA base pairs that exist between individual genomes, and that may act as markers of an individual's physical traits. a, Here, a short stretch of human DNA is shown that is a marker for normal earwax. b, In the analogous DNA from the Saqqaq individual, there is a SNP in which a C in the lower strand has been replaced by a T (C, G, T and A denote the four kinds of DNA base). This SNP shows that the Saqqaq man had dry earwax. Rasmussen and colleagues identified other SNPs indicating that the ancient human had, among other things, brown eyes, non-white skin, thick dark hair and an increased susceptibility to baldness.

You will start by searching for different physical traits (or diseases) which are associated to SNPs. Each SNP has a unique RefSNP accession ID (rs number) and a unique location (chromosome and position). From our Solexa sequencing project we have around 3 billion sequence reads for the Saqqaq genome which have been mapped to the human reference genome. You can take any position on a chromosome and use the script get_saqqaq.py to retrieve the alignment around the requested position. From the alignment you can do a manual "SNP calling", meaning determining what allele the Saqqaq genome has at a particular location and how that does fit into to known SNP phenotype associations.

A step by step example - the exciting story about wet vs dry earwax

From the NCBI Coffee Break Bookshelf:
Don't put anything smaller than your elbow in your ear: the genetics of ear wax
Recently, an exciting genetic discovery was made in the field of ear wax. It appears that a change in a single nucleotide of your DNA can determine whether your ear wax is wet or dry. This marks the first time that a single-nucleotide polymorphism (SNP) has been found to determine a visible genetic trait.
  • Point your browser to SNPedia (http://www.snpedia.com) and search for earwax
  • read the full earwax story (NCBI coffeebreak introduction) and note what changes determine the type of human earwax.
  • find the corresponding RefSNP accession ID (rs number) for this SNP and ignore the SNP location because this is for a different Human reference (build 37.1) genome than the one used in the Saqqaq study. Follow instead the dbSNP link, scroll down to the "Integrated maps" and look for the correct position in line mentioning build 36.3 and ref_assembly (chr16, position 46815699)
  • log on to organism.cbs.dtu.dk using your user-id and password
  • from organism: ssh -Y life (the software we will use in this exercise is installed on that machine)
  • retrieve the alignment around this particular SNP position
get_saqqaq.py chr16 46815699 | less -S
get_saqqaq.py is a script we wrote to quickly browse through the assembled reads. If you have spare time and have some unix experience, feel free to look through the script. It uses the standard unix tools awk and python to extract mapped reads from relevant positions. Notice how this is reasonably quick despite a rather large data file size.
 less `which get_saqqaq.py` 
  • this will retrieve the alignment on chromosome 16 around position 46815699 + 10bp flanking regions (this can take a while)
  • the region of interest is
    chr16   46815699        C       39      @tTtttTTTTtTTTtTtTTTTtttTTTTTTttttTtTttT        9,58,7,7,7,51,49,49,49,1,46,45,45,4,42,30,39,38,38,36,37,39,19,30,30,30,30,... 

The different fields mean:

    1. chromosome
    2. position
    3. base in the reference genome
    4. depth
    5. The fifth column always starts with ‘@’. In this column, read bases identical to the reference are showed in comma ‘,’ or dot ‘.’, and read bases different from the reference in letters. A comma or a upper case indicates that the base comes from a read aligned on the forward strand, while a dot or a lower case on the reverse strand.
    6. base qualities
  • This tells us that the reference genome has C and the Saqqaq genome is homozygotic for T. What phenotype (what kind of earwax) does this indicate for Inuk? Can we say anything about his ancestry?

Preparing the physical traits

Assemble in groups and discuss which physical traits (or diseases) you would think could be associated with single nucleotide polymorphisms. Search for these traits on SNPedia and see if any SNP association data exists.

If you fail to find physical traits with enough data on Mr Inuk, then try hair colour, eye colour, tooth-shape.

Check what Inuk had

Use the information from your chosen traits and look through the Saqqaq genome assembly, do the manual "SNP calling" and note what (if any) variation Inuk had.

Remember the discussion during the lecture earlier today, what are
reasonable criteria for SNPs?

Compare amongst populations

Search for the same rs number in dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP) and note the distribution of this SNP amongst following populations. (scroll down to the population diversity part)

  • HapMap-CEU European
  • HapMap-HCB Chinese
  • HapMap-JPT Japanese
  • HapMap-YRI African

In which population fits the Saqqaq genotype?

More complex traits

Not all, in fact only a few phenotypes are easily defined by a single SNP (like the earwax type). By far the most are constituted of multiple SNPs, which in SNPedia is called a genoset - defined as the combination of alleles at 2 or more distinct SNP locations. Browse through SNPedia's genosets and settle for one or two phenotypes/disease/conditions and find out what our friend Inuk had ...


For at least three different traits or diseases and two genosets, report following:

  1. the rs number
  2. the phenotypic effect of the SNP and implied zygosity (if known)
  3. Inuks genotype - and how you interpreted the reads at this location
  4. Inuks phenotype - if possible to determine
  5. the distribution of this SNP amongst the HapMap populations


  1. genomic location of SNP and possible functional consequences
Personal tools