Exercises For Week Five

This lab is due on Thursday, March 24th

This lab exercise is a bit more ambitious and far ranging than what you have seen before. In exchange, however, I am asking you to start using the tools we have been learning about to address a set of specific (and I hope interesting) biological questions.

The focus of the exercise is the evolution of snake venoms.

Most of you realize that many snakes and vipers are poisonous, and this is an opportunity to explore how this diverse and very successful group of vertebrates evolved a heterogeneous set of proteins that are critical to their evolutionary success.

You should probably begin by doing a little bit of reading here and here. These articles provide some context for the questions you will be addressing.

As the article suggest, the evolution of snake venoms has taken place in a phylogenetically complex setting. Certain venom types appear to trace their ancestry to the last common ancestor of certain lizards and snakes; other venoms appear to emerge only in the lineage leading to contemporary snakes. In both cases, venoms evolve from existing proteins.

I would like you to focus your analysis on one of the venom families described in readings (your choice).

Let's imagine that you've decided to focus on Kunitz-like venoms.

Your first step, of course, should be to read a little bit about what a Kunitz-like venom actually is (in fact, we are dealing with a specific type of protease inhibitor). Now that you are no longer operating in intellectual darkness, you realize that there are innumerable questions that we can be asking about the evolution of Kunitz-like venoms.

I'd like us to concentrate on the following:
1) What forces are shaping the evolution of the particular family of venoms you are examining?
2) Is there any evidence that selection may be shaping the evolution of the family?
3) Are there any changes associated with becoming a venom that you can detect from an analysis of the sequences? In the case of Kunitz-like venoms, for example, what distinguishes the venom form from the other protease inhibitors to which it is related?
4) (optional) To what extent does the evolution of orthologs differ from the evolution of paralogs in your system?

The first step might be to go to NCBI and type in “Kunitz and snake”. When I do that, the search returns 83 hits, not all of which come from snakes. If you look at the right hand column you will see two little windows, one entitled "filter your results" and the other entitled “top organisms”.
At this point, we might want to concentrate specifically on mRNA sequences, since they will likely provide us with the complete coding sequence, already spliced. If you click on mRNA, you will see that you now have 41 sequences. Now look at the top organisms window, open it out by selecting “more”: you now have a list of the organisms for which Kunitz–like sequences have been identified.
Note that some organisms will have several sequences associated with them. You should also note that apparently homologous sequences seem to have different names in different organisms–no big deal here. More importantly, however, certain organisms seem to have different versions of the sequence: paralogs. What this means is that you're going to have to be careful when you construct the data set for your analysis, you do not want to be comparing orthologs and paralogs carelessly.

How then should you go about this? How do you know, for instance, if mulgin-1 is the ortholog of tigerin-1?

One option might be to start out by constructing a phylogenetic hypothesis (tree) that includes all of the available sequence, as a way to get a glimpse of the structure of these data.

When I do that in Mega (using the DNA sequence), where I first align the sequences and then use the alignment to construct a neighbor joining tree, I note that with very few exceptions the paralogs in any of the species are more closely related to each other than any is to an ortholog in another species. Stated differently, it looks either like the gene duplications that give rise to the paralogs occurred after speciation and/or that there is gene conversion occurring, resulting in the homogenizing of paralogs within a species. Either way, the result reassures me that I can select a single gene from each of the species represented for subsequent analysis and still be on relatively safe terrain.

When I take my alignment, however, and try to translate it, I soon realize that the sequences are a bit of a mess, with stop codons everywhere. Maybe it would make more sense to search the protein database, and extract Kunitz-like sequences in that form. I could then do the alignment, create a tree (note that you can do this in NCBI if you are just trying to get a first glimpse of the data). Do you actually reach different conclusions about orthologs and paralogs depending on what data you use (DNA or protein) for the analysis?
(a hint: if you look at a record as Protein, you will see a link along the right margin that says "Encoding mRNA". Try it)

Eventually, you have to make a decision about what genes to include in your analysis. My advice is to follow some simple rules:
1) keep it simple: don't try to include more than about a dozen sequences, unless you have the patience to clean up the alignments.
2) make sure you're looking at complete sequences, and that the sequences are actually translated (watch out for pseudogenes).
3) check and recheck the alignment: your results are only as good as your alignment. You're looking for a start codon at the beginning, and the alignment that stays in frame.
4) a hint: look carefully at the Genbank entry, where the actual positions of the start and stop codons are indicated under the heading CDS. If you use blast to find homolog, you can specify that the search be limited to the sequence encompassed by the start of stop codons. It will save you a lot of trouble.

Once you have a good alignment- in frame, no stop codon (including the real stop codon- REMOVE IT), you are ready to carry out the analyses that we have been learning about. At this point, I'd like you to decide what the question(s) is/are, and what analyses might allow you to address the questions. Some of the analyses are obvious: dn/ds, for instance, or codon-by-codon evidence of selection. But we might also want to know if there is codon bias, or what the matrix of amino acid substitutions might be, or what the transition/transversion ratio. No single right answer here- just explain to me why you are doing what you are doing. And tell me something interesting that you learned from exploring the evolution of snake venoms.