|
Loading...
|
biopython@biopython.org
[Prev] Thread [Next] | [Prev] Date [Next]
Re: [Biopython] Divergent sequence data set Peter Wed Nov 18 03:00:12 2009
On Wed, Nov 18, 2009 at 8:19 AM, Animesh Agrawal <[EMAIL PROTECTED]> wrote: > > Hi, > > I have been trying to develop a divergent sequence data set for a > phylogenetic analysis. Do we have something in Biopython, where for a given > set of  sequences we can choose identity threshold to reduce redundancy in > the dataset. > > Cheers, > > Animesh Hi Animesh, There are probably 100s of ways to do this. I think you should consult the literature as the the best approach (in terms of the algorithm), or talk to a phylogeneticist. Once you have an algorithm in mind, it can probably be done with python. For example, you could do pairwise BLAST alignments (e.g. using the NCBI standalone tools) or maybe pairwise Needleman-Wunsch global alignment (e.g. using the EMBOSS needle tool) and construct a distance matrix in terms of percentage identity. You could build a rough phylogenetic tree (perhaps using NJ if your starting dataset is very large), and use that to sample the nodes to get a fairly uniform distribution w.r.t. the phylogenetic space. These are just rough ideas - I am not a phylogenetics specialist. I have a vague recollection that one of the sequence alignment tools includes an option to do something like this for you... but I can't remember the details. Peter _______________________________________________ Biopython mailing list - [EMAIL PROTECTED] http://lists.open-bio.org/mailman/listinfo/biopython
- [Biopython] Divergent sequence data set Animesh Agrawal 2009/11/18
- Re: [Biopython] Divergent sequence data set Christian Schäfer 2009/11/18
- Re: [Biopython] Divergent sequence data set Peter 2009/11/18 <=