A Python library for doing fast, thread-safe computations with phylogenetic trees.
New for SuchTree v1.2
- Quartet topology tests provided by
SuchTree.get_quartet_topology( a, b, c, d )
- Optimized, thread-safe bulk quartet topology tests provided by
SuchTree.quartet_topologies( [N,4] )
- SuchTree now automatically detects and uses NEWICK strings as for initialization
New for SuchTree v1.1
- Basic support for support values provided by
SuchTree.get_support( node_id )
- Relative evolutionary divergence (RED)
- Bipartitions
- Node generators for in-order and preorder traversal
- Summary of leaf relationships via
SuchTree.relationships()
You have a phylogenetic tree. You want to do some statistics with it. No
problem! There are lots of packages in Python that let you manipulate
phylogenies, like dendropy
,
scikit-bio
and
ete3
. If your tree isn't too big and your statistical
method doesn't require too many traversals, you you have a lot of great
options. If you're working with about a thousand taxa or less, you should
have no problem. You can forget about SuchTree
and use a tree package that
has lots of cool features.
However, if you are working with trees that include tens of thousands, or
maybe even millions of organisms, you are going to run into problems. ete3
,
dendropy
and scikit-bio
's TreeNode
are all designed to give you lots of
flexibility. You can re-root trees, use different traversal schemes, attach
metadata to nodes, attach and detach nodes, splice sub-trees into or out of
the main tree, plot trees for publication figures and do lots of other useful
things. That power and flexibility comes with a price -- speed.
For trees of moderate size, it is possible to solve the speed issue by working with matrix representations of the tree. Unfortunately, most representations scale quadratically with the number of taxa in the tree. A distance matrix for a tree of 100,000 taxa will consume about 20GB of RAM. If your method performs sampling, then almost every operation will be a cache miss. Even if you have the RAM, it will be painfully slow.
Suppose you have more than one group of organisms, and you want to study the way their interactions have influenced their evolution. Now, you have several trees that link together to form a generalized graph. Oh no, not graph theory!
Calm yourself! SuchLinkedTrees
has you covered. At the moment,
SuchLinkedTrees
supports trees of two interacting groups, but work is
underway to generalize it to any number of groups. Like SuchTree
,
SuchLinkedTrees
is not intended to be a general-purpose graph theory
package. Instead, it leverages SuchTree
to efficiently handle the
problem-specific tasks of working with co-phylogeny systems. It will load
your datasets. It will build the graphs. It will let you subset the graphs
using their phylogenetic or ecological properties. It will generate
weighted adjacency and Laplacian matrixes of the whole graph or of subgraphs
you have selected. It will generate spectral decompositions of subgraphs if
spectral graph theory is your thing.
And, if that doesn't solve your problem, it will emit sugraphs as Graph
objects for use with the igraph
network analysis
package, or node and edge data for building graphs in
networkx
. Now you can do even more things.
Maybe you want to get all crazy with some
graph kernels?
Well, now you can.
SuchTree
is motivated by a simple the observation. A distance matrix of
100,000 taxa is quite bulky, but the tree it represents can be made to fit
into about 7.6MB of RAM if implemented using only C
primitives. This is
small enough to fit into L2 cache on many modern microprocessors. This comes
at the cost of traversing the tree for every calculation (about 16 hops from
leaf to root for a 100,000 taxa tree), but, as these operations all happen
on-chip, the processor can take full advantage of
pipelining,
speculative execution
and other optimizations available in modern CPUs.
Here, we use SuchTree
to compare the topology of two trees built
from the same 54,327 sequences using two methods : neighbor joining
and Morgan Price's FastTree
approximate maximum likelihood algorithm. Using one million randomly
chosen pairs of leaf nodes, we look at the patristic distances in each
of the two trees, plot them against one another, and compute
correlation coefficients.
On an Intel i7-3770S, SuchTree
completes the two million distance
calculations in a little more than ten seconds.
from SuchTree import SuchTree
import random
T1 = SuchTree( 'data/bigtrees/ml.tree' )
T2 = SuchTree( 'data/bigtrees/nj.tree' )
print( 'nodes : %d, leafs : %d' % ( T1.length, len(T1.leafs) ) )
print( 'nodes : %d, leafs : %d' % ( T2.length, len(T2.leafs) ) )
nodes : 108653, leafs : 54327
nodes : 108653, leafs : 54327
N = 1000000
v = list( T1.leafs.keys() )
pairs = []
for i in range(N) :
pairs.append( ( random.choice( v ), random.choice( v ) ) )
%time D1 = T1.distances_by_name( pairs ); D2 = T2.distances_by_name( pairs )
CPU times: user 10.1 s, sys: 0 ns, total: 10.1 s
Wall time: 10.1 s
from scipy.stats import kendalltau, pearsonr
print( 'Kendall\'s tau : %0.3f' % kendalltau( D1, D2 )[0] )
print( 'Pearson\'s r : %0.3f' % pearsonr( D1, D2 )[0] )
Kendall's tau : 0.709
Pearson's r : 0.969
SuchTree
depends on the following packages :
scipy
numpy
dendropy
cython
pandas
To install the current release, you can install from PyPI :
pip install SuchTree
If you install using pip
, binary packages
(wheels
) are available for CPython 3.6, 3.7,
3.8, 3.9, 3.10 and 3.11 on Linux x86_64 and on MacOS with Intel and Apple
silicon. If your platform isn't in that list, but it is supported by
cibuildwheel
, please file an issue
to request your platform! I would be absolutely delighted if someone was
actually running SuchTree
on an exotic embedded system or a mainframe.
To install the most recent development version :
git clone https://github.com/ryneches/SuchTree.git
cd SuchTree
./setup.py install
SuchTree
will accept either a URL or a file path :
from SuchTree import SuchTree
T = SuchTree( 'test.tree' )
T = SuchTree( 'https://github.com/ryneches/SuchTree/blob/master/data/gopher-louse/gopher.tree' )
The available properties are :
length
: the number of nodes in the treedepth
: the maximum depth of the treeroot
: the id of the root nodeleafs
: a dictionary mapping leaf names to their idsleafnodes
: a dictionary mapping leaf node ids to leaf names
The available methods are :
get_parent
: for a given node id or leaf name, return the parent idget_children
: for a given node id or leaf name, return the ids of the child nodes (leaf nodes have no children, so their child node ids will always be -1)get_distance_to_root
: for a given node id or leaf name, return the integrated phylogenetic distance to the root nodemrca
: for a given pair of node ids or leaf names, return the id of the nearest node that is parent to bothdistance
: for a given pair of node ids or leaf names, return the patristic distance between the pairdistances
: for an (n,2) array of pairs of node ids, return an (n) array of patristic distances between the pairsdistances_by_name
for an (n,2) list of pairs of leaf names, return an (n) list of patristic distances between each pairdump_array
: print out the entire tree (for debugging only! May produce pathologically gigantic output.)
For analysis of ecological interactions, SuchTree
is distributed
with a curated collection of several different examples from the
literature. Additionally, a collection of simulated interactions with
various properties, along with an annotated notebook of Python
code
for generating them, is also included. Interactions are registered in
a JSON object (data/studies.json
).
- gopher-louse Hafner, M.S. & Nadler, S.A. 1988. Phylogenetic trees support the coevolution of parasites and their hosts. Nature 332: 258-259)
- dove-louse Dale H. Clayton, Sarah E. Bush, Brad M. Goates, and Kevin P. Johnson. 2003. Host defense reinforces host–parasite cospeciation. PNAS.
- sedge-smut Escudero, Marcial. 2015. Phylogenetic congruence of parasitic smut fungi (Anthracoidea, Anthracoideaceae) and their host plants (Carex, Cyperaceae): Cospeciation or host-shift speciation? American journal of botany.
- fish-worm Maarten P. M. Vanhove, Antoine Pariselle, Maarten Van Steenberge, Joost A. M. Raeymaekers, Pascal I. Hablützel, Céline Gillardin, Bart Hellemans, Floris C. Breman, Stephan Koblmüller, Christian Sturmbauer, Jos Snoeks, Filip A. M. Volckaert & Tine Huyse. 2015. Hidden biodiversity in an ancient lake: phylogenetic congruence between Lake Tanganyika tropheine cichlids and their monogenean flatworm parasites, Scientific Reports.
These were originally collected by Enrico Rezende et al. :
Enrico L. Rezende, Jessica E. Lavabre, Paulo R. Guimarães, Pedro Jordano & Jordi Bascompte "Non-random coextinctions in phylogenetically structured mutualistic networks," Nature, 2007
- arr1 Arroyo, M.T.K., R. Primack & J.J. Armesto. 1982. Community studies in pollination ecology in the high temperate Andes of central Chile. I. Pollination mechanisms and altitudinal variation. Amer. J. Bot. 69:82-97.
- arr2 Arroyo, M.T.K., R. Primack & J.J. Armesto. 1982. Community studies in pollination ecology in the high temperate Andes of central Chile. I. Pollination mechanisms and altitudinal variation. Amer. J. Bot. 69:82-97.
- arr3 Arroyo, M.T.K., R. Primack & J.J. Armesto. 1982. Community studies in pollination ecology in the high temperate Andes of central Chile. I. Pollination mechanisms and altitudinal variation. Amer. J. Bot. 69:82-97.
- bahe Barrett, S. C. H., and K. Helenurm. 1987. The Reproductive-Biology of Boreal Forest Herbs.1. Breeding Systems and Pollination. Canadian Journal of Botany 65:2036-2046.
- cllo Clements, R. E., and F. L. Long. 1923, Experimental pollination. An outline of the ecology of flowers and insects. Washington, D.C., USA, Carnegie Institute of Washington.
- dihi Dicks, LV, Corbet, SA and Pywell, RF 2002. Compartmentalization in plant–insect flower visitor webs. J. Anim. Ecol. 71: 32–43
- dish Dicks, LV, Corbet, SA and Pywell, RF 2002. Compartmentalization in plant–insect flower visitor webs. J. Anim. Ecol. 71: 32–43
- dupo Dupont YL, Hansen DM and Olesen JM 2003 Structure of a plant-flower-visitor network in the high-altitude sub-alpine desert of Tenerife, Canary Islands. Ecography 26:301-310
- eol Elberling, H., and J. M. Olesen. 1999. The structure of a high latitude plant-flower visitor system: the dominance of flies. Ecography 22:314-323.
- eolz Elberling & Olesen unpubl.
- eski Eskildsen et al. unpubl.
- herr Herrera, J. 1988 Pollination relatioships in southern spanish mediterranean shrublands. Journal of Ecology 76: 274-287.
- hock Hocking, B. 1968. Insect-flower associations in the high Arctic with special reference to nectar. Oikos 19:359-388.
- inpk Inouye, D. W., and G. H. Pyke. 1988. Pollination biology in the Snowy Mountains of Australia: comparisons with montane Colorado, USA. Australian Journal of Ecology 13:191-210.
- kevn Kevan P. G. 1970. High Arctic insect-flower relations: The interrelationships of arthropods and flowers at Lake Hazen, Ellesmere Island, Northwest Territories, Canada. Ph.D. thesis, University of Alberta, Edmonton, 399 pp.
- kt90 Kato, M., Kakutani, T., Inoue, T. and Itino, T. (1990). Insect-flower relationship in the primary beech forest of Ashu, Kyoto: An overview of the flowering phenology and the seasonal pattern of insect visits. Contrib. Biol. Lab., Kyoto, Univ., 27, 309-375.
- med1 Medan, D., N. H. Montaldo, M. Devoto, A. Mantese, V. Vasellati, and N. H. Bartoloni. 2002. Plant-pollinator relationships at two altitudes in the Andes of Mendoza, Argentina. Arctic Antarctic and Alpine Research 34:233-241.
- med2 Medan, D., N. H. Montaldo, M. Devoto, A. Mantese, V. Vasellati, and N. H. Bartoloni. 2002. Plant-pollinator relationships at two altitudes in the Andes of Mendoza, Argentina. Arctic Antarctic and Alpine Research 34:233-241.
- memm Memmott J. 1999. The structure of a plant-pollinator food web. Ecology Letters 2:276-280.
- moma Mosquin, T., and J. E. H. Martin. 1967. Observations on the pollination biology of plants on Melville Island, N.W.T., Canada. Canadian Field Naturalist 81:201-205.
- mott Motten, A. F. 1982. Pollination Ecology of the Spring Wildflower Community in the Deciduous Forests of Piedmont North Carolina. Doctoral Dissertation thesis, Duke University, Duhram, North Carolina, USA; Motten, A. F. 1986. Pollination ecology of the spring wildflower community of a temperate deciduous forest. Ecological Monographs 56:21-42.
- mull McMullen 1993
- oflo Olesen unpubl.
- ofst Olesen unpubl.
- olau Olesen unpubl.
- olle Ollerton, J., S. D. Johnson, L. Cranmer, and S. Kellie. 2003. The pollination ecology of an assemblage of grassland asclepiads in South Africa. Annals of Botany 92:807-834.
- perc Percival, M. 1974. Floral ecology of coastal scrub in sotheast Jamaica. Biotropica, 6, 104-129.
- prap Primack, R.B. 1983. Insect pollination in the New Zealand mountain flora. New Zealand J. Bot. 21, 317-333, AB.
- prca Primack, R.B. 1983. Insect pollination in the New Zealand mountain flora. New Zealand J. Bot. 21, 317-333. Cass
- prcg Primack, R.B. 1983. Insect pollination in the New Zealand mountain flora. New Zealand J. Bot. 21, 317-333. Craigieb.
- ptnd Petanidou, T. 1991. Pollination ecology in a phryganic ecosystem. Unp. PhD. Thesis, Aristotelian University, Thessaloniki.
- rabr Ramirez, N., and Y. Brito. 1992. Pollination Biology in a Palm Swamp Community in the Venezuelan Central Plains. Botanical Journal of the Linnean Society 110:277-302.
- rmrz Ramirez, N. 1989. Biología de polinización en una comunidad arbustiva tropical de la alta Guyana Venezolana. Biotropica 21, 319-330.
- schm Schemske, D. W., M. F. Willson, M. N. Melampy, L. J. Miller, L. Verner, K. M. Schemske, and L. B. Best. 1978. Flowering Ecology of Some Spring Woodland Herbs. Ecology 59:351-366.
- smal Small, E. 1976. Insect pollinators of the Mer Bleue peat bog of Ottawa. Canadian Field Naturalist 90:22-28.
- smra Smith-Ramírez C., P. Martinez, M. Nuñez, C. González and J. J. Armesto 2005 Diversity, flower visitation frequency and generalism of pollinators in temperate rain forests of Chiloé Island,Chile. Botanical Journal of the Linnean Society, 2005, 147, 399–416.
- bair Baird, J.W. 1980. The selection and use of fruit by birds in an eastern forest. Wilson Bulletin 92: 63-73.
- beeh Beehler, B. 1983. Frugivory and polygamy in birds of paradise. Auk, 100: 1-12.
- cacg Carlo et al. 2003. Avian fruit preferences across a Puerto Rican forested landscape: pattern consistency and implications for seed removal. Oecologia 134: 119-131
- caci Carlo et al. 2003. Avian fruit preferences across a Puerto Rican forested landscape: pattern consistency and implications for seed removal. Oecologia 134: 119-131
- caco Carlo et al. 2003. Avian fruit preferences across a Puerto Rican forested landscape: pattern consistency and implications for seed removal. Oecologia 134: 119-131
- cafr Carlo et al. 2003. Avian fruit preferences across a Puerto Rican forested landscape: pattern consistency and implications for seed removal. Oecologia 134: 119-131
- crom Crome, F.H.J. 1975. The ecology of fruit pigeons in tropical Northern Queensland. Australian Journal of Wildlife Research, 2: 155-185.
- fros Frost, P.G.H. 1980. Fruit-frugivore interactions in a South African coastal dune forest. Pages 1179-1184 in: R. Noring (ed.). Acta XVII Congresus Internationalis Ornithologici, Deutsches Ornithologische Gessenshaft, Berlin.
- gen1 Galetti, M., Pizo, M.A. 1996. Fruit eating birds in a forest fragment in southeastern Brazil. Ararajuba, Revista Brasileira de Ornitologia, 4: 71-79.
- gen2 Galetti, M., Pizo, M.A. 1996. Fruit eating birds in a forest fragment in southeastern Brazil. Ararajuba, Revista Brasileira de Ornitologia, 4: 71-79.
- hamm Hammann, A. & Curio, B. 1999. Interactions among frugivores and fleshy fruit trees in a Philippine submontane rainforest
- hrat Jordano P. 1985. El ciclo anual de los paseriformes frugívoros en el matorral mediterráneo del sur de España: importancia de su invernada y variaciones interanuales. Ardeola, 32, 69-94.
- kant Kantak, G.E. 1979. Observations on some fruit-eating birds in Mexico. Auk, 96: 183-186.
- lamb Lambert F. 1989. Fig-eating by birds in a Malaysian lowland rain forest. J. Trop. Ecol., 5, 401-412.
- lope Tutin, C.E.G., Ham, R.M., White, L.J.T., Harrison, M.J.S. 1997. The primate community of the Lopé Reserve, Gabon: diets, responses to fruit scarcity, and effects on biomass. American Journal of Primatology, 42: 1-24.
- mack Mack, AL and Wright, DD. 1996. Notes on occurrence and feeding of birds at Crater Mountain Biological Research Station, Papua New Guinea. Emu 96: 89-101.
- mont Wheelwright, N.T., Haber, W.A., Murray, K.G., Guindon, C. 1984. Tropical fruit-eating birds and their food plants: a survey of a Costa Rican lower montane forest. Biotropica, 16: 173-192.
- ncor P. Jordano, unpubl.
- nnog P. Jordano, unpubl.
- sapf Noma, N. 1997. Annual fluctuations of sapfruits production and synchronization within and inter species in a warm temperate forest on Yakushima Island, Japan. Tropics, 6: 441-449.
- snow Snow, B.K., Snow, D.W. 1971. The feeding ecology of tanagers and honeycreepers in Trinidad. Auk, 88: 291-322.
- wes Silva, W.R., P. De Marco, E. Hasui, and V.S.M. Gomes, 2002. Patterns of fruit-frugivores interactions in two Atlantic Forest bird communities of South-eastern Brazil: implications for conservation. Pp. 423-435. In: D.J. Levey, W.R. Silva and M. Galetti (eds.) Seed dispersal and frugivory: ecology, evolution and conservation. Wallinford: CAB International.
- wyth Snow B.K. & Snow D.W. 1988. Birds and berries, Calton, England.
Special thanks to @camillescott and @pmarkowsky for their many helpful suggestions (and for their patience).