Calculate genetic flow (Fsp)
get_genetic_flow(fasta, locs, matrix = TRUE, pt)
fasta | ape DNAbin object (i.e. from fasta file of SNPs) using read.fasta |
---|---|
locs | a named vector of locations of isolates (e.g. facility of isolation), with the name being the sample ID |
matrix | whether to output symmetric matrix (TRUE; default) or long form (FALSE) |
pt | a named vector of patients each isolate originated from, with the name being the sample ID. If this information is unavailable, set pt = NULL. |
facility x facility matrix with Fsp values
Genetic flow (Fsp) is described in Donker et al. 2017 (mgen.microbiologyresearch.org/pubmed/content/journal/mgen/10.1099/mgen.0.000113). Only bi-allelic sites are included when computing Fsp. The Fsp values are between 0 and 1 where lower values indicate more similar populations. Note that the current implementation of this function is fairly slow, visit https://github.com/nateosher/RPTfast for a faster implementation
if (FALSE) { # This takes a long time to run right now! locs <- metadata %>% dplyr::select(isolate_id, facility) %>% tibble::deframe() pt <- metadata %>% dplyr::select(isolate_id, patient_id) %>% tibble::deframe() facil_fsp <- get_genetic_flow(aln, locs, matrix = TRUE, pt) }