Calculate genetic flow (Fsp)

get_genetic_flow(fasta, locs, matrix = TRUE, pt)

Arguments

fasta

ape DNAbin object (i.e. from fasta file of SNPs) using read.fasta

locs

a named vector of locations of isolates (e.g. facility of isolation), with the name being the sample ID

matrix

whether to output symmetric matrix (TRUE; default) or long form (FALSE)

pt

a named vector of patients each isolate originated from, with the name being the sample ID. If this information is unavailable, set pt = NULL.

Value

facility x facility matrix with Fsp values

Details

Genetic flow (Fsp) is described in Donker et al. 2017 (mgen.microbiologyresearch.org/pubmed/content/journal/mgen/10.1099/mgen.0.000113). Only bi-allelic sites are included when computing Fsp. The Fsp values are between 0 and 1 where lower values indicate more similar populations. Note that the current implementation of this function is fairly slow, visit https://github.com/nateosher/RPTfast for a faster implementation

Examples

if (FALSE) {
# This takes a long time to run right now!
locs <- metadata %>% dplyr::select(isolate_id, facility) %>% tibble::deframe()
pt <- metadata %>% dplyr::select(isolate_id, patient_id) %>% tibble::deframe()
facil_fsp <- get_genetic_flow(aln, locs, matrix = TRUE, pt)
}