util module

ensure_dir

Check to make sure the supplied directory path does not exist, if so, create it.

usage: phylotoast.util.ensure_dir(d)
d:

It is the full path to a directory.

return:

Does not return anything, but creates a directory path if it doesn’t exist already.


file_handle

Takes either a file path or an open file handle, checks validity and returns an open file handle or raises an appropriate Exception.

usage: phylotoast.util.file_handle(fnh, mode='rU')
fnh:

It is the full path to a file, or open file handle.

mode:

The way in which this file will be used, for example to read or write or both. By default, file will be opened in rU mode.

return:
Returns an opened file for appropriate usage.

gather_categories

Find the user specified categories in the map and create a dictionary to contain the relevant data for each type within the categories. Multiple categories will have their types combined such that each possible combination will have its own entry in the dictionary.

usage: phylotoast.util.gather_categories(imap, header, categories=None)
imap:

The input mapping file data keyed by SampleID.

header:

The header line from the input mapping file. This will be searched for the user-specified categories.

categories:

The list of user-specified categories from the mapping file.

return:

A sorted dictionary keyed on the combinations of all the types found within the user-specified categories. Each entry will contain an empty DataCategory namedtuple. If no categories are specified, a single entry with the key ‘default’ will be returned.


parseFASTA

Parse the records in a FASTA-format file by first reading the entire file into memory.

usage: phylotoast.util.parseFASTA(fastaFNH)
fastaFNH:

The data source from which to parse the FASTA records. Expects the input to resolve to a collection that can be iterated through, such as an open file handle.

return:

FASTA records containing entries for id, description and data.


parse_map_file

Opens a QIIME mapping file and stores the contents in a dictionary keyed on SampleID (default) or a user-supplied one. The only required fields are SampleID, BarcodeSequence, LinkerPrimerSequence (in that order), and Description (which must be the final field).

usage: phylotoast.util.parse_map_file(mapFNH)
mapFNH:

Either the full path to the map file or an open file handle.

return:

A tuple of header line for mapping file and a map associating each line of the mapping file with the appropriate sample ID (each value of the map also contains the sample ID). An OrderedDict is used for mapping so the returned map is guaranteed to have the same order as the input file.


parse_taxonomy_table

Greengenes provides a file each OTU a full taxonomic designation. This method parses that file into a map with (key,val) = (OTU, taxonomy).

usage: phylotoast.util.parse_taxonomy_table(idtaxFNH)
idtaxFNH:

Either the full path to the map file or an open file handle.

return:

A map associating each OTU ID with the taxonomic specifier. An OrderedDict is used so the returned map is guaranteed to have the same order as the input file.


parse_unifrac

Parses the unifrac results file into a dictionary.

usage: phylotoast.util.parse_unifrac(unifracFN)
unifracFN:

The path to the unifrac results file.

return:

A dictionary with keys: ‘pcd’ (principle coordinates data) which is a dictionary of the data keyed by sample ID, ‘eigvals’ (eigenvalues), and ‘varexp’ (variation explained).


parse_unifrac_v1_8

Function to parse data from older version of unifrac file obtained from Qiime version 1.8 and earlier.

usage: phylotoast.util.parse_unifrac_v1_8(unifrac, file_data)
unifrac:

The path to the unifrac results file.

file_data

Unifrac data lines after stripping whitespace characters.

return:

A dictionary with keys: ‘pcd’ (principle coordinates data) which is a dictionary of the data keyed by sample ID, ‘eigvals’ (eigenvalues), and ‘varexp’ (variation explained).


parse_unifrac_v1_9

Function to parse data from newer version of unifrac file obtained from Qiime version 1.9 and later.

usage: phylotoast.util.parse_unifrac_v1_9(unifrac, file_data)
unifrac:

The path to the unifrac results file.

file_data

Unifrac data lines after stripping whitespace characters.

return:

A dictionary with keys: ‘pcd’ (principle coordinates data) which is a dictionary of the data keyed by sample ID, ‘eigvals’ (eigenvalues), and ‘varexp’ (variation explained).


split_phylogeny

Return either the full or truncated version of a QIIME-formatted taxonomy string.

usage: phylotoast.util.split_phylogeny(p, level='s')
p:

A QIIME-formatted taxonomy string: k__Foo; p__Bar; …

level:

The different level of identification are kingdom (k), phylum (p), class (c),order (o), family (f), genus (g) and species (s). The default level of identification is species.

return:

A QIIME-formatted taxonomy string up to the classification given by param level.


storeFASTA

Parse the records in a FASTA-format file by first reading the entire file into memory.

usage: phylotoast.util.storeFASTA(fastaFNH)
fastaFNH:

The data source from which to parse the FASTA records. Expects the input to resolve to a collection that can be iterated through, such as an open file handle.

return:

FASTA records containing entries for id, description and data.


write_map_file

Given a list of mapping items (in the form described by the parse_mapping_file method) and a header line, write each row to the given input file with fields separated by tabs.

usage: phylotoast.util.write_map_file(mapFNH, items, header)
mapFNH:

Either the full path to the map file or an open file handle.

items:

The list of row entries to be written to the mapping file.

header:

The descriptive column names that are required as the first line of the mapping file.

return:

None.