API¶

Command Line Interface¶

allfreqs¶

Calculate allele frequencies from the given input multialignment.

Input can be either a fasta or csv file with multialigned sequences, which may or may not contain the reference sequence in the first position. In the latter case, an additional reference sequence file is needed, either in fasta or csv format.

allfreqs [OPTIONS] INPUT_FILE

Options

-o, --out <out>¶: Output filename [default: all_freqs.csv]

-r, --reference <reference>¶: Optional reference file (if not present in INPUT_FILE)

-a, --ambiguous¶: Show frequencies for ambiguous nucleotides too [default: False]

--version¶: Show the version and exit.

Arguments

INPUT_FILE¶: Required argument

Python Module¶

class allfreqs.allfreqs.AlleleFreqs(multialg: allfreqs.classes.MultiAlignment, reference: allfreqs.classes.Reference, ambiguous: bool = False)[source]¶

Class used to calculate allele frequencies from a multialignment.

Input can be either a fasta or csv file with multialigned sequences, which may or may not contain the reference sequence in the first position. In the latter case, an additional reference sequence file is needed, either in fasta or csv format.

df¶: Convert sequences to the proper dataframe for further allele frequency calculations.

frequencies¶: Calculate allele frequencies for the 4 basic nucleotides, gaps and other (non-canonical) nucleotides.

classmethod from_csv(sequences: str, reference: Optional[str] = None, ambiguous: bool = False, **kwargs)[source]¶

Read a multialignment from a csv file.

If reference is not provided, it is assumed that the first sequence of the multialignment is the reference sequence. Otherwise, an additional csv file with the reference sequence is needed. In both cases, the input csv file must be composed of two columns only, one for sequences ids and the other for the actual sequences; if not, you can provide additional options for pandas to restrict the number of columns read.

Parameters

sequences – input csv file with multialignment
reference – optional csv file with reference sequence
ambiguous – show frequencies for ambiguous nucleotides too [default: False]
**kwargs – additional options for pandas.read_csv()

classmethod from_fasta(sequences: str, reference: Optional[str] = None, ambiguous: bool = False)[source]¶

Read a multialignment from a fasta file.

If reference is not provided, it is assumed that the first sequence of the multialignment is the reference sequence. Otherwise, an additional fasta file with the reference sequence is needed.

Parameters

sequences – input fasta file with multialignment
reference – optional fasta file with reference sequence
ambiguous – show frequencies for ambiguous nucleotides too [default: False]

to_csv(output_file: str = 'all_freqs.csv')[source]¶

Write the resulting allele frequency dataframe to disk.

Parameters: output_file – output file name