API

Command Line Interface

allfreqs

Calculate allele frequencies from the given input multialignment.

Input can be either a fasta or csv file with multialigned sequences, which may or may not contain the reference sequence in the first position. In the latter case, an additional reference sequence file is needed, either in fasta or csv format.

allfreqs [OPTIONS] INPUT_FILE

Options

-o, --out <out>

Output filename [default: all_freqs.csv]

-r, --reference <reference>

Optional reference file (if not present in INPUT_FILE)

-a, --ambiguous

Show frequencies for ambiguous nucleotides too [default: False]

--version

Show the version and exit.

Arguments

INPUT_FILE

Required argument


Python Module

class allfreqs.allfreqs.AlleleFreqs(multialg: allfreqs.classes.MultiAlignment, reference: allfreqs.classes.Reference, ambiguous: bool = False)[source]

Class used to calculate allele frequencies from a multialignment.

Input can be either a fasta or csv file with multialigned sequences, which may or may not contain the reference sequence in the first position. In the latter case, an additional reference sequence file is needed, either in fasta or csv format.

df

Convert sequences to the proper dataframe for further allele frequency calculations.

frequencies

Calculate allele frequencies for the 4 basic nucleotides, gaps and other (non-canonical) nucleotides.

classmethod from_csv(sequences: str, reference: Optional[str] = None, ambiguous: bool = False, **kwargs)[source]

Read a multialignment from a csv file.

If reference is not provided, it is assumed that the first sequence of the multialignment is the reference sequence. Otherwise, an additional csv file with the reference sequence is needed. In both cases, the input csv file must be composed of two columns only, one for sequences ids and the other for the actual sequences; if not, you can provide additional options for pandas to restrict the number of columns read.

Parameters
  • sequences – input csv file with multialignment

  • reference – optional csv file with reference sequence

  • ambiguous – show frequencies for ambiguous nucleotides too [default: False]

  • **kwargs – additional options for pandas.read_csv()

classmethod from_fasta(sequences: str, reference: Optional[str] = None, ambiguous: bool = False)[source]

Read a multialignment from a fasta file.

If reference is not provided, it is assumed that the first sequence of the multialignment is the reference sequence. Otherwise, an additional fasta file with the reference sequence is needed.

Parameters
  • sequences – input fasta file with multialignment

  • reference – optional fasta file with reference sequence

  • ambiguous – show frequencies for ambiguous nucleotides too [default: False]

to_csv(output_file: str = 'all_freqs.csv')[source]

Write the resulting allele frequency dataframe to disk.

Parameters

output_file – output file name