Usage

Command Line Interface

allfreqs can be used as a command line tool, using the allfreqs command and providing the input fasta or csv file with multialigned sequences:

# multialignment in fasta format
$ allfreqs multialg_seqs.fasta
# if reference is stored separately:
$ allfreqs multialg_seqs.fasta --reference my_ref.fasta

# multialignment in csv format
# e.g. seq1,ACGTACGT
#      seq2,A-CTAGGT
$ allfreqs multialg_seqs.csv
# if reference is stored separately:
$ allfreqs multialg_seqs.csv --reference my_ref.csv

The program will use the first sequence in the multialignment as the reference sequence; if this is not the case, you can supply a reference sequence using the --reference|-r option followed by the fasta or csv file with the desired reference sequence to use. Please note that in this case both multialigned sequences and reference sequence must be in the same format (both fasta or csv files).

By default, allfreqs will add frequencies of non-standard (ambiguous) nucleotides together, showing them in the oth column of the output; it is possible to show them in separate columns, specific for each of them, using the --ambiguous flag.

allfreqs will calculate allele frequencies for each position in the multialignment and save them as a csv file called all_freqs.csv in the current working directory. It is possible to specify a different output location using the --out|-o option followed by the desired path/filename.


Python Module

allfreqs can be used in a Python script by importing its AlleleFreqs class.

This class has two methods, .from_fasta() and .from_csv(), which can be used to load multialignments from either fasta or csv files respectively. Both methods accept a mandatory sequences argument, which specifies the file containing multialigned sequences, and an optional reference argument, which can be used to specify the reference sequence in case it is not reported as the first sequence in the provided sequences file:

from allfreqs import AlleleFreqs

# multialignment in fasta format
a = AlleleFreqs.from_fasta(sequences="multialg_seqs.fasta")
# if reference is stored separately:
a = AlleleFreqs.from_fasta(sequences="multialg_seqs.fasta", reference="my_ref.fasta")

# multialignment in csv format
a = AlleleFreqs.from_csv(sequences="multialg_seqs.csv")
# if reference is stored separately:
a = AlleleFreqs.from_csv(sequences="multialg_seqs.csv", reference="my_ref.csv")

The AlleleFreqs class has two useful properties:

  • df, which returns a dataframe with sequences as rows and single positions as columns;

  • frequencies, which returns a dataframe with the actual allele frequencies for each position.

By default, allfreqs will add frequencies of non-standard (ambiguous) nucleotides together, showing them in the oth column of the output; it is possible to show them in separate columns, specific for each of them, using the ambiguous=True option.

These allele frequencies can be saved to a csv file using the .to_csv() method; by default they will be saved to a file called all_freqs.csv in the current working directory, but this can be overridden by providing the output_file argument to .to_csv().