cogent3.core.alignment.Alignment#

class Alignment(seqs_data: AlignedSeqsDataABC, slice_record: SliceRecord | None = None, **kwargs: Any)#

A collection of aligned sequences.

Attributes:
annotation_db

the annotation database for the collection

array_positions

Returns a numpy array of positions, axis 0 is alignment positions columns in order corresponding to names.

array_seqs

Returns a numpy array of sequences, axis 0 is seqs in order

modified

collection is a modification of underlying storage

name_map

returns mapping of seq names to parent seq names

names

returns the names of the sequences in the collection

num_seqs

the number of sequences in the collection

positions
seqs

iterable of sequences in the collection

storage

the aligned sequence storage instance of the collection

Methods

add_feature(*[, seqid, parent_id, strand, ...])

add feature on named sequence, or on the alignment itself

add_seqs(seqs, **kwargs)

Returns new collection with additional sequences.

alignment_quality([app_name])

Computes the alignment quality using the indicated app

apply_scaled_gaps(other[, aa_to_codon])

applies gaps in self to ungapped sequences

coevolution([stat, segments, drawable, ...])

performs pairwise coevolution measurement

copy([copy_annotations])

creates new instance, only mutable attributes are copied

copy_annotations(seq_db)

copy annotations into attached annotation db

count_ambiguous_per_seq()

Return the counts of ambiguous characters per sequence as a DictArray.

count_gaps_per_pos([include_ambiguity])

return counts of gaps per position as a DictArray

count_gaps_per_seq([induced_by, unique, ...])

return counts of gaps per sequence as a DictArray

counts([motif_length, include_ambiguity, ...])

counts of motifs

counts_per_pos([motif_length, ...])

return MotifCountsArray of counts per position

counts_per_seq([motif_length, ...])

counts of non-overlapping motifs per sequence

deepcopy(**kwargs)

returns deep copy of self

degap([storage_backend])

returns collection sequences without gaps or missing characters.

distance_matrix([calc, drop_invalid, parallel])

Returns pairwise distances between sequences.

dotplot([name1, name2, window, threshold, ...])

make a dotplot between two sequences.

drop_duplicated_seqs()

returns self without duplicated sequences

duplicated_seqs()

returns the names of duplicated sequences

entropy_per_pos([motif_length, ...])

returns shannon entropy per position

entropy_per_seq([motif_length, ...])

Returns the Shannon entropy per sequence.

filtered(predicate[, motif_length, ...])

The alignment positions where predicate(column) is true.

get_ambiguous_positions()

Returns dict of seq:{position:char} for ambiguous chars.

get_degapped_relative_to(name)

Remove all columns with gaps in sequence with given name.

get_drawable(*[, biotype, width, vertical, ...])

make a figure from sequence features

get_drawables(*[, biotype])

returns a dict of drawables, keyed by type

get_features(*[, seqid, biotype, name, ...])

yields Feature instances

get_gap_array([include_ambiguity])

returns bool array with gap state True, False otherwise

get_gapped_seq(seqname[, recode_gaps])

Return a gapped Sequence object for the specified seqname.

get_identical_sets([mask_degen])

returns sets of names for sequences that are identical

get_lengths([include_ambiguity, allow_gap])

returns sequence lengths as a dict of {seqid: length}

get_motif_probs([alphabet, ...])

Return a dictionary of motif probs, calculated as the averaged frequency across sequences.

get_position_indices(f[, negate])

Returns list of column indices for which f(col) is True.

get_projected_feature(*, seqid, feature)

returns an alignment feature projected onto the seqid sequence

get_projected_features(*, seqid, **kwargs)

projects all features from other sequences onto seqid

get_seq(seqname[, copy_annotations])

Return a Sequence object for the specified seqname.

get_seq_names_if(f[, negate])

Returns list of names of seqs where f(seq) is True.

get_similar(target, min_similarity, ...)

Returns new SequenceCollection containing sequences similar to target.

get_translation([gc, incomplete_ok, ...])

translate sequences from nucleic acid to protein

has_annotation_db()

returns True if self has annotation db

has_terminal_stop([gc, strict])

Returns True if any sequence has a terminal stop codon.

information_plot([width, height, window, ...])

plot information per position

is_ragged()

by definition False for an Alignment

iter_positions([pos_order])

Iterates over positions in the alignment, in order.

iter_seqs([seq_order])

Iterates over sequences in the collection, in order.

iupac_consensus([allow_gap])

Returns string containing IUPAC consensus sequence of the alignment.

majority_consensus()

Returns consensus sequence containing most frequent item at each position.

make_feature(*, feature[, on_alignment])

create a feature on named sequence, or on the alignment itself

matching_ref(ref_name, gap_fraction, gap_run)

Returns new alignment with seqs well aligned with a reference.

no_degenerates([motif_length, allow_gap])

returns new alignment without degenerate characters

omit_bad_seqs([quantile])

Returns new alignment without sequences with a number of uniquely introduced gaps exceeding quantile

omit_gap_pos([allowed_gap_frac, motif_length])

Returns new alignment where all cols (motifs) have <= allowed_gap_frac gaps.

pad_seqs([pad_length])

Returns copy in which sequences are padded with the gap character to same length.

probs_per_pos([motif_length, ...])

returns MotifFreqsArray per position

probs_per_seq([motif_length, ...])

return frequency array of motifs per sequence

quick_tree([calc, drop_invalid, parallel, ...])

Returns a phylogenetic tree.

rc()

Returns the reverse complement of all sequences in the alignment.

renamed_seqs(renamer)

Returns new alignment with renamed sequences.

replace_annotation_db(value[, check])

public interface to assigning the annotation_db

reverse_complement()

Returns the reverse complement of all sequences in the collection.

sample(*, n, with_replacement, motif_length, ...)

Returns random sample of positions from self, e.g. to bootstrap.

seqlogo([width, height, wrap, vspace, colours])

returns Drawable sequence logo using mutual information

sliding_windows(window, step[, start, end])

Generator yielding new alignments of given length and interval.

strand_symmetry([motif_length])

returns dict of strand symmetry test results per ungapped seq

take_positions(cols[, negate])

Returns new Alignment containing only specified positions.

take_positions_if(f[, negate])

Returns new Alignment containing cols where f(col) is True.

take_seqs(names[, negate, copy_annotations])

Returns new collection containing only specified seqs.

take_seqs_if(f[, negate])

Returns new collection containing seqs where f(seq) is True.

to_dict(-> dict[str, str]  -> dict[str, str])

Return a dictionary of sequences.

to_dna()

returns copy of self as a collection of DNA moltype seqs

to_fasta([block_size])

Return collection in Fasta format.

to_html([name_order, wrap, limit, colors, ...])

returns html with embedded styles for sequence colouring

to_json()

returns json formatted string

to_moltype(moltype)

returns copy of self with changed moltype

to_phylip()

Return collection in PHYLIP format and mapping to sequence ids

to_pretty([name_order, wrap])

returns a string representation of the alignment in pretty print format

to_rich_dict()

returns a json serialisable dict

to_rna()

returns copy of self as a collection of RNA moltype seqs

variable_positions([include_gap_motif, ...])

Return a list of variable position indexes.

with_masked_annotations(biotypes[, ...])

returns an alignment with regions replaced by mask_char

write(filename[, format_name])

Write the sequences to a file, preserving order of sequences.

from_rich_dict

gapped_by_map

trim_stop_codons

Notes

Should be constructed using make_aligned_seqs().