cogent3.core.seq_storage.AlignedSeqsData#

class AlignedSeqsData(*, gapped_seqs: NumpyIntArrayType, names: Collection[str], alphabet: c3_alphabet.CharAlphabet[Any], ungapped_seqs: dict[str, NumpyIntArrayType] | None = None, gaps: Mapping[str, NumpyIntArrayType] | None = None, offset: dict[str, int] | None = None, align_len: int | None = None, check: bool = True, reversed_seqs: set[str] | None = None)#

The builtin cogent3 implementation of aligned sequences storage underlying an Alignment. Indexing this object returns an AlignedDataView which can realise the corresponding slice as a string, bytes, or numpy array, gapped or ungapped.

Attributes:
align_len

Return the length of the alignment.

alphabet

the character alphabet for validating, encoding, decoding sequences

names

returns the names of the sequences in the storage

offset

returns the offset of each sequence in the Alignment

reversed_seqs

names of sequences that are reverse complemented

Methods

add_seqs(seqs[, force_unique_keys, offset])

Returns a new AlignedSeqsData object with added sequences.

copy(**kwargs)

shallow copy of self

from_names_and_array(*, names, data, alphabet)

Construct an AlignedSeqsData object from a list of names and a numpy array of aligned sequence data.

from_seqs(*, data, alphabet, **kwargs)

Construct an AlignedSeqsData object from a dict of aligned sequences

from_seqs_and_gaps(*, seqs, gaps, alphabet, ...)

Construct an AlignedSeqsData object from a dict of ungapped sequences and a corresponding dict of gap data.

get_gapped_seq_array(*, seqid[, start, ...])

Return sequence data corresponding to seqid as an array of indices.

get_gapped_seq_bytes(*, seqid[, start, ...])

Return sequence corresponding to seqid as a bytes string.

get_gapped_seq_str(*, seqid[, start, stop, step])

Return sequence corresponding to seqid as a string.

get_gaps(seqid)

returns the gap data for seqid

get_hash(seqid)

returns hash of seqid

get_pos_range(names[, start, stop, step])

returns an array of the selected positions for names.

get_positions(names, positions)

returns alignment positions for names

get_seq_array(*, seqid[, start, stop, step])

Return ungapped sequence corresponding to seqid as an array of indices.

get_seq_bytes(*, seqid[, start, stop, step])

Return ungapped sequence corresponding to seqid as a bytes string.

get_seq_length(seqid)

return length of the unaligned seq for seqid

get_seq_str(*, seqid[, start, stop, step])

Return ungapped sequence corresponding to seqid as a string.

get_ungapped(name_map[, start, stop, step])

Returns a dictionary of sequence data with no gaps or missing characters and a dictionary with information to construct a new SequenceCollection via make_unaligned_seqs.

get_view(seqid[, slice_record])

reurns view of aligned sequence data for seqid

to_alphabet(alphabet[, check_valid])

Returns a new AlignedSeqsData object with the same underlying data with a new alphabet.

variable_positions(names[, start, stop, step])

returns absolute indices of positions that have more than one state

Notes

Methods on this object only accepts plust strand start, stop and step indices for selecting segments of data. It can return the gap coordinates for a sequence as used by IndelMap.