It is one of the two main functions of the MitoHEAR package (together with get_raw_counts_allele). It computes the allele frequencies and the heteroplasmy matrix starting from the counts matrix obtained with get_raw_counts_allele.

get_heteroplasmy(
  raw_counts_allele,
  name_position_allele,
  name_position,
  number_reads,
  number_positions,
  filtering = 1,
  my.clusters = NULL
)

Arguments

raw_counts_allele

A raw counts matrix obtained from get_raw_counts_allele.

name_position_allele

A character vector with elements specifying the genomic coordinate of the base and the allele (obtained from get_raw_counts_allele).

name_position

A character vector with elements specifying the genomic coordinate of the base (obtained from get_raw_counts_allele).

number_reads

Integer specifying the minimum number of counts above which we consider the base covered by the sample.

number_positions

Integer specifying the minimum number of bases that must be covered by the sample (with counts>number_reads), in order to keep the sample for down-stream analysis.

filtering

Numeric value equal to 1 or 2. If 1 then only the bases that are covered by all the samples are kept for the downstream analysis. If 2 then all the bases that are covered by more than 50% of the the samples in each cluster (specified by my.clusters) are kept for the down-stream analysis. Default is 1.

my.clusters

Character vector specifying a partition of the samples. It is only used when filtering is equal to 2. Default is NULL

Value

It returns a list with 5 elements:

sum_matrix

A matrix (n_row=number of sample, n_col=number of bases) with the counts for each sample/base, for all the initial samples and bases included in the raw counts allele matrix.

sum_matrix_qc

A matrix (n_row=number of sample, n_col=number of bases) with the counts for each sample/base, for all the samples and bases that pass the two consequentially filtering steps.

heteroplasmy_matrix

A matrix with the same dimension of sum_matrix_qc where each entry (i,j) is the heteroplasmy for sample i in base j.

allele_matrix

A matrix (n_row=number of sample, n_col=4*number of bases) with allele frequencies, for all the samples and bases that pass the two consequentially filtering steps.

index

Indices of the samples that cover a base, for all bases and samples that pass the two consequentially filtering steps (if filtering = 2); if all the samples cover all the bases (that is the case for filtering = 1), then index is NULL

Details

Starting from raw counts allele matrix, the function performed two consequentially filtering steps. The first one is on the samples, keeping only the ones that cover a number of bases above number_positions. The second one is on the bases, defined by the parameter filtering. The heteroplasmy for each sample-base pair is computed as 1-max(f), where f are the frequencies of the four alleles.

Author

Gabriele Lubatti gabriele.lubatti@helmholtz-muenchen.de