orca_utils module¶
This module contains the utilities for Orca-based applications, including a class for structural variants and plotting utilities.
-
class
orca_utils.
GRange
(chr, start, end, strand)¶ Bases:
tuple
-
property
chr
¶ Alias for field number 0
-
property
end
¶ Alias for field number 2
-
property
start
¶ Alias for field number 1
-
property
strand
¶ Alias for field number 3
-
property
-
class
orca_utils.
LGRange
(len, ref)¶ Bases:
tuple
-
property
len
¶ Alias for field number 0
-
property
ref
¶ Alias for field number 1
-
property
-
class
orca_utils.
StructuralChange2
(chr_name, length)[source]¶ Bases:
object
This class stores and manupulating structural changes for a single chromosome and allow querying the mutated chromosome by coordinates by providing utilities for retrieving the corresponding reference genome segments.
The basic operations that StructuralChange2 supports are duplication, deletion, inversion, insertion, and concatenation. StructuralChange2 objects can be concatenated with ‘+’ operator, this operation allows concatenating two chromosomes. ‘+’ can be combined with other basic operations to create fused chromosomes.
These operations can be used sequentially to introduce arbitrarily complex structural changes. However, note that the coordinates are dynamically updated after each operation reflecting the current state of the chromosome, thus coordinates specified in later operation must take into account of the effects of all previous operations.
- Parameters
-
segments
¶ List of reference genome segments that constitute the (mutated) chromosome. Each element is a LGRange namedtuple (length and a GRange namedtuple (chr: str, start: int, end: int, strand: str)).
-
coord_points
¶ Stores N+1 key coordinates where N is the number of segments. The key coordinates are 0, segment junction positions, and chromosome end coordinate. coord_points reflects the current state of the chromosome.
-
orca_utils.
coord_clip
(pos, chrlen, binsize=128000, window_radius=16000000)[source]¶ Clip the coordinate to make sure that full window centered at the coordinate to stay within chromosome boundaries. coord_clip also try to preserve the relative position of the coordinate to the grid as specified by binsize whenever possible.
- Parameters
x (int or numpy.ndarray) – Coordinates to round.
gridsize (int) – The gridsize to round by
- Returns
The clipped coordinate
- Return type
-
orca_utils.
coord_round
(x, gridsize=4000)[source]¶ Round coordinate to multiples of gridsize.
- Parameters
x (int or numpy.ndarray) – Coordinates to round.
gridsize (int) – The gridsize to round by
- Returns
The rounded coordinate
- Return type
-
orca_utils.
genomeplot
(output, show_genes=False, show_tracks=False, show_coordinates=True, unscaled=False, file=None, cmap=None, unscaled_cmap=None, colorbar=True, maskpred=False, vmin=-1, vmax=2, model_labels=['H1-ESC', 'HFF'])[source]¶ Plot the multiscale prediction outputs for 32Mb output.
- Parameters
output (dict) – The result dictionary to plot as returned by genomepredict_256Mb.
show_genes (bool, optional) – Default is False. If True, plot the retrieved gene annotations corresponding to all windows used for the multiscale prediction.
show_tracks (bool, optional) – Default is False. If True, plot the retrieved chromatin tracks for CTCF, chromatin accessibility and histone marks for all windows used for the multiscale prediction.
show_coordinates (bool, optional) – Default is True. If True, annotate the generated plot with the genome coordinates.
unscaled (bool, optional) – Default is False. If True, plot the predictions and observations without normalizing by distance-based expectation.
file (str or None, optional) – Default is None. The output file prefix. No output file is generated if set to None.
cmap (str or None, optional) – Default is None. The colormap for plotting scaled interactions (log fold over distance-based background). If None, use colormaps.hnh_cmap_ext5.
unscaled_cmap (str or None, optional) – Default is None. The colormap for plotting unscaled interactions (log balanced contact score). If None, use colormaps.hnh_cmap_ext5.
colorbar (bool, optional) – Default is True. Whether to plot the colorbar.
maskpred (bool, optional) – Default is True. If True, the prediction heatmaps are masked at positions where the observed data have missing values when observed data are provided in output dict.
vmin (int, optional) – Default is -1. The lowerbound value for heatmap colormap.
vmax (int, optional) – Default is 2. The upperbound value for heatmap colormap.
model_labels (list(str), optional) – Model labels for plotting. Default is [“H1-ESC”, “HFF”].
- Returns
- Return type
-
orca_utils.
genomeplot_256Mb
(output, show_coordinates=True, unscaled=False, file=None, cmap=None, unscaled_cmap=None, colorbar=True, maskpred=True, vmin=-1, vmax=2, model_labels=['H1-ESC', 'HFF'])[source]¶ Plot the multiscale prediction outputs for 256Mb output.
- Parameters
output (dict) – The result dictionary to plot as returned by genomepredict_256Mb.
show_coordinates (bool, optional) – Default is True. If True, annotate the generated plot with the genome coordinates.
unscaled (bool, optional) – Default is False. If True, plot the predictions and observations without normalizing by distance-based expectation.
file (str or None, optional) – Default is None. The output file prefix. No output file is generated if set to None.
cmap (str or None, optional) – Default is None. The colormap for plotting scaled interactions (log fold over distance-based background). If None, use colormaps.hnh_cmap_ext5.
unscaled_cmap (str or None, optional) – Default is None. The colormap for plotting unscaled interactions (log balanced contact score). If None, use colormaps.hnh_cmap_ext5.
colorbar (bool, optional) – Default is True. Whether to plot the colorbar.
maskpred (bool, optional) – Default is True. If True, the prediction heatmaps are masked at positions where the observed data have missing values when observed data are provided in output dict.
vmin (int, optional) – Default is -1. The lowerbound value for heatmap colormap.
vmax (int, optional) – Default is 2. The upperbound value for heatmap colormap.
model_labels (list(str), optional) – Model labels for plotting. Default is [“H1-ESC”, “HFF”].
- Returns
- Return type
-
orca_utils.
process_anno
(anno_scaled, base=0, window_radius=16000000)[source]¶ Process annotations to the format used by Orca plotting functions such as genomeplot and genomeplot_256Mb.
- Parameters
anno_scaled (list(list(..))) – List of annotations. Each annotation can be a region specified by [start: int, end: int, info:str] or a position specified by [pos: int, info:str]. Acceptable info strings for region currently include color names for matplotlib. Acceptable info strings for position are currently ‘single’ or ‘double’, which direct whether the annotation is drawn by single or double lines.
base (int) – The starting position of the 32Mb (if window_radius is 16000000) or 256Mb (if window_radius is 128000000) region analyzed.
window_radius (int) – The size of the region analyzed. It must be either 16000000 (32Mb region) or 128000000 (256Mb region).
- Returns
annotation – Processed annotations with coordinates transformed to relative coordinate in the range of 0-1.
- Return type