deepnog.utils package¶

deepnog.utils.bio module¶

deepnog.utils.bio.parse(p: pathlib.Path, fformat: str = 'fasta', alphabet=None) → Iterator[source]¶

Parse a possibly compressed sequence file.

Parameters

p (Path or str) – Path to sequence file
fformat (str) – File format supported by Biopython.SeqIO.parse, e.g “fasta”
alphabet (any) – Pass alphabet to SeqIO.parse

Returns

it – The SeqIO.parse iterator yielding SeqRecords

Return type

Iterator

deepnog.utils.config module¶

deepnog.utils.config.get_config(config_file: Optional[Union[pathlib.Path, str]] = None) → Dict[source]¶

Get a config dictionary

If no file is provided, look in the DEEPNOG_CONFIG env variable for the path. If this fails, load a default config file (lacking any user customization).

This contains the available models (databases, levels). Additional config may be added in future releases.

deepnog.utils.io_utils module¶

Author: Roman Feldbauer

Date: 2020-02-19

Description:

Input/output helper functions

deepnog.utils.io_utils.create_df(class_labels: list, preds: torch.Tensor, confs: torch.Tensor, ids: List[str], indices: List[int], threshold: Optional[float] = None)[source]¶

Creates one dataframe storing all relevant prediction information.

The rows in the returned dataframe have the same order as the original sequences in the data file. First column of the dataframe represents the position of the sequence in the datafile.

Parameters

class_labels (list) – Store class name corresponding to an output node of the network.
preds (torch.Tensor, shape (n_samples,)) – Stores the index of the output-node with the highest activation
confs (torch.Tensor, shape (n_samples,)) – Stores the confidence in the prediction
ids (list[str]) – Stores the (possible empty) protein labels extracted from data file.
indices (list[int]) – Stores the unique indices of sequences mapping to their position in the file
threshold (float) – If given, prediction labels and confidences are set to ‘’ if confidence in prediction is not at least threshold.

Returns

df – Stores prediction information about the input protein sequences. Duplicates (defined by their sequence_id) have been removed from df.

Return type

pandas.DataFrame

deepnog.utils.io_utils.get_data_home(data_home: Optional[str] = None, verbose: int = 0) → pathlib.Path[source]¶

Return the path of the deepnog data dir.

This folder is used for large files that cannot go into the Python package on PyPI etc. For example, the network parameters (weights) files may be larger than 100MiB. By default the data dir is set to a folder named ‘deepnog_data’ in the user home folder. Alternatively, it can be set by the ‘DEEPNOG_DATA’ environment variable or programmatically by giving an explicit folder path. If the folder does not already exist, it is automatically created.

Parameters

data_home (str | None) – The path to deepnog data dir.
verbose (int) – Log or not.

Notes

Adapted from SKLEARN_DATAHOME.

deepnog.utils.io_utils.get_weights_path(database: str, level: str, architecture: str, data_home: Optional[str] = None, download_if_missing: bool = True, verbose: int = 0) → pathlib.Path[source]¶

Get path to neural network weights.

This is a path on local storage. If the corresponding files are not present, download from remote storage. The default remote URL can be overridden by setting the environment variable DEEPNOG_REMOTE.

Parameters

database (str) – The orthologous groups database. Example: eggNOG5
level (str) – The taxonomic level within the database. Example: 2 (for bacteria)
architecture (str) – Network architecture. Example: deepnog
data_home (str, optional) – Specify another download and cache folder for the weights. By default all deepnog data is stored in ‘$HOME/deepnog_data’ subfolders.
download_if_missing (boolean, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source site.
verbose (int) – Log or not

Returns

weights_path – Path to file of network weights

Return type

Path

deepnog.utils.logger module¶

deepnog.utils.logger.get_logger(initname: str = 'deepnog', verbose: int = 0) → logging.Logger[source]¶

This function provides a nicely formatted logger.

Parameters

initname (str) – The name of the logger to show up in log.
verbose (int) – Increasing levels of verbosity

References

Shamelessly stolen from phenotrex

deepnog.utils.metrics module¶

deepnog.utils.metrics.estimate_performance(df_true: pandas.DataFrame, df_pred: pandas.DataFrame) → Dict[source]¶

Calculate various model performance measures.

Parameters

df_true (pandas.DataFrame) – The ground truth labels. DataFrame must contain ‘sequence_id’ and ‘label’ columns.
df_pred (pandas.DataFrame) – The predicted labels. DataFrame must contain ‘sequence_id’ and ‘prediction’ columns.

Returns

perf –

Performance estimates:

macro_precision
micro_precision
macro_recall
micro_recall
macro_f1
micro_f1
accuracy
mcc

Return type

dict

deepnog.utils.network module¶

Author: Roman Feldbauer

Date: 2020-02-19

Description:

Various utility functions

deepnog.utils.network.count_parameters(model, tunable_only: bool = True) → int[source]¶

Count the number of parameters in the given model.

Parameters

model (torch.nn.Module) – PyTorch model (deep network)
tunable_only (bool, optional) – Count only tunable network parameters

References

https://stackoverflow.com/questions/49201236/check-the-total-number-of-parameters-in-a-pytorch-model

deepnog.utils.network.load_nn(architecture: Union[str, Sequence[str]], model_dict: Optional[dict] = None, phase: str = 'eval', device: Union[torch.device, str] = 'cpu', verbose: int = 0)[source]¶

Import NN architecture and set loaded parameters.

Parameters

architecture (str or list-like of two str) – If single string: name of neural network module and class to import. E.g. ‘deepnog’ will load deepnog.models.deepnog.deepnog. Otherwise, separate module and class name of deep network to import. E.g. (‘deepthought’, ‘DeepNettigkeit’) will load deepnog.models.deepthought.DeepNettigkeit.
model_dict (dict, optional) – Dictionary holding all parameters and hyper-parameters of the model. Required during inference, optional for training.
phase (['train', 'infer', 'eval']) – Set network in training or inference=evaluation mode with effects on storing gradients, dropout, etc.
device ([str, torch.device]) – Device to load the model into.
verbose (int) – Increasingly verbose logging

Returns

model – Neural network object of type architecture with parameters loaded from model_dict and moved to device.

Return type

torch.nn.Module

deepnog.utils.network.set_device(device: Union[str, torch.device]) → torch.device[source]¶

Set device (CPU/GPU) depending on user choice and availability.

Parameters: device ([str, torch.device]) – Device set by user as an argument to DeepNOG call.
Returns: device – Object containing the device type to be used for prediction calculations.
Return type: torch.device

deepnog.utils.sync module¶

Author: Roman Feldbauer

Date: 2020-02-19

Description:

Parallel processing helpers

class deepnog.utils.sync.SynchronizedCounter(init: int = 0)[source]¶

Bases: object

A multiprocessing-safe counter.

Parameters: init (int, optional) – Counter starts at init (default: 0)

increment(n=1)[source]¶

Obtain a lock before incrementing, since += isn’t atomic.

Parameters: n (int, optional) – Increment counter by n (default: 1)

increment_and_get_value(n=1) → int[source]¶

Obtain a lock before incrementing, since += isn’t atomic.

Parameters: n (int, optional) – Increment counter by n (default: 1)

property value: int¶