deepnog.utils package

deepnog.utils.bio module

deepnog.utils.bio.parse(p: pathlib.Path, fformat: str = 'fasta', alphabet=None) Iterator[source]

Parse a possibly compressed sequence file.

Parameters
  • p (Path or str) – Path to sequence file

  • fformat (str) – File format supported by Biopython.SeqIO.parse, e.g “fasta”

  • alphabet (any) – Pass alphabet to SeqIO.parse

Returns

it – The SeqIO.parse iterator yielding SeqRecords

Return type

Iterator

deepnog.utils.config module

deepnog.utils.config.get_config(config_file: Optional[Union[pathlib.Path, str]] = None) Dict[source]

Get a config dictionary

If no file is provided, look in the DEEPNOG_CONFIG env variable for the path. If this fails, load a default config file (lacking any user customization).

This contains the available models (databases, levels). Additional config may be added in future releases.

deepnog.utils.io_utils module

Author: Roman Feldbauer

Date: 2020-02-19

Description:

Input/output helper functions

deepnog.utils.io_utils.create_df(class_labels: list, preds: torch.Tensor, confs: torch.Tensor, ids: List[str], indices: List[int], threshold: Optional[float] = None)[source]

Creates one dataframe storing all relevant prediction information.

The rows in the returned dataframe have the same order as the original sequences in the data file. First column of the dataframe represents the position of the sequence in the datafile.

Parameters
  • class_labels (list) – Store class name corresponding to an output node of the network.

  • preds (torch.Tensor, shape (n_samples,)) – Stores the index of the output-node with the highest activation

  • confs (torch.Tensor, shape (n_samples,)) – Stores the confidence in the prediction

  • ids (list[str]) – Stores the (possible empty) protein labels extracted from data file.

  • indices (list[int]) – Stores the unique indices of sequences mapping to their position in the file

  • threshold (float) – If given, prediction labels and confidences are set to ‘’ if confidence in prediction is not at least threshold.

Returns

df – Stores prediction information about the input protein sequences. Duplicates (defined by their sequence_id) have been removed from df.

Return type

pandas.DataFrame

deepnog.utils.io_utils.get_data_home(data_home: Optional[str] = None, verbose: int = 0) pathlib.Path[source]

Return the path of the deepnog data dir.

This folder is used for large files that cannot go into the Python package on PyPI etc. For example, the network parameters (weights) files may be larger than 100MiB. By default the data dir is set to a folder named ‘deepnog_data’ in the user home folder. Alternatively, it can be set by the ‘DEEPNOG_DATA’ environment variable or programmatically by giving an explicit folder path. If the folder does not already exist, it is automatically created.

Parameters
  • data_home (str | None) – The path to deepnog data dir.

  • verbose (int) – Log or not.

Notes

Adapted from SKLEARN_DATAHOME.

deepnog.utils.io_utils.get_weights_path(database: str, level: str, architecture: str, data_home: Optional[str] = None, download_if_missing: bool = True, verbose: int = 0) pathlib.Path[source]

Get path to neural network weights.

This is a path on local storage. If the corresponding files are not present, download from remote storage. The default remote URL can be overridden by setting the environment variable DEEPNOG_REMOTE.

Parameters
  • database (str) – The orthologous groups database. Example: eggNOG5

  • level (str) – The taxonomic level within the database. Example: 2 (for bacteria)

  • architecture (str) – Network architecture. Example: deepnog

  • data_home (str, optional) – Specify another download and cache folder for the weights. By default all deepnog data is stored in ‘$HOME/deepnog_data’ subfolders.

  • download_if_missing (boolean, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source site.

  • verbose (int) – Log or not

Returns

weights_path – Path to file of network weights

Return type

Path

deepnog.utils.logger module

deepnog.utils.logger.get_logger(initname: str = 'deepnog', verbose: int = 0) logging.Logger[source]

This function provides a nicely formatted logger.

Parameters
  • initname (str) – The name of the logger to show up in log.

  • verbose (int) – Increasing levels of verbosity

References

Shamelessly stolen from phenotrex

deepnog.utils.metrics module

deepnog.utils.metrics.estimate_performance(df_true: pandas.DataFrame, df_pred: pandas.DataFrame) Dict[source]

Calculate various model performance measures.

Parameters
  • df_true (pandas.DataFrame) – The ground truth labels. DataFrame must contain ‘sequence_id’ and ‘label’ columns.

  • df_pred (pandas.DataFrame) – The predicted labels. DataFrame must contain ‘sequence_id’ and ‘prediction’ columns.

Returns

perf

Performance estimates:
  • macro_precision

  • micro_precision

  • macro_recall

  • micro_recall

  • macro_f1

  • micro_f1

  • accuracy

  • mcc

Return type

dict

deepnog.utils.network module

Author: Roman Feldbauer

Date: 2020-02-19

Description:

Various utility functions

deepnog.utils.network.count_parameters(model, tunable_only: bool = True) int[source]

Count the number of parameters in the given model.

Parameters
  • model (torch.nn.Module) – PyTorch model (deep network)

  • tunable_only (bool, optional) – Count only tunable network parameters

References

https://stackoverflow.com/questions/49201236/check-the-total-number-of-parameters-in-a-pytorch-model

deepnog.utils.network.load_nn(architecture: Union[str, Sequence[str]], model_dict: Optional[dict] = None, phase: str = 'eval', device: Union[torch.device, str] = 'cpu', verbose: int = 0)[source]

Import NN architecture and set loaded parameters.

Parameters
  • architecture (str or list-like of two str) – If single string: name of neural network module and class to import. E.g. ‘deepnog’ will load deepnog.models.deepnog.deepnog. Otherwise, separate module and class name of deep network to import. E.g. (‘deepthought’, ‘DeepNettigkeit’) will load deepnog.models.deepthought.DeepNettigkeit.

  • model_dict (dict, optional) – Dictionary holding all parameters and hyper-parameters of the model. Required during inference, optional for training.

  • phase (['train', 'infer', 'eval']) – Set network in training or inference=evaluation mode with effects on storing gradients, dropout, etc.

  • device ([str, torch.device]) – Device to load the model into.

  • verbose (int) – Increasingly verbose logging

Returns

model – Neural network object of type architecture with parameters loaded from model_dict and moved to device.

Return type

torch.nn.Module

deepnog.utils.network.set_device(device: Union[str, torch.device]) torch.device[source]

Set device (CPU/GPU) depending on user choice and availability.

Parameters

device ([str, torch.device]) – Device set by user as an argument to DeepNOG call.

Returns

device – Object containing the device type to be used for prediction calculations.

Return type

torch.device

deepnog.utils.sync module

Author: Roman Feldbauer

Date: 2020-02-19

Description:

Parallel processing helpers

class deepnog.utils.sync.SynchronizedCounter(init: int = 0)[source]

Bases: object

A multiprocessing-safe counter.

Parameters

init (int, optional) – Counter starts at init (default: 0)

increment(n=1)[source]

Obtain a lock before incrementing, since += isn’t atomic.

Parameters

n (int, optional) – Increment counter by n (default: 1)

increment_and_get_value(n=1) int[source]

Obtain a lock before incrementing, since += isn’t atomic.

Parameters

n (int, optional) – Increment counter by n (default: 1)

property value: int