deepnog.utils package¶
deepnog.utils.bio module¶
- deepnog.utils.bio.parse(p: pathlib.Path, fformat: str = 'fasta', alphabet=None) Iterator [source]¶
Parse a possibly compressed sequence file.
- Parameters
p (Path or str) – Path to sequence file
fformat (str) – File format supported by Biopython.SeqIO.parse, e.g “fasta”
alphabet (any) – Pass alphabet to SeqIO.parse
- Returns
it – The SeqIO.parse iterator yielding SeqRecords
- Return type
Iterator
deepnog.utils.config module¶
- deepnog.utils.config.get_config(config_file: Optional[Union[pathlib.Path, str]] = None) Dict [source]¶
Get a config dictionary
If no file is provided, look in the DEEPNOG_CONFIG env variable for the path. If this fails, load a default config file (lacking any user customization).
This contains the available models (databases, levels). Additional config may be added in future releases.
deepnog.utils.io_utils module¶
Author: Roman Feldbauer
Date: 2020-02-19
Description:
Input/output helper functions
- deepnog.utils.io_utils.create_df(class_labels: list, preds: torch.Tensor, confs: torch.Tensor, ids: List[str], indices: List[int], threshold: Optional[float] = None)[source]¶
Creates one dataframe storing all relevant prediction information.
The rows in the returned dataframe have the same order as the original sequences in the data file. First column of the dataframe represents the position of the sequence in the datafile.
- Parameters
class_labels (list) – Store class name corresponding to an output node of the network.
preds (torch.Tensor, shape (n_samples,)) – Stores the index of the output-node with the highest activation
confs (torch.Tensor, shape (n_samples,)) – Stores the confidence in the prediction
ids (list[str]) – Stores the (possible empty) protein labels extracted from data file.
indices (list[int]) – Stores the unique indices of sequences mapping to their position in the file
threshold (float) – If given, prediction labels and confidences are set to ‘’ if confidence in prediction is not at least threshold.
- Returns
df – Stores prediction information about the input protein sequences. Duplicates (defined by their sequence_id) have been removed from df.
- Return type
pandas.DataFrame
- deepnog.utils.io_utils.get_data_home(data_home: Optional[str] = None, verbose: int = 0) pathlib.Path [source]¶
Return the path of the deepnog data dir.
This folder is used for large files that cannot go into the Python package on PyPI etc. For example, the network parameters (weights) files may be larger than 100MiB. By default the data dir is set to a folder named ‘deepnog_data’ in the user home folder. Alternatively, it can be set by the ‘DEEPNOG_DATA’ environment variable or programmatically by giving an explicit folder path. If the folder does not already exist, it is automatically created.
- Parameters
data_home (str | None) – The path to deepnog data dir.
verbose (int) – Log or not.
Notes
Adapted from SKLEARN_DATAHOME.
- deepnog.utils.io_utils.get_weights_path(database: str, level: str, architecture: str, data_home: Optional[str] = None, download_if_missing: bool = True, verbose: int = 0) pathlib.Path [source]¶
Get path to neural network weights.
This is a path on local storage. If the corresponding files are not present, download from remote storage. The default remote URL can be overridden by setting the environment variable DEEPNOG_REMOTE.
- Parameters
database (str) – The orthologous groups database. Example: eggNOG5
level (str) – The taxonomic level within the database. Example: 2 (for bacteria)
architecture (str) – Network architecture. Example: deepnog
data_home (str, optional) – Specify another download and cache folder for the weights. By default all deepnog data is stored in ‘$HOME/deepnog_data’ subfolders.
download_if_missing (boolean, default=True) – If False, raise an IOError if the data is not locally available instead of trying to download the data from the source site.
verbose (int) – Log or not
- Returns
weights_path – Path to file of network weights
- Return type
Path
deepnog.utils.logger module¶
- deepnog.utils.logger.get_logger(initname: str = 'deepnog', verbose: int = 0) logging.Logger [source]¶
This function provides a nicely formatted logger.
- Parameters
initname (str) – The name of the logger to show up in log.
verbose (int) – Increasing levels of verbosity
References
Shamelessly stolen from phenotrex
deepnog.utils.metrics module¶
- deepnog.utils.metrics.estimate_performance(df_true: pandas.DataFrame, df_pred: pandas.DataFrame) Dict [source]¶
Calculate various model performance measures.
- Parameters
df_true (pandas.DataFrame) – The ground truth labels. DataFrame must contain ‘sequence_id’ and ‘label’ columns.
df_pred (pandas.DataFrame) – The predicted labels. DataFrame must contain ‘sequence_id’ and ‘prediction’ columns.
- Returns
perf –
- Performance estimates:
macro_precision
micro_precision
macro_recall
micro_recall
macro_f1
micro_f1
accuracy
mcc
- Return type
dict
deepnog.utils.network module¶
Author: Roman Feldbauer
Date: 2020-02-19
Description:
Various utility functions
- deepnog.utils.network.count_parameters(model, tunable_only: bool = True) int [source]¶
Count the number of parameters in the given model.
- Parameters
model (torch.nn.Module) – PyTorch model (deep network)
tunable_only (bool, optional) – Count only tunable network parameters
References
https://stackoverflow.com/questions/49201236/check-the-total-number-of-parameters-in-a-pytorch-model
- deepnog.utils.network.load_nn(architecture: Union[str, Sequence[str]], model_dict: Optional[dict] = None, phase: str = 'eval', device: Union[torch.device, str] = 'cpu', verbose: int = 0)[source]¶
Import NN architecture and set loaded parameters.
- Parameters
architecture (str or list-like of two str) – If single string: name of neural network module and class to import. E.g. ‘deepnog’ will load deepnog.models.deepnog.deepnog. Otherwise, separate module and class name of deep network to import. E.g. (‘deepthought’, ‘DeepNettigkeit’) will load deepnog.models.deepthought.DeepNettigkeit.
model_dict (dict, optional) – Dictionary holding all parameters and hyper-parameters of the model. Required during inference, optional for training.
phase (['train', 'infer', 'eval']) – Set network in training or inference=evaluation mode with effects on storing gradients, dropout, etc.
device ([str, torch.device]) – Device to load the model into.
verbose (int) – Increasingly verbose logging
- Returns
model – Neural network object of type architecture with parameters loaded from model_dict and moved to device.
- Return type
torch.nn.Module
- deepnog.utils.network.set_device(device: Union[str, torch.device]) torch.device [source]¶
Set device (CPU/GPU) depending on user choice and availability.
- Parameters
device ([str, torch.device]) – Device set by user as an argument to DeepNOG call.
- Returns
device – Object containing the device type to be used for prediction calculations.
- Return type
torch.device
deepnog.utils.sync module¶
Author: Roman Feldbauer
Date: 2020-02-19
Description:
Parallel processing helpers
- class deepnog.utils.sync.SynchronizedCounter(init: int = 0)[source]¶
Bases:
object
A multiprocessing-safe counter.
- Parameters
init (int, optional) – Counter starts at init (default: 0)
- increment(n=1)[source]¶
Obtain a lock before incrementing, since += isn’t atomic.
- Parameters
n (int, optional) – Increment counter by n (default: 1)
- increment_and_get_value(n=1) int [source]¶
Obtain a lock before incrementing, since += isn’t atomic.
- Parameters
n (int, optional) – Increment counter by n (default: 1)
- property value: int¶