deepnog.io

Author: Roman Feldbauer

Date: 2020-02-19

Description:

Input/output helper functions

deepnog.io.create_df(class_labels, preds, confs, ids, indices, threshold=None, verbose=3)[source]

Creates one dataframe storing all relevant prediction information.

The rows in the returned dataframe have the same order as the original sequences in the data file. First column of the dataframe represents the position of the sequence in the datafile.

Parameters
  • class_labels (list) – Store class name corresponding to an output node of the network.

  • preds (torch.Tensor, shape (n_samples,)) – Stores the index of the output-node with the highest activation

  • confs (torch.Tensor, shape (n_samples,)) – Stores the confidence in the prediction

  • ids (list[str]) – Stores the (possible empty) protein labels extracted from data file.

  • indices (list[int]) – Stores the unique indices of sequences mapping to their position in the file

  • threshold (int) – If given, prediction labels and confidences are set to ‘’ if confidence in prediction is not at least threshold.

  • verbose (int) – If bigger 0, outputs warning if duplicates detected.

Returns

df – Stores prediction information about the input protein sequences. Duplicates (defined by their sequence_id) have been removed from df.

Return type

pandas.DataFrame

deepnog.io.get_data_home(data_home: str = None) → pathlib.Path[source]

Return the path of the deepnog data dir.

This folder is used for large files that cannot go into the Python package on PyPI etc. For example, the network parameters (weights) files may be larger than 100MiB. By default the data dir is set to a folder named ‘deepnog_data’ in the user home folder. Alternatively, it can be set by the ‘DEEPNOG_DATA’ environment variable or programmatically by giving an explicit folder path. If the folder does not already exist, it is automatically created.

Parameters

data_home (str | None) – The path to deepnog data dir.

Notes

Adapted from SKLEARN_DATAHOME.

deepnog.io.get_weights_path(database: str, level: str, architecture: str, data_home=None, download_if_missing=True) → pathlib.Path[source]

Get path to neural network weights.

This is a path on local storage. If the corresponding files are not present, download from remote storage. The default remote URL can be overridden by setting the environment variable DEEPNOG_REMOTE.

Parameters
  • database (str) – The orthologous groups database. Example: eggNOG5

  • level (str) – The taxonomic level within the database. Example: 2 (for bacteria)

  • architecture (str) – Network architecture. Example: deepencoding

  • data_home (string, optional) – Specify another download and cache folder for the weights. By default all deepnog data is stored in ‘~/deepnog_data’ subfolders.

  • download_if_missing (boolean, default=True) – If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site.

Returns

weights_path – Path to file of network weights

Return type

Path