deepnog.learning package¶

deepnog.learning.inference module¶

Author: Roman Feldbauer

Date: 2020-02-19

Description:

Predict orthologous groups of protein sequences.

deepnog.learning.inference.predict(model, dataset, device='cpu', batch_size=16, num_workers=4, verbose=3)[source]¶

Use model to predict zero-indexed labels of dataset.

Also handles communication with ProteinIterators used to load data to log how many sequences have been skipped due to having empty sequence ids.

Parameters

model (nn.Module) – Trained neural network model.
dataset (ProteinIterableDataset) – Data to predict protein families for.
device ([str, torch.device]) – Device of model.
batch_size (int) – Forward batch_size proteins through neural network at once.
num_workers (int) – Number of workers for data loading.
verbose (int) – Define verbosity.

Returns

preds (torch.Tensor, shape (n_samples,)) – Stores the index of the output-node with the highest activation
confs (torch.Tensor, shape (n_samples,)) – Stores the confidence in the prediction
ids (list[str]) – Stores the (possible empty) protein labels extracted from data file.
indices (list[int]) – Stores the unique indices of sequences mapping to their position in the file

deepnog.learning.training module¶

Author: Roman Feldbauer

Date: 2020-06-03

Description:

Training deep networks for protein orthologous group prediction.

deepnog.learning.training.fit(architecture, module, cls, training_sequences, validation_sequences, training_labels, validation_labels, *, data_loader_params: Optional[dict] = None, iterable_dataset: bool = False, n_epochs: int = 15, shuffle: bool = False, learning_rate: float = 0.01, learning_rate_params: Optional[dict] = None, l2_coeff: Optional[float] = None, optimizer_cls=torch.optim.Adam, device: Union[str, torch.device] = 'auto', tensorboard_dir: Union[None, str] = 'auto', log_interval: int = 100, random_seed: Optional[int] = None, save_each_epoch: bool = True, out_dir: Optional[pathlib.Path] = None, experiment_name: Optional[str] = None, config_file: Optional[str] = None, verbose: int = 2) → deepnog.learning.training.train_val_result[source]¶

Perform training and validation of a given model, data, and hyperparameters.

Parameters

architecture (str) – Network architecture, must be available in deepnog/models
module (str) – Python module containing the network definition (inside deepnog/models/).
cls (str) – Python class name of the network (inside deepnog/models/{module}.py).
training_sequences (str, Path) – File with training set sequences
validation_sequences (str, Path) – File with validation set sequences
training_labels (str, Path) – File with class labels (orthologous groups) of training sequences
validation_labels (str, Path) – File with class labels (orthologous groups) of validation sequences
data_loader_params (dict) – Parameters passed to PyTorch DataLoader construction
iterable_dataset (bool, default False) – Use an iterable dataset that does not load all sequences in advance. While this saves memory and does not involve the delay at start, random sampling is impaired, and requires a shuffle buffer.
n_epochs (int) – Number of training passes over the complete training set
shuffle (bool) – Shuffle the training data. This does NOT shuffle the complete data set, which requires having all sequences in memory, but uses a shuffle buffer (default size: 2**16), from which sequences are drawn.
learning_rate (float) – Learning rate, the central hyperparameter of deep network training. Too high values may lead to diverging solutions, while too low values result in slow learning.
learning_rate_params (dict) – Parameters passed to the learning rate Scheduler.
l2_coeff (float) – If not None, regularize training by L2 norm of network weights
optimizer_cls – Class of PyTorch optimizer
device (torch.device) – Use either ‘cpu’ or ‘cuda’ (GPU) for training/validation.
tensorboard_dir (str) – Save online learning statistics for tensorboard in this directory.
log_interval (int, optional) – Print intermediary results after log_interval minibatches
random_seed (int) – Set a random seed for numpy/pytorch for reproducible results.
save_each_epoch (bool) – Save the network after each training epoch
out_dir (Path) – Path to the output directory used to save models during training
experiment_name (str) – Prefix of model files saved during training
config_file (str) – Override path to config file, e.g. for custom models in unit tests
verbose (int) – Increasing levels of messages

Returns

results –

A namedtuple containing:

the trained deep network model
training dataset
evaluation statistics
the ground truth labels (y_true)
the predicted labels (y_pred).

Return type

namedtuple