deepnog.learning package¶
deepnog.learning.inference module¶
Author: Roman Feldbauer
Date: 2020-02-19
Description:
Predict orthologous groups of protein sequences.
- deepnog.learning.inference.predict(model, dataset, device='cpu', batch_size=16, num_workers=4, verbose=3)[source]¶
Use model to predict zero-indexed labels of dataset.
Also handles communication with ProteinIterators used to load data to log how many sequences have been skipped due to having empty sequence ids.
- Parameters
model (nn.Module) – Trained neural network model.
dataset (ProteinIterableDataset) – Data to predict protein families for.
device ([str, torch.device]) – Device of model.
batch_size (int) – Forward batch_size proteins through neural network at once.
num_workers (int) – Number of workers for data loading.
verbose (int) – Define verbosity.
- Returns
preds (torch.Tensor, shape (n_samples,)) – Stores the index of the output-node with the highest activation
confs (torch.Tensor, shape (n_samples,)) – Stores the confidence in the prediction
ids (list[str]) – Stores the (possible empty) protein labels extracted from data file.
indices (list[int]) – Stores the unique indices of sequences mapping to their position in the file
deepnog.learning.training module¶
Author: Roman Feldbauer
Date: 2020-06-03
Description:
Training deep networks for protein orthologous group prediction.
- deepnog.learning.training.fit(architecture, module, cls, training_sequences, validation_sequences, training_labels, validation_labels, *, data_loader_params: Optional[dict] = None, iterable_dataset: bool = False, n_epochs: int = 15, shuffle: bool = False, learning_rate: float = 0.01, learning_rate_params: Optional[dict] = None, l2_coeff: Optional[float] = None, optimizer_cls=torch.optim.Adam, device: Union[str, torch.device] = 'auto', tensorboard_dir: Union[None, str] = 'auto', log_interval: int = 100, random_seed: Optional[int] = None, save_each_epoch: bool = True, out_dir: Optional[pathlib.Path] = None, experiment_name: Optional[str] = None, config_file: Optional[str] = None, verbose: int = 2) deepnog.learning.training.train_val_result [source]¶
Perform training and validation of a given model, data, and hyperparameters.
- Parameters
architecture (str) – Network architecture, must be available in deepnog/models
module (str) – Python module containing the network definition (inside deepnog/models/).
cls (str) – Python class name of the network (inside deepnog/models/{module}.py).
training_sequences (str, Path) – File with training set sequences
validation_sequences (str, Path) – File with validation set sequences
training_labels (str, Path) – File with class labels (orthologous groups) of training sequences
validation_labels (str, Path) – File with class labels (orthologous groups) of validation sequences
data_loader_params (dict) – Parameters passed to PyTorch DataLoader construction
iterable_dataset (bool, default False) – Use an iterable dataset that does not load all sequences in advance. While this saves memory and does not involve the delay at start, random sampling is impaired, and requires a shuffle buffer.
n_epochs (int) – Number of training passes over the complete training set
shuffle (bool) – Shuffle the training data. This does NOT shuffle the complete data set, which requires having all sequences in memory, but uses a shuffle buffer (default size: 2**16), from which sequences are drawn.
learning_rate (float) – Learning rate, the central hyperparameter of deep network training. Too high values may lead to diverging solutions, while too low values result in slow learning.
learning_rate_params (dict) – Parameters passed to the learning rate Scheduler.
l2_coeff (float) – If not None, regularize training by L2 norm of network weights
optimizer_cls – Class of PyTorch optimizer
device (torch.device) – Use either ‘cpu’ or ‘cuda’ (GPU) for training/validation.
tensorboard_dir (str) – Save online learning statistics for tensorboard in this directory.
log_interval (int, optional) – Print intermediary results after
log_interval
minibatchesrandom_seed (int) – Set a random seed for numpy/pytorch for reproducible results.
save_each_epoch (bool) – Save the network after each training epoch
out_dir (Path) – Path to the output directory used to save models during training
experiment_name (str) – Prefix of model files saved during training
config_file (str) – Override path to config file, e.g. for custom models in unit tests
verbose (int) – Increasing levels of messages
- Returns
results –
- A namedtuple containing:
the trained deep network model
training dataset
evaluation statistics
the ground truth labels (y_true)
the predicted labels (y_pred).
- Return type
namedtuple