Deepnog New Models and Architectures¶

deepnog is developed with extensibility in mind, and allows to plug in additional models (for different taxonomic levels, or different orthology databases). It also supports addition of new network architectures.

In order to register a new network architecture, we recommend an editable installation with pip, as described in Installation from Source.

Training scripts¶

Starting with v1.2.0, deepnog ships with functions for training custom models. Consider we are training a DeepNOG model for eggNOG 5, level 1239 (Firmicutes):

deepnog train \
    -a "deepnog" \
    -o /path/to/output/ \
    -db "eggNOG5" \
    -t "1239" \
    --shuffle \
    train.faa.gz \
    val.faa.gz \
    train.csv.gz \
    val.csv.gz

Run deepnog train --help for additional options.

In order to assess the new model’s quality, run the following commands:

deepnog infer \
    -a "deepnog" \
    -w /path/to/output/MODEL_FILENAME.pth \
    -o /path/to/output/assignments.csv \
    --test_labels test.csv.gz \
    test.faa.gz
cat /path/to/output/assignments.performance.csv

This provides a number of performance measures, including accurcay, macro averaged precision and recall, among others.

Register new models¶

New models for additional taxonomic levels in eggNOG 5 or even different orthology databases using existing network architectures must be placed in the deepnog data directory as specified by the DEEPNOG_DATA environment variable (default: $HOME/deepnog_data).

The directory looks like this:

| deepnog_data
| ├── eggNOG5
| │   ├── 1
| │   |   └── deepnog.pth
| │   └── 2
| │       └── deepnog.pth
| ├── ...
|
|

In order to add a root level model for “MyOrthologyDB”, we place the serialized PyTorch parameters like this:

| deepnog_data
| ├── eggNOG5
| │   ├── 1
| │   |   └── deepnog.pth
| │   └── 2
| │       └── deepnog.pth
| ├── MyOrthologyDB
| |   └── 1
| |       └── deepnog.pth
| ├── ...
|

Register new network architectures¶

Create a Python module deepnog/models/<my_network.py>. You can use deepnog.py as a template. A new architecture MyNetworkA would look like so:

import torch.nn as nn


class MyNetworkA(nn.Module):
    """ A revolutionary network for orthology prediction. """
    def __init__(self, model_dict):
        super().__init__()
        param1 = model_dict['param1']
        param2 = model_dict['param2']
        param3 = model_dict.get('param3', 0.)
        ...
    def forward(self, x):
        ...
        return x

When the new module is in place, also edit deepnog/config/deepnog_config.py to expose the new network to the user:

architecture:
  netA:
    module: my_network
    class: MyNetworkA
    param1: 'settingXYZ'
    param2:
      - 2
      - 4
      - 8
    param3: 150
    # ... all hyperparameters required for class init

  deepnog:
    module: deepnog
    class: DeepNOG
    encoding_dim: 10
    kernel_size:
      - 8
      - 12
      - 16
      - 20
      - 24
      - 28
      - 32
      - 36
    n_filters: 150
    dropout: 0.3
    pooling_layer_type: 'max'

The new network can now be used in deepnog by specifying parameter -a netA.

Assuming we want to compare deepnog to netA, we add the trained network parameters like this:

| deepnog_data
| ├── eggNOG5
| │   ├── 1
| │   |   ├── deepnog.pth
| │   |   └── netA.pth
| │   └── 2
| │       ├── deepnog.pth
| │       └── netA.pth
| ├── MyOrthologyDB
| |   └── 1
| │       ├── deepnog.pth
| │       └── netA.pth
| ├── ...
|

Finally, expose the new models to the user by modifying deepnog/config/deepnog_config.py again. The relevant section is database.

database:
  eggNOG5:
    # taxonomic levels
    - 1
    - 2
    - 1236
    - 1239        # Example 1: Uncomment this line, if you created a Firmicutes model
  MyOrthologyDB:  # Example 2: Uncomment this line and the following, if you
    - 1           #            created a model for the '1' level of MyOrthologyDB.

Notes:

Currently, a level must be provided, even if the database does not use levels. Simply use a placeholder 1 or similar.
Indentation matters