Quick Start Example¶
The following example shows all these steps for predicting protein orthologous groups
with the command line interface of deepnog
as well as using the Python API.
Please make sure you have installed deepnog
(installation instructions).
CLI Usage Example¶
Using deepnog
from the command line is the simple, and preferred way of interacting with the
deepnog
package.
Here, we assign orthologous groups (OGs) of proteins using a model trained on the eggNOG 5.0 database and using only bacterial OGs (default settings), and redirect the output from stdout to a file:
deepnog infer input.fa > assignments.csv
Alternatively, the output file and other settings can be specified explicitly like so:
deepnog infer input.fa --out prediction.csv -db eggNOG5 --tax 2
For a detailed explanation of flags and further settings, please consult the User Guide.
Note that deepnog
masks predictions below a certain confidence threshold.
The default confidence threshold baked into the model at 0.99
can be overridden from the command line interface:
deepnog infer input.fa --confidence-threshold 0.8 > assignments.csv
The output comma-separated values (CSV) file assignments.csv then looks something like:
sequence_id,prediction,confidence
WP_004995615.1,COG5449,0.99999964
WP_004995619.1,COG0340,1.0
WP_004995637.1,COG4285,1.0
WP_004995655.1,COG4118,1.0
WP_004995678.1,COG0184,1.0
WP_004995684.1,COG1137,1.0
WP_004995690.1,COG0208,1.0
WP_004995697.1,,
WP_004995703.1,COG0190,1.0
The file contains a single line for each protein in the input sequence file, and the following fields:
sequence_id
, the name of the input protein sequence.prediction
, the name of the predicted protein OG. Empty if masked by confidence threshold.confidence
, the confidence value (0-1 inclusive) thatdeepnog
ascribes to this assignment. Empty if masked by confidence threshold.
API Example Usage¶
import torch
from deepnog.data import ProteinIterableDataset
from deepnog.inference import predict
from deepnog.utils import create_df, get_config, get_weights_path, load_nn, set_device
PROTEIN_FILE = '/path/to/file.faa'
DATABASE = 'eggNOG5'
TAX = 2
ARCH = 'deepnog'
CONF_THRESH = 0.99
# load protein sequence file into a ProteinIterableDataset
dataset = ProteinIterableDataset(PROTEIN_FILE, f_format='fasta')
# Construct path to saved parameters deepnog model.
weights_path = get_weights_path(
database=DATABASE,
level=str(TAX),
architecture=ARCH,
)
# Set up device for prediction
device = set_device('auto')
torch.set_num_threads(1)
# Load neural network parameters
model_dict = torch.load(weights_path, map_location=device)
# Lookup where to find the chosen network
config = get_config()
module = config['architecture'][ARCH]['module']
cls = config['architecture'][ARCH]['class']
# Load neural network model and class names
model = load_nn((module, cls), model_dict, device)
class_labels = model_dict['classes']
# perform prediction
preds, confs, ids, indices = predict(
model=model,
dataset=dataset,
device=device,
batch_size=1,
num_workers=1,
verbose=3
)
# Construct results (a pandas DataFrame)
df = create_df(
class_labels=class_labels,
preds=preds,
confs=confs,
ids=ids,
indices=indices,
threshold=threshold,
verbose=3
)