Deepnog CLI Documentation

Invocation:

deepnog infer SEQUENCE_FILE [options] > assignments.csv

Basic Commands

These options may be commonly tuned for a basic invocation for orthologous group assignment.

positional arguments:
  SEQUENCE_FILE         File containing protein sequences for classification.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -db {eggNOG5, cog2020}, --database {eggNOG5, cog2020}
                        Orthologous group/family database to use. (default:
                        eggNOG5)
  -t {1,2,[]}, --tax {1,2}
                        Taxonomic level to use in specified database
                        (1 = root, 2 = bacteria) (default: 2)
  -o FILE, --out FILE   Store orthologous group assignments to output file.
                        Per default, write predictions to stdout. (default: None)
  -c FLOAT, --confidence-threshold FLOAT
                        The confidence value below which predictions are
                        masked by deepnog. By default, apply the confidence
                        threshold saved in the model if one exists, and else
                        do not apply a confidence threshold. (default: None)

Advanced Commands

These options are unlikely to require manual tuning for the average user.

--verbose INT       Define verbosity of DeepNOGs output written to stdout
                    or stderr. 0 only writes errors to stderr which cause
                    DeepNOG to abort and exit. 1 also writes warnings to
                    stderr if e.g. a protein without an ID was found and
                    skipped. 2 additionally writes general progress
                    messages to stdout.3 includes a dynamic progress bar
                    of the prediction stage using tqdm. (default: 3)
-ff STR, --fformat STR
                    File format of protein sequences. Must be supported by
                    Biopythons Bio.SeqIO class. (default: fasta)
-of {csv,tsv,legacy}  --outformat {csv,tsv,legacy}
                    The file format of the output file produced by
                    deepnog. (default: csv)
-d {auto,cpu,gpu}, --device {auto,cpu,gpu}
                    Define device for calculating protein sequence
                    classification. Auto chooses GPU if available,
                    otherwise CPU. (default: auto)
-nw INT, --num-workers INT
                    Number of subprocesses (workers) to use for data
                    loading. Set to a value <= 0 to use single-process
                    data loading. Note: Only use multi-process data
                    loading if you are calculating on a gpu (otherwise
                    inefficient)! (default: 0)
-a {deepnog}, --architecture {deepnog}
                    Network architecture to use for classification.
                    (default: deepnog)
-w FILE, --weights FILE
                    Custom weights file path (optional) (default: None)
-bs INT, --batch-size INT
                    The batch size determines how many sequences are
                    processed by the network at once. If 1, process the
                    protein sequences sequentially (recommended
                    on CPUs). Larger batch sizes speed up the inference and
                    training on GPUs. Batch size can influence the
                    learning process.
--test_labels TEST_LABELS_FILE
                    Measure model performance on a test set.
                    If provided, this file must contain the ground-truth
                    labels for the provided sequences.
                    Otherwise, only perform inference.