Deepnog CLI Documentation

Invocation:

deepnog SEQUENCE_FILE [options] > predictions.csv

Basic Commands

These options may be commonly tuned for a basic invocation for OG prediction.

positional arguments:
  SEQUENCE_FILE         File containing protein sequences for classification.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -o FILE, --out FILE   Store orthologous group predictions to outputfile. Per
                        default, write predictions to stdout. (default: None)
  -c FLOAT, --confidence-threshold FLOAT
                        The confidence value below which predictions are
                        masked by deepnog. By default, apply the confidence
                        threshold saved in the model if one exists, and else
                        do not apply a confidence threshold. (default: None)

Advanced Commands

These options are unlikely to require manual tuning for the average user.

--verbose INT       Define verbosity of DeepNOGs output written to stdout
                    or stderr. 0 only writes errors to stderr which cause
                    DeepNOG to abort and exit. 1 also writes warnings to
                    stderr if e.g. a protein without an ID was found and
                    skipped. 2 additionally writes general progress
                    messages to stdout.3 includes a dynamic progress bar
                    of the prediction stage using tqdm. (default: 3)
-ff STR, --fformat STR
                    File format of protein sequences. Must be supported by
                    Biopythons Bio.SeqIO class. (default: fasta)
-of {csv,tsv,legacy}  --outformat {csv,tsv,legacy}
                    The file format of the output file produced by
                    deepnog. (default: csv)
-d {auto,cpu,gpu}, --device {auto,cpu,gpu}
                    Define device for calculating protein sequence
                    classification. Auto chooses GPU if available,
                    otherwise CPU. (default: auto)
-db {eggNOG5}, --database {eggNOG5}
                    Orthologous group/family database to use. (default:
                    eggNOG5)
-t {1,2}, --tax {1,2}
                    Taxonomic level to use in specified database
                    (1 = root, 2 = bacteria) (default: 2)
-nw INT, --num-workers INT
                    Number of subprocesses (workers) to use for data
                    loading. Set to a value <= 0 to use single-process
                    data loading. Note: Only use multi-process data
                    loading if you are calculating on a gpu (otherwise
                    inefficient)! (default: 0)
-a {deepencoding}, --architecture {deepencoding}
                    Network architecture to use for classification.
                    (default: deepencoding)
-w FILE, --weights FILE
                    Custom weights file path (optional) (default: None)
-bs INT, --batch-size INT
                    Batch size used for prediction.Defines how many
                    sequences should be forwarded in the network at once.
                    With a batch size of one, the protein sequences are
                    sequentially classified by the network without
                    leveraging parallelism. Higher batch-sizes than the
                    default can speed up the prediction significantly if
                    on a gpu. On a cpu, however, they can be slower than
                    smaller ones due to the increased average sequence
                    length in the convolution step due to zero-padding
                    every sequence in each batch. (default: 1)