Command Line Options¶

The various command line options for running HBDesigner are described in detail below. For some general advice for how to use these options to maximize the success rate of HBDesigner, see the Usage Advice section at the end of this page.

Required and optional arguments for running HBDesigner. For help on individual arguments, run with the –help flag or refer to the documentation.

usage: run_hbdesigner [-h] --pdb PDB [--design_model {design_002,design_020}]
                      [--cpu] [--out_dir OUT_DIR] [--n_workers N_WORKERS]
                      [--n_samples N_SAMPLES] [--top_k TOP_K]
                      [--n_res {2,3,4,5,6}] [--T_range T_RANGE T_RANGE]
                      [--min_burial MIN_BURIAL] [--guide_res GUIDE_RES]
                      [--guide_radius GUIDE_RADIUS] [--guide_seq GUIDE_SEQ]
                      [--max_BUNs MAX_BUNS] [--max_BUPHs MAX_BUPHS]
                      [--min_sat MIN_SAT] [--max_hb_energy MAX_HB_ENERGY]
                      [--symm_chains SYMM_CHAINS] [--symm_file SYMM_FILE]
                      [--sel_chains SEL_CHAINS] [--min_core_res MIN_CORE_RES]
                      [--anchor_res ANCHOR_RES] [--max_hb_score MAX_HB_SCORE]
                      [--omit_chains OMIT_CHAINS] [--omit_AA OMIT_AA]
                      [--seed SEED] [--design_model_ckpt DESIGN_MODEL_CKPT]
                      [--packing_model_ckpt PACKING_MODEL_CKPT]

Named Arguments¶

--pdb

Relative file path to and name of the PDB file to process. Example: –pdb /path/to/1ABC.pdb

--design_model

Possible choices: design_002, design_020

Design model to use. Default is ‘design_020’ (moderate noise), but ‘design_002’ (low noise) is also available.

Default: 'design_020'

--cpu

Run inference only on CPU by loading CPU-specific YAML configs. This is not recommended, as it will be much slower than running on GPU, but is available for users without access to a CUDA-enabled GPU. By default, the script will attempt to run on GPU if available.

Default: False

--out_dir

Relative path to output directory for saving new files. Defaults to current working directory. If the given directory does not exist, it will be created. Output files include designed HBNet PDB files and a summary CSV file with design metrics for each network.

Default: '/home/runner/work/HBDesigner/HBDesigner'

--n_workers

Workers for parallelization during packing. Default is 1. More workers will speed up predictions, typical values are between 8 and 24. The number available will depend on the number of CPU nodes you have allocated.

Default: 1

--n_samples

Number of unique samples to generate before packing/scoring. Default is 100, typical values are 100-500 but higher values can be used if initial runs fail to form networks. More samples will increase diversity but also increase inference time.

Default: 100

--top_k

How many unique samples to keep after ranking. Default is 5, typical values 5-25.

Default: 5

--n_res

Possible choices: 2, 3, 4, 5, 6

Size of desired HBNet (the number of residues you want in your completed network(s)), in residues. Default is 2. Valid range is 2-6 (inclusive).

Default: 2

--T_range

Temperature range for sampling. Default is [0.1, 1.0]. Lower temperatures will yield more conservative designs, while higher temperatures will yield more diverse designs.

Default: [0.1, 1.0]

--min_burial

Minimum burial for designable positions, as calculated by Rosetta’s sidechain neighbor algorithm. Defaults to 0.0. See https://docs.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/ResidueSelectors/ResidueSelectors#residueselectors_conformation-dependent-residue-selectors_layerselector or https://docs.rosettacommons.org/docs/latest/rosetta_basics/scoring/BuriedUnsatPenalty#algorithm for more information.

Default: 0.0

--guide_res

Guide residues. Off by default. Model will calculate Cb-centroid of these residues and place the guide atom near the centroid.Uses PDB chain/resnum format. Example: ‘A12,B13,B49’

--guide_radius

Hard distance constraint on designable positions based on Cb distance (Angstrom) from guide atom. Off by default. Typically not necessary, but can be used to force location of designed network in difficult cases.

Default: 1000000.0

--guide_seq

Guide sequence. Default is ‘X,X,X’ (3 unknowns). Options include ‘S,T’ (one SER, one THR), ‘T,X’ (one THR, one UNK), ‘S|T,N’ (one SER or THR, one ASN) etc.

--max_BUNs

Maximum buried unsatisfied hydrogen atoms (BUNs, buried but not participating in a hydrogen bond) to allow in the network. Default is 0. Raising this will make filtering more permissive.

Default: 0

--max_BUPHs

Maximum buried unsatisfied polar hydrogen atoms (BUPHs, buried but not participating in a hydrogen bond) to allow in the network. Default is 5. Raising this will make filtering more permissive.

Default: 5

--min_sat

Minimum saturation score to allow in the network. Default is 0.5. Raising this will make filtering more strict.

Default: 0.5

--max_hb_energy

Maximum hydrogen bond energy to allow in the network. Default is 0.0. Raising this will make filtering more permissive.

Default: 0.0

--symm_chains

Option to symmetrize output networks after design for convenience. Specify symmetric chains as ‘A,B;C,D’ to symmetrize A with B and C with D. If you wanted to symmetrize all four together, the input would be ‘A,B,C,D’. By default, no symmetrization will be performed.

--symm_file

Symmetry file (.symm) to use for symmetrization. Only used for strict symmetry. You can use Rosetta’s make_symmdef_file.pl script to generate a .symm file from your PDB.

--sel_chains

Option to select specific chain(s) to run HBDesigner on. Format: ‘A,C’. Off by default (will use all chains). This option removes chains from the input file before they are passed to HBDesigner. HBDesigner will then graft chain B back in so the output still contains both chains.

--min_core_res

Minimum number of core residues required for each network. Defaults to 0.

Default: 0

--anchor_res

Comma-separated list of residues to use as anchor residues during design, in PDB chain/resnum format. Example: ‘A12,B13,B49’. All networks will contain the anchor residue(s).

--max_hb_score

Maximum energy score to allow for returned networks. More negative values are more strict. Default is 0.0

Default: 0.0

--omit_chains

Comma-separated list of chains to omit from design. Residues in these chains will not be eligible for use in designed networks. Example: A,C. This option allows HBDesigner to see all of the chains (even the omitted ones) but prevents HBDesigner from using residues in the omitted chains to form the hydrogen bonding network. This is different from the –sel_chains option, which removes the omitted chains from the input file entirely (i.e. HBDesigner will not even see the omitted chains).

--omit_AA

Comma-separated list of amino acids to omit from design. Residues of these types will not be eligible for use in designed networks. Example: R,K means no LYS or ARG allowed.

--seed

Random seed for reproducible sampling. Default is no seed. Note that due to the use of parallelization and Rosetta scoring, results may not be exactly reproducible even with a set seed. Running with parallelization turned off (‘n_workers’ set to 0) will increase the likelihood of similar results between runs, but small energy changes will still be seen from Rosetta.

--design_model_ckpt

Path to and file name of custom design model checkpoint. If not specified, will use default checkpoint for the specified design model (e.g. ‘/path/to/model_weights/design_020’).

--packing_model_ckpt

Path to and file name of custom packing model checkpoint. If not specified, will use default checkpoint.

Usage Advice¶

At larger --n_res, packing is harder, so you will get fewer good designs per --n_samples. This means you might want to increase --n_samples when increasing --n_res. Here is a good place to start:

--n_res=2, --n_samples=100
--n_res=3, --n_samples=200
--n_res=4, --n_samples=500
--n_res=5, --n_samples=500
--n_res=6, --n_samples=1000

Smaller amino acids, especially SER and THR, have notably higher success rates. This means that, if you don’t care what amino acids are in your network, you can get higher success rates using --guide_seq SXX, --guide_seq TXX, etc.