Ensemble Prediction

predict_ensemble allows to predict different protein conformations starting from an input MSA by run AlphaFold2 using the ColabFold implementation with different subsampling parameters. Below is a detailed description of each argument and how to use them effectively.

Command-Line Arguments

  • –config_file (str):

    Path to the configuration file. If not provided, the script will default to config.json in the current directory.

  • –jobname (str):

    The name of the job. This is used to organize output directories and files.

  • –msa_path (str):

    Path to the .a3m MSA file. If not provided, the script will automatically generate it based on the output_path and jobname.

  • –output_path (str):

    Directory path where the prediction results will be saved.

  • –seq_pairs (str):

    A list of [max_seq, extra_seq] pairs in the format [[max_seq1, extra_seq1], [max_seq2, extra_seq2], …]. This defines the sequence pairing strategy for the predictions.

  • –seeds (int, nargs=’+’):

    Specifies the number of predictions to run. The default is 10.

  • –save_all (bool):

    Outputs a pickled files of all the output.

  • –platform (str):

    The platform to run the predictions on, either cpu or gpu. The default is cpu.

  • –subset_msa_to (int):

    Subset the input MSA to the specified number of sequences.

  • –msa_from (str):

    The MSA building tool used to generate the input MSA. Available options are jackhmmer or mmseqs2.

Usage Examples

Example 1: Using a Configuration File

If you have a configuration file named config.json, you can run the script as follows:

predict_ensemble --config_file config.json