MSA Generation Tool Documentation

This documentation provides an overview of two Python scripts designed to automate the process of generating multiple sequence alignments (MSA) using either jackhmmer or mmseqs2.

JackHMMeR Multiple Sequence Alignment

Overview

This script is designed to generate a Multiple Sequence Alignment (MSA) using jackhmmer. It processes a target sequence provided in FASTA format and outputs the resulting MSA in a format compatible with tools like colabfold_batch.

Key Features: - Configurable via command-line arguments or a JSON configuration file. - Supports the use of a RAM disk for improved performance. - Saves the resulting MSA in a specified output directory.

Usage

To run the script, use the following command:

jackhmmer_msa --config_file <path_to_config> --sequence_path <path_to_fasta> --output_path <output_dir> [optional arguments]

Command-Line Arguments: - –config_file: Path to the JSON configuration file. - –sequence_path: Path to the FASTA file containing the target sequence. - –output_path: Path to save the results. - –jobname: The job name (optional). - –homooligomers: Number of copies of the protein (optional). - –use_ramdisk: Whether to use a RAM disk for the process (optional, requires root access).

Example

jackhmmer_msa --config_file config.json --sequence_path input.fasta --output_path ./results

mmseqs2 Multiple Sequence Alignment

Overview

This script is designed to generate a Multiple Sequence Alignment (MSA) using mmseqs2. Similar to the jackhmmer script, it processes a target sequence provided in FASTA format and outputs the resulting MSA.

Key Features: - Configurable via command-line arguments or a JSON configuration file. - Saves the resulting MSA and associated files in the specified output directory.

Usage

To run the script, use the following command:

mmseqs2_msa --config_file <path_to_config> --sequence_path <path_to_fasta> --output_path <output_dir> [optional arguments]

Command-Line Arguments: - –config_file: Path to the JSON configuration file. - –sequence_path: Path to the FASTA file containing the target sequence. - –output_path: Path to save the results. - –jobname: The job name (optional).

Example

mmseqs2_msa --config_file config.json --sequence_path input.fasta --output_path ./results

Configuration Files

Both scripts allow configuration through a JSON file. Below is an example configuration file:

{
    "sequence_path": "input.fasta",
    "output_path": "./results",
    "jobname": "prediction_run",
    "homooligomers": 1,
    "use_ramdisk": false,
    "tmp_dir": "./tmp"
}

Output

Both scripts save the resulting MSA in the specified output directory. The MSA file can be used for downstream analyses, such as structure prediction or further alignment refinement.