fast_conformation.ensemble_analysis package¶
Submodules¶
fast_conformation.ensemble_analysis.analysis_utils module¶
- fast_conformation.ensemble_analysis.analysis_utils.auto_select_2d_references(references_dataset_path, analysis_type)[source]¶
Automatically select two 2D references based on the most distant mode representatives.
Parameters: references_dataset_path (str): Path to the CSV file containing reference data. analysis_type (str): The type of analysis (e.g., ‘tmscore’, ‘rmsd’) to use for mode selection.
Returns: tuple: Paths to the two selected reference structures.
- fast_conformation.ensemble_analysis.analysis_utils.create_directory(path)[source]¶
Create a directory at the specified path. If the directory already exists, it is removed and recreated.
Parameters: path (str): The path to the directory.
Raises: OSError: If the directory cannot be created.
- fast_conformation.ensemble_analysis.analysis_utils.load_config(config_file)[source]¶
Load a JSON configuration file.
Parameters: config_file (str): The path to the configuration file.
Returns: dict: The configuration as a dictionary.
Raises: FileNotFoundError: If the file does not exist. json.JSONDecodeError: If the file is not a valid JSON.
- fast_conformation.ensemble_analysis.analysis_utils.load_frames(file_list)[source]¶
Load a list of molecular dynamics files as MDAnalysis Universes.
Parameters: file_list (list): A list of file paths to load.
Returns: list: A list of MDAnalysis Universes.
- fast_conformation.ensemble_analysis.analysis_utils.load_pdb_files_as_universe(folder_path, reindex)[source]¶
Load all PDB files in the specified folder as a Universe, using the first PDB file as the topology.
Parameters: folder_path (str): Path to the folder containing PDB files. reindex (int or None): If provided, reindex the PDB files so the first residue matches this index.
Returns: MDAnalysis.Universe: The loaded Universe.
Raises: FileNotFoundError: If no PDB files are found in the folder.
- fast_conformation.ensemble_analysis.analysis_utils.load_predictions(predictions_path, seq_pairs, jobname, starting_residue)[source]¶
Load predictions from a set of PDB files as MDAnalysis Universes.
Parameters: predictions_path (str): Path to the directory containing the predictions. seq_pairs (list): A list of sequence pairs for the predictions. jobname (str): The job name associated with the predictions. starting_residue (int or None): The starting residue index for reindexing.
Returns: dict: A dictionary of Universes and associated metadata.
- fast_conformation.ensemble_analysis.analysis_utils.load_predictions_json(predictions_path, seq_pairs, jobname)[source]¶
Load pLDDT scores from JSON files associated with predictions.
Parameters: predictions_path (str): Path to the directory containing the predictions. seq_pairs (list): A list of sequence pairs for the predictions. jobname (str): The job name associated with the predictions.
Returns: dict: A dictionary of pLDDT scores for each prediction.
- fast_conformation.ensemble_analysis.analysis_utils.parabola(x, a, b, c)[source]¶
Calculate the value of a parabola given the coefficients.
Parameters: x (float or array-like): The independent variable. a (float): Coefficient for the quadratic term. b (float): Coefficient for the linear term. c (float): Constant term.
Returns: float or array-like: The value of the parabola at x.
- fast_conformation.ensemble_analysis.analysis_utils.reorder_frames_by(frames, values)[source]¶
Reorder trajectory frames based on associated values (e.g., RMSD).
Parameters: frames (MDAnalysis.Universe): The Universe containing the trajectory frames to reorder. values (list): A list of values associated with each frame.
Returns: MDAnalysis.Universe: A new Universe with frames reordered by the associated values.
- fast_conformation.ensemble_analysis.analysis_utils.save_traj(universe, traj_output_path, jobname, max_seq, extra_seq, traj_format, ordered)[source]¶
Save the trajectory of a Universe to a file.
Parameters: universe (MDAnalysis.Universe): The Universe containing the trajectory to save. traj_output_path (str): The directory where the trajectory file will be saved. jobname (str): The job name associated with the trajectory. max_seq (str): The maximum sequence associated with the trajectory. extra_seq (str): The extra sequence associated with the trajectory. traj_format (str): The format to save the trajectory in (e.g., ‘pdb’). ordered (str): A description of the ordering of the frames (default is ‘raw’).
Returns: None
fast_conformation.ensemble_analysis.pca module¶
- fast_conformation.ensemble_analysis.pca.pca_from_ensemble(jobname, prediction_dicts, output_path, align_range, analysis_range, n_clusters, widget)[source]¶
Perform Principal Component Analysis (PCA) on an ensemble of molecular dynamics predictions and generate plots.
Parameters: jobname (str): The name of the job or analysis. prediction_dicts (dict): A dictionary containing prediction data with associated MDAnalysis Universes. output_path (str): The directory where the analysis results and plots will be saved. align_range (str): Atom selection string for alignment of trajectories (MDAnalysis selection syntax). analysis_range (str): Atom selection string for PCA analysis (MDAnalysis selection syntax). n_clusters (int): The number of clusters to form using K-Means clustering. widget (object): A widget object for displaying plots interactively.
Returns: None
This function performs the following steps: 1. Aligns the trajectories based on the provided alignment range. 2. Runs PCA on the aligned trajectories, transforming the coordinates into principal components. 3. Performs K-Means clustering on the first two principal components (PC1 and PC2). 4. Fits a parabola to the PC1 and PC2 data and calculates the R² score for the fit. 5. Generates interactive scatter plots with clustering information and fitted curve using the provided widget. 6. Saves the plots and PCA data to the specified output directory.
fast_conformation.ensemble_analysis.rmsd module¶
- fast_conformation.ensemble_analysis.rmsd.build_dataset_rmsd_modes(results_dict, input_dict)[source]¶
Build a dataset from the RMSD mode analysis results and save it as a CSV file.
Parameters: results_dict (dict): A dictionary containing the results of the RMSD mode analysis. input_dict (dict): A dictionary containing job-related metadata (jobname, analysis range, output path, etc.).
Returns: None: The function saves the dataset as a CSV file in the specified output directory.
- fast_conformation.ensemble_analysis.rmsd.calculate_rmsd(u: Universe, ref: Universe | None = None, align_range: str = 'backbone', analysis_range: str = 'backbone') dict[source]¶
Calculate the Root Mean Square Deviation (RMSD) of a given molecular dynamics trajectory.
Parameters: u (mda.Universe): The MDAnalysis Universe containing the trajectory to be analyzed. ref (mda.Universe): The reference Universe for RMSD calculation. If None, the first frame of u is used. align_range (str): The atom selection string for alignment of the trajectory (default is “backbone”). analysis_range (str): The atom selection string for RMSD calculation (default is “backbone”).
Returns: dict: A dictionary containing the RMSD values for each frame with keys ‘frame’, align_range, and analysis_range.
- fast_conformation.ensemble_analysis.rmsd.rmsd_kde(rmsd_data: list, input_dict: dict, widget) dict[source]¶
Perform Kernel Density Estimation (KDE) on RMSD data and identify the most distant modes.
Parameters: rmsd_data (list): A list of RMSD values. input_dict (dict): A dictionary containing job-related metadata (jobname, max_seq, extra_seq, analysis range, etc.). widget (object): A widget object for interactive plotting.
Returns: dict: A dictionary containing information about the identified modes, including their indices, values, and densities.
- fast_conformation.ensemble_analysis.rmsd.rmsd_mode_analysis(prediction_dicts, input_dict, ref1d, widget)[source]¶
Perform 1D RMSD mode analysis for each prediction in the provided dictionary.
Parameters: prediction_dicts (dict): A dictionary containing prediction data with associated MDAnalysis Universes. input_dict (dict): A dictionary containing job-related metadata (jobname, analysis range, alignment range, etc.). ref1d (str): Path to the reference PDB file for RMSD calculation. widget (object): A widget object for interactive plotting.
Returns: dict: The updated prediction_dicts with calculated RMSD data and identified modes.
fast_conformation.ensemble_analysis.rmsf module¶
- fast_conformation.ensemble_analysis.rmsf.build_dataset_rmsf_peaks(jobname, results_dict, output_path, engine)[source]¶
Build a dataset from detected RMSF peaks and save it as a CSV file.
Parameters: jobname (str): The name of the job or analysis. results_dict (dict): A dictionary containing detected peaks for each prediction. output_path (str): Directory where the analysis results and dataset will be saved. engine (str): The name of the engine used for the analysis (e.g., AlphaFold2, OpenFold).
Returns: None: The function saves the dataset as a CSV file in the specified output directory.
- fast_conformation.ensemble_analysis.rmsf.calculate_rmsf_and_call_peaks(jobname, prediction_dicts, align_range, output_path, peak_width, prominence, threshold, widget)[source]¶
Calculate Root Mean Square Fluctuation (RMSF) and detect peaks for multiple molecular dynamics predictions.
Parameters: jobname (str): The name of the job or analysis. prediction_dicts (dict): A dictionary containing prediction data with associated MDAnalysis Universes. align_range (str): Atom selection string for alignment of trajectories (MDAnalysis selection syntax). output_path (str): Directory where the analysis results and plots will be saved. peak_width (int): Minimum width of peaks in the RMSF data to be considered. prominence (float): Prominence required for a peak in the RMSF data. threshold (float): Threshold for peak detection based on the standard deviation of RMSF values. widget (object): A widget object for displaying plots interactively.
Returns: dict: The updated prediction_dicts with detected peaks added to each prediction’s data.
- fast_conformation.ensemble_analysis.rmsf.calculate_rmsf_multiple(jobname, prediction_dicts, align_range, output_path, widget)[source]¶
Calculate Root Mean Square Fluctuation (RMSF) for multiple molecular dynamics predictions and plot the results.
Parameters: jobname (str): The name of the job or analysis. prediction_dicts (dict): A dictionary containing prediction data with associated MDAnalysis Universes. align_range (str): Atom selection string for alignment of trajectories (MDAnalysis selection syntax). output_path (str): Directory where the analysis results and plots will be saved. widget (object): A widget object for displaying plots interactively.
Returns: None
- fast_conformation.ensemble_analysis.rmsf.plot_plddt_line(jobname, plddt_dict, output_path, custom_start_residue, widget)[source]¶
Plot pLDDT scores across residues for multiple predictions.
Parameters: jobname (str): The name of the job or analysis. plddt_dict (dict): A dictionary containing pLDDT scores for each prediction. output_path (str): Directory where the analysis results and plots will be saved. custom_start_residue (int): A custom starting residue number for plotting. widget (object): A widget object for displaying plots interactively.
Returns: None
- fast_conformation.ensemble_analysis.rmsf.plot_plddt_rmsf_corr(jobname, prediction_dicts, plddt_dict, output_path, widget)[source]¶
Plot the correlation between pLDDT scores and RMSF values for each prediction.
Parameters: jobname (str): The name of the job or analysis. prediction_dicts (dict): A dictionary containing prediction data with associated MDAnalysis Universes. plddt_dict (dict): A dictionary containing pLDDT scores for each prediction. output_path (str): Directory where the analysis results and plots will be saved. widget (object): A widget object for displaying plots interactively.
Returns: None
fast_conformation.ensemble_analysis.tmscore module¶
- fast_conformation.ensemble_analysis.tmscore.build_dataset_tmscore_modes(results_dict, input_dict)[source]¶
Build a dataset from detected TM-score modes and save it as a CSV file.
Parameters: results_dict (dict): A dictionary containing TM-score mode analysis results for each prediction. input_dict (dict): A dictionary containing job-related metadata (jobname, output path, etc.).
Returns: None: The function saves the dataset as a CSV file in the specified output directory.
- fast_conformation.ensemble_analysis.tmscore.run_tmscore(folder_path, custom_ref)[source]¶
Runs TM-score for all PDB files in a specified folder against a reference structure.
Parameters: folder_path (str): Path to the folder containing PDB files. custom_ref (str): Path to the custom reference PDB file. If None, the first PDB in the folder is used.
Returns: dict: A dictionary containing the TM-scores and corresponding frame indices.
- fast_conformation.ensemble_analysis.tmscore.slice_models(universe, selection, temp_traj_path)[source]¶
Extracts a specific selection from each frame of a molecular dynamics trajectory and saves the selection as individual PDB files.
Parameters: universe (MDAnalysis.Universe): The MDAnalysis Universe containing the trajectory. selection (str): Atom selection string (MDAnalysis selection syntax) to be extracted. temp_traj_path (str): Path to the temporary directory where the PDB files will be saved.
Returns: None
- fast_conformation.ensemble_analysis.tmscore.tmscore_kde(tmscore_data: list, input_dict: dict, slice_predictions, widget) dict[source]¶
Performs Kernel Density Estimation (KDE) on TM-score data to identify modes (peaks) and finds the most distant modes for analysis.
Parameters: tmscore_data (list): List of TM-scores. input_dict (dict): A dictionary containing job-related metadata (jobname, max_seq, extra_seq, output path, etc.). slice_predictions (str): Description of the slice selection used for predictions (optional). widget (object): A widget object for displaying plots interactively.
Returns: dict: A dictionary containing the detected modes, their indices, and densities.
- fast_conformation.ensemble_analysis.tmscore.tmscore_mode_analysis(prediction_dicts, input_dict, custom_ref, slice_predictions, widget)[source]¶
Perform TM-score mode analysis on multiple molecular dynamics predictions.
Parameters: prediction_dicts (dict): A dictionary containing prediction data with associated MDAnalysis Universes. input_dict (dict): A dictionary containing job-related metadata (jobname, output path, etc.). custom_ref (str): Path to the custom reference PDB file. If None, the first PDB in the folder is used. slice_predictions (str): Description of the slice selection used for predictions (optional). widget (object): A widget object for displaying plots interactively.
Returns: dict: The updated prediction_dicts with TM-score modes added to each prediction’s data.
- fast_conformation.ensemble_analysis.tmscore.tmscore_wrapper(mobile, reference)[source]¶
Runs the TM-score comparison between a mobile structure and a reference structure.
Parameters: mobile (str): Path to the mobile PDB file. reference (str): Path to the reference PDB file.
Returns: float: The TM-score value between the mobile and reference structures.
fast_conformation.ensemble_analysis.traj module¶
- fast_conformation.ensemble_analysis.traj.save_trajs(prediction_dicts, input_dict, reorder, traj_format)[source]¶
Saves molecular dynamics trajectories for a set of predictions, with optional reordering based on analysis results.
Parameters: prediction_dicts (dict): A dictionary containing prediction data with associated MDAnalysis Universes. input_dict (dict): A dictionary containing job-related metadata (jobname, analysis range, output path, etc.). reorder (str): Specifies the method of reordering the trajectory frames before saving.
Options include ‘pca’, ‘tmscore’, or other analysis range names. If None, the trajectories are saved without reordering.
traj_format (str): The format to use for saving the trajectory files (e.g., ‘pdb’, ‘xtc’).
Returns: dict: The updated prediction_dicts with saved trajectories.
fast_conformation.ensemble_analysis.twodrmsd module¶
- class fast_conformation.ensemble_analysis.twodrmsd.TwodRMSD(prediction_dicts, input_dict, widget, ref_gr=None, ref_alt=None)[source]¶
Bases:
objectA class to perform 2D RMSD analysis on molecular dynamics simulations.
Attributes:¶
- prediction_dictsdict
A dictionary containing prediction data with associated MDAnalysis Universes.
- input_dictdict
A dictionary containing job-related metadata (jobname, analysis range, etc.).
- ref_grstr or None, optional
The reference structure file path for the first RMSD calculation (default is None).
- ref_altstr or None, optional
The reference structure file path for the second RMSD calculation (default is None).
- filtering_dictdict
A dictionary to store data related to the filtering of RMSD values.
- clustering_dictdict
A dictionary to store data related to the clustering of 2D RMSD values.
- widgetobject
A widget object to handle the plotting of the analysis results.
Methods:¶
- calculate_2d_rmsd(trial):
Calculate 2D RMSD for a given trial.
- fit_and_filter_data(rmsd_2d_data, n_stdevs):
Fit a parabola to the 2D RMSD data and filter points based on the standard deviation threshold.
- show_filt_data(rmsd_2d_data):
Plot the 2D RMSD data along with the fitted curve and filtered points.
- plot_filtering_data(rmsd_2d_data):
Generate and save a plot of the filtered 2D RMSD data with the fitted curve.
- cluster_2d_data(rmsd_2d_data, n_clusters):
Perform clustering on the filtered 2D RMSD data and store clustering results.
- plot_and_save_2d_data(output_path):
Plot the clustered 2D RMSD data, save the plot, and return a DataFrame with the clustering information.
- get_2d_rmsd(rmsd_mode_df_path, n_stdevs, n_clusters, output_path):
Execute the full 2D RMSD analysis for all trials, including fitting, filtering, clustering, and saving results.
- calculate_2d_rmsd(trial)[source]¶
Calculate 2D RMSD for a given trial.
Parameters:¶
- trialstr
The identifier for the trial being analyzed.
Returns:¶
- rmsd_2d_datanp.ndarray
A 2D array of RMSD values against two reference structures.
- cluster_2d_data(rmsd_2d_data, n_clusters)[source]¶
Perform clustering on the filtered 2D RMSD data and store clustering results.
Parameters:¶
- rmsd_2d_datanp.ndarray
A 2D array of RMSD values.
- n_clustersint
Number of clusters to form.
Returns:¶
None
- fit_and_filter_data(rmsd_2d_data, n_stdevs)[source]¶
Fit a parabola to the 2D RMSD data and filter points based on the standard deviation threshold.
Parameters:¶
- rmsd_2d_datanp.ndarray
A 2D array of RMSD values.
- n_stdevsint
Number of standard deviations to use for filtering the data.
Returns:¶
None
- get_2d_rmsd(rmsd_mode_df_path, n_stdevs, n_clusters, output_path)[source]¶
Execute the full 2D RMSD analysis for all trials, including fitting, filtering, clustering, and saving results.
Parameters:¶
- rmsd_mode_df_pathstr
The path to the RMSD mode data file.
- n_stdevsint
Number of standard deviations to use for filtering the data.
- n_clustersint
Number of clusters to form.
- output_pathstr
The path where the results will be saved.
Returns:¶
None
- plot_and_save_2d_data(output_path)[source]¶
Plot the clustered 2D RMSD data, save the plot, and return a DataFrame with the clustering information.
Parameters:¶
- output_pathstr
The path where the plot will be saved.
Returns:¶
- dfpd.DataFrame
A DataFrame containing the clustering information.
fast_conformation.ensemble_analysis.twotmscore module¶
- class fast_conformation.ensemble_analysis.twotmscore.TwoTMScore(prediction_dicts, input_dict, widget, ref_gr=None, ref_alt=None, slice_predictions=None)[source]¶
Bases:
objectA class to perform 2D TM-Score analysis on molecular dynamics simulations.
Attributes:¶
- prediction_dictsdict
A dictionary containing prediction data with associated MDAnalysis Universes.
- input_dictdict
A dictionary containing job-related metadata (jobname, analysis range, etc.).
- slice_predictionsstr or None
A selection string for slicing the predictions (default is None).
- ref_grstr or None
The reference structure file path for the first TM-Score calculation (default is None).
- ref_altstr or None
The reference structure file path for the second TM-Score calculation (default is None).
- filtering_dictdict
A dictionary to store data related to the filtering of TM-Score values.
- clustering_dictdict
A dictionary to store data related to the clustering of 2D TM-Score values.
- widgetobject
A widget object to handle the plotting of the analysis results.
Methods:¶
- slice_models(universe, selection, temp_traj_path):
Static method to slice models according to a given selection and save the results.
- tmscore_wrapper(mobile, reference):
Static method to run the TM-Score command and return the TM-Score value.
- run_tmscore(folder_path, custom_ref):
Run TM-Score on all PDB files in a folder against a custom reference structure.
- calculate_2d_tmscore(trial):
Calculate 2D TM-Score for a given trial.
- fit_and_filter_data(tmscore_2d_data, n_stdevs):
Fit a parabola to the 2D TM-Score data and filter points based on the standard deviation threshold.
- plot_filtering_data(tmscore_2d_data):
Generate and save a plot of the filtered 2D TM-Score data with the fitted curve.
- cluster_2d_data(tmscore_2d_data, n_clusters):
Perform clustering on the filtered 2D TM-Score data and store clustering results.
- plot_and_save_2d_data(output_path):
Plot the clustered 2D TM-Score data, save the plot, and return a DataFrame with the clustering information.
- get_2d_tmscore(tmscore_mode_df_path, n_stdevs, n_clusters, output_path):
Execute the full 2D TM-Score analysis for all trials, including fitting, filtering, clustering, and saving results.
- calculate_2d_tmscore(trial)[source]¶
Calculate 2D TM-Score for a given trial.
Parameters:¶
- trialstr
The identifier for the trial being analyzed.
Returns:¶
- tmscore_2d_datanp.ndarray
A 2D array of TM-Score values against two reference structures.
- cluster_2d_data(tmscore_2d_data, n_clusters)[source]¶
Perform clustering on the filtered 2D TM-Score data and store clustering results.
Parameters:¶
- tmscore_2d_datanp.ndarray
A 2D array of TM-Score values.
- n_clustersint
Number of clusters to form.
Returns:¶
None
- fit_and_filter_data(tmscore_2d_data, n_stdevs)[source]¶
Fit a parabola to the 2D TM-Score data and filter points based on the standard deviation threshold.
Parameters:¶
- tmscore_2d_datanp.ndarray
A 2D array of TM-Score values.
- n_stdevsint
Number of standard deviations to use for filtering the data.
Returns:¶
None
- get_2d_tmscore(tmscore_mode_df_path, n_stdevs, n_clusters, output_path)[source]¶
Execute the full 2D TM-Score analysis for all trials, including fitting, filtering, clustering, and saving results.
Parameters:¶
- tmscore_mode_df_pathstr
The path to the TM-Score mode data file.
- n_stdevsint
Number of standard deviations to use for filtering the data.
- n_clustersint
Number of clusters to form.
- output_pathstr
The path where the results will be saved.
Returns:¶
None
- plot_and_save_2d_data(output_path, widget)[source]¶
Plot the clustered 2D TM-Score data, save the plot, and return a DataFrame with the clustering information.
Parameters:¶
- output_pathstr
The path where the plot will be saved.
Returns:¶
- dfpd.DataFrame
A DataFrame containing the clustering information.
- plot_filtering_data(tmscore_2d_data)[source]¶
Generate and save a plot of the filtered 2D TM-Score data with the fitted curve.
Parameters:¶
- tmscore_2d_datanp.ndarray
A 2D array of TM-Score values.
Returns:¶
None
- run_tmscore(folder_path, custom_ref)[source]¶
Run TM-Score on all PDB files in a folder against a custom reference structure.
Parameters:¶
- folder_pathstr
The path to the folder containing PDB files.
- custom_refstr
The custom reference structure file path.
Returns:¶
- tmscore_dictdict
A dictionary containing the TM-Score results for each frame.
- static slice_models(universe, selection, temp_traj_path)[source]¶
Static method to slice models according to a given selection and save the results.
Parameters:¶
- universeMDAnalysis.Universe
The Universe containing the trajectory data.
- selectionstr
The selection string for slicing the models.
- temp_traj_pathstr
The path where the sliced models will be saved.
Returns:¶
None
- static tmscore_wrapper(mobile, reference)[source]¶
Static method to run the TM-Score command and return the TM-Score value.
Parameters:¶
- mobilestr
The path to the mobile structure (the structure being compared).
- referencestr
The path to the reference structure.
Returns:¶
- tmscorefloat
The TM-Score value from the comparison.