Skip to content

Slow featurization when running inference for same input with multiple seeds #675

Description

@y1zhou

In run_alphafold.py, the call stack is main -> process_fold_input -> predict_structure -> featurisation.featurise_input, which calls data_pipeline.process_item repeatedly on the same inputs with different seeds. Within WholePdbPipeline.process_structure (the workhorse of process_item), the random seed is only used once towards the end of the method in features.RefStructure.compute_features. I'm wondering if the rest of the process_structure method could be taken outside of the for-loop, as it seems to be generating the same features anyways? Happy to work on a draft PR if this is the case. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions