The PSP dataset (Additional Parts) is fully accessible to the public under the Creative Commons 4.0 License (CC-BY-4.0 License). The original PSP dataset, which was used to train the MEGAFold-monomer model, can be found at: [http://ftp.cbi.pku.edu.cn/psp/](http://ftp.cbi.pku.edu.cn/psp/). In addition to this, we provide paired MSA files, which are essential for training protein complexes. These files were used to train both GRASP and MEGAFold-multimer models. For use of the original monomer PSP dataset, please cite: Liu, S. et al., PSP: Million-level Protein Sequence Dataset for Protein Structure Prediction. arXiv:2206.12240 (2022). doi: 10.48550/arXiv.2206.12240. For use of the PSP dataset (Additional Parts) in this directory, please also cite: Xie, Y., et al., Integrating various Experimental Information to Assist Protein Complex Structure Prediction by GRASP. bioRxiv:2024.09.16.613256(2024). doi: 10.1101/2024.09.16.613256. The dataset is organized as follows: ``` PSP/ ├── paired_msa_tar/ # 177GB of .pkl packages containing paired MSA data │ └── sample_data/ # A sample .pkl file from the paired_msa_tar package # Each .pkl file contains paired MSA and deletion_matrix # for all unique sequences of a given protein complex ```