Family-wide analysis of integrin structures predicted by AlphaFold2

Recent advances in protein structure prediction using AlphaFold2, known for its high efficiency and accuracy, have opened new avenues for comprehensive analysis of all structures within a single protein family. In this study, we evaluated the capabilities of AphaFold2 in analyzing integrin structures. Integrins are heterodimeric cell surface receptors composed of a combination of 18 α and 8 β subunits, resulting in a family of 24 different members. Both α and β subunits consist of a large extracellular domain, a short transmembrane domain, and typically, a short cytoplasmic tail. Integrins play a pivotal role in a wide range of cellular functions by recognizing diverse ligands. Despite significant advances in integrin structural studies in recent decades, high-resolution structures have only been determined for a limited subsets of integrin members, thus limiting our understanding of the entire integrin family. Here, we first analyzed the single-chain structures of 18 α and 8 β integrins in the AlphaFold2 protein structure database. We then employed the newly developed AlphaFold2-multimer program to predict the α/β heterodimer structures of all 24 human integrins. The predicted structures show a high level of accuracy for the subdomains of both α and β subunits, offering high-resolution structure insights for all integrin heterodimers. Our comprehensive structural analysis of the entire integrin family unveils a potentially diverse range of conformations among the 24 members, providing a valuable structure database for studies related to integrin structure and function. We further discussed the potential applications and limitations of the AlphaFold2-derived integrin structures.


Introduction
Integrins are cell surface receptors that recognize a variety of extracellular or cell surface ligands, enabling communication between the cell's interior and exterior [1]. The human integrin family consists of 24 members, formed through the combination of 18 α and 8 β subunits (Fig. 1). These 24 α /β integrin heterodimers are either widely distributed or specifically expressed in particular cell types. As a result, they serve universal or specialized functions in cellular processes related to cell adhesion and migration. Based on ligand or cell specificity, integrins can be categorized into subfamilies, including RGD (Arg-Gly-Asp) receptors, collagen receptors, laminin receptors, and leukocyte-specific receptors [1]. Integrins play pivotal roles in various diseases such as thrombosis, inflammation, and cancer, rendering them attractive therapeutic targets by small molecule or antibody inhibitors [2][3][4]. Since their discovery in the early 1980s, research into integrin structure and function has been a continuous area of interest [5]. I domains combine to form the ligand binding site ( Fig. 1A-C). In contrast, for α I-containing integrins, the α I domain is responsible for ligand binding (Fig.  1D-F). Integrin ectodomains can also be divided into the headpiece (containing the head and upper legs) and lower leg domains (Fig. 1C). In the past few decades, structural studies of integrins have unveiled a conformation-dependent activation and ligand binding mechanism, involving transitions among at least three conformational states [6,7]. The bent conformation with closed headpiece represents the resting state of integrin (Fig. 1A, D), while the extended closed headpiece and extended open headpiece represent the intermediate and high-affinity active states, respectively (Fig. 1B-C, E-F). The conformational transition of integrin can be initiated by the binding of intracellular activators, including talin and kindlin, to β CT, leading to inside-out signaling, or by the binding of extracellular ligands, resulting in outside-in signaling ( Fig. 1A-F) [8,9]. However, it's worth noting that the conformation-dependent activation model was primarily derived from structural studies of the highly-regulated β 2 and β 3 integrins, primarily expressed in blood cells [6,7]. Given the limited structural information available for most integrin members, it remains uncertain whether the current model of integrin conformational changes can be applied to the entire integrin family.
Since the publication of the first high-resolution crystal structure of the α V β 3 ectodomain in 2001 [10], substantial efforts have been dedicated to determining integrin structures using a variety of methods, including crystallography, negative-stain electron microscopy (EM), nuclear magnetic resonance (NMR), and more recently, cryogenic EM (cryo-EM). However, high-resolution structure information remains limited to only a few integrin members, including α V β 3 [10][11][12], 10 [20,[39][40][41][42][43]. The recent breakthrough in protein structure prediction using the artificial intelligence-based AlphaFold2 program has provided a powerful tool for analyzing previously challengingto-determine protein structures with a remarkable level of accuracy [44]. We conducted an analysis of the predicted atomic structure models of single-chain 18 α and 8 β integrins that are available in the AlphaFold2 database (Fig. 1G). Moreover, using the recently developed AlphaFold2-multimer program [45], we predicted the structures of all 24 human integrin α /β heterodimers. Our structural analysis of the entire integrin family revealed potential conformational diversity across its 24 members, with the identification of previously unknown structural features. Our study compiled a comprehensive database of integrin structures that can serve as a valuable resource for guiding functional and structural studies. Despite the limitations of predicted structures, these findings underscore the efficacy of AlphaFold2 in the family-wide structure prediction of large and complex proteins.

Databases and software
The single-chain structures of 18 α and 8 β integrin subunits were downloaded from the AlphaFold2 database (https://alphafold.ebi.ac.uk Typically, one GPU (--gres=gpu:1) and 100 GB memory (--mem=100gb) was requested to run AlphaFold2. The maximum job running time was set to 48 h (--time=48:00:00). To run AlphaFold2-Multimer for structure prediction of integrin heterodimers, an input fasta file containing the sequences of both integrin α and β subunits was provided. The multimer prediction function was enabled with command "--model_preset (m)=multimer". Full length or extracellular domain structures of integrin heterodimers without signal peptides were predicted with or without templates by setting the parameter of "--max_template_date(-t)=2000-05-14" or "--max_template_date(-t)=2023-01-01". For integrin α 6 β 4 structure prediction, the large cytoplasmic tail of β 4 was truncated after KGRDV to simplify the prediction. The top ranked models were selected for further analysis.

Comparison of AlphaFold2 predicted integrin structures
The single chain

Flow Cytometry Analysis of LIBS (Ligand Induced Binding Site) mAb binding
The LIBS rat mAb 9EG7 (cat# 553715, BD Biosciences) was used to measure the conformational extension of β 1 integrin. The mouse mAb MAR4 (cat# 555442, BD Biosciences) was used to measure total surface expression of β 1 integrin. The HEK293T cells were grown in complete DMEM (cat# 10-017-CV, Corning) supplemented with 10% fetal bovine serum (FBS) (cat# F2442, Sigma-Aldrich). Cells were maintained in a 37°C incubator with 5% CO 2 . Flow cytometry analysis of integrin expression and LIBS mAb binding were as described previously [47]. In brief, HEK293T cells were transfected with EGFP-tagged α integrin constructs plus β 1 integrin. 48 hours post-transfection, the cells were detached, washed, and resuspended in HBSGB buffer (25 mM HEPES pH 7.4, 150 mM NaCl, 2.75 mM glucose, 0.5% BSA) containing 1 mM Ca 2+ /Mg 2+ or 0.1 mM Ca 2+ plus 2 mM Mn 2+ . Cells were incubated with 5 μ g/ml of either 9EG7 mAb or MAR4 for 15 mins, followed by additional 15 min incubation with 10 μ g/ml Alexa Fluor 647-conjugated goat anti-rat IgG (cat# A-21247, Invitrogen) or goat antimouse IgG (cat # A-21235, Invitrogen). Surface binding of mAb was measured by a BD Accuri TM C6 (BD Biosciences). The results were presented as a normalized mean fluorescence intensity (MFI) by calculating the MFI of 9EG7 binding (recognizing extended β 1 ) as a percentage of the MFI of MAR4 binding (recognizing total β 1 ). The plot was generated with Prism 9 software.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. A PyMol session file, named as "integrin-alphafold2-structure.pse", that includes the predicted structures of integrin heterodimers were deposited online as supplementary materials. We extracted the structures of 18 human α integrins from the AlphaFold2 protein structure database. These integrin structure models in the AlphaFold2 database were predicted based on the full-length single-chain amino acid sequence, encompassing the signal peptide, ectodomain, TM and CT domains (Fig. 1G). The predicted models showed high accuracy in the domain structures, as indicated by the high score of predicted local distance difference test (pLDDT) (Fig. 1G). To compare the overall conformation of α integrin ectodomains, we superimposed all structures onto the calf-2 domain of α IIb , vertically oriented them relative to the cell membrane, and adjusted their rotations to display the position of the β -propeller domain relative to the membrane (Fig.  2). We also superimposed the AlphaFold2 structures with the experimental structures of . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  α V displayed a sharp bent conformation, nearly identical to their crystal structures ( Fig. 2A). However, the α 5 AlphaFold2 structure exhibited a more bent conformation than its half-bent cryo-EM structure ( Fig.  2A). Among the three laminin receptors, only α 6 adopted a sharp bent conformation like the RGD receptors, while α 3 and α 7 showed a half-bent conformation (Fig. 2B). The α 4 and α 9 integrins also adopted a bent conformation similar to the RGD receptors (Fig.  2C). Interestingly, all four α integrins of collagen receptors exhibited more extended conformation than bent, with α 10 in a nearly fully extended conformation (Fig. 2D). The five leukocyte-specific Previous structural studies have revealed that the extension of α integrin occurs at the interface between the thigh and calf-1 domains, where a disulfide bonded knob, known as the genu, is located (Fig. 1C, F). We conducted a sequence alignment of all α integrins at the junction between the thigh and calf-1 domains (Fig. 4A). To illustrate the interface between the thigh and calf-1 domains in a bent conformation, we used the structure of α IIb as an example (Fig. 4B). Interfacial residues are highlighted in red in the sequence alignment (Fig. 4A) and shown as red sticks in the structure (Fig. 4B). The sequence alignment reveals that the interfacial residues, as well as the putative Nglycan sites, are not highly conserved (Fig. 4A). Some   (Fig. 4A). However, no signature sequences appear to indicate a preference for a bent or extended conformation.

The
The integrin β subunit extends at the junction between I-EGF-1 and I-EGF-2 (Fig. 1C,  F). Sequence alignment of the eight human β integrins in this region showed no obvious . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 17, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint residue conservation, except for the typical disulfide bonds of EGF domains (Fig. 4C).
In the bent conformation of β 3 integrin (Fig. 4D), the interface between I-EGF-1 and I-EGF-2 is considerably smaller compared to the interface between the thigh and calf-1 in α IIb (Fig. 4B), suggesting that it is unlikely to play a major role in maintaining the bent structure. However, the length of C1-C2 loop in I-EGF-2 domain has been shown to regulate integrin extension [48]. Notably, a landmark disulfide bond is absent in the I-EGF-1 domain of β 8 (Fig. 4C), which may contribute, at least in part, to the distinct conformational regulation of β 8 integrin.

α 10 integrin prefers an extended conformation on cell surface
Among the 18 α integrins, the AlphaFold2 structure of α 10 reveals an extended conformation (Fig. 2D). This extended structure was also observed in α 10 from other species, such as mouse, rat, and zebrafish (Fig. 5A). Notably, we identified a conserved putative N-glycan site (N839 in human) at the interface between the thigh and calf-1 domains of α 10 integrin (Fig. 5A). It is plausible that N-glycans at this site may interfere with the bent conformation (Fig. 5B).
It is important to note that α 10 integrin only forms heterodimer with β 1 integrin. The AlphaFold2 structures of human, mouse, cat, and chicken β 1 integrins all exhibit a halfbent conformation (Fig. 5C). To measure the conformation of α 10 β 1 on cell surface, we used mAb 9EG7, which recognizes the β 1 I-EGF-2 epitope that remains concealed in the bent conformation (Fig. 5C) [49]. This antibody reports on   (Fig. 5D). Notably, Mn 2+ did not further increase 9EG7 binding to α 10 -EGFP/β 1 cells (Fig. 5D). Mutating the putative N-glycan site at the interface between the thigh and calf-1 domains of α 10 integrin (α 10 -N839Q) did not affect 9EG7 binding (Fig. 5D). These data suggest that α 10 integrin maintains a constitutively extended conformation on cell surface. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 17, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint multimer module for structure prediction of all 24 human integrin heterodimers. To prevent potential model bias arising from the templates of experimental integrin structures, we set the template search date to the year 2000, predating the reporting of any integrin heterodimer structures. Impressively, AlphaFold2-multimer successfully predicted the structures of all 24 integrin heterodimers (Fig. 6). These structures were categorized based on their ligand or cell specificity. To facilitate conformational comparisons, all structures were superimposed onto the α IIb calf-2 domain and individually orientated to align their ectodomains vertically with respect to the cell membrane (Fig. 6). Overall, the inter-subunit interfaces, including those of the α β Among the RGD receptors, all adopted a bent conformation, except α 5 β 1 , which exhibited a half-bent (Fig. 6A). Similarly, all laminin receptors, including α 7 β 1 , displayed a bent conformation (Fig. 6B). Interestingly, the four collagen receptors, including α 10 β 1 , exhibited a half-bent structure (Fig. 6C), in contrast to the AlphaFold2 structure of single-chain α 10 showing a more extended conformation (Fig. 2D). Within the leukocytespecific integrins, only α E β 7 appeared to be more extended (Fig. 6D). The AlphaFold2 structures of α 4 β 1 and α 9 β 1 showed a half-bent conformation (Fig. 6E). It is worth noting that the relative orientation of the TM domains to the cell membrane was not correctly predicted for most of the structures.
As nine of the integrin structures predicted by AlphaFold2-multimer show artificial interactions between TM-CT and ectodomains (Fig. 7A), we asked whether such interactions have any impact on the overall integrin structure prediction. We performed AlphaFold2-multimer modeling of the nine integrin structures in the absence of the TM-CT sequences. Surprisingly, the resulting structures closely matched those containing TM-CT domains (Fig. 7B), indicating that the structure modeling of the ectodomain and TM-CT domains does not influence each other during the structure calculation by AlphaFold2-multimer.
We proceeded to investigate whether the provision of structure templates to AlphaFold2-multimer had any impact on the calculation of integrin structures. For this test, we selected  (Fig. 8A-D), closely resembling the crystal structure of bent α IIb β 3 (Fig. 8A). For  (Fig. 8A). Indeed, the α 5 β 1 -2000 structure closely resembles the α 5 β 1 cryo-EM structure (Fig. 8A). Similarly, the  (Fig. 8C). In . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  (Fig. 8D). These findings suggest that the inclusion of template structures can significantly influence the outcomes of structure prediction by AlphaFold2-multimer.

Structures of integrin TM-CT domains
Despite the apparent simplicity of the sequences and structures of integrin TM and CT domains, experimental structure determination has been limited to the α human integrins, we identified conservative features at the TM, membrane-proximal (MP), and membrane-distal (MD) regions (Fig. 9A). These sequence conservative features were analyzed in the 24 integrin TM-CT structures calculated by AlphaFold2multimer (Fig. 9B). Structure alignment revealed a high degree of structural similarity among the heterodimers at the TM domain, highlighting the conserved GXXXG motif in α and the conserved small G/A residue in β at the α /β TM interface (Fig. 9B). In the CT MP regions, the conserved GFFKR motif in the 18 α integrins all adopted a reverse turn conformation. The  (Fig. 9B). The conserved β CT Asp residue is positioned proximal to the conserved Arg residue in the α GFFKR motif (Fig. 9B), which was proposed to form a salt bridge interaction [50]. The CT MD regions of both α and β subunits exhibited diverse disordered conformations, including the conserved NPXY motif responsible for binding talin (Fig. 9B). Despite not including any integrin TM-CT structure templates during the AlphaFold2-multimer calculation, the predicted α IIb β 3 TM-CT structure closely resembled the experimentally determined structure (Fig. 9C). Additionally, we performed AlphaFold2-multimer modeling for the α IIb β 3 TM-CT structure in the absence of the ectodomain, which showed a similar TM interface as the model generated along with the ectodomain (Fig. 9D). These results suggest that AlphaFold2-multimer is capable of accurately predicting integrin TM structures. β 1 that were predicted to be bent by AlphFold2, suggesting that α 7 integrin may be prone to becoming extended. These results imply that the overall integrin conformations predicted by AlphaFold2, especially for the underexplored integrins, can serve as reference structures for proposing functional assays.

Discussion
AlphaFold2-multimer has demonstrated remarkable success in accurately predicting the structures of protein complexes, including those with transient interactions, multiple subunits, and large interfaces [51,52]. Here, we have shown that the AlphaFold2multimer algorithm successfully predicts the structures of large complexes of integrin heterodimers. Furthermore, AlphaFold2 has exhibited excellent performance in predicting both fragment and full-length integrin structures. For instance, AlphaFold2multimer's modeling of integrin ectodomains remains consistent regardless of the presence or absence of TM-CT domains. Similarly, AlphaFold2-multimer effectively modeled the heterodimeric structures of integrin TM-CT domains, even in the absence of the ectodomain. Additionally, AlphaFold2 modeling may capture the conformational heterogeneity and intrinsically disordered regions, as observed in the predicted structures of integrin cytoplasmic domains.
Despite its relatively high accuracy, the integrin structures predicted by AlphaFold2 have apparent limitations. Notably, these predictions do not incorporate glycan structures and essential metal ions, both of which are vital components for integrin structure and function. Furthermore, the relative orientation between ectodomain and TM-CT domains cannot be correctly modeled by AlphaFold2. Additionally, our . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 17, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint observations indicate that AlphaFold2-multimer predictions can be influenced by the homologous structures, potentially introducing bias when modeling the overall conformation of integrins. Therefore, we recommend conducting integrin structure calculations with AlphaFold2-multimer in the absence of template structures.

Conclusions
In summary, our family-wide structural analysis of integrins using AlphaFold2 showcases the remarkable capabilities of AlphaFold2 in modeling complex structures. The comprehensive structure database containing all 24 integrin heterodimers can be used as high-resolution structure resources for advancing both structural and functional studies within the integrin family.

Funding
This work was supported by the grant R01 HL131836 (to J. Zhu) from the Heart, Lung, and Blood Institute of the National Institute of Health.

Declaration of Competing Interest
None declared.

Acknowledgement
We thank the Research Computing Center at the Medical College of Wisconsin for providing the help and resources in running the AlphaFold2 program.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 17, 2023.    (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 17, 2023. ; https://doi.org/10.1101/2023.05.02.539023 doi: bioRxiv preprint