Activation pathway of a G protein-coupled receptor uncovers conformational intermediates as targets for allosteric drug design

G protein-coupled receptors (GPCRs) are the most common proteins targeted by approved drugs. A complete mechanistic elucidation of large-scale conformational transitions underlying the activation mechanisms of GPCRs is of critical importance for therapeutic drug development. Here, we apply a combined computational and experimental framework integrating extensive molecular dynamics simulations, Markov state models, site-directed mutagenesis, and conformational biosensors to investigate the conformational landscape of the angiotensin II (AngII) type 1 receptor (AT1 receptor) — a prototypical class A GPCR—activation. Our findings suggest a synergistic transition mechanism for AT1 receptor activation. A key intermediate state is identified in the activation pathway, which possesses a cryptic binding site within the intracellular region of the receptor. Mutation of this cryptic site prevents activation of the downstream G protein signaling and β-arrestin-mediated pathways by the endogenous AngII octapeptide agonist, suggesting an allosteric regulatory mechanism. Together, these findings provide a deeper understanding of AT1 receptor activation at an atomic level and suggest avenues for the design of allosteric AT1 receptor modulators with a broad range of applications in GPCR biology, biophysics, and medicinal chemistry.


Structures
To confirm the activation pathway of the AT1 receptor discovered by our extensive MD simulations, we also collected currently published class A GPCR structures, measured their activation parameters (distance between L 5.55 and N 7.46 ; angle among V 2.41 , S 6.47 , and F 6.34 ), and projected them to the free energy landscape. Supplementary Data file 2 shows the statistical data of all collected GPCRs. We defined the receptor structures with inverse agonist-or antagonist-bound forms as inactive conformations, only agonist-bound forms as active conformations, and both agonist and G protein-, βarrestin-, or nanobody-bound forms as fully active conformations.

Supplementary Note 3: Unusual GPCR Structures on the Free Energy Landscape
Several class A GPCRs have unique activation mechanisms. Thus, they are outliers in Fig. 2B and we show the specific structures in this Supplementary Note. As shown in Supplementary Fig. 3A, the inactive human platelet-activating factor receptor (PAFR) (PDB ID: 5ZKP) has a TM2 outward movement which increases the angle among 6.34, 6.47, and 2.41 1 . For the P2Y12R (PDB ID: 4NTJ) in Supplementary Fig. 3B, the pose of TM6 is distinctive. N 6.34 is on the edge of ICL3 and moves outwards, while C 6.47 becomes close to the center of TM bundles 2 . Thus, the activation angle is larger than other inactive structures.
In Supplementary Fig. 4, we show the comparison among two active CB1R and one active CB2R structures. CB1R (PDB ID: 5XR8) and CB2R (PDB ID: 6KPC) are GPCR-AM-841 complexes, while CB1R (PDB ID: 6KQI) has both an agonist CP55940 and a negative allosteric modulator ORG27569 bound. 6KQI and 6KPC sample the same inactive-like conformation, whereas 5XR8 adopts an active-like conformation due to their different compositions.
As for the fully active structures, the most obvious outliers are rhodopsin structures ( Supplementary Fig. 5). During activation. Rhodopsin receptor undergoes a limited TM5-TM7 inward movement and has a larger distance between residue positions at 5.55 and 7.46. Thus, they cluster at the right of the active cloud.

Supplementary Note 4: Distribution of G protein-, β-arrestin-, and nanobody-bound GPCR Complex on the Free Energy Landscape
We selected the G protein-, β-arrestin-, and nanobody-bound GPCR complexes from Supplementary Data file 1 and projected them on the free energy landscape ( Supplementary Fig. 6). G protein-bound non-rhodopsin GPCR structures are mostly distributed in the active cloud, while the structures of rhodopsin-G protein complex situate at the right of the active basin. Gs protein has a bulkier α5 helix, which causes a Currently, only four non-rhodopsin structures with β-arrestin bound (6U1N, 6PWC, 6UP7, and 6TKO) have been solved, so the distribution tendency is not clear.
However, it is inferred that β-arrestin binding leads to a similar TM6 displacement with Gi/o binding rather than Gs binding 3 .
As for the nanobody-bound receptor complexes, some of them cluster in the position close to GPCR-Gs complexes because they are Gs-mimic nanobodies.
Nanobody-bound receptor structures can sample both the β-arrestin-and G proteinbound structure clusters. Remarkably, some nanobodies can stabilize the intermediate (nanobody 6 for succinate receptor SUCNR1) or inactive (nanobody 6 for κ-OR and nanobody 60 for β2AR) structures. These receptor structures are located at a common position for corresponding structures without nanobody.  To guarantee the accuracy of our analysis, we set 8 ns as the lag time of our system.
According to the shape of the AT 1 receptor activation pathway, we further clustered the microstates into three macrostates using the Perron Cluster Cluster Analysis (PCCA+) algorithm. Through the Chapman-Kolmogorov test shown in Supplementary   Fig. 7B, the transition probability estimated by MSM is highly close to the practical transition process 6 . Thus, our MSM estimation is validated in both microstates and macrostates.   Fig. 9). Supplementary Fig. 9A shows the movement of ECL2 and TM6 towards the AngII pocket during activation, which finally closes the pocket. In Supplementary Fig. 9B, H8 generally moves upwards during activation in order to accommodate downstream transducers. In the inactive structure, H8 forms a large angle with TM7, which is unable to accommodate a ligand in the inactive structure. Moreover, the active macrostate has a tight space between TM1, TM7, and H8 that a pocket cannot be formed. Thus, a potential cryptic pocket may exist among TM1, TM7, and H8 in the intermediate state.

Supplementary Note 8: Secondary Structure of Different Macrostates
Using the mdtraj package, we extracted structures corresponding to each macrostate into trajectories (see Methods). These trajectories conclude snapshots far more than single representative structures and provide more general information for different AT1 receptor states during simulations. Thus, we applied the DSSP algorithm to reflect the flexibility of the whole structure. The output secondary structure classification of each residue in its snapshot course was shown in Supplementary Fig. 10.
Overall, the AT1 receptor maintained its seven helical architecture, and the macrostates showed limited fluctuation when compared with themselves; this confirmed the accuracy of our classification. However, the active AT1 receptor had a more stable ECL2 between TM4 and TM5 than both the inactive and intermediate states,  shown on the right. The figures were drawn by VMD.

Transmission of Each State
Community analysis was also conducted to elucidate the signal transmission in different AT1 receptor. During the community analysis, the community networks of distinct AT1 receptor were determined as sets of nodes connected by weighted edges. Cα atom for each residue was defined as a node, while nodes are connected if the minimum distance between the residues were lower than 4.5 Å for at least 75% of the representative trajectories. Floyd-War-shall algorithm calculated the optimal paths among all pairs of nodes 8 . Then, we counted the pairwise optimal paths between nodes and set them as the betweenness. A community was formed if and only if a group of nodes are more intraconnected with each other than inter-connected with other nodes. With the help of the Girvan-Newman algorithm, communities were further optimized to maximize the modularity measure 9 . Of note, communities with too few (fewer than three) residues or only with connection to one other community were merged to the nearby communities.
The final community distributions for AT1 receptors are shown in Supplementary Fig.   11.
The global complexity of connection was decreased upon activation, but the key interactions among the areas were promoted in the active state. For instance, the connections between C3 (intracellular TM3 and TM5), C4 (ICL2 and around TM5 and TM6), and C5 (intracellular TM2 and TM4), which represent the interaction of the transducer pocket with other AT1 receptor regions, were enhanced in the active state.
This implies that some unnecessary interactions are quenched during activation, but the connections that transmit the activation signal to the G protein pocket are stronger. The absence of interactions between C4 and C5 in the intermediate state may reflect a transition state to form new connections. In addition, H8 was not an individual community in the active AT1 receptor but merged itself with TM1, which reflects that it couples with other parts more during activation.  Environment. To place ligand in the pocket, the triangle matcher algorithm and London dG score were applied to obtain 60 poses. Then, we set the receptor rigid and used GBVI/WSA dG score in the refinement step. We chose the best-scored pose for system setup in the output 10 poses.
Following the system setup steps in Methods, we constructed olmesartan and AngII systems for simulation. The force field of olmesartan was generated by CHARMM General Force Field 10 . Minimization, heating, and equilibrium processes were the same as settings in Methods. Then, 2 μs conventional MD simulations were finished to further relax the system.
Gaussian accelerated MD (GaMD) is an enhanced sampling algorithm in simulation and successfully used in GPCR systems 11,12 . During GaMD, a harmonic boost potential is added to the system when the system potential V(r) is lower than where k is the harmonic force constant. Ethresh and k are adjustable parameters determined by two criteria. One is the extra energy does not change the previous energy sequence of each conformation, the other is the boost potential decreases the energy distinction between conformations. Thus, the range of k and Ethresh is set by where Vmax and Vmin are the system maximum and minimum potential energies,

respectively. To make Supplementary Equation 5 valid, Supplementary Equation 6
should be guaranteed.
where k0 is determined by Supplementary Equation 7.
In Supplementary Equation 7, Vavg and are the average and standard variation of potential energies, while 0 is a user-defined upper limit.
Our independent three rounds of GaMD runs started with randomized initial atomic velocities. We boosted total potential and dihedral energy in simulation. The average and standard variation of energy were calculated every 400 ps, while the upper limit was 6.0 kcal/mol. In each run, 26-ns short conventional MD simulation was firstly applied to evaluate potential energies for acceleration parameters, then a 50-ns equilibration run with the boost energy was employed. The production runs are 1 μs and the sampling timestep was 100 ps.
After simulations, we also calculated the activation parameters (distance between the Cα atoms of L 5.55 and N 7.46 , the angle between the Cα atoms of F 6.34 , S 6.47 , and V 2.41 ) for each system. The corresponding free energy landscapes were shown in Supplementary Fig. 13. In the AngII binding, the TM6 angle of the AT1 receptor largely adopted more than 55° and the tendency fitted the outward movement of TM6, which resembles to the active state. On the contrary, olmesartan binding rendered the AT1 receptor less active in the TM6 angle and moved the free-energy landscape downwards.
As for the distance index, major conformers (darkest part) of the AngII-bound AT1

Supplementary Note 12: Intermediate-specific Interactions in the Other Two States
We compared the three macrostates and found several intermediate-specific microswitches ( Supplementary Fig. 14).  Fig. 14A). In addition, the TM6 outward movement generates a hydrogen bond between I 6.37 and Y 5.58 in the intermediate state ( Supplementary Fig. 14B). TM6 movement also causes the formation of a hydrophobic network among M 6.38 , W 5.62 , and F 6.34 in the intracellular intermediate structure ( Supplementary Fig. 14C). As for the H8 movement, the hydrophobic contacts among V 1.53 , V 1.56 , and F 8.50 only exist in the intermediate state ( Supplementary Fig. 14D). Supplementary Fig. 14

Supplementary Note 15: TICA Analysis of Trajectories
To build an MSM based on tICA, we firstly aligned structures on the first one to elucidate the influence of translation and rotation, then we featurized the Cartesian coordinates in trajectories to decrease the dimension. After the alignment finished by CPPTRAJ, the "add_backbone_torsions" function provided by PyEMMA was used to extract all backbone phi/psi angles during simulation, reflecting the global movement of the membrane-embedded AT1 receptor 7,13 . Next, considering the great ability of tICA to describe slow dynamics in simulation, the "coordinates_tica" method was applied for dimensionality reduction to 2-dimension 14 16B) proved that the 8-macrostate model matches the practical transformation that happened in simulation. Next, representative trajectories for macrostates were extracted by mdtraj using snapshots close to k-means centers. According to Sij calculated by Eq.
(5), the structure with the most Sij on each trajectory was identified as the representative structure, which was projected onto Fig. 5A according to its activation parameter. With the help of the MFPT algorithm, the transition time among macrostates was determined by the "msm.estimate_markov_model.mfpt" method.

Supplementary Note 20: Clustered Mutation Experiment of P6 in the AT 1 Receptor
In order to confirm the existence of P6, we first designed 3 direct mutation clusters around P6 and tested the corresponding influence on Gq and β-arrestin 2 signal, respectively. As shown in Supplementary Fig. 22 where i reflects the residue index. ε is associated with the normal modes ( ) and (P) indicates the state of a protein.
where σ represents a vector of Gaussian variables with variance