ELIHKSIR Web Server: Evolutionary Links Inferred for Histidine Kinase Sensors Interacting with Response Regulators

Two-component systems (TCS) are signaling machinery that consist of a histidine kinases (HK) and response regulator (RR). When an environmental change is detected, the HK phosphorylates its cognate response regulator (RR). While cognate interactions were considered orthogonal, experimental evidence shows the prevalence of crosstalk interactions between non-cognate HK–RR pairs. Currently, crosstalk interactions have been demonstrated for TCS proteins in a limited number of organisms. By providing specificity predictions across entire TCS networks for a large variety of organisms, the ELIHKSIR web server assists users in identifying interactions for TCS proteins and their mutants. To generate specificity scores, a global probabilistic model was used to identify interfacial couplings and local fields from sequence information. These couplings and local fields were then used to construct Hamiltonian scores for positions with encoded specificity, resulting in the specificity score. These methods were applied to 6676 organisms available on the ELIHKSIR web server. Due to the ability to mutate proteins and display the resulting network changes, there are nearly endless combinations of TCS networks to analyze using ELIHKSIR. The functionality of ELIHKSIR allows users to perform a variety of TCS network analyses and visualizations to support TCS research efforts.


Introduction
Two-component systems (TCSs) are ubiquitous in bacteria and archaea and are the key signaling transduction machineries for sensing and responding to the environment. TCSs consist of sets of interaction signaling partners, histidine kinases (HKs) that phosphorylate their cognate response regulators (RRs). Interactions, however, are often not one-to-one. Multiple HKs can interact with multiple RRs. Identifying relevant interactions among TCS is an important task that has been addressed experimentally only for a limited number of organisms.
We advanced the study of interaction specificity in TCS by creating a model based on amino acid coevolution at the interface of HKs and RRs. Our Direct Coupling Analysis (DCA) [1] based interaction model not only confirms known cognate partners [2] but also reveals novel interactions in multiple organisms. We uncovered a TCS network in Synechococcus elongatus regulating cyanobacterial circadian clock and confirmed important master regulators [3]. Our model is also able to predict functional mutations to modulate binding specificity between partners, such as PhoQ and PhoP [4] or even design new interactions between non-cognate, interspecies TCS proteins, such as the EnvZ from Escherichia coli and Spo0F from Bacillus subtilis [5]. Another application of this model is the identification of crosstalk across signaling networks and the influence of mutation in the topology of the network. Figure 1 illustrates a section of statistical couplings in a protein sequence and highlights two of the most common applications, the identification of physical contacts in a protein [6,7] or the identification and quantification of interactions between multiple proteins [8,9]. Statistical couplings for sequence position and residue type are inferred from the MSA for the protein family using the DCA method. High couplings indicate significant interactions between sequence positions. These couplings can be used to infer physical contacts within a single protein structure, or to infer the interaction interface and strength between two proteins.
We decided to make this model and tools available to the scientific community in an interactive web server that facilitates the analysis and prediction of TCS networks as well as the exploration of the effects of mutation in these proteins prior to experimental work. We named the service Evolutionary Links Inferred for Histidine Kinase Sensors Interacting with Response regulators (ELIHKSIR) and it can be accessed at https://elihksir.org.
In recent years, online repositories of sequence data have seen a large influx of sequences and are painting a more refined picture of protein families. Using these data, one can construct global probabilistic models that verify the observed statistics and relate them to inter-residual couplings. Cheng et al. [2] have used these probabilistic models to introduce an objective function H speci f ic TCS ( σ) to describe the specificity (fitness) of the interaction between a response regulator and a histidine kinase partner by a scalar score using a sequence σ from a linked multiple sequence alignment (MSA). For completeness, we reproduce the introduction of H speci f ic TCS ( σ) here. Using the set of sequences { σ}, we can create a global probabilistic model P( σ) to find a given amino acid sequence σ in a protein family by the following: with a general Hamiltonian H( σ) and the partition function Z to verify normalization for the probabilities. A sufficient form for H( σ) [10] is given by the large-q Potts Model [11]: with the coupling matrix e ij (a i , a j ) between two sequence sites a i , a j at sequence positions i and j; and the local field h i (a i ) at the site a i at sequence position i. The sites a can have q = 21 different states for amino acid and sequence gap composition. The entries of the coupling matrix e ij (a i , a j ) and the local fields h i (a i ) encode preferences for sequence compositions at positions i and j. The inference of the coupling matrix e ij (a i , a j ) and the local fields h i (a i ) is a non-trivial task. Several methods exist to do so [1,12,13]. We inferred the couplings using mean field DCA (mfDCA), which is fast and accurate at predicting interaction specificity in TCS. From these coupling parameters, we can introduce and create objective functions to measure varying effects. In the Material and Methods, we introduce an objective function H speci f ic TCS ( σ) that is sensitive to sequence mutations and linked to protein interaction specificity. For the calculation of H speci f ic TCS ( σ), we need full access to the couplings e ij (a i , a j ) and local fields h i (a i ). Throughout the process, we consider these as constant and created a database that our server uses internally to calculate new values for the H speci f ic TCS ( σ) score in a mutation event. Figure 2 gives an overview of the entire process of the ELIHKSIR web server. The MSA for our system is created by concatenating the HisKA domain section of the Pfam [14] Histidine Kinase (HK) family (Pfam:PF00512) [15] and the REC domain of the Response Regulator (RR) family (PF00072) [16], which contains information for thousands of organisms. Furthermore, we collect metadata for each organism and sequence pairs through the Uniprot database [17]. From this, we calculate the coupling matrices e ij (a i , a j ) and the local fields h i (a i ). These parameters allow us to calculate a score for the interaction specificity H TCS . The data are visualized in a web interface with interactive heatmaps.
ELIHKSIR is a user-friendly and accessible tool that displays TCS signaling networks. The breadth of the web server allows for analysis of TCS networks in both common and uncommon species and strains. Table 1 summarizes the number of organisms and interaction partners available. Users can easily search for their organism of interest, view TCS specificity networks for the whole organism, and view all possible interactions for an HK or RR of interest. This capability allows researchers with restricted computational resources to analyze signaling networks. Some common use cases of ELIHKSIR's features include identifying cross-talk interactions between non-cognate HKs and RRs, comparing specificity of different HK-RR pairs, and comparing differences in signaling networks between species and/or strains. In addition to browsing and exporting wild-type TCS networks, mutations may be introduced into HKs and/or RRs, for which all interaction specificity scores are recalculated and displayed. This allows users to predict network-wide changes in specificity after introducing a mutation. Further applications include testing mutants for desired change(s) in specificity, guiding engineering of TCS proteins with interaction or insulation requirements, and viewing changes in specificity for new or uncommon clinical and environmental variants. With these capabilities, ELIHKSIR is an effective tool for a variety of researchers who interface with TCS proteins and signaling.  [14] Histidine Kinase (HK) family (Pfam:PF00512) [15] and Response Regulator (RR) family (PF00072) [16]. (b) From this MSA coupling matrices are generated with mfDCA [1]. From these couplings, we are able to calculate a numeric score using the equation shown. This equation formally describes how Hamiltonian scores are generated for each HK-RR pair and is equivalent to Equation (3). The data are displayed in a web interface with interactive heatmaps. The user has an elaborate menu available to explore the data by creating mutations to sequence positions. The default heatmap legend is more sensitive towards the outer extremes of the values, coloring strongly negative (favorable) or positive values (unfavorable).

Validation
Validation of the ELIHKSIR web server was performed through detailed investigation using three model organisms: Escherichia coli, Synechococcus elongatus, and Enterococcus faecalis. True positive specificity predictions were defined by either positive selection and/or negative selection for a cognate pair. Positive selection is defined as an HK having its highest specificity with a single RR. Negative selection is defined as an RR having poor specificity across all HKs but having its relative highest specificity with an HK. False negatives were defined as selection towards a noncognate partner that is greater than that of the cognate partner, in which both positive and negative selection fail to identify the cognate pair. Only cognate pairs in which the HK contains a HisKA domain were evaluated. For E. coli, there were fourteen true positives and three false negatives for seventeen cognate pairs, shown in Figure A1. For S. elongatus, there were five true positives and one false negative for the six cognate pairs, shown in Figure A2. For E. faecalis, there were seven true positives and one false negative for the eight cognate pairs, shown in Figure A3. The resulting sensitivity and accuracy is 0.84.
DCA identifies coevolving residues at the HK-RR interface for HisKA and REC domains that have been used to accurately predict the structure of the HK-RR complex [18]. Out of the top 20 DCA-identified interfacial couplings, 10 are present in the 3DGE structure, as shown in Figure A4b. Information about all 3DGE interfacial contacts is present in the DCA-generated couplings and local fields ( Figure A4a). Couplings are scored by their direct information (DI) value as defined by DCA (Table A2). Thus, higher DI values indicate that these couplings are more important for HK-RR interactions. When utilizing DCA couplings for the calculation of Hamiltonian values, only couplings present on the structurally verified HK-RR interface are used. This ensures auxiliary information obtained through DCA does not impact the Hamiltonian values, and thus, does not impact the resulting specificity score.
The interface is aligned for each TCS pair during the construction of the MSA, which was performed using a hidden Markov model. The sequences displayed in ELIHK-SIR are the aligned residues and gaps. Predictions made based on HK and RR sequences only consider residues which align with their respective protein family. Insertions and deletions are not considered in the alignment of the interface and may result in deviations in the three dimensional structure of the resulting signaling complex. The model assumes no changes in the three dimensional structure of the HK-RR interface during evaluation of different TCS pairs.

Mutations
A key functionality of the ELIHKSIR server is the ability to interactively perform in silico mutations on a HK-RR pair. In the mutation screen, as shown in Figure 3b, the full MSA of a pair is shown with a visual clue to the histidine kinase region and the response regulator region. Any part of the MSA can be transformed and the changes in a HK or RR become applied globally. The heatmap is also updated accordingly. Gaps can be introduced as '-' characters. As the mutation values are run against a tabulated database for the positions and amino acid type, the total length of the MSA has to remain at 176 amino acids. Insertions are not possible in the model unless they occur in gap regions.
Only a subset of the positions in the genetic sequence correspond to an actual interfacial residue of the protein interface between Thermotoga maritima class I HK853 and the response regulator RR468, (PDB ID: 3DGE). Because of this, not every change in the sequence performed by a user will translate into a change in the specificity score. Furthermore, some types of amino acids can play similar roles in a specific residue position. In this case, the model accounts for this and only reflects minor or no changes in the total score.
An interesting application of the mutation user interface is shown in Figure 4, the rewiring of specificity. By transferring portions of a sequence from one cognate pair to another cognate pair, interaction properties can be discovered or lost. In this specific example, a portion of amino acids positions 70 to 80 transferred from ntrC into the same position in the cusR response regulator creates cross-talk with a new interacting partner qseC, while maintaining the initial interaction cognate partner cusS. Alternatively, introducing the same sequence positions from the response regulator qseB into cusR is entirely sufficient to rewire the entire interaction and create an exclusively positive selection towards qseC.  (a) One of the many use cases for the web server is the exploration and in silico change of specificity. In this example, we identify the response regulator cusR as the interaction partner of the histidine kinase cusS indicated by the lowest value in our Hamiltonian. (b) The transfer of a significant sequence portion of the response regulator ntrC does not disrupt the initial interaction and introduces cross-talk through a second interaction partner. (c) Alternatively, the introduction of a sequence portion of the response regulator qseB into cusR disrupts the initial interaction and rewires the interaction towards qseC.

Data Export
The user has three options to export data from ELIHKSIR. First, the user may export a PNG image, as shown in Figure 3a of the entirety of the heatmap in PNG format by clicking on the Export to PNG button on the left panel once a heatmap has been displayed. This will generate a PNG image of the heatmap on a transparent background and download it onto the user's machine. The image will also include the labels and legend. When selecting an n × m-sized subselection in a heatmap, the user is presented with the choice to display the subselection as a new heatmap. Second, the user may export a PNG image of a histogram as shown in Figure 3c of a row of response regulator and histidine kinase pairs that corresponds to a desired histidine kinase by clicking on the Export to PNG button that is located inside the opened histogram. The histogram export will also include the names of each response regulator. Finally, the user may export a CSV representation of the user's arbitrary selections of the cells of the heatmap. After the user makes selections of the cells on the heatmap, the Export to CSV button on the right panel can be clicked to download a file that contains a comma delimited list of the user's selections. All these methods of exporting will take into consideration the mutated Hamiltonian values, if any, of the response regulator and histidine kinase pairs.

Negative Selection
An important concept highlighted by the server is that of negative selection. Not only are interaction partners indicated by strong couplings and a highly negative score for a TCS pair, but equally by high interaction scores with each partner except one. In this case, the interaction with a marginal advantage will be the strongest interaction and may facilitate signal transduction. Hence, we differentiate by either positive selection and/or negative selection for the cognate pair, where positive selection is defined as an HK having the highest specificity for its cognate RR and where negative selection is defined as the cognate HK having the highest specificity out of all HKs for a given RR. Figure 5 highlights this for two different cases in E. coli (ECOLI). Besides the heatmap, a good indicator for the interactions is a look at the histograms (Figure 5b) of interaction strengths, which are, for this purpose, available through the server. In cusR, a single interaction between cusR and the histidine kinase cusS is dominant (Figure 5b top). In rcsB, the majority of interactions are reported as less specific. Even though the interaction between rcsB and the histidine kinase rcsC is not reported as very specific, it will be the dominant interaction for rcsB. In cusR, a single interaction between cusR and the histidine kinase cusS is dominant. This is a case of positive selection between two interacting partners. In rcsB, the majority of interactions are reported as having a low specificity. Even though the interaction between rcsB and the histidine kinase rcsC is not reported as having a high specificity, it will be the dominant interaction for rcsB as there is no stronger interaction partner for signal transduction. This is an example of negative selection. (b) Histogram view for the response regulators cusR and rcsB. From these histograms, it becomes clear that cusR-cusS (top) and rcsB-rcsC (bottom) are the dominant interactions.

Characterization of Cognate Specificity
Through both mutational and computational analyses, the interface between the HisKA domain and the REC response regulator domain has been shown to control specificity of TCS interactions [19]. In Figure 6, this finding is confirmed for 14 out of 17 cognate pairs shown for E. coli. In Figure 7, this finding is confirmed for all eight cognate pairs shown for M. tuberculosis. While predictions of interaction specificity have been previously demonstrated, ELIHKSIR presents specificity scores for all HisKA HK and RR pairs in thousands of organisms, defining specificity landscapes. These specificity landscapes can then be used to determine favorable interactions through identification of pairs exhibiting positive and/or negative selection. When assessing cognate pairs, the prevalence of interactions either partially or solely characterized by negative selection becomes apparent. In the validation process, 54.8% of detected cognate pairs exhibited both positive and negative selection and 19.4% of detected cognate pairs were characterized by negative selection only. Negative selection is important for preventing cross-talk and ensuring orthogonality [20], but results indicate that it may be a main or contributing determinant of many cognate interactions. It is unclear if other attributes or domains contribute to reinforcement of specificity for cognate pairs detected by negative selection only.
By identifying whether cognate interactions are maintained by positive and/or negative selection, users can explore how deletion of TCS proteins may affect gene expression. Experimental deletion of the cognate RR in a pair regulated by negative selection may result in a noncognate RR being phosphorylated by the HK. In deletion experiments, it may be useful to understand how removal of TCS proteins may affect overall expression. Furthermore, some TCS proteins are encoded for on plasmids. Understanding how the presence or lack of plasmid-encoded TCS proteins on organisms' genetic expression may be important for the study of antibiotic resistance and plant cell transformation by bacteria [21].
It is important to note that, in many proteins, HisKA domains are accompanied by an HATPase_c domain, which is responsible for binding ATP and transferring its γ -phosphate to the HisKA domain. Aside from its ATPase activity, the HATPase_c domain alone can act as a histidine kinase [22]. It is unknown whether the HATPase_c domain itself encodes specificity or is partially responsible for specificity in certain cognate TCS pairings. Further analysis of the HATPase_c domain as well as other histidine kinase domains could reveal additional residues and mechanisms controlling TCS orthogonality.

Exploration of Non-Cognate Interactions
The ELIHKSIR web server allows for exploration and visualization of signaling networks. Using the displayed heatmap, users may identify crosstalk interactions in signaling networks. Non-cognate, crosstalk interactions are common in signaling networks and may influence the expression patterns in organisms. H TCS scores can be used to identify non-cognate, crosstalk interactions. Non-cognate interactions may be predicted by high specificity for a non-cognate partner as shown in Figures 7b-d and 6b,d. Any negative score indicates some level of encoded specificity. While scores near zero indicate no encoded specificity, TCS non-cognate partners with scores near zero may still interact due to shared attributes present in all TCS proteins, shown in Figures 6c and 7b. TCS non-cognate pairs in which shared TCS attributes are partially removed have positive specificity scores, indicating low specificity. These methods of identifying possible interactions may be used across all available organisms, allowing for users to investigate crosstalk interactions within specific, and possibly uncommon, species or strains.
TCS pairs in which the RR has a cognate HK of a different family than HisKA have low specificity, but may still interact are shown in Figures 7b,d,f and 6b,d,f. The ability to interact despite very low specificity indicates there may be activity of HATPase_c in phosphorylation of non-cognate RRs whose cognates belong to other HK families since HATPase_c is present in both HisKA and HisKA3 family HKs.  [23]. Noncognate interactions are assessed. (b) BarA phosphorylates cusR, narL, and narP, in which the scores are −5.723, 2.390, and 3.491 respectively. The score for barA-cusR indicates that phosphorylation occurs due to high specificity for its noncognate partner. Phosphorylation of narL and narP are characterized in (f). (c) PhoR phosphorylates cpxR, in which the score is −0.037. A score near zero indicates diminished specificity, while still retaining attributes shared among all TCS pairs. (d) BaeS phosphorylates glrR, rssB, and cheY, in which the scores are −1.264, 3.998, and 4.605. The score for baeS-glrR indicates that phosphorylation occurs due to increased specificity for a noncognate partner. Phosphorylation of rssB is characterized in (g). Phosphorylation of cheY can be described similarly to (f), as its cognate HK utilizes a different family of HK than HisKA. (e) Cognate, crosstalk, and average non-cognate scores are shown for each HK. (f) HKs narQ and narX are not shown as they utilize a HisKA3 family HK, rather than HisKA. Their RRs, narL and narP, have low specificity for all HKs utilizing the HisKA domain. This leads narL and narP to be nonspecific for HisKA family HKs. Despite a lack of specificity, crosstalk is observed. (g) RssB is an orphan RR that can be phosphorylated by multiple HKs.  In Figure 6g, we observe an orphan RR that exhibits low specificity for many HKs and has been phosphorylated by HKs with low predicted specificity. Aside from the possibility of HATPase_c domain contributions, it is possible that low specificity for orphan RRs is favorable as it promotes promiscuity. In the case of rssB in E. coli, phosphorylation is important for function [25,26]. Therefore, promiscuity of rssB could ensure maintenance of function throughout the E. coli life cycle. Using similar reasoning, one can identify potential interactions with orphan HKs and RRs. Information yielded from analysis of orphan TCS proteins may assist in describing their role in organisms' life cycles, environmental stress responses, and expression patterns. Utilizing predicted orphan TCS protein interactions could be useful in the study of antibiotic resistance in bacteria, response to environmental metals and compounds in archaea, or plant response to drought.

Revealing Interaction Specificity for Mutation and Variation
After mutating a protein residue, specificity scores are recalculated and the heatmap is updated. This reveals how mutation(s) change interaction specificity with all possible TCS partners. A feature that becomes important when scientists would like to assess the network effect of mutations as opposed to single pairwise interactions. The ELIHKSIR web server also separates organisms by strain, allowing interaction specificities to be compared between different strains of the same organism. Accessibility of specificity predictions for different mutants and strains may reveal differences in TCS signaling of clinical and environmental variants and may assist in the engineering of sensory kinases and response regulators as it has been shown in previous studies [5].

MSA Construction
Raw HMM profiles for HisKA and REC were obtained through Pfam's hidden Markov models (HMM) [27,28]. Then, the profile was searched using Hmmer's hmmsearch against the TrEMBL database. HKs with a sequence gap of 5 residues or larger were excluded from the MSA. The resulting HisKA domain MSA was 67 residues in length and contained 111,032 sequences utilized in the ELIHKSIR web server. RRs with a sequence gap of 6 residues or larger were excluded from the MSA. The resulting REC domain MSA was 112 residues in length and contained 225,616 sequences utilized in the ELIHKSIR web server. Cognate HK-RR pairs were concatenated and used for the generation of couplings and local fields using mfDCA, where cognate is defined by having adjacent loci [29]. The resulting cognate MSA was 179 residues in length and contained 10,091 sequences. A number of 25 iterations of random concatenation of each HK to a random RR was used to generate a scrambled MSA. The resulting MSA was 179 residues in length and contained 16,363,100 sequences.

mfDCA Evolutionary Couplings and Hamiltonian Scores
Mean field DCA (mfDCA) [1] was used to infer the coevolutionary parameters from conjugated multiple sequence alignments (MSAs) of cognate HK-RR sequences and scrambled HK-RR sequences. The resulting coupling parameters and local field parameters were utilized in the calculation of Hamiltonian scores. In order to quantify changes on the Hamiltonian H(S), Cheng et al. introduced a score H TCS as follows: for a specific pair between a sequence HK A and RR A of sequence lengths L HK A and L RR A with the coupling matrix e ij (A i , A j ) between two sequence sites A i , A j at sequence positions i and j; and the local field h i (A i ) at the site A i at sequence position i. L HK A is 67 for the HisKA domain and L RR A is 112 for the REC domain. The couplings are only taken within a pair distance r ij < c = 12Å of a native contact, expressed by a function Θ(x) = 1 for all x > 0 and Θ(x) = 0 for x ≤ 0. The contact map of the native interfacial pairs is given by the 3D resolved structure of protein interface Thermotoga maritima class I HK853 with its cognate, RR468, (PDB ID: 3DGE). This interface is used as a template for the spatial complex. Equation (3) is used to calculate energies H TCS and H 0 TCS at interface positions, where H TCS is calculated using cognate couplings and local fields and H 0 TCS is calculated using scrambled couplings and local fields. H 0 TCS is generated using the large-q Potts Hamiltonian model on the scrambled MSA which is constructed by completing 25 rounds of concatenation of any of m HKs in the data set with any of n RRs in the dataset: To find H specific TCS , Hamiltonian energies calculated from shared attributes present in all HK-RR pairs must be removed from the specific HK-RR pair being evaluated: where the resulting H specific TCS represents the interaction specificity strength between the HK and RR. Therefore, this energy function could be used to predict the interaction preference between any HK and RR. Additionally, an updated H specific TCS score, after incorporating a mutation in the MSA, serves a reference for the effect of the mutation on binding specificity strength. The updated H specific TCS is generated by performing the same calculations presented in Equations (3) and (5). Ranges for H specific TCS values are varied between organisms and strains where a positive score indicates a loss of shared encoded TCS attributes, a negative score indicates encoded specificity, and a score of zero indicates a presence of all shared TCS attributes but diminished encoded specificity. When qualifying potential interactions, users should compare H specific TCS for different TCS pairs belonging to the same organism. One should consider more negative values to have increased encoded specificity, zero values to be capable of interacting with other TCS proteins without encoded specificity in the HisKA domain, and positive values to exhibit insulation of HisKA and REC domains.

Software
The web server has a custom-built front end running React [30] for enhanced user experience with custom components. The back-end is serving data through REST [31] endpoints. Upon mutation, the scores are looked up from a pre-computed table. The python source code for the calculation of H TCS is accessible via the web server. Details on public endpoints can be found in Appendix A.

Conclusions
The ELIHKSIR web server is a valuable tool for analyzing TCS specificity landscapes in a growing list of 6412 species and strains of bacteria, 65 species and strains of archaea, and 188 species and strains of eukaryotes. This allows users to find potential cross-talk interactions and characterize existing orthogonality for many organisms across different kingdoms. For each organism, heatmaps and histograms of TCS networks are easily accessed, displayed, and exported. Furthermore, the ability to compute, display, and export changes in specificity for mutated HK or RR proteins allows users to explore potential interactions and visualize changes in specificity over an entire signaling network. This ability can assist in the analysis of engineered mutants, clinical and environmental variants, and cross-talk behavior. While ELIHKSIR is useful for interactions between HisKA family HKs and the REC domain of RRs, there exist other HK families in which the ELIHKSIR model does not evaluate. Building and validating models to predict specificity for other families of HK would further assist TCS research. Even though ELIHKSIR only displays specificity scores for HisKA and REC domains, these domains are critical in determining specificity for many TCS interactions, as demonstrated by the 6,272,607 HK-RR pairs evaluated. Due to the ability to mutate each protein and recalculate network-wide specificity scores, there are nearly endless possibilities of HK-RR pairs to evaluate using ELIHKSIR. The accessibility, breadth, and functionality of ELIHKSIR allows a variety of researchers (both computational and experimental) to harness TCS specificity predictions, supporting research efforts through a tool that did not previously exist. Data Availability Statement: Data presented in this work is available at the ELIHKSIR web server at https://elihksir.org/.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A
Data of the server can be accessed in a programmatic way through two REST endpoints as described in Table A1. The all organisms endpoint api/list returns a list of all the organisms currently accessible through ELIHKSIR. The return value will contain the names (ORGANISM_NAMES::STRING), UNIPROT ID (ORGANISM_UNIPROT_ID::STRING), and the numeric identifier/primary key (ORGANISM_ID::INT) for each organism. By using the numeric identifiers obtained from the list endpoint further meta data and information, along with the scores for each interacting pair, can be obtained through the api/pairs endpoint.