Abstract
Understanding SARS-CoV-2 evolution and host immunity is critical to control COVID-19 pandemics. At the core is an arms-race between SARS-CoV-2 antibody and angiotensin-converting enzyme 2 (ACE2) recognition, a function of the viral protein spike. Mutations in spike impacting antibody and/or ACE2 binding are appearing worldwide, with the effect of mutation synergy still incompletely understood. We engineered 25 spike-pseudotyped lentiviruses containing individual and combined mutations, and confirmed that E484K evades antibody neutralization elicited by infection or vaccination, a capacity augmented when complemented by K417N and N501Y mutations. In silico analysis provided an explanation for E484K immune evasion. E484 frequently engages in interactions with antibodies but not with ACE2. Importantly, we identified a novel amino acid of concern, S494, which shares a similar pattern. Using the already circulating mutation S494P, we found that it reduces antibody neutralization of convalescent and post-immunization sera, particularly when combined with E484K and N501Y. Our analysis of synergic mutations provides a landscape for hotspots for immune evasion and for targets for therapies, vaccines and diagnostics.
One-Sentence Summary Amino acids in SARS-CoV-2 spike protein implicated in immune evasion are biased for binding to neutralizing antibodies but dispensable for binding the host receptor angiotensin-converting enzyme
Main Text
Severe acute respiratory syndrome coronavirus–2 (SARS-CoV-2) is the virus responsible for the pandemic of coronavirus disease 2019 (COVID-19) 1 that has caused more than 140 million infections and provoked the death of over 3 million people (as of April 18, 2021). A pertinent question is whether viral evolution will permit escaping immunity developed by natural infection or by vaccination. The answer to this multi-layered and complex question depends on the type, duration and heterogeneity of the selective pressures imposed by the host and by the environment to the virus, but also on the rate and phenotypic impact of the mutations the virus acquires. The central question is whether the virus is able to select mutations that escape host immunity while remaining efficient to replicate in the host 2–4. Identifying hotspots of viral immune evasion is critical for preparedness of future interventions to control COVID-19.
SARS-CoV-2 is an enveloped virus, characterized by displaying spike proteins at the surface 5. Spike is critical for viral entry 6 and is the primary target of vaccines and therapeutic strategies, as this protein is the immunodominant target for antibodies 7–11. Spike is composed of S1 and S2 subdomains. S1 contains the N-terminal (NTD) and receptor-binding (RBD) domains, and the S2 contains the fusion peptide (FP), heptad repeat 1 (HR1) and HR2, the transmembrane (TM) and cytoplasmic domains (CD) 12. S1 leads to the recognition of the angiotensin-converting enzyme 2 (ACE2) receptor and S2 is involved in membrane fusion 6, 13, 14. Spike binds to ACE2 displayed at the host cell surface, followed by proteolytic cleavage to yield S1 and S2 fragments 6. Interestingly, the spike protein oscillates between distinct conformations to recognize and bind different host factors 15, 16. In the prefusion state, the RBD domain alternates between open (‘up’) and closed (‘down’) conformations 5, 17. For binding to the ACE2 receptor, the RBD is transiently exposed in the ‘up’ conformation. However, most potent neutralizing antibodies elicited by natural infection 9, 11, 18 and by vaccination 7, 10 bind spike in the closed (‘down’) conformation. Understanding how each amino acid of spike dynamically interacts with either antibodies or ACE2 receptor may reveal the amino acid residues that are more prone to suffer mutations driving immunological escape without affecting viral entry.
Viral mutations may affect host-pathogen interactions in many ways: affect viral spread, impact virulence, escape natural or vaccine-induced immunity, evade therapies or detection by diagnostic tests, and change host species range 19, 20. Therefore, it is critical to survey circulating variants and assess their impact in the progression of SARS-CoV-2 dynamics in the population, in real time. Being a novel virus circulating in the human population, SARS-CoV-2 evolution displayed a mutational pattern of mostly neutral random genetic drift until December 2020. In fact, the D614G mutation in the spike protein was amongst the few epidemiologically significant variants, resulting in increased transmissibility without affecting the severity of the disease 21, 22. However, from the end of 2020, three divergent SARS-CoV-2 lineages evolved into fast-spreading variants that became known as variants of concern B.1.1.7 [United Kingdom (UK)] 23, B.1.351 [South Africa (SA)] 24 and P.1 (Brazil) 25. More recently, other lineages were added to this list: B.1.427, B.1.429 [California, United States of America (USA)], B.1.617 or B.1.617.1-3 [India] 23, 26, 27, as well as several lineages of interest that carry the mutation E484K, including P.2 28 and B.1.1.7 containing E484K. These variants present alterations in the spike protein that change its properties: increase transmissibility and virulence 29, 30 and/or escape immunity developed by natural infection 31, 32, therapies 33–35, vaccination 36, detection 37; and change the host species range 38. Mutations in the RBD of spike may severely affect viral replication and host immune response since, as explained above, this region is responsible for binding to ACE2 and is immunodominant. Three mutations D614G, N501Y and L452R are associated with an increased ACE2 binding affinity in humans and increased viral transmission 21, 22, 29, 39. Y453F was associated with a mink-to-human adaptation in cluster 5 40. L452R and N439K are associated with a modest reduction in antibody-dependent neutralization by immune sera, whilst variants containing E484K display a reduction that is moderate to substantial 41. Variants, however, contain several other mutations. Interestingly, some mutations are convergent whilst others are unique in lineages 42. For example, in B.1.351 and P.1 lineages, mutations in amino acid residues at positions 18 (L18F) and at position 417 (S417N, or S414T in some P.1 cases) were observed. A two amino acid deletion at position 69-70 in the S protein is observed in cluster 5 and lineages B.1.1.7, B.1.525 and B.1.258 43. The appearance of shared mutations in distinct and rapidly spreading SARS-CoV-2 lineages suggests that these mutations, either alone or in combination, may provide fitness advantage. In principle, mutations in the NTD, furin cleavage site or HR1 may affect the function of spike, as well as any mutation that results in conformational rearrangements of the RBD. Therefore, it is important to understand how synergetic interactions in spike collectively change the RBD.
Serum neutralizing antibodies were shown to develop upon natural SARS-CoV-2 infection 44 or vaccination 45, 46, and last at least several months 47–49. The alarm caused by the emergence of SARS-CoV-2 variants with the potential to escape host immunity acquired through infection or vaccination has been confirmed by a series of reports about lineages B.1.351 and P.1. 31, 32, 36, 50, 51. These raise concerns on whether a reduction in vaccine efficacy could result in re-infections and delay the reduction in mortality caused by circulating SARS-CoV-2. There is, therefore, the urgent need to develop mutation-tolerant vaccines (and biopharmaceuticals) targeting the skype protein. In this sense, it is critical to determine what makes mutations well tolerated for the viral lifecycle whilst efficiently escaping immunity. In this work, we engineered spike-pseudotyped viruses and analyzed individual and combined mutations that convergently appeared in different lineages, over time, and across several geographic locations, to determine their synergetic effects on neutralizing-antibody responses. We then used available structures of the complexes spike-antibodies and spike-ACE2 to determine the distance between each amino acid residue in the complex and the frequency of interactions of each amino acid residue of the RBD with either ACE2 or antibodies. We found a moderate reduction in neutralizing potency of sera against SARS-CoV-2 spike-pseudotyped lentivirus containing single mutations at position 484 (E484K) and 494 (S494P). Interestingly, the reduction became substantial with the addition of synergetic mutations K417N/N501Y or E484K/N501Y to E484K or S494P, respectively. In addition, we show that the amino acid residues at positions 484 and 494 frequently engage in binding to antibodies but not in binding to the receptor ACE2. Our work suggests that the amino acid residues at the RBD that are more dispensable for binding to ACE2 can more promptly evolve immune escape mutants if the amino acid residue substitution severely alters the binding specificity of the antibody.
Results
Construction of spike-pseudotyped viruses containing single or combined mutations in spike
Geographical distribution of mutations and associated prevalence is important, not only to track viral evolution and dynamics of lineages circulating worldwide, but also to identify recurrent mutations and ultimately tailor preventive measures to contain the virus. We used the pipeline https://github.com/wtsi-hpag/covidPileup [github.com] to trace single nucleotide polymorphisms (SNPs) in a given country or region. We explored a comprehensive data set containing 416,893 sequences of SARS-CoV-2 downloaded from GISAID to unravel the dispersal history and dynamics of spike mutations observed in SARS-CoV-2 viral lineages in the geographic locations in different continents divided as Australia (AUS), UK, European Union (EU), SA and USA, and integrated the data with viral circulation in these areas, measured by the number of new cases per week (fig. S1). Most regions were selected based on the appearance of specific lineages and variants of concern. Australia was included because of the restrictive measures of entering the country, closeness to Asia, geographical dispersion and for constituting an almost independent evolutionary landscape. These geographical locations were selected based on availability of sequenced samples during the period analyzed (from the 29th of December 2019 until week 5 of 2021). Based on the reference genome NC_45512 (Wuhan-Hu-1, 29903 bases), the pipeline detected 3,766,497 SNPs and 2209 indels covering 25443 genome locations. For this study, we selected only mutations in spike because this protein is responsible for viral recognition of ACE2 receptor 6, 12, 14, 52 and for inducing neutralizing antibodies in the host 10. Of note, antibodies with neutralizing capacity were shown to bind to the NTD, but mostly to the RBD of spike 2, 11, 50, 53, 54. Many reports showed how single mutations change host neutralizing capacity and demonstrated that variants of concern B.1.351, P.1., or others including the mutation E484K, are able to escape immunity 31, 35, 41, 55. Whilst mutations on the RBD may sterically block binding of spike to ACE2, other mutations in spike may affect the conformation of the protein thereby impacting antibody recognition. Given this, we broaden our selection of mutations to include variants of concern and of interest (up to all mutations in lineages B.1.1.7, B.1.351, P.1., B.1.427/ B.1.429, table S1), high prevalence (fig. S2), or convergent appearance in different lineages over time. In terms of prevalence, the S477N mutation was accompanied by a peak of incidence in Australia from week 23-36 of 2020 and reached a prevalence of 20% worldwide (figs. S1 and S2). The L18F alone or combined with A222V was also accompanied by a peak in incidence in the UK and EU in weeks 38-45 of 2020, and in South Africa from week 42-to present and reached 20% incidence worldwide. Other single mutations include L452R and E484K, with a global prevalence of up to 6% worldwide; D839Y with peaks up to 2.5% during weeks 7-19 of 2020; Q675H, from week 23-present; deletion of amino acid residues at positions 69/70 combined with Y543H from week 39-51 of 2020; and D936Y from week 9-37 of 2020 (figs. S1 and S2). The phenotypic evaluation of SARS-CoV-2 mutations comprises several layers and includes the interaction of the virus with the host, disease severity and epidemiology. In this study, we evaluated evasion of host neutralizing antibodies.
All mutations engineered in this study, and the domain in spike in which they occur, are highlighted in Fig. 1B and C, detailed in table S1 and shown at the structural level in figs. S3-S5. In addition, to understand how interactions in spike collectively change the RBD, we evaluated individual or a combination of mutations on spike associated with variants of concern (up to all defining mutations, shown in table S1 and Fig. 2A-D) and have overall produced 25 different spike-pseudotyped viruses (listed in table S1). The mutations in the RBD region used in this study are highlighted also in the open (‘up’) conformation (Fig. 1C), showing the different conformational changes that each amino acid residue undergoes in this region.
The different spike versions served to engineer lentiviral spike-pseudotyped particles through a three-plasmid approach: a plasmid harboring the lentiviral genome, lacking Gag-Pol and envelope proteins, and encoding a GFP reporter; a Gag-Pol expression plasmid; and a plasmid expressing SARS-CoV-2 spike protein (or VSV-G protein, as control). Cells infected by these lentiviruses express GFP and can be easily quantified with high-throughput analytical methods.
Neutralization assays on spike-pseudotyped viruses to evaluate synergetic effects of mutations
For the neutralization assay, we developed a 293T cell line expressing the human ACE2 receptor (fig. S6). The specificity of the assay was assessed using an anti-spike polyclonal antibody and an RBD peptide that competes with the spike protein for viral entry in SARS-CoV-2 spike-pseudotyped but not in VSV G-pseudotyped lentivirus that was used in parallel (fig. S7).
We characterized the serum antibodies from 12-16 health care workers, infected with SARS-CoV-2 during spring/summer 2020, for their ability to neutralize our spike-pseudotyped mutant lentiviruses. These sera were tested by ELISA for their anti-spike content and classified as low (up to endpoint titer 1:150), medium (endpoint titer 1:450) and high (endpoint titer from 1:1350) (table S2). We also used 4 negative sera as control (one pre-pandemic and 3 contemporary). Representative neutralization curves of the WT (614G) and original (614D) strains, as well as variants of concern B.1.1.7, B.1.351 and P.1 (containing all defining mutations, listed in table S1) are shown in Fig. 2E and the complete set of neutralization curves in fig. S8. For each neutralization curve, we calculated the half-maximal neutralization titer (NT50), defined as the reciprocal of the dilution at which infection was decreased by 50% (Fig. 2F and fig. S9). Consistent with the literature, the ELISA titer was associated with the neutralizing titer of sera (table S2) 48, 56.
In this paper, we used the Wilcoxon statistical test when comparing NT50s to understand if variants are significantly different from the wild-type. However, to define whether variants are capable of evading neutralization, we use the 4-fold criteria, similar to what has been done with influenza virus, where differences in hemagglutination assays greater than 4-fold highlight the need for vaccine update 36, 57, 58. Data show that, although the spike including all defining mutations of B.1.1.7 exhibited a modest reduction in neutralization titers relative to WT spike, it was still efficiently neutralized by convalescent sera, as observed before 35, 59. Conversely, the complete set of defining mutations of B.1.351 and P.1 consistently showed a reduction in neutralization capacity higher than 4-fold and are considered significantly different from the WT virus (Fig. 2G). We then analysed a series of single, double and triple mutations (Fig. 2H and figs. S6 and S7). We found that the mutations associated with B. 1.1.7 variant N501Y, Δ69-70/N501, Δ69-70/N501+P681H behave as WT spike. The mutation E484K alone reduced sera neutralization capacity by 3.6-fold [ratio NT50 mutant/NT50 WT of .028±0.37 (geometric mean ± SD)], and inclusion of additional mutations (K417N and N501Y) further decreased the neutralization (0.08±0.05), emphasizing the role of synergic mutations on immune evasion and underscoring the need for evaluating single and combined mutations on variants 51. Other mutations at the NTD, RBD, FP, HR1, TM and CD did not significantly alter sera neutralization efficiency. However, we observed a reduction in neutralization in some sera against N439K mutant (0.53±0.45) as reported in 41, which was not observed when Δ69-70 was added (Fig. 2H), and also against L452R mutant (0.48±0.23) that is associated with the most recent variants of concern B.1.427/9, as reported in 60. Importantly, we tested BNT162b2-elicited plasmas (collected after the first and second dosage of the Pfizer–BioNTech vaccine) for their capacity to neutralize the spike-pseudotyped particles expressing WT spike protein or mutants bearing all defining mutations of the variants of concern B.1.1.7, B.1.351 and P.1. Amongst 10 individuals, plasma collected 12 days after the administration of the first vaccine dose displayed anti-spike IgG titers from 1:450 to 1:12150 (table S3), and only one of them exhibited neutralizing capacity (table S3 and fig. S10). BNT162b2-elicited plasma collected 12 days after the administration of the 2nd dose had very high levels of anti-spike IgG (titers 1:36450 −1:109350), higher than any sera collected upon natural infection, and all exhibited neutralizing activity against the variants of concern B. 1.351 and P.1, although with significantly lower efficiency than against the wild type or the B.1.1.7 variant.
Collectively, these data show that the spike-pseudotyped lentiviral particles used in this study, in which GFP is the output and is measured by high throughput and cheap methodologies, is suitable to quantitatively assess how single and synergetic mutations in full-length spike affect neutralizing-antibody responses.
Frequency of contacts spike-antibodies and spike-ACE2
To determine which spike protein residues are more frequently targeted by neutralizing antibodies, we analyzed 57 structures retrieved from the protein data bank (PBD) containing the spike protein (or only the RBD region) bound to antibodies (table S4). Our results revealed that the receptor binding motif (RBM), in RBD, is the region with the highest frequency of contacts (Fig. 4A and B), consistent with what has been documented 11, 61–64. This region is important for the SARS-CoV-2 infection since it mediates contacts with ACE2 14, 65, 66 and, for this reason, the binding of antibodies to this region would hinder ACE2 binding and, consequently, the entry of the virus in the cell.
In addition to contacts with the RBD, we also identified other regions of the spike protein with antibody binding sites. The antibodies FC05 and DH1050.1 bind to segments in the N-terminal region (between residues 143-152 and 246-257), region identified as an antigenic site 54, and 2G12 binds to residues 936-941 in the stalk (Fig. 4B and C).
To choose the cut-off above which to select the most important residues for antibody binding, we built a histogram (fig. S11). Three distinct groups appear: the first includes residues that have an interaction probability below 20%, the second includes residues whose interaction is between 20% and 45% and the last corresponds to frequent binders (interaction probability over 45%). Within this last group, we find Y449, L455, F456, E484, F486, N487, Y489, Q493, S494, and Y505 residues (Fig. 4A and B). More than half of these residues have hydrophobic side chains, showing that, within the batch of antibodies we studied, the most common binding mode is through hydrophobic contacts. These residues are important for antibody binding, which means that mutations within this group may enable the virus to escape antibodies from patients previously exposed to the WT or another variant, or people vaccinated using WT sequences of the S protein. In fact, prior studies have shown that mutations on most of these residues influence antibody escape 2, 67. For residues L455, F456 and F486, mutations for less hydrophobic residues have been shown to reduce binding by polyclonal antibodies 67 and mutants at site 487 escape human monoclonal antibodies COV2-2165 and COV2-2832(2, 63). Interestingly, our results also show that there is a high prevalence of antibodies binding to E484 (∼50%, Fig. 4B).
To be able to escape antibodies while maintaining the ability to efficiently infect cells, the mutations introduced must destabilize the interaction with antibodies while keeping a high binding affinity for ACE2. Thus, we reason that mutations occurring in residues that are relevant for antibody binding but not very relevant for the interaction with ACE2 can be advantageous for the virus. To gain further insights into this subject, we performed molecular dynamics (MD) simulations of the RBD bound to ACE2. These allowed us to determine which RBD residues are relevant for binding to ACE2, and the persistence of these interactions (Fig. 4D). Combining the information on antibody and ACE2 binding allows us to predict which are the most relevant amino acid residues (high affinity for antibodies and low affinity for ACE2). Using these criteria, two residues stand out: E484 and S494 (Fig. 4E, IV quadrant circled in red). These results are consistent with the evidence that E484 is an important mutation site and bring to light a new relevant site: S494. Mutations in S494 have been found in circulating strains, and the S494P mutation was shown to reduce the binding by polyclonal plasma antibodies 67 whilst having no 4 or modest effect 68 in RBD-ACE2 binding. To elucidate the role of this mutation in antibody neutralization, we used our neutralization assay and found that mutation S494P alone leads to a reduction in neutralization by covalence sera that is significant (Fig. 4F) despite not reaching our 4-fold criteria (ratio NT50 of mutant in relation to WT of 0.39±0.28, Fig. 4G). This value was further reduced when this mutation was combined with N501Y (0.28±0.21, Fig. 4G), or when combined with E484K (0.25±0.32, Fig. 4F-G). Interestingly, the E484K/S494P/N501Y triple mutant reduced neutralization over the 4-fold criteria applied (0.17±0.34, Fig. 4F-G). We also tested the ability of these mutants to evade neutralization by post-immunization sera, but the effect was less pronounced (Fig. 4H-I). In fact, S494P alone did not significantly reduce neutralization, becoming significant when N501Y (0.70±0.4), or E484K (0.5±0.3) mutations were introduced. The triple mutant displayed the more significant reduction in neutralization, despite not reaching the 4-fold difference (0.34±0.18, Fig. 4H-I). Importantly, S494P mutation appeared independently on several occasions and is under positive selection 68 and is now a variant in monitoring, especially when combined with E484K or N501Y 23, 69. Our results indicate that the S494P mutation can facilitate the escape of the virus from antibodies, maintaining the ability to bind to the receptor and enter host cells, and should be surveilled worldwide.
Some residues that are important for antibody binding are also involved in frequent interactions with ACE2. Seven of these residues remained bound to ACE2 throughout the whole simulation in all the replicates: L455, F456, F486, N487, Y489, Q493 and Y505 (Fig. 4D, top residues in quadrant II). This group includes residue F456, which was found to be a hotspot for escape mutations 67. Although this may seem contradictory, the same study shows that mutations in this residue were very rarely found in nature and, because of lack of prevalence, were not analyzed 67. We hypothesize that the low mutation rate for this site may be due to the high relevance of this residue for RBD-ACE2 binding. In fact, deep mutational analysis shows that when mutating F546 to any amino acid residue besides its SARS-CoV-1 counterpart (a leucine), the ACE2 binding affinity is reduced 4. Even though a viral strain with a single mutation at the site 456 could escape antibodies, it would be unfit to subsist in nature, due to a reduced ability to infect host cells.
Together, the analysis presented here is supported by experimental evidence on mutation-driven binding to ACE2 and escape to antibodies 2–4, as well was with the emergence of natural variants 31, 35, 41, 55, 70–72, and may be used to predict mutation hotspots.
Discussion
It is critical to understand how SARS-CoV-2 will evolve and if it will escape host immunity, which could pose challenges to vaccination and render therapies ineffective.
We developed spike-pseudotyped lentiviral particles for high-throughput quantitation that express GFP upon entering cells, contributing to the toolkit of neutralizing assay methodologies 73–75. Our method is suitable to assess the neutralization activity of sera/plasma from individuals infected naturally or vaccinated, and to screen for antiviral drugs that block viral entry, such as therapeutic antibodies, in biosafety level 2 settings, which greatly facilitates the procedure and broadens its usage. It can be easily adapted to include single and multiple mutations, as observed in the present work for sera neutralizing activity (Figs. 2). Our results with BNT162b2-elicited plasma (Fig. 3) agree with previous publications showing resistance to neutralization by B.1.351 and P.1 lineages 31, 32, 36, 50, 76 and validate our neutralization assay. The fact that only mutations containing the E484K substitution promote immune escape, although this effect increases with synergic mutations that improve viral binding to ACE2 (K417N and N501Y), may be a consequence of lack in immunological selective pressure, a recognized driver of evolution 58, 77–79, as a large proportion of the population remains susceptible to SARS-CoV-2 infection. However, the emergence of this escape mutant suggests that the continued circulation of the virus may, in the future, impose further immunological constraints and result in viral evolution, as seen for influenza A virus 58, 80. How changes in SARS-CoV-2 will impact the circulation of the virus is not known, and the future will also elucidate whether vaccination will shape evolution of SARS-CoV-2 and permit reinfections. At the moment, reinfections are considered rare events 81, 82, but reports that SARS-CoV-2 escape mutants were shown to drive resurgence of cases upon natural infection 26, 83 are indicative that viral dynamics in the population may change 29, 84. In agreement, in this study and in reports by others, a reduction in neutralization activity by vaccination elicited plasma against variants of concern B.1.351 and P.1 was observed 31, 32, 36, 50, 76. Therefore, the development of methods able to predict hotspots of immune evasion are in demand. The surveillance of circulating strains across time, the evaluation of the type and duration of host immune responses upon natural infection and vaccination, and the identification of antibodies that efficiently control SARS-CoV-2 are essential measures to understand and control SARS-CoV-2 infection and host response 85–87. We argue that the structural information on complexes between antibodies, or ACE2, and spike variants may be integrated with mutational maps and their phenotypic characterization 2–4 to predict hotspots of immune evasion.
In this work, we analyzed by neutralization assays how a full-length spike carrying single or multiple mutations dispersed throughout the protein affected the conformation of the RBD. With this assay, we probe how antibodies from sera of infected patients or plasma collected from vaccinated people after administration of the 1st and 2nd dose of the vaccine block viral entry. We used structural information on the protein complexes spike-antibodies and spike-ACE2 to define the non-overlapping frequency of interactions using a cut-off value of 4.5 Å distance to define a contact. Given the idea that only a few amino acid residues are involved in protein-protein interfaces 88, including in antibody-antigen recognition, we aimed at identifying which amino acid residues in the RBD may evolve antibody escape mutants without affecting viral entry. With this approach, we identified two amino acid residues at the RBD-positions 484 and 494 (Fig. 4E) - that frequently engage in interactions with antibodies but not with ACE2. We observed that the E484 is relevant for antibody binding (Fig. 2G), in agreement with our and previous experimental results, and with the occurrence of this mutation in two variants that are becoming highly prevalent and were shown drive resurgence of infection in sites with high levels of seroconversion 2, 36, 57, 71, 72, 83. Additionally, we found that mutations in residue 494 may be problematic, either alone or combined with synergetic mutations, as it reduces neutralization competency of convalescent sera, and thus facilitates antibody escape without substantially altering the affinity for the ACE2 binding (Fig. 4E-G and fig. S12). The mutation S494P has been found in nature with a prevalence of 0.81% of all sequenced genomes (week 11 of 2021, fig. S2) and has already been observed in combination with mutation E484K (reported on the 22nd October 2020). Of note, current prevalence of B.1.351 and of P.1 is 1.9% and 0.47%, respectively, being in the same range as S494P. We posit that the prevalence of mutations in amino acids 484, 494 and 501 will increase with viral circulation and/or spike seroconversion and should be surveilled worldwide. We observed a similar pattern relative to the substitution L454R/Q in India, in which the E484K mutation was acquired posteriorly. At the moment, however, with the majority of the population still susceptible to viral infection, mutations that increase viral transmission, such as N501Y 84, have a selective advantage.
Our approach has some caveats. Regarding the neutralization assays, and despite spike being the immunodominant protein targeted by antibodies 7–11, other viral proteins may contribute, even if in a small proportion to neutralization activity in vivo, which we are unable to detect using spike-pseudotyped particles. In addition, the sera/plasma we used was obtained at a fixed time interval. Future experiments should repeat the analysis using sera/plasma of people infected with known genotypes, or from different geographical areas and obtained at different times along the epidemics dynamic. Finally, we inspected a handful of mutations and not a comprehensive mutational map across spike 2, 4. However, we traced several mutations across spike in single and multiple combinations, and thus, we conclude that our analysis is relevant to inform on mutation-driven escape viruses to antibodies. The analysis of frequency of interactions between spike-antibodies or ACE2 has some issues associated as well. The pool of complexes spike protein/RBD and antibodies analyzed here is not necessarily representative of the universe of antibodies found in COVID-19 patients, since it is limited by the availability of structural information and is biased to monoclonal antibodies (table S4). All the antibodies analyzed are neutralizing antibodies, which is not the case in nature 54. Additionally, many structural studies have focused on the interaction with the RBD, while other regions have not been so exhaustively analyzed, which creates a bias in terms of the number of RBD binding antibodies with known structure 14, 65, 66. Limitations of the methods used, such as crystallographic contacts, can also influence the results. Nevertheless, its combination with data from the molecular simulation and from experiments helps to predict and explain the emergence of new mutations.
In conclusion, our analysis of the spike protein-antibody contacts revealed that (within the available data set) the receptor binding motif (RBM), in the receptor binding domain (RBD), is an important region for antibody binding, which makes sense, since antibody binding to this region can prevent the interaction with ACE2 and ultimately impair viral infection. Our results are in very good agreement with experimental data on mutation-driven escape to polyclonal antibodies and with the emergence of natural variants, which were reported to re-circulate in previously exposed people 83. Thus, this analysis represents a simple strategy to predict mutation hotspots that should be kept under surveillance. The predictions will become even better once the pool and diversity of available structures increases.
Methods
Usage of COVID-19 pipeline for geographical SARS-CoV-2 variation analysis
We have used a pipeline for COVID-19 variation analysis using whole genome sequences. CovidPileup can be downloaded from https://github.com/wtsi-hpag/covidPileup. The pipeline includes SNP and indel calling and tracing specific SNPs in a given country or region. By incorporating metadata, it is also possible to assign tags such as collecting time, age and sex to each identified SNP or indel. To allow quick search and data processing, a multi-layered indexed data structure has been designed. We assume the size of the reference is G and the number of sequenced samples is N. The first index implemented is the genome reference coordinates which points to SNP or indel locations while the second index stores the number of SNPs and the third index records the number of countries SNPs are associated with. When a search is pursued on a location, one can quickly get the information on the numbers of samples containing variation and countries on this location. Specifically, if all the SNPs are from one country or one region (such as EUR or UK), the variant will be a specific or unique SNP/indel. Statistics of COVID-19 variations was performed on CovidPileup files, based on weeks (week 1, Dec 29th 2019 - Jan 5th 2020). Some samples with reported problems were excluded, according to the publicly available document from nextstrain https://github.com/nextstrain/ncov/blob/master/defaults/exclude.txt. Data for new positive cases per week is based on WHO Coronavirus Dashboard (https://covid19.who.int/WHO-COVID-19-global-data.csv). The following regions of world were defined: EUR (EU region, samples with tag ‘France’, ‘Germany’, ‘Italy’, ‘Belgium’, ‘Netherlands’, ‘Austria’, ‘Denmark’, ‘Sweden’, ‘Norway’, ‘Poland’, ‘Portugal’, ‘Switzerland’, ‘CzechRepublic’, ‘Iceland’, ‘Luxembourg’, ‘Greece’, ‘Hungary’, ‘Slovenia’, ‘Slovakia’, ‘Malta’, ‘Croatia’, ‘Slovenia’, ‘Latvia’, ‘Lithuania’, ‘Estonia’, ‘Bulgaria’, ‘Cyprus’ and ‘Ireland’), UK (Samples with tag ‘UnitedKingdom’, ‘England’,’NorthernIreland’, ‘Scotland’, and ‘Wales’), Australia, USA and South Africa. Prevalence of amino acid residue genotypes of spike proteins were counted and shown in fig. S1. Minor genotypes with < 1% prevalence or supported by < 3 samples were grouped as “other genotype”.
Cells and plasmids
Human hepatocellular carcinoma HuH-7 cells (a kind gift from Dr Colin Adrain, Instituto Gulbenkian de Ciência, Portugal), Human Embryonic Kidney 293T (provided by Prof Paul Digard, Roslin Institute, UK) and 293ET cells (from Dr Colin Adrain) were maintained in Dulbecco’s Modified Eagle Medium (DMEM, Gibco, 21969035) supplemented with 10 % fetal bovine serum (FBS, Gibco, 10500064), 1% penicillin/streptomycin solution (Biowest, L0022) and 2mM L-glutamine (ThermoFisher, 25030024), at 37°C and 5% CO2 atmosphere. Plasmids used in this study were obtained as follows: pLEX.MSC was purchased from Thermo Scientific; psPAX2 and pVSV.G were a kind gift from Dr Luís Moita (Instituto Gulbenkian de Ciência, Portugal); pEGFP-N1 was provided by Dr Colin Crump (University of Cambridge, UK); and pCAGGS containing the SARS-Related Coronavirus 2, Wuhan-Hu-1 Spike Glycoprotein Gene (pCAGGS-SARS-CoV-2-S, NR-5231) was obtained through BEI Resources.
Cloning and S protein mutations
To facilitate incorporation of S protein into lentiviral pseudovirions, the C-terminal 19 amino acids encompassing the ER-retention domain of the cytoplasmic tail of S protein were deleted. This construct, named SARS-CoV-2 Strunc, was generated by PCR amplification of the C-terminal region of S without amino acids 1255-1273 (a premature STOP codon was added in the reverse primer), using pCAGGS-SARS-CoV-2-S as template (primers in table S7). The amplified sequence was cloned back into pCAGGS-SARS-CoV-2-S, between AgeI and SacI restriction sites, generating the expression vector pCAGGS-SARS-CoV-2-Strunc. Mutations were introduced in the spike protein by site-directed mutagenesis, with the Quickchange Multi Site-Directed Mutagenesis kit (Agilent) following the manufacturer’s instructions (primers in table S7). Lentiviral reporter plasmid pLEX-GFP was produced by PCR amplifying GFP from pEGFP-N1 (primers in Table 1) and cloning the insert into BamHI–XhoI restriction sites in the multi-cloning site of pLEX.MCS vector. For lentiviral plasmid pLEX-ACE2 production, human ACE2 coding sequence was amplified from Huh7 cDNA and cloned into pLEX.MCS, using XhoI and MluI restriction sites (primers in table S7).
Production of 293T cells stably expressing human ACE2 receptor
To produce VSV-G pseudotyped lentiviruses encoding the human ACE2, 293ET cells were transfected with pVSV-G, psPAX2 and pLEX-ACE2 using jetPRIME (Polyplus), according to manufacturer’s instructions. Lentiviral particles in the supernatant were collected after 3 days and were used to transduce 293T cells. Three days after transduction, puromycin (Merck, 540411) was added to the medium, to a final concentration of 2.5 μg/ml, to select for infected cells. Puromycin selection was maintained until all cells in the control plate died and then reduced to half. The 293T-Ace2 cell line was passaged six times before use and kept in culture medium supplemented with 1.25 μg/ml puromycin. ACE2 expression was evaluated by flow cytometry (fig. S6).
Production and titration of spike pseudotyped lentiviral particles
To generate spike pseudotyped lentiviral particles, 3×106 293ET cells were co-transfected with 8.89ug pLex-GFP reporter, 6.67ug psPAX2, and 4.44ug pCAGGS-SARS-CoV-2-S WT or mutants (or pVSV.G, as a control), using jetPRIME according to manufacturer’s instructions. The virus-containing supernatant was collected after 3 days, concentrated 10 to 20-fold using Lenti-XTM Concentrator (Takara, 631231), aliquoted and stored at −80°C. Pseudovirus stocks were titrated by serial dilution and transduction of 293T-Ace2 cells. At 24h post transduction, the percentage of GFP positive cells was determined by flow cytometry, and the number of transduction units per mL was calculated.
Flow cytometry
Flow cytometry was performed as in 89. In brief, HEK 293T and 293T-ACE2 cells were prepared for flow cytometry analysis by detaching from the wells with trypsin, followed by fixation with 4% paraformaldehyde). For analysis of ACE2 expression, cells were stained with a primary antibody against ACE2 (4µg/ml, R&D Systems, catalog no. AF933) followed by a secondary antibody labelled with Alexa Fluor 568 (1:1000; Life Technologies). Analysis of cell populations was performed in a Becton Dickinson (BD) LSR Fortessa X-20 equipped with FACS Diva and FlowJo (BD, Franklin Lakes, NJ) software’s.
Human convalescent sera and post-vaccination plasma/serum
Venous blood was collected by standard phlebotomy from health care providers who contracted COVID-19 during spring/summer 2020, as tested by RT-PCR on nasopharyngeal swabs. All participants provided informed consent to take part in the study. This study “Fatores de susceptibilidade genética e proteçao imunológica à COVID-19” was approved on the 25th of May 2020 by the Ethics committee of the Centro Hospitalar Lisboa Ocidental, in compliance with the Declaration of Helsinki, and follows international and national guidelines for health data protection. Serum was prepared using standard methodology and stored at −20°C.
Peripheral blood from health care workers was collected by venipuncture into EDTA tubes at day 12 and day 33 post-immunization with Pfizer BTN162b2 mRNA vaccine and immediately processed. Alternatively, post-vaccination serum was collected at day 22 and day 51 post-immunization with Pfizer BTN162b2 mRNA vaccine. All participants provided informed consent and all procedures were approved by the ethics committee of Centro Hospitalar Lisboa Central, in accordance with the provisions of the Declaration of Helsinki and the Good Clinical Practice guidelines of the International Conference on Harmonization. Peripheral blood was diluted in PBS 1x (VWR), layered on top of biocoll (Biowest) and centrifuged at 1200g for 30 min without break. Plasma was collected to cryotubes and stored at −80°C ultra-low freezer until subsequent analysis.
ELISA assay
Direct ELISA was used to quantify IgG anti-full-length Spike in convalescent sera. The antigen was produced as described in 90. The assay was adapted from 91 and semi-automized to measure IgG in a 384-well format, according to a protocol to be detailed elsewhere. For titer estimation, sera were serially diluted 3-fold starting in a 1:50 dilution and cut off was defined by pre-pandemic sera (mean + 2 standard deviation).
ELISA assay on post-vaccination plasma and serum was performed based on the protocol 92 and modified as described in Gonçalves et al. 93. Briefly, 96 well plates (Nunc) were coated with 50 µl of trimeric spike protein at 0.5 µg/mL and incubated overnight at 4°C. On the following day, plates were washed three times with 0.1% PBS/Tween20 (PBST) using an automatic plate washer (ThermoScientific). Plates were blocked with 3% bovine serum albumin (BSA) diluted in 0.05% PBS/T and incubated 1h at room temperature. Samples were diluted using 3-fold dilution series starting at 1:50 and ending at 1:10,9350 in 1% BSA-PBST/T and incubated 1h at room temperature. Plates were washed three times as previously and goat anti-human IgG-HRP secondary antibodies (Abcam, ab97215) were added at 1:25,000 and incubated 30 min at room temperature. Plates were washed three times and incubated ∼7min with 50 µl of TMB substrate (BioLegend). The reaction was stopped with 25µl of 1M phosphoric acid (Sigma) and read at 450nm on a plate reader (BioTek). Each plate contained 6 calibrator samples from two high-, two medium-, and two low-antibody producer from adult individuals collected at Hospital Fernando Fonseca that were confirmed positive for SARS-CoV-2 by RT-PCR from nasopharyngeal and/or oropharyngeal swabs in a laboratory certified by the Portuguese National Health Authorities 93. As negative control, we used pre-pandemic plasma samples obtained from healthy donors collected prior July 2019. The endpoint titer was defined as the last dilution before the absorbance dropped below OD450 of 0.15.
Neutralization assay
Heat-inactivated serum and plasma samples were four-fold serially diluted over 7 dilutions, beginning with a 1:30 initial dilution. Dilutions were then incubated with spike pseudotyped lentiviral particles for 1h at 37°C. The mix was added to a pre-seeded 96 well plate of 293T-ACE2 cells, with a final MOI of 0.2. At 48h post-transduction, the fluorescent signal was measured using the GloMax Explorer System (Promega). The relative fluorescence units were normalized to those derived from the virus control wells (cells infected in the absence of plasma or serum), after subtraction of the background in the control groups with cells only. The half-maximal neutralization titer (NT50), defined as the reciprocal of the dilution at which infection was decreased by 50%, was determined using four-parameter nonlinear regression (least squares regression without weighting; constraints: bottom=0) (GraphPad Prism 9). To assess the specificity of our assay, an anti-Spike antibody neutralization assay and an RBD competition assay were performed in SARS-CoV-2 S pseudotyped and vesicular stomatitis virus (VSV) G pseudotyped lentiviral particles, in parallel (fig. S9). The anti-Spike antibody neutralization assay was performed as above, starting with a concentration of 50µg/ml (Thermo Fisher Scientific, PA5-81795) and three-fold serially diluting over 7 dilutions. For the RBD competition assay, 293T-ACE2 cells were pre-incubated with three-fold serial dilutions of SARS-CoV-2 spike’s receptor binding domain, starting with a concentration of 400 µg/ml. After 1h incubation at 37°C, supernatant was aspirated and pseudoviruses were added at an MOI of 0.2. The half maximal inhibitory concentration (IC50) was determined using four-parameter nonlinear regression (least squares regression without weighting; constraints: bottom=0) (GraphPad Prism 9).
Spike protein-antibody contact analysis
In order to determine which residues of the SARS-CoV-2 spike protein contribute the most for the binding of antibodies, we studied 57 PDB structures containing the S protein (or only the RBD region) bound to antibodies (Table S1). These complexes were chosen based on their availability in the PDB repositorium 94 and are all neutralizing antibodies. We first used the MDAnalysis library 95, 96 to identify the residues of the antibody and the spike protein located in the interface between the two proteins. This was done for all PDB structures using in-house Python scripts. To determine the relevance of each spike protein/RBD residue in the binding, a distance cut-off value of 4.5 Å was applied as a criterion (i.e., a contact is observed when two residues have a minimum distance lower than 4.5 Å). Finally, the numerical python library (NumPy 97) was used to calculate the frequency of contact for each spike protein residue with an antibody residue, from the sample of PDB structures analyzed.
MD simulation of the RBD-ACE2 complex
Molecular dynamics (MD) simulations of the RBD bound to the ACE2 protein were performed with the GROMACS 2020.3 package 98, using the Amber14sb 99 force field, starting from the 6m0j structure 12, in a truncated dodecahedron box filled with water molecules (minimum of 1.2 nm between protein and box walls). The TIP3P water model 100 was used and the total charge of the system (−23, including the constitutive Zn- and Cl- ions bound to ACE2) was neutralized with 23 Na+ ions. Additional Na+ and Cl+ ions were added to the solution to reach an ionic strength of 0.1M. The system was energy-minimized using the steepest descent method for a maximum of 50000 steps using position restraints on the heteroatom positions by restraining them to the crystallographic coordinates using a force constant of 1000 kJ/mol in the X, Y and Z positions. Before performing the production runs, an initialization process was carried out in 5 stages of 100 ps each. Initially, all heavy-atoms were restrained using a force constant of 1000 kJ/mol/nm, and at the final stage only the only C-α atoms were position-restrained using the same force constant. In the first stage, the Berendsen temperature algorithm 101 was used to initialise and maintain the simulation at 300 K, using a temperature coupling constant of 0.01 ps, without pressure control. The second stage continues to use the Berendsen temperature algorithm but with a force constant of 0.1 ps. The third stage kept the same temperature control settings, but introduced pressure coupling with the Berendsen pressure algorithm 101 with a pressure coupling constant of 5.0 ps and applied isotropically. The fourth stage changed the temperature algorithm to V-rescale 102, with a temperature coupling constant of 0.1 ps, and the pressure algorithm to Parrinello-Rahman 103 with a pressure coupling constant of 5.0 ps. The fifth stage is equal to the fourth stage, but position restraints are only applied to C-α atoms. During the simulation, the equations of motion were integrated using a timestep of 2 fs. The temperature was maintained at 300 K, using the V-rescale 102 algorithm with a time constant of 0.1 ps, and the pressure was maintained at 1 bar using the Parrinello−Rahman 103 pressure coupling algorithm, with a time constant of 5 ps; pressure coupling was applied isotropically. Long-range electrostatic interactions were treated with the PME 104, 105 method, using a grid spacing of 0.12 nm, with a cubic interpolation. The neighbor list was updated every twenty steps and the cutoff scheme used was Verlet with 0.8nm as the real space cut-off radius. All bonds were constrained using the LINCS algorithm 106. The system was simulated for 8 µs in 5 replicates (to a total of 40 µs).
RBD-ACE2 contacts frequency
In order to determine the residues that contribute the most for the interaction between RBD and ACE2, and the persistence of these interactions, we performed a contact analysis throughout the simulation. We started by eliminating the first equilibration µs of all replicates. The MDAnalysis library 95, 96 was then used to pinpoint the residues of the RBD that are in contact with the ACE2 protein. A distance cut-off value of 4.5 Å was applied as a criterion (i.e., a contact is observed when two residues have a minimum distance lower than 4.5 Å). Finally, we determined the percentage of time for which a given RBD residue is at less than 4.5 Å of ACE2.
Homology-based models of S protein variants
Homology-based models of all the variants analyzed in this study were generated using the software Modeller 107, version 9.23, using the structure of the wild type enzyme (PDB code: 6XR8) 15 as a template. The protocol used only optimizes the atoms belonging to the mutated residues and the residues that are located within a 5 Å radius from these residues, maintaining the remaining atoms fixed with the coordinates found in the template structure. The optimization parameters were set to the software default values. The final model corresponds to the one with the lowest value of the DOPE function out of 20 generated structures.
Funding
This project is supported by the grants PTDC/CCI-BIO/28200/2017 and FCT RESEARCH4COVID 19 (Ref 580) awarded by the Fundação para a Ciência e a Tecnologia (FCT, Portugal) and also by Fundação Calouste Gulbenkian - Instituto Gulbenkian de Ciência (FCG-IGC, Portugal). The work of JD benefited from COVID-19 emergency funds 2020 from Fundação Gulbenkian de Ciência and from Câmara Municipal de Oeiras. The work of HS was supported by ESCMID and by Gilead Génese (PGG/009/2017) grants. MJA is funded by FCT 2020.02373.CEECIND; MA is funded by a post-doctoral working contract; FF is supported by the grant PTDC/BIA-CEL/32211/2017; JG and HS are supported by Fundação para a Ciência e Tecnologia (FCT) through PD/BD/128343/2017 and CEECIND/01049/2020.
Author contributions
MJA and CS conceived and supervised the study. All authors designed and conducted experiments; all authors carried out experiments and analyzed the data; MJA, MA, RL, CMS, DL, MV wrote the manuscript; all authors contributed to editing the manuscript.
Competing interests
Authors declare that they have no competing interests.
Data and materials availability
All data are available in the main text or the supplementary materials.
Supplementary Materials
Acknowledgments
The authors acknowledge Sean Whelan (Washington University School of Medicine St Louis) for providing reagents, João F. Viana and his team for organizing sera collection at Centro Hospitalar Lisboa Ocidental in Portugal, and the IGC COVID-19 team (Patrícia C. Borges, Christian Diwo, Paula Matoso, Vanessa Malheiro, Lígia A. Gonçalves, Nádia Duarte, Ana Brennand, Lindsay Kosack and Onome Akpogheneta) for processing and ranking samples, for this paper and during early settings. We are grateful to the Unit of Genomics of the Instituto Gulbenkian de Ciência for technical support, sample processing and data collection. The authors thank Isabel Gordo and Mónica Dias (Instituto Gulbenkian de Ciência) for helpful discussions andRute Castro and Paula M. Alves (Instituto de Biologia Experimental e Tecnológica, Universidade Nova de Lisboa, Portugal) for providing the antigen used in the ELISA and in the RBD competition assay.
Footnotes
We have updated Figure 4 of the manuscript to include more spike-pseudotyped mutant particles comprising a combination of the mutations S494P, E484K and N501Y. These were tested with convalescent and post-immunization sera.