The Eukaryotic Linear Motif resource for
Functional Sites in Proteins
Functional site class:
NLS classical Nuclear Localization Signals
Functional site description:
Many nuclear proteins possess a nuclear localisation signal (NLS) that is recognised by the importer protein importin-alpha. The NLS motif is primarily composed of basic residues and is found in two main variants: a monopartite form and a bipartite form with two short basic segments separated by a flexible linker. Importin-alpha is itself an adaptor for the nuclear transport receptor importin-beta. The latter is docked on the cytosolic side of the nuclear pore via repetitive FG, FxFG and GLFG linear motifs found in several nucleoporin proteins (FG-Nups) (Terry,2009). The cargo loaded importin complexes translocate through the nuclear pore while remaining attached to the flexible FG-Nups. Finally, binding of RanGTP to importin-beta drives cargo release, with the importin-alpha still being bound to nucleoporins located on the nucleoplasmic side. Importin-alpha must be returned to the cytosol to repeat the process.
ELMs with same func. site: TRG_NLS_Bipartite_1  TRG_NLS_MonoCore_2  TRG_NLS_MonoExtC_3  TRG_NLS_MonoExtN_4 
ELM Description:
The classical monopartite nuclear localisation signal (mNLS) binds to the major site of importin-alpha. It always has a short basic cluster of lysines and arginines. Originally, the main positions were assigned as P1-P5. Now, the four positions P2-P5 are considered to build the core with three of these positions always occupied by basic residues. Complementary charges as well as the size of the individual binding pockets in importin alpha strictly control the P2 and P3 basic side chain preference: The P2 position is critical to anchor the motif: it must be occupied by a basic residue. Furthermore, if P2 is arginine, P3 must be lysine; whereas if P2 is lysine, P3 may be arginine or lysine. At least one of P4 and P5 must be a basic residue: There are preferences for hydrophobic or proline and against acidic residues at P4 and P5. Outside the core motif, there are additional preferences, which, though weaker, clearly play a role in NLS binding affinity, especially at P1, P6 and P7. P1 and P6 both have a preference for basic, Pro and hydrophobic residues. Acidic residues are almost never found in P1 and P6, whereas P7 has a clear preference for acidic residues: e.g. for c-myc (320-PAAKRVKLD-328), the lysine at P5 forms an intramolecular salt bridge with the aspartate residue (position 328) at P7, thus stabilizing the P5 residue by neutralizing its charge intramolecularly (Conti,2000). If positions with weaker preferences are all unfavourable, it is likely that P2-P5 must be four consecutive basic residues to achieve the required binding energy (possible alternative: Pro in P4). The rejection of acidic residues in most positions around the NLS, may allow some NLS activities to be regulated by phosphorylation as Ser and Thr residues are quite often found in the non-core positions. For ease of understanding, the monopartite NLS has been split into three regular expressions which collectively capture the core motif and the adjacent preferences.
Pattern: [^DE]((K[RK])|(RK))(([^DE][KR])|([KR][^DE]))(([PKR])|([^DE][DE]))
Pattern Probability: 0.0007252
Present in taxon: Eukaryota
Interaction Domain:
Arm (PF00514) Armadillo/beta-catenin-like repeat (Stochiometry: 1 : 1)
PDB Structure: 1Q1T
o See 23 Instances for TRG_NLS_MonoExtC_3
o Abstract
Proteins are synthesised in the cytosol, so they must travel across membranes to reach other cell compartments. Proteins which function in the nucleus pass through the nuclear envelope via the nuclear pores. Small proteins might be capable of diffusing through the pores, although the process is likely to be inefficient. Therefore, almost all well studied nuclear proteins are transported into the nucleus using active translocation through the pore. This requires a targeting motif to bind the transport machinery. Proteins that enter the nucleus in preformed complexes may not require their own targeting motif (Dingwall,1982). Nevertheless, it is clear that most proteins do specify their own import signal and, of these, the vast majority have an NLS that binds to importin-alpha (also termed karyopherin-alpha). Importin-alpha constitutes a multiprotein family in Metazoa with generally similar but perhaps not identical binding specificities (Mason,2009). Importin-alpha is an adaptor protein for importin-beta which interacts with nuclear pore components to effect transport into the nucleus. Several other proteins such as snurportin and transportin may be considered as specialised importin-beta adaptors for specific cargoes like snRNP complexes, while some proteins may interact directly with importin-beta to effect their import (e.g. the viral protein HIV Rev (Henderson,1997)).
A striking feature of protein transfer through the nuclear pore is that key roles are played by several linear motifs. Besides the NLS for nuclear import, the CRM1-binding NES motif is found in proteins that are re-exported, while the FG, FxFG and GLFG repeating motifs are found in a subset of nucleoporins (FG-Nups) and are docking sites for the transfer complexes (Terry,2009). The FG-Nups are large, predominantly natively disordered proteins that line the inner pore, extending projections into both the cytosol and nucleoplasm. Importin-beta makes multivalent interactions with motifs in the FG-Nups. Importin-beta stays docked to the FG-Nups while it translocates through the pore. Additional regulatory interactions involving globular interfaces, e.g. with the RAN GTPase, effect allosteric rearrangements of the transport receptors. Overall the nuclear transfer systems are highly cooperative, which is the hallmark of robust cellular operations requiring multiple regulatory inputs to carefully control each step in the process.
The "classical" or "conventional" importin-alpha binding NLS is found in the majority of nuclear proteins. It was one of the earliest linear motifs to be described. The dominating feature of the motif is its highly basic nature. It exists in two main variants, a "monopartite" form (mNLS) with a single cluster of basic amino acids, originally found in the sequences of the large T Antigen (SV40) and E1A (Adenovirus) (Kalderon,1984; Smith,1985) and a "bipartite" form (bNLS) with two basic clusters separated by a short linker region, first analysed in Nucleoplasmin (Dingwall,1991). NLS motifs can be found anywhere within a protein sequence provided that the location is well exposed: in practice, they are nearly always in segments of native disorder. Possession of an NLS is sufficient for targeting a protein located in the cytosol into the nucleus. As a consequence, proteins whose localizations are commonly restricted to the cytoplasm can be experimentally directed to the nucleus by fusing an NLS motif onto them. Nuclear import may be very fast but can also be slow, particularly if it is regulated. Thus, the NLS of the Polyomavirus VP1 protein mediated nuclear localization within one minute (Chang,1992). In contrast, for the Adenovirus E1A protein, nuclear localization required up to 30 minutes (Lyons,1987).
In nuclear proteins, sequences matching the mNLS are more common than for the bNLS. Structures of NLS peptides in complex with importin-alpha show that the site binding the mNLS also binds the second basic cluster of the bNLS. In the ELM NLS entries, we will refer to this binding site as the "major importin site", in contrast to its interacting counterpart, the "major NLS site" (and similarly to the "minor importin site" and the "minor NLS site" respectively). The NLS itself is known to bind in a predominantly extended conformation (Fontes,2003).
Although basic amino acids predominate in the NLS, other amino acids can contribute to the binding interaction, resulting in sufficient variation that accurate definition of motif patterns has not been straightforward. Historically, five positions P1-P5 were used to designate the major importin site (Fontes,2000). Earlier NLS pattern search applications either assumed a fuzzy distribution of the basic residues at these five positions (Nakai,1999) or else used very restricted definitions (Chelsky,1989). Even if it continued to be used for designing pattern search algorithms (Seiler,2006), from a biochemical point of view, such a fuzzy characterization remained unsatisfying. The availability of several crystal structures of NLS peptides in complex with importin-alpha (Conti,2000; Fontes,2003; Tarendeau,2007) has lead to a better understanding, with the positions P2-P5 now designated as the core of the major importin site. The binding interaction is anchored at positions P2 and P3: these are the only positions where basic amino acids are obligatory. In positions P4 and P5, basic residues are preferred to the extent that one of these positions must be Arg or Lys. Pro and hydrophobic residues are reasonable substitutes at P4 and P5. As P4 is the position of the NLS core where most amino acid variation is allowed, we regard [KR][KR]X[KR] to be stronger than [KR][KR][KR]X. Examination of binding peptides, mutational studies and sequence alignments of NLS-containing proteins indicate that adjacent positions modulate the major site by the presence/non-presence of favoured residues: When a hydrophobic residue is found at P4 or P5, favoured residues must be present either in residues preceding P2-P5, or else in positions P6 and P7. As a third option, if these adjacent positions lack favoured residues, then the NLS must be bipartite, with a pair of basic residues occupying the minor site. There are also strongly disfavoured amino acids - in particular the acidic residues (Asp, Glu) appear to be rejected from P1, P4, P5 and P6.
In principle, it would be possible to assemble the NLS amino acid preferences into a single regular expression. However, this would be overly complicated and difficult to understand. Therefore for the NLS in ELM, the residue preferences have been split into four patterns, three mNLS and one bNLS. Often a given NLS will match multiple ELM NLS patterns.
For proteins that shuttle in and out of the nucleus, it is likely that the NLS can be conditionally inactivated, or they would immediately reimport into the nucleus without executing their cytosolic function. Conditionally inactive NLSes may also be required for the many transcription factors used in transient signalling pathways that are initially targeted to the plasma membrane cytosolic side (Fabbro,2003). Upon cell surface receptor stimulation, they are transferred to the nucleus, where they transiently regulate target genes before being ubiquitinated and destroyed by the proteasome. Such proteins must remain outside the nucleus indefinitely, and therefore the NLS is likely to be activated when the transfer signal is made. Most probably, the main way to control NLS availability is by phosphorylation of the motif at positions that cannot be negatively charged e.g. P1, P4, P5, P6 and possibly at other nearby residue positions (Fontes,2003; Sorokin,2007).
o 21 selected references:

o 8 GO-Terms:

o 23 Instances for TRG_NLS_MonoExtC_3
(click table headers for sorting; Notes column: =Number of Switches, =Number of Interactions)
Acc., Gene-, NameStartEndSubsequenceLogic#Ev.OrganismNotes
D2KNN0 ankA
1059 1064 SPTITVMKKKVKPQVPTRTS TP 2 Anaplasma phagocytophilum
P14727 avrBs3
1020 1025 RILQASGMKRAKPSPTSTQT TP 3 Xanthomonas euvesicatoria
P14727 avrBs3
1066 1072 QTRASSRKRSRSDRAVTGPS TP 3 Xanthomonas euvesicatoria
A0A1B4JR46 A8H35_06200
155 160 SEDDEGKKKKKAKKGGKDKK TP 3 Burkholderia thailandensis
Q5ZUS4 legAS4
53 58 KSKFKFSQRKAKKKGPGMTH TP 6 Legionella pneumophila subsp. pneumophila str. Philadelphia 1
Q62315 Jarid2
105 110 SDFEEGPSRKRPRLQAQRKF TP 2 Mus musculus (House mouse)
P38398 BRCA1
503 508 RPLTNKLKRKRRPTSGLHPE TP 3 Homo sapiens (Human)
P25054 APC
2048 2053 ECISSAMPKKKKPSRLKGDN TP 2 Homo sapiens (Human)
P19838 NFKB1
360 365 IKDKEEVQRKRQKLMPNFSD TP 2 Homo sapiens (Human)
P18870 JUN
252 258 RIAASKCRKRKLERIARLEE TP 2 Gallus gallus (Chicken)
P05411 JUN
225 231 RIAASKSRKRKLERIARLEE TP 2 Avian sarcoma virus 17
P03255 Early E1A 32
284 289 EDLLNEPGQPLDLSCKRPRP TP 2 Human adenovirus 5
P03096 Minor capsid
313 318 TPTWATVIEEDGPQKKKRRL TP 2 Murine polyomavirus strain A2
P03073 Large T antig
191 196 PPRTPVSRKRPRPAGATGGG TP 2 Murine polyomavirus strain A2
P02293 HTB1
30 35 KTSTSTDGKKRSKARKETYS TP 2 Saccharomyces cerevisiae (Baker"s yeast)
P01106 MYC
322 328 RKDYPAAKRVKLDSVRVLRQ TP 3 Homo sapiens (Human)
P06668 virD2
31 37 NQLEYLSRKGKLELQRSARH TP 2 Agrobacterium tumefaciens
12 17 RRRPRRSQRKRPPTPWPTSQ TP 4 Human T-cell lymphotrophic virus type 1 (strain ATK)
P04591 gag
P07270 PHO4
156 161 SNSSPYLNKRRGKPGPDSAT TP 3 Saccharomyces cerevisiae S288c
Q04206 RELA
301 307 RHRIEEKRKRTYETFKSIMK TP 3 Homo sapiens (Human)
P38398 BRCA1
502 507 ERPLTNKLKRKRRPTSGLHP TP 3 Homo sapiens (Human)
P03070 Large T antig
127 133 QHSTPPKKKRKVEDPKDFPS TP 3 Simian virus 40
Please cite: The Eukaryotic Linear Motif resource: 2022 release. (PMID:34718738)

ELM data can be downloaded & distributed for non-commercial use according to the ELM Software License Agreement