In the past decades, global climate changes and improper agricultural practices have caused serious environmental problems such as soil salinization and acidification  , soil erosion  , drought  and explosion of insect pests  , which severely challenge agricultural production in the world, while simultaneously the increasing of global crop productivity is required to meet the growing demand for grain products from continuously increased world population  . It becomes an urgent task for current agricultural scientists to create novel crop varieties with better agronomic traits such as high yield and grain quality, high nutrient usage efficiency, and good acclimation capacity to abiotic and biotic stresses. Conventional breeding strategies such as genetic hybridization have been widely used for creation of new crop varieties and obtained tremendous successes for a long period, but these processes are commonly time-consuming and labor-intensive, thus being difficult to satisfy modern plant breeding objectives.
The rapid development of molecular biology and transgene technologies has made it feasible to genetically modify the traits of interest, and lots of transgenic plants harboring beneficial agronomic traits such as golden rice, a kind of GM rice that is able to accumulate β-carotene in grain, have been generated    . In contrast to the widespread utilization of transgenic technology in the basic and applied research of crops in the lab, commercialization of GM crops is still strictly regulated by governments in the world such as China, Japan and European countries because of intensive concerns about biosafety of GM crops  . Therefore, only a very small portion of GM crops have been released by now     .
Targeted genome engineering refers to technologies that are used for site-specific genome modifications including gene knockout, knockin and transcriptional regulation   . Site-specific recombination system, ZFNs, TALENs and CRISPR/Cas9 technologies are the representatives of targeted genome engineering. In comparison to the conventional transgenic technology, genome editing has obvious advantages such as easy design and construction, high precision and efficiency of modifications for genome loci responsible for traits of interest, being capable of stacking multiple genes of interest simultaneously, generation of descendants without transgenic elements, and so on. Therefore, targeted genome engineering has attracted extensive attentions from plant scientists and breeders, and has been rapidly adopted in crop improvement, especially with the emergence of CRISPR/Cas9 system since 2013   . In this review, we summarize recent progresses of targeted genome engineering and its application in genetical modifications of crop plants, and propose perspectives for future research on genome editing-based crop improvement.
2. Site-Specific Recombination System
Site-specific recombination refers to the reaction of two DNA molecules catalyzed by specific enzymes (recombinases) at their cognate pairs of sequences or target sites  . This recombination requires three components, a recombinase such as Cre responsible for DNA editing and two DNA partners such as LoxP cognate sites recognized by the recombinase. The recombinases can be divided into two families, tyrosine recombinase family and serine recombinase family (Table 1)  . The tyrosine recombinases contain a conserved tyrosine active site and catalyze DNA rearrangement via formation and resolution of a Holliday junction intermediate  , while the serine recombinases contain a conserved serine active site and catalyze site-specific DNA recombination through a concerted, four-strand cleavage and rejoining mechanism  .
Among these site-specific recombination systems, Cre/LoxP is the most commonly used one, which is composed of Cre recombinase and 34-bp LoxP sequences that can be recognized by Cre  . When the Cre recombinase is expressed, recombination events will occur to the cells harboring LoxP recognition sites in their genomes. In general, there are three possible outcomes from the Cre/LoxP-derived recombination including inversion, translocation or excision, depending on the initial arrangement of LoxP recombination sites (Figure 1). Inversion event or excision event can occur when the recombination sites are located on the same chromosome with the same orientation or the opposite orientation, respectively. Translocation event can result from the exchange of DNA segments when the two recombination sites are located on separate chromosomes with the same orientation  .
There are two main applications of site-specific recombination system in genetical modifications of crop plants: removal of undesirable transgenic elements such as selectable marker genes and site-specific integration of genes of interest. For example, by using Cre/LoxP-mediated recombination, the selectable maker gene, HPT, has been successfully eliminated from transgenic mustard plants with insect resistance  . Elimination of selectable marker genes has also been reported in rice  , potato  , tomato  and so on. Cre/LoxP system has been adopted in construction of maize and rice minichromosomes as well, wherein genes of interest are expected to be stacked without limitations   . The application of Flp/Frt recombination system has also been reported for elimination of selectable marker genes in plants such as rice  and maize  . The representative GM crops that were generated via site-specific recombination recently are summarized in Table 2.
Although targeted genome editing has been realized via different site-specific recombination tools, there are still disadvantages limiting their applications in current crop improvement such as the failure of complete removal of transgenic elements, complicated design for vectors, time-consuming multiple transformation and genetic hybridization, and relatively low targeting efficiency  .
Table 1. Representative recombinases responsible for site-specific recombination  .
Figure 1. Cre/LoxP-mediated site-specific recombination. (a) Cre/LoxP-mediated inversion event can occur when two LoxP sites are located on the same chromosome with opposite orientation. (b) Cre/LoxP-mediated translocation event can occur when two LoxP sites are located on separate chromosomes with the same orientation. (c) Cre/LoxP-mediated excision event can occur when two LoxP sites are located on the same chromosome with the same orientation.
3. ZFN Technology
ZF proteins are the common group of DNA binding proteins in eukaryotic organisms. Each ZF protein is composed of about 30 amino acids in a conserved β-β-α configuration  . The DNA binding ability is determined by the specific amino acids present on the surface of the α-helix in each ZF with varying specificity. Based on the specific DNA binding trait, a targeted genome editing platform, ZFNs, has been constructed (Figure 2). A ZFN system is composed of two arrays of ZF proteins and a nuclease such as Fok I. Each array of ZF proteins is linked with a subunit of Fok I. Fok I can work normally after the two arrays of ZF proteins bind to the DNA sites of interest and two subunits of Fok I are dimerized. There are several strategies used for the assembly of ZFNs. The first one is called as modular assembly, a strategy based on the library of ZFs with well-known DNA-binding specificities. ZFNs can also be assembled through web-based tools by combining random assembly of multi-finger libraries with specificity screening or by companies.
In general, there are two types of DNA editing by ZFNs (Figure 2). The first type is the targeted gene knockout, of which the purpose is to create a null mutant by interfering with the expression of genes of interest at DNA level. For example, the HIV-1 resistance has been detected in the primary T cells and the hematopoietic stem/progenitor cells by ZFN-mediated knockout of the CC chemokine receptor 5   . A maize ipk1 mutant line has also been generated by ZFN-mediated gene knockout, leading to a modified phytate biosynthesis pathway in the resulted maize plants  . The other is the targeted gene
Table 2. The application of site-specific recombination system in targeted gene modifications of crop plants.
knockin, of which the purpose is to create organisms expressing genes of interest by introducing exogenous genes at a specific site of the genome or by repairing the mutation sites of endogenous genes. In this case, for example, ZFNs have been applied to repair the mutation sites that are closely associated with diseases such as haemophilia B  , sickle-cell disease  and Parkinson’s disease  . In tobacco BY2 cells, a functional GFP gene has been successfully introduced into both the pre-integrated defective reporter construct and an endogenous locus
Figure 2. Zinc finger (ZF)-mediated regulation of gene expression. (a) ZF nuclease (ZFN)-mediated genome editing. Arrays of ZF proteins recognize specific DNA sequences in the genome, and double-strand breakages (DSBs) will be generated by linked Fok I nuclease. The DSBs will be repaired by either non-homologous end joining (NHEJ) to produce knockout events or by homologous recombination (HR) to produce gene knockin events. (b) ZF-mediated transcriptional regulation of gene expression. By linking with specific transcription factors or repressors, ZFs can also be used to regulate gene expression at transcriptional level.
by ZFNs  . In addition to the targeted gene knockout and knockin, ZFs can be linked with transcriptional factors to regulate gene expression at DNA level  . Table 3 shows a summary of representative GM crops that were generated via ZFN system recently.
Despite the successes of ZF-associated technologies in previous studies, some disadvantages such as the complexity of assembly, context-dependent binding specificity and relatively low targeting efficiency have not been well addressed and thus limited the application of these technologies in current research  .
4. TALEN Technology
TALEs are a group of natural proteins from the genus of plant bacteria, Xanthomonas, and can bind some specific DNA regions in plant genome through a series of 33-35 amino acid domains that each can recognize a single base pair  . The binding specificity of TALEs is dependent on the RVD, the two highly-variable amino acids at the position of 12 and 13 in each TALE (Table 4)  . TALEs can be designed to recognize specific DNA sequences in the genome. By
Table 3. The application of zinc finger nucleases (ZFNs) or ZF transcription factors in targeted gene modifications of crop plants.
Table 4. The corresponding nucleotide of amino acid pairs in transcriptional activator-like effector (TALE)  .
adding a nuclease such as the Fok-I, TALEN system has been developed to make DSBs on pre-selected genome sites to generate knockout or knockin editing in the genome (Figure 3).
For example, AvrXa7 is the effector-binding element in the promoter of a bacterial blight susceptibility gene Os11N3, and the TALEN-mediated knockout of this element has generated rice with resistance to bacterial blight  . In another report, rice fragrance has been improved by TALEN-mediated targeting to a defective badh2 allele, which is responsible for the synthesis of 2AP, a major fragrance compound in rice  . The TALE designer transcription factors have been used to regulate OCT4 and NANOG loci by targeting their enhancers in mammalian cells, leading to stimulation or inhibition of reprogramming somatic cells to the induced pluripotent cells  , while similar applications have not been introduced into plants yet. Table 5 shows a summary of representative GM crops that were generated via TALEN recently.
Figure 3. Transcription activator-like effector (TALE)-mediated regulation of gene expression. (a) TALE nuclease (TALEN)-mediated gene knockout event based on non-homologous end joining (NHEJ) mechanism or gene knockin event based on homologous recombination (HR) mechanism. (b) TALE-mediated transcriptional regulation of gene expression. By linking with specific transcription factors, TALEs can also be used to regulate gene expression transcriptionally. The binding specificity of each TALE is dependent on repeat-variable di-residue (RVD) composition.
The single base recognition of TALE-DNA binding repeats affords greater design flexibility than triplet-confined ZF proteins, while the assembly of the large repetitive TALE modules is still a great challenge for the cloning of repeated TALE arrays  .
5. CRISPR/Cas9 System
CRISPR/Cas is an adaptive innate immune system to defend against the invasion of viral and plasmid DNA in bacteria and archaes  . Scientists have found three types of CRISPR/Cas systems (type I, II and III) within a series of microbes, and each type includes a Cas protein and the corresponding CRISPR arrays  . These CRISPR/Cas arrays are composed of repeat sequences interspaced by spacer sequences, which are non-repetitive and derive from the short segments of viral or plasmid DNA often called as protospacer sequences. By transcribing the CRISPR arrays into crRNAs, Cas proteins can be directed to the target sites and make DSBs, resulting in the degradation of foreign genetic materials.
Table 5. The application of transcription activator-like effector nuclease (TALEN) technology in targeted gene modifications of crop plants.
Among the three CRISPR systems, the type II CRISPR system is the first one to be engineered for targeted genome editing in eukaryotic organisms (Figure 4). In this system DNA sequence bearing a 5’-NGG-3’ PAM can be recognized by a duplex of two non-coding RNAs, a crRNA and a tracrRNA, or with a sgRNA, which is a synthetic fusion of crRNA and tracrRNA, and then be degraded by Cas9 protein that is complexed with the duplex of crRNA and tracrRNA or the sgRNA   . The Cas9 protein from Streptococcus pyogenes (SpCas9) has two nuclease motifs, HNH and RuvC, which play critical roles in generating DSBs at target sites  . The target specificity of CRISPR/Cas9 is determined by a seed sequence, which is a 12-base sequence upstream of the PAM and must match with the sequence of crRNA or sgRNA  . Compared with the
Figure 4. The basic components of a clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein-9 nuclease (Cas9) (CRISPR/Cas9) system and its application in targeted genome editing. (a) CRISPR/Cas9 or (b) CRISPR/Cas9n-mediated gene knockout event based on non-homologous end joining (NHEJ) mechanism or gene knockin event based on homologous recombination (HR) mechanism after the occurrence of Cas9 or Cas9n-induced double-strand breakages (DSBs). (c) CRISPR/dCas9-mediated transcriptional regulation of gene expression. By being fused with specific transcription factors, dCas9 can regulate gene expression transcriptionally via RNA-directed binding to specific genome sites. Targeted gene repression can also be realized via the only dCas9 associated with specific sgRNAs. The binding specificity of CRISPR/Cas9 system is dependent on a seed sequence, which is an about 12-base sequence upstream of the PAM sequence and must match with the crRNA or sgRNA.
formerly-designed ZFNs and TALENs, CRISPR/Cas9 system has more advantages such as the simplicity of design and assembly, the high efficiency of targeting and the versatility of application, and thus is expected to be a powerful tool for targeted genome editing (Table 6).
Table 6. Characteristic comparison of different targeted genome engineering systems.
CRISPR/Cas9 is first reported to be efficient in targeted genome editing in mammalian cells at single-gene level   , multi-gene level  and even genome-wide level  . Thereafter, this system is adopted for gene knockout or knockin in plants such as Arabidopsis, tobacco, sorghum, rice, and wheat   . By modifying two amino acids at the position of 10 and 841 from aspartate (D) and histidine (H) to alanine (A), scientists have successfully developed two variants of Cas9, Cas9n (D10A) with a nickase activity and dCas9 (D10A and H841A) without any catalytic activity. The two variants of Cas9 have been further developed to CRIPSR/Cas9n  and CRISPR/dCas9 systems  . Some Cas9-derived base editor systems have also been established by fusing an adenine or a cytidine deaminase to a Cas9n protein  . Being different from the classical Cas9 system, the base editor systems can catalyze two kinds of nucleotide replacement reactions, from adenine (A) to guanine (G) or from cytosine (C) to thymine (T), depending on the deaminase that is linked to Cas9n  . Applications of these Cas9-derived systems have been reported in targeted genome editing. For example, in a comparison study of gene expression regulation by CRISPR/dCas9 and TALE designer transcription factor systems, scientists have demonstrated better performance in activation of gene expression by TALE activator system, while repression by CRISPR/dCa9 was similar with or better than TALE repressor  . The successful utilization of base editors has been well documented in crop plants such as rice, tomato, potato, wheat, maize, rape and watermelon as well  -  . To date, CRISPR/Cas9-mediated genome editing has been widely used for crop improvement and a number of GM crops with desired traits have been created (Table 7).
Recently, some orthologues of the classical Cas9 (SpCas9) have been identified in bacteria such as SaCas9 from Staphylococcus aureus and St1Cas9 from Streptococcus thermophilus  , and been adopted in targeted genome modifications
Table 7. The application of CRISPR/Cas9 in targeted gene modifications of crop plants.
of Arabidopsis, tobacco and rice (Table 8)   . The comparable target efficiency of SaCas9 and St1Cas9 with SpCas9 system has also been documented in plants  . Although all of the three Cas9 proteins are clustered to the type II CRISPR/Cas system, different PAM sequences are recognized (5’-NNGRRT-3’ for SaCas9 and 5’-NNGGAA-3’ for St1Cas9 instead of 5’-NGG-3’ for SpCas9) when DSBs are catalyzed at the target sites  . Considering that the flexibility of target sites is constrained by PAM sequences to great extent, the introduction of SaCas9 and St1Cas9 into CRIPSR group greatly expands the coverage of available target sites on plant genome and promotes CRISPR-mediated crop improvement. In addition to the mentioned type II CRISPR systems above, a CRISPR/Cpf1 platform (also named CRISPR/Cas12a) has been developed, of which several aspects are quite distinct from the Cas9 systems   . For instance, only a crRNA is needed for guiding Cpf1-crRNA complex to target sites on plant genome, and therefore, the length of sgRNA is shortened from approximately 100 nt in Cas9 systems to 42 nt, facilitating sgRNA synthesis. The PAM sequence of Cpf1 is 5’-TTTN-3’ and the Cpf1-mediated DSBs can generate 4 bp overhangs at 5’ end of cleavage sites, while blunt ends are produced after Cas9 cleavage at target sites. The resulted cohesive ends by Cpf1 display some potential advantages over blunt ends such as the improvement of knockin efficiency and the increased possibility of multiple editing events. The CRISPR/Cpf1 system is thus considered as a promising tool and has been applied in genetical modifications of crop plants [Table 8]. And we believe that this system will be widely used for crop improvement in the near future.
The appearance and rapid development of targeted genome engineering technologies have paved a new way for crop breeding. By now, four generations of genome editing platforms have been provided from site-specific recombination system to ZFN, TALEN and CRISPR/Cas9 technologies. CRISPR/Cas9 is now
Table 8. The newly-developed CRISPR systems and their applications in crop improvement.
considered as the advantageous tool over the other three and has been used in genetical modifications of crops most widely. A lot of new crop germplasms that do not exist in nature have been efficiently created via CRISPR-mediated knockout, knockin, transcriptional activation and transcriptional repression of genes of interest.
However, challenges still remain in current CRISPR platform. Thus far, most of reported modifications for crops are CRISPR-mediated gene knockouts while knockin events are rare and usually at low efficiency, though knockin is very useful for crop breeding because it can confer novel traits that do not exist in crops in nature by editing of existing alleles or adding of new ones. CRISPR/dCas9 is considered as a promising tool for transcriptional regulation of gene expression, particularly for genes with highly methylated promoter regions   , while available data for gene modification of this kind are quite limited, constraining the improvement of CRISPR-based transcriptional regulation systems and their applications in crops. Transformation and tissue culture are crucial for CRISPR-mediated genome editing, while their efficiencies are challenged for most of crop plants. Last but not the least, off-targeting is still an intensive concern for plant scientists, though numerous studies have evidenced the precision of CRISPR-mediated genome editing in plants  . All of these challenges should be addressed in the future studies in order to promote the applications of CRISPR systems in crop improvement.
This work was supported by Natural Science Foundation of Guangdong Province (2018A030310446), China Postdoctoral Science Foundation (2017M612741), Guangdong Innovation Research Team Fund (2014ZT05S078), National Natural Science Foundation of China (31600982), and Shenzhen High-Level Talents Research Fund (827/000256).
11N3: 11 nodulin 3
AAD1: aryloxyalkanoate dioxygenase 1
AATCB: cecropin B gene with a signal peptide sequence AAT from the a-1-antitrypsin
ABCC6: ATP-binding cassette sub-family C member 6
AG: agamous gene
ALS1: acetolactate synthase 1
BADH2: betaine aldehyde dehydrogenase 2
BAR: phosphinothricin acetyl transferase
BEL-230: bentazon sensitive lethal 230
BFP: blue fluorescent protein
Cas9: CRISPR-associated protein9 nuclease
CCD4a: carotenoid cleavage dioxygenase 4a
CFP: cyan fluorescent protein
COMT: caffeic acid O-methyltransferase
Cre: cyclization recombination enzyme
CRISPR: clustered regularly interspaced short palindromic repeats
crRNA: CRISPR RNA
CYP97A4: cytochrome P450-type carotenoid hydroxylases 97A4
DCL1a: dicer-like 1a
DEP1: dense and erect panicle 1
DGT28: dow glyphosate tolerance
DIPM1: DspA/E-interacting protein of Malus × domestica 1
DL: drooping leaf
DREB2: DRE-binding protein 1
DSB: double-strand breakage
DSM2: drought-hypersensitive mutant 2
eIF4E: eukaryotic initiation factor 4E
EPFL9: epidermal patterning factor like 9
EPSPS: 5-enolpyruvylshikimate-3-phosphate synthase
ER1: Erysiphe pisi resistance 1
ERF922: ethylene responsive factor 922
FAD2: oleate desaturase 2
Fim: fimbrial subunit
Flp: yeast 2-micron circle plasmid recombinase
Frt: Flp-recognition target
FT4: flowering locus T 4
GASR7: gibberellic acid-stimulated transcript-related 7
GAT: glyphosate acetyltransferase
GBSS: granule-bound starch synthase
GFP: green fluorescent protein
GL2: GLABRA 2
GM: genetically modified
GW2: grain width 2
HAK1: high-affinity potassium transporter 1
HIV-1: human immunodeficiency virus type 1
HPT: hygromycin phosphotransferase
HR: homologous recombination
IdnDH: idonate dehydrogenase
IPK1: inositol 1,3,4,5,6-pentakisphosphate 2-kinase
IPT: isopentenyl transferase
KASII: 3-ketoacyl-ACP synthase II
LOB1: lateral organ boundaries
LoxP: integration/crossover site on phage P1 genome
LTP9.4: lipid transfer protein 9.4
MAPK3: mitogen-activated protein kinase 3
MLO: mildew resistance locus O
MTL: MATRILINEAL phospholipase
MYB44: v-myb avian myeloblastosis viral oncogene homolog 44
NANOG: Nanog homeobox
nCBP-1: novel cap binding protein 1
NFXL1: nuclear factor-X like 1
NHEJ: non-homologous end joining
NOS: nopaline synthase
NPTII: neomycin phosphotransferase II
OCT4: octamer-binding transcription factor 4
PAM: protospacer-adjacent motif
PAT: hosphinothricin acetyltransferase
PDR6: pleiotropic drug resistance 6
PDS: phytoene desaturase
PMI: phosphomannose isomerase
RFP: red fluorescent protein
RLK-798: receptor-like kinases 798
ROC5: rice outermost cell-specific gene 5
RVD: repeat-variable di-residues
RVI6: resistance to Venturia inaequalis 6
SBE1: starch-branching enzyme 1
sgRNA: single guide RNA
SPL14: SQUAMOSA promoter binding protein-like 14
SSR2: sterol side chain reductase 2
STF1: soybean TGACG-motif binding factor 1
SUR: sulfonylurea receptor
TALE: transcriptional activator-like effector
TALEN: transcriptional activator-like effector nuclease
TB1: teosinte branched 1
TC: tocopherol cyclase
tracrRNA: trans-activating crRNA
VINV: vacuolar invertase
YFP: yellow fluorescent protein
ZF: zinc finger
ZFN: zinc finger nuclease