Received 10 November 2015; accepted 28 March 2016; published 31 March 2016
The estuarine fish Fundulus heteroclitus (Atlantic killifish) is an important ecological model for studying regional and local adaptation  . Killifish are relatively non-migratory and have large effective population (Ne)  sizes which provide evolutionary arenas for natural selection to occur. Their extensive genetic diversity coupled with their occurrence in varying environmental conditions has led to adaptive variation in several physiological and biochemical traits, including susceptibility to toxic xenobiotics   -  . The extensive phenotypic variation and genetic diversity exhibited by killifish make them an ideal candidate for studying evolutionary biology including questions of quantitative genetic variation.
Since the post-genomic era, the molecular toolbox for F. heteroclitus has expanded from a few genetic loci allowing for population genetic studies   to numerous loci distributed across the genome  -  . As the repertoire of genetic markers increases, it is becoming more feasible to perform genetic association studies using Fundulus as a model. However, a genetic linkage map has not yet been developed for this species, prohibiting localization of chromosomal regions containing the genetic variation responsible for phenotypic traits.
To further advance F. heteroclitus as a functional genetic model organism, we developed enriched microsatellite libraries and identified 330 novel microsatellites. Parental (P) and Second Filial (F2) generations of three genetic crosses were genotyped with the microsatellites reported herein, existing microsatellites  -  , and single nucleotide polymorphisms (SNPs)   . Based on the segregation patterns of polymorphic loci in the recombinant generations, we constructed the first genetic linkage map of the killifish genome. Comparisons between this killifish genetic map and published genomic assemblies for medaka (Oryzias latipes) and zebrafish (Danio rerio) suggest relatively high degrees of synteny among fish species, and provide additional resources to test evolutionary models and hypotheses.
2. Methods and Materials
2.1. Killifish Collection and Breeding Design
Parental (P) fish for genetic crosses were collected from New Bedford Harbor (NBH), Massachusetts, USA, and Block Island (BI), Rhode Island, USA, and were maintained as previously described  . Following a classic F2-intercross design, BI males were crossed with NBH females to generate the first filial generation (F1). Once sexually mature, F1 siblings were crossed (F1 × F1) to generate the recombinant F2 generation. F2 larvae from each of three independent pedigrees were collected and used to construct an integrated consensus map of the killifish genome. All research including fish collection, culture, and breeding have been approved and adhere to the guidelines set forth by USEPAs Institutional Animal Care and Use Committee (IACUC).
Genomic libraries enriched for microsatellites were developed following the methods of Hamilton et al.  and Glenn et al.  . Briefly, genomic DNA extracted from eight larval fish (DNeasy, Qiagen) was pooled and digested with Hae III and Rsa I (NEB) and ligated to double-stranded linkers (SNX-f-5’-CTAAGGCCTT- GCTAGCAGAAGC-3’and SNX-r-5’-pGCTTCTGCTAGCAAGGCCTTAGAAAA-3’). Microsatellite probes (5’-biotinylated) were used to capture fragments containing microsatellite repeats on streptavidin-coated beads. Microsatellite-enriched single stranded DNA was amplified using the SNX-f primer to recover double-stranded DNA. PCR amplification products of clones (TOPO 2.1, Invitrogen) were sequenced (ABI 3730 Genetic Analyser) and screened for microsatellites. Primers for candidate microsatellites were designed using Primer 3  .
Parental fish (n = 6) and 95 F2 embryos from each of three genetic crosses (total n = 291) were genotyped with the microsatellites developed herein, as well as with previously published markers including microsatellites  -  , and SNPs   . Polymerase chain reactions and genotyping were performed according to Waits and Nebert, 2011  , for microsatellite markers, and according to Williams and Oleksiak, 2011  , for SNPs.
2.4. Linkage Mapping
Linkage groups were identified based on estimated two-point logarithm of odds (LOD) scores and Haldane distances. For each genetic cross, a minimum LOD score of 3.0 and a maximum recombination rate (rf) of 0.5 were used as threshold values to determine linkage groups  . Within each linkage group, marker order was determined by examining all pairs of markers, starting with the pair having the highest maximum likelihood and incrementally adding markers (most strongly linked first) always choosing the highest log likelihood and the best insertion point . Linkage maps were then improved and validated by perturbation of the initial maps in a systematic manner  . First, the “flips” algorithm was used to evaluate all possible permutations of marker order in a sliding window of fixed size (window = 5) on the initial map. Initial maps were replaced with improved maps as identified  . These maps were then interrogated by “polishing,” which displaces each marker in all possible intervals in search of a map having a better log-likelihood  . For the consensus map, markers in common among the three families were used to anchor an integrated map. Linkage groups were identified and an initial framework map was built using stringent criteria (LOD > 10, rf < 0.2), which were then incrementally relaxed (to LOD ≥ 3.0, rf ≤ 0.5). Marker order within each consensus linkage group was determined as described above.
2.5. Marker Identity
To identify microsatellite and SNP marker identities, an annotated draft of the F. heteroclitus genome (supported by National Science Foundation grant, DEB-1120512) was queried. Specifically, marker sequences were assigned to annotated killifish gene models by performing nucleotide BLAST searches against the killifish genome sequence. Markers were assigned a gene name when the marker sequence fell within an annotated gene model and the match between the marker sequence and the gene model sequence spanned at least 150 bp and exceeded 96% identity.
2.6. Comparative Genome Analysis
Syntenic relationships were explored for the consensus linkage map against two other teleost species for which an annotated genome assembly is available: medaka (Oryzias latipes)  and zebrafish (Danio rerio)  . Synteny with Tilapia (Oreochromis niloticus) was also explored, however, the structure of tilapia genome is less well defined precluding the comparison. Orthologs were assigned by reciprocal BLAST and predicted proteins from the annotated killifish gene models were compared to the zebrafish and medaka non-redundant protein databases (NCBI). Orthology was confirmed if the BLAST search of zebrafish/medaka genes against the killifish genome returned the original killifish gene model as the primary hit and exhibited an e-value < 1e−5. To validate and augment the list of orthologs, an additional query was performed by blasting the individual marker sequences against sequence data for D. rerio and O. latipes (Ensembl).
3.1. Novel Microsatellite Markers
Approximately 1200 clones were sequenced from the microsatellite-enriched libraries and surveyed for tandem repeats with sufficient flanking sequence for oligonucleotide design. One-third of the sequences contained di-, tri-, or tetra-nucleotide repeats that met these criteria. Of these, 330 candidate loci were selected for primer design and evaluated for amplification success and polymorphism in the parental generation used to establish genetic crosses. In the parental generation of the three genetic crosses, 59, 65, and 94 loci, respectively, proved polymorphic and Informative. A X2 (chi-square) goodness-of-fit test for deviation from expected 1:2:1 Mendelian ratios identified a small proportion of markers showing segregation distortion (Supplementary Data, Table 1(a)) in the 285 F2 larvae genotyped (following Bonferonni correction, α = 0.00015). Moderate departures from these frequencies are not unusual and may indicate the presence of partially lethal alleles. Gross departures from these frequencies may indicate problematic markers or null alleles  .
3.2. Linkage Mapping
Linkage maps were constructed for each of the three mapping families. A total of 75, 88, and 105 loci clustered into 16, 21, and 23 linkage groups (Lg), respectively (Supplementary Data, Table 1(b)). Genetic loci incommon among the three families were used to anchor and join the individual maps into one integrated consensus map (Figure 1). At a threshold of LOD ≥ 3.0 and a recombination frequency ≤ 0.5, twenty-four linkage groups were identified. Together, the 24 linkage groups contain 240 loci covering 2300 cM total. Individual linkage groups range from 35 to 195 cM with an average marker resolution of 9.5 cM.
Overall, the integrated consensus map is in good agreement with the maps based on individual genetic crosses. Marker order is conserved between the consensus map and individual maps with few exceptions. Exceptions include the relative locations of Fhe_2179 and Fhe_495 in cross A, which are reversed relative to cross B, C, and the consensus map. This discrepancy is likely due to heterogeneity in recombination rates between source popula- tions and the extreme segregation distortion of Fhe_2179 (p = 2.27E−10). Other loci with inconsistent positioning due to segregation distortion in at least one of the three genetic crosses include: Fhe_2254 (p = 5.88E−19), Fhe_2277 (p = 7.31E−06), and Fhe_2027 (p = 2.50E−06). Inter-marker distances in the consensus map are
Figure 1. Consensus genetic linkage map for Fundulus heteroclitus. Detailed information for the microsatellite and SNP markers (gene, DNA sequence, map position, source publication) used to construct linkage maps can be found in Supplementary Data, Table 1. All novel microsatellite loci reported herein can be accessed at GenBank at the NCBI (GenBank: KT001557-KT001910).
also consistent (±5 cM) with individual based maps barring three exceptions: xTC15606_316 and xTC19271_848 (Lg-9 consensus, Lg-7 genetic cross B) are separated by 35.6 cM on the consensus map and 15.5 cM on the individual map; Fhe_2151 and Fhe_2133 (Lg-19 consensus, Lg-13 genetic cross C) are separated by 84 cM on the individual map and 49.6 cM on the consensus map; and xTC21107_340 and Fhe_2069 (Lg-5 consensus, Lg-9 genetic cross C) are separated by 20.5 cM on the consensus map and 13.8 cM on the individual map.
3.3. Comparative Genome Analysis
Of the 240 loci comprising the consensus linkage map, 181 loci fell within a gene model in the draft Fundulus genome (Supplementary Data, Table 1(a)) and were assigned identities. Four additional loci were named based on significant hits to the zebrafish and medaka genome assemblies. In order to perform a comparative genome analysis, zebrafish and medaka orthologs and their genomic coordinates were identified (NCBI; non- redundant protein database) based on the predicted protein sequences of the Fundulus gene models. The identified homologous genes were used to determine homologous chromosomes in medaka and zebrafish (Figure 2). Comparing killifish and medaka, 10 killifish linkage groups (2, 20, 10, 9, 6, 11, 21, 3, 16, 1) exhibit 1:1 relationships with medaka chromosomes and for 7 other linkage groups (5, 19, 7, 4, 17, 15, 8, 22), >75% of killifish loci fall on a single medaka chromosome with only a single locus on a second medaka chromosome. For LG4, 7 loci fall on medaka CHR10, 2 on CHR18, and one on CHR21. Only one medaka chromosome (CHR2) does not share homologous loci with the consensus Fundulus linkage map. A comparative coordinate analysis of loci on the Fundulus linkage map and medaka genome assembly reveal conservation of gene order for approximately 70% of shared loci as well as multiple rearrangements (Figure 3).
Figure 2. Homologous linkage map-to-chromosomal relationships between F. heteroclitus and medaka (a) or zebrafish (b). In each case, the F. heteroclitus linkage groups (LG) are displayed in rows and the zebrafish/medaka chromosomes are displayed in columns. The numbers in each cell represent number of homologous loci between the F. heteroclitus consensus linkage map and the other model fish species genomes.
Figure 3. Comparison of gene order between the F. heteroclitus linkage map and the medaka or zebrafish genomes. Map distances (cM) and genome coordinates (Mb) are the accumulated sum of the F. heteroclitus linkage groups (Lg1-Lg24) and additive physical length for medaka chromosomes 1-24 (a) and zebrafish chromosomes 1-25 (b). Each data point represents a gene and its Cartesian coordinates. An abrupt change in coordinates between adjacent genes indicates a rearrangement.
Homologous chromosomes in zebrafish are less apparent. Three killifish linkage groups (12, 8, 22) exhibit 1:1 relationships with zebrafish chromosomes and for 7 other linkage groups (2, 9, 15, 4, 7, 16, 10), >65% of killifish loci fall on a single medaka chromosome with multiple loci located on up to three additional zebrafish chromosomes (Figure 2). Overall, at least 22 of 25 zebrafish chromosomes contain loci from between two and five Fundulus linkage groups. A comparative coordinate analysis of loci of the Fundulus linkage map and zebrafish genome assembly reveals a higher proportion of rearrangements in gene order (Figure 3) than observed when comparing to Medaka.
The rapid advancement of molecular-genetic technologies has made genome-wide quantitative genetic and population genetic surveys readily assessable. Molecular markers distributed across the genome can be used to assess critical ecological and evolutionary issues such as resilience, local adaptation, population connectivity, and invasion. In this study we report several novel co-dominant microsatellite markers which we have leveraged in combination with many previously published molecular markers to construct the first genetic linkage map for the small estuarine fish model, Fundulus heteroclitus. By integrating independent linkage maps from three F2- intercrosses, we developed a consensus map containing twenty-four linkage groups, consistent with the number of chromosomes (2N = 48) reported for F. heteroclitus   . Comparing the individual and consensus linkage maps, there are minor inconsistencies in relative marker position in approximately 4% of loci. The recombinant larvae genotyped from each family to create the linkage maps are descendant from parental generations established by crossing males and females from two different populations. Since the recombination histories of different populations may vary, slight inconsistencies in recombination frequencies when comparing the maps are expected  . This is not an uncommon phenomena   and does not compromise the utility of a composite map   . Differential recombination rates between sexes are also common, with females often having more recombination and longer genetic maps than males  . However, the recombinant individuals were not sexually mature when genotyped precluding us from ascertaining male-female linkage variance among loci.
Overall, the 24 Fundulus linkage groups represent 2300 centimorgans (cM) of recombinant genomic space, intermediate in size relative to the current linkage maps for medaka (1401.5 cM)  and zebrafish (3192 cM)  . A large degree of synteny is observed between the consensus killifish linkage map and the medaka or zebrafish physical genome assemblies with the highest degree of syntenty between medaka and Fundulus (Figures 2-4).
The higher degree of synteny between medaka and Fundulus genome organization is consistent with their taxonomic classifications. All three teleost species are euteleostein with Fundulus and medaka containing 24 chromosomes (2N = 48) and representing the subdivision neoteleostei versus zebrafish with 25 chromosomes (2N = 50) in the subdivision ostariophysi. Several comparative phylogenetic  and phylogenomic   assessments of teleost fish support this taxonomic structure and relative evolutionary distances. Earlier theories explaining genome evolution suggest synteny is an artifact of ancestral linkages that have not yet been disrupted by random micro- and macro-rearrangements such as indels, inversions and translocations  . It has since become apparent that gene distribution in eukaryotic genomes is not always random  . For example, co-regu- lated genes and their regulatory elements often exhibit strict conservation of both gene order and linkage  -  .
Teleost ﬁsh species provide an outstanding model to study evolution at the genetic and genomic levels. Their genomes exhibit considerable plasticity in size, chromosome number   , and in the expression of phenotypic traits     . An extensive molecular tool-box exists for zebrafish and the Japanese medaka including annotated genome assemblies, genetic maps   , and commercially available microarray technologies. In both species, gene function can be studied routinely in vivo via trans-genesis, “morpholino” gene- knockdowns, and large-scale mutagenesis   . Like zebrafish and medaka, genetic and genomic resources for F. heteroclitus are emerging at rapid pace. The novel polymorphic microsatellite loci and genetic map reported herein will contribute to the growing F. heteroclitus molecular toolbox. For example, the microsatellites
Figure 4. An example of syntenic relationships between F. heteroclitus linkage groups and their respective homologous chromosomes in the zebrafish and medaka genomes.
have utility for population genetic studies and the genetic map can serve as a tool for quantitative trait analysis (QTL) of the adaptive variation observed in F. heteroclitus  by localizing chromosomal regions containing candidate genes contributing to phenotypic traits.
In conclusion, our genetic linkage map, in conjunction with other emerging genetic and genomic tools, will elevate the status of F. heteroclitus within the ranks of small fish genetic models.
We thank Marjorie Oleksiak for contributing single nucleotide polymorphism reagents for inclusion on the linkage map. We also thank Jeffery Markert for thoughtful comments which lead to the improvement of this manuscript. Data interpretation was aided by reference to a preliminary draft of the F. heteroclitus genome sequence which was supported by funding from the National Science Foundation (DEB-1120512). The views expressed in this article are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Supplementary Data Table 1(a) and Table 1(b) can be viewed online at: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/JNZKCJ