Research Article

Journal of Agricultural, Life and Environmental Sciences. 31 December 2025. 307-322
https://doi.org/10.22698/jales.20250026

ABSTRACT


MAIN

  • Introduction

  • Materials and Methods

  •   Plant Materials

  •   DNA extraction and genotyping data acquisition

  •   Marker selection methods

  •   Statistical Analysis

  • Results and Discussion

  • Summary

Introduction

Doubled haploid (DH) technology enables the rapid development of completely homozygous (2n) lines by generating haploids (n) through artificial crossings with a haploid-inducer line, followed by chromosome doubling (Prasanna et al., 2012). The first widely used inducer, Stock 6, exhibited a 2.9% haploid induction rate. Subsequent advances, including genome editing, led to the development of 24 improved inducer lines, including 19SN952452, which shows an induction rate exceeding 15% (Coe, 1959; Delzer et al., 2024; Ha et al., 2024). In South Korea, tropically adapted inducer lines (TAILs), a subtropical DH inducer have been introduced for maize variety development and applied to develop field, silage, and waxy maize lines (Ryu et al., 2022).

When integrated with genome-wide analysis, DH technology has become a powerful tool for breeding and selection. Large-scale SNP datasets enable detailed assessment of linkage disequilibrium (LD) patterns, haplotype structure, and overall genetic variation. However, several studies have reported reduced genetic variation and loss of variation in DH-derived populations (Strigens et al., 2013; Zeitler et al., 2020). Although using F2 populations can partially alleviate these bottlenecks, offering cumulative selection responses as high as 4%-6% (Bernardo, 2009), this does not necessarily translate into higher heritability or genetic gain (Showkath Babu et al., 2022). Thus, molecular-level evaluation of DH lines is essential within breeding programs. Despite this need, molecular genetic analyses of DH waxy maize lines are limited. This limitation restricts the development of core SNP marker sets critical for efficient variety management and genetic diversity assessment. Given the large volume of SNPs generated in modern genomic workflows, selecting a small, representative set of core SNP markers is essential for practical use in breeding, germplasm characterization, and resource management (Du et al., 2019; Fujii et al., 2013; Li et al., 2019; Varshney et al., 2008). Existing DH-based studies often examine isolated genetic metrics, such as LD decay, genetic distance, or population structure, without providing comprehensive whole-genome evaluations of genetic diversity, LD structure, and core marker selection strategies using large-scale SNP datasets. Furthermore, most previous studies have relied on field corn or experimental DH populations, limiting the relevance of findings to domestically bred Korean waxy maize DH resources.

Therefore, in this study, we aimed to overcome these limitations by characterizing LD extension and diversity reduction in DH-derived waxy maize lines. Furthermore, we sought to identify a practical, representative core SNP marker set from genome-wide SNP data. To achieve this, we quantified population genetic characteristics of DH lines, compared three marker selection algorithms, and evaluated their applicability in breeding programs.

Materials and Methods

Plant Materials

A total of 155 waxy maize lines were used in this study, including 26 elite lines used for variety development (Ilyas et al., 2023; Park et al., 2009; Sa et al., 2010), 18 inbred lines derived from selfed full-sib heterotic group populations (Park et al., 2016), and 111 lines developed through DH technology (Table 1). To induce DH lines, the hybrid Mibaek 2 (Park et al., 2007) and Gangwonchal 39 were used in 2016. In 2017, segregating generations derived from the half-sib heterotic group served as breeding material. In 2018, hybrid combinations classified as resistant or susceptible based on inoculation with Fusarium subglutinans and F. temperatum, the causal agents of corn stalk rot, were used to generate additional DH lines.

Table 1

Summary of the 155 waxy maize lines used for SNP selection

Group Number of lines Source
Elite lines 10 Quality lines
16 HW lines, KL103
Inbred 2 Purple and black inbreds
8 WaxyPop 09 (FS) A
8 WaxyPop 09 (FS) B
16DHW 6 Gangwonchal#39
23 Mibaek#2
17DHW 30 WaxyPop (HS) A
28 WaxyPop (HS) B
18DHW 24 Stalk rot population (HW9/11)
Total 155

DNA extraction and genotyping data acquisition

Young leaf tissue collected from pot-grown plants was freeze-dried (IlShinBioBase, South Korea), homogenized using a Tissue Lyser II (Qiagen, Germany), and used for genomic DNA extraction using the QIA Symphony SP platform with the QIA Symphony DSP DNA Mini Kit (Qiagen) according to the manufacturer’s instructions. Genotyping was performed using the Illumina MaizeSNP50K BeadChip (Illumina, USA). After manufacturer-recommended filtering and removal of SNPs not mapped to waxy maize chromosomes, 49,037 SNPs were obtained. Further filtering excluded markers with ≥ 10% missing data, monomorphic loci, and SNPs with ≥ 20% heterozygosity, resulting in a final dataset of 21,981 SNPs for downstream analysis.

Marker selection methods

To select core SNP markers suitable for waxy maize, three previously reported marker selection strategies were applied and compared. First, the index-based method was implemented using the Shannon entropy formula (Dou et al., 2023).

H=-i=1Npi×ln(pi)

In this formula, n denotes the number of alleles or genotypes, and pi represents the frequency of the i-th allele or genotype. Haplotypes were iteratively constructed from candidate SNPs, and the stepwise increase in the Shannon index (SI) was used to evaluate each newly added marker. At each iteration step, we added one remaining candidate marker to generate a new haplotype set and calculated the resulting increase in the Shannon index. The marker producing the greatest increase in SI was selected as the next core marker. If adding a candidate marker did not increase the number of distinguishable haplotypes, that SNP was excluded to improve algorithmic efficiency. The procedure terminated when all line pairs were distinguished or when no further increase in the index was observed. The second method, described by Wu et al. (2021), used random sampling of SNPs to calculate Euclidean genetic distances across all lines and identify subsets that maximized the minimum pairwise genetic distance (random sampling with genetic distance, RS). The number of markers (K) ranged from 10 to 100. For each k, 6.0 × 105 subsets were sampled, increasing to 1.0 × 107 times for k ≥ 20. If classification power did not improve after a predefined number of iterations, the algorithm moved to the next value of k. When no additional improvement was observed across increasing k, the subset with the best performance was selected. The third method employed a fitness function incorporating genetic distance (gd) and physical distance (pd) to ensure a uniform chromosomal distribution of markers (Rousselle et al., 2015). Genetic map positions from the IBM and LHRF populations in MaizeGDB were used, and only SNPs with both physical and genetic coordinates were retained. To avoid bias introduced by specific cross combinations, polymorphisms present only in HW3 and HW9 were excluded. The optimization process was iterated to identify the marker panel that consistently minimized the fitness value.

Statistical Analysis

Allele frequencies were estimated for each SNP after excluding missing data using q = 1 - p. Minor allele frequency (MAF) was defined as MAF = min(p,q), polymorphism information content as (PIC) = 2pq(1 - pq), and expected heterozygosity as He = 2pq. We calculated the mean number of alleles per group and identified group-specific alleles that were found only within each group. Population differentiation (Fst) was estimated using the Weir and Cockerham (1984) method implemented using the R package hierfstat (R Core Team, 2023). For phylogenetic tree analysis, a genotype matrix was constructed for each marker set. Inter-line genetic identity (I) was calculated and transformed to -ln(I) to obtain Nei’s genetic distance (Nei, 1972). Neighbor-Joining (NJ) trees were constructed from the resulting distance matrix using the ape package in R. To validate NJ results, we calculated the nearest-neighbor purity (NN-purity), defined as the proportion of tips whose nearest neighbor belonged to the same group (Degen et al., 2017). A null distribution was generated from 1,000 random permutations of group labels, from which the mean, standard deviation, and Monte Carlo p-value were estimated. For principal coordinate analysis (PCoA), Nei’s genetic distance matrix was was analyzed by eigenvalue decomposition using the ape package. The variance explained by positive eigenvalues was calculated, and the first two coordinates (PCoA1 and PCoA2) were used for visualization. When negative eigenvalues were significant, Cailliez/Lingoes correction was applied.

LD was calculated for all SNP pairs within each chromosome up to 5 Mb. To reduce computational load for visualization, distance-based bin subsampling was applied (10 kb bins, up to 300 pairs per bin). The SNP matrix was processed in blocks of 512 SNPs, and pairwise r2 values were calculated as the squared Pearson correlation coefficient, using only SNP pairs with at least 12 valid samples. Haplotype frequencies were estimated using the expectation-maximization (EM) algorithm, and statistical significance was assessed with Fisher’s exact test followed by Benjamini–Hochberg correction. LD decay was visualized by plotting median r2 values within 50-kb physical distance bins. Mean LD distance (r2 ≥ 0.10) was summarized in heatmaps across groups and chromosomes. SNP-specific MAF values were matched to LD values to compare LD decay under increasing MAF thresholds (≥ 0.00, 0.05, 0.10, and 0.20) in the 18DHW group. Group-wide LD distributions were visualized using violin plots to compare median and variance. All numerical statistics (median r2 and mean LD distance) were derived from full-matrix calculations, whereas the figures present subsampled summaries for visualization (Benjamini and Hochberg, 1995; Pe’er et al., 2006; Remington et al., 2001; Stephens et al., 2001; VanLiere and Rosenberg, 2008). All data analyses were conducted using R.

Results and Discussion

The SNP markers used in this study ranged from 1,485 to 3,485 per chromosome after filtering. Chromosome 1 contained the highest number of markers (3,485), whereas chromosome 10 contained the lowest (1,485). The physical distance between adjacent markers ranged from 86 to 100 kb, with an average of 94 kb. Mean heterozygosity across chromosomes was 1.4%, with chromosome 4 showing the highest value (2.0%) and chromosome 6 the lowest (1.0%) (Table 2).

Table 2

Chromosome-wide SNP filtering and genomic statistics in waxy maize lines

Chromosome Number of
raw SNPs
Number of
filtered SNPs
Mean distance of
filtered SNPs (bp)
Heterozygosity rate
(%)
1 7,783 3,485 86,397 1.3
2 5,667 2,497 95,216 1.6
3 5,525 2,509 92,387 1.2
4 5,411 2,428 99,612 2.0
5 5,349 2,500 87,096 1.1
6 3,935 1,757 96,272 1.0
7 4,062 1,890 93,183 1.4
8 4,231 1,864 94,053 1.6
9 3,598 1,566 99,923 1.4
10 3,476 1,485 100,391 1.2
Overall1) 49,037 21,981 94,453 1.4

1)Overall: Raw and filtered SNP counts represent totals across chromosomes. Mean SNP distance and heterozygosity rate represent averages across chromosomes.

Among genetic diversity metrics, elite lines showed the highest number of group-specific alleles (1,432) and the greatest expected heterozygosity (He = 0.315) (Table 3). High He values were also present in the heterotic full-sib group (0.268) and the 17DHW group (0.273). In contrast, the 18DHW population, derived from crosses between stalk rot-resistant and susceptible lines, exhibited the lowest values across all diversity metrics except for the count of specific alleles.

Table 3

Genetic diversity metrics across groups. Values represent locus-wise means for MAF, PIC, He, and the number of group-specific alleles

Group MAF PIC He Specific alleles
Elite line 0.234 0.254 0.315 1,432
Inbred 0.202 0.214 0.268 51
16DHW 0.160 0.174 0.216 97
17DHW 0.204 0.219 0.273 91
18DHW 0.152 0.141 0.182 112

Abbreviations: MAF: minor allele frequency; PIC: polymorphism information content; He: expected heterozygosity. Specific alleles refer to alleles observed exclusively within each group.

Phylogenetic analysis based on Nei’s genetic distance revealed that most lines formed well-defined, continuous clusters (Fig. 1). Elite lines were broadly separated into two groups, whereas the inbred lines were more widely dispersed and generally positioned between the DH groups. Some elite and inbred lines clustered very closely on the tree and were presumed to have originated from a common source during line development. The nearest-neighbor purity (NN-purity) was 86.5%, significantly higher than the permutation-based 23.5% ± 3.9% (p < 0.001). This strongly supports the presence of a well-defined and robust group structure.

https://cdn.apub.kr/journalsite/sites/ales/2025-037-04/N0250370404/images/ales_37_04_04_F1.jpg
Fig. 1.

Neighbor-joining tree of 155 inbred lines based on Nei’s genetic distance. The nearest-neighbor purity was 86.5%, substantially exceeding the permutation baseline (23.5% ± 3.9%; 1,000 label permutations), indicating strong, non-random clustering consistent with group structure.

The elite lines and inbred groups showed rapid LD decay, whereas the 18DHW group maintained r2 values around 0.5 even at distances up to 5 Mb, exhibiting a pronounced long-range LD pattern (Fig. 2a). To examine whether allele-frequency differences contributed to this extended LD in the 18DHW population, we evaluated LD estimates under increasing MAF thresholds (Fig. 2b). Group-wide LD distributions were also summarized to compare LD patterns among groups (Fig. 2c). To confirm these patterns were not artifacts of IBD or LD dependence, identity-by-descent (IBD) was estimated using an EM-based maximum likelihood method with an LD-pruned SNP set (r2 ≤ 0.20). The maximum estimated IBD value (π) reached 0.997. PCA coordinates before and after LD pruning showed strong concordance (Procrustes t0 = 0.971), indicating that the inferred population structure is highly robust despite elevated pairwise IBD and LD-related dependence. Even when the MAF threshold was increased in the 18DHW group, the overall LD decay curve remained largely unchanged. The violin plot revealed consistently high r2 values across the distribution, indicating that many SNP pairs exhibit strong LD. These results suggest that the long-distance LD observed in this population cannot be attributed solely to statistical fluctuations arising from rare variants but is instead consistent with underlying genetic features of the group (Flint-Garcia et al., 2003; Remington et al., 2001). More detailed analyses, such as removing identical sequences, accounting more explicitly modeling relatedness using IBD estimates, or clumping physically adjacent SNPs, may be necessary for deeper investigation of this extended LD architecture.

https://cdn.apub.kr/journalsite/sites/ales/2025-037-04/N0250370404/images/ales_37_04_04_F2.jpg
Fig. 2.

Linkage disequilibrium (LD) patterns across groups. (a) LD decay curves showing the median pairwise r2 within 50 kb physical distance bins for each group; the horizontal dashed line indicates r2 = 0.10 as a reference threshold. (b) LD decay in the 18DHW group stratified by minor allele frequency (MAF) thresholds (MAF ≥ 0.00, 0.05, 0.10, 0.20), illustrating the expected increase in r2 under stricter MAF cutoffs. (c) Violin plots depicting the distribution of pairwise r2 values in each group; embedded boxplots indicate the median (central line) and interquartile range (box).

Analysis of the mean LD distance per chromosome revealed that the elite and inbred groups exhibited short LD distances ranging from 2,200 to 2,298 kb. However, the 18DHW group showed the largest mean LD distance, reaching 2,438 kb on chromosome 1 (Fig. 3). Hao et al. (2015) reported LD lengths of 1,500-2,000 kb in a Chinese waxy maize population, which is comparable to the results observed in our study. Previous studies have also shown LD expansion in haploid populations derived from wild-type maize, where genetic diversity was markedly reduced due to allele loss at both nucleotide and haplotype levels (Zeitler et al., 2020). These findings are consistent with our results and suggest that the use of DH technology may contribute to LD expansion, underscoring a need for improvement in breeding programs that rely heavily on DH lines. Although the 18DHW group maintained clear long-distance LD in the decay curve, the mean LD distance measured only approximately 2,395 kb. This discrepancy reflects differences in the statistical properties of LD metrics and mirrors patterns reported in plant-population studies (Brazier and Glémin, 2022). Although the mean LD distance is relatively low due to the abundance of short-range LD pairs (Remington et al., 2001), strong LD was still maintained across certain long-distance genomic regions (Epstein et al., 2024; Palaisa et al., 2004). This phenomenon can be explained by reduced effective population size, inbreeding, or selection acting on genomic segments with low recombination rates (Bukowski et al., 2018). Therefore, selecting an optimized marker set requires strategies that reduce SNP redundancy caused by long-range LD while preserving marker information content (Carlson et al., 2004; Takeuchi et al., 2005) and minimizing the total number of markers required (Paschou et al., 2010; Rosenberg et al., 2003).

https://cdn.apub.kr/journalsite/sites/ales/2025-037-04/N0250370404/images/ales_37_04_04_F3.jpg
Fig. 3.

Heat map of mean linkage disequilibrium (LD) distance (kb) by group and chromosome. Each tile represents the average physical distance between SNP pairs with r2 ≥ 0.10. Numeric labels within tiles indicate the corresponding mean LD distance (kb).

To characterize differences among groups, we assessed population differentiation and genetic distance using Nei’s distance (Table 4). 18DHW and 16DHW showed the greatest genetic distance (Nei’s distance = 0.285), whereas the 17DHW-inbred group showed the lowest Fst and genetic distance (both 0.009), reflecting the shared breeding line used in their development. Although the 18DHW group exhibited low within-group genetic diversity (Table 3), the high between-group genetic distances and Fst values suggest that genetic differentiation may have increased in response to selection for disease resistance. This pattern aligns with previous reports showing that repeated selection within a group decreases internal diversity while increasing differentiation from other groups (Ledesma et al., 2023). The elite group exhibited the highest within-group genetic diversity (Table 3) but showed intermediate levels of differentiation from other groups. This pattern is reminiscent of U.S. commercial maize breeding, where a limited set of founder lines was repeatedly used to develop improved varieties. As a result, founder haplotypes occur at high frequency across derived germplasm (Coffman et al., 2020; Van Heerwaarden et al., 2012). Collectively, these findings indicate that increasing genetic diversity should be a priority in breeding programs. For DH populations under strong selection pressure, incorporating external founder lines or different LD blocks will be essential to maintain diversity and ensure long-term breeding progress.

Table 4

Pairwise genetic differentiation (Fst) and genetic distance among elite lines, inbreds, and DH groups

Group Elite line Inbred 16DHW 17DHW 18DHW
Elite line - 0.036 0.138 0.050 0.185
Inbred 0.037 - 0.171 0.009 0.283
16DHW 0.136 0.166 - 0.180 0.283
17DHW 0.047 0.009 0.186 - 0.264
18DHW 0.187 0.277 0.285 0.280 -

Values above the diagonal are pairwise Fst estimates; values below the diagonal are Nei’s genetic distances

This intergroup differentiation pattern suggests that core marker selection should prioritize group discrimination and genome-wide coverage over simple diversity metrics. Based on these molecular analyses, we applied three marker-selection methods—the Shannon index (SI), random sampling with genetic distance (RS), and map-based balanced (bin) approaches—to derive highly informative SNP panels.

Ten markers were selected through the SI-based screening, enabling clear discrimination of haplotypes across 155 lines (Fig. 4a). The RS method achieved a maximum classification ability of 99.98% (Fig. 4b). This limitation occurred because three line pairs (18DHW025-18DHW026, 16DHW01-16DHW04, and 16DHW25-16DHW29) were identical across all non-missing loci, and another pair (16DHW25–16DHW29) differed at only one SNP.

https://cdn.apub.kr/journalsite/sites/ales/2025-037-04/N0250370404/images/ales_37_04_04_F4.jpg
Fig. 4.

Discrimination saturation curves as a function of the number of SNPs (k). The y-axis shows the proportion of pairwise sample comparisons that are distinguishable. (a) Shannon index (SI) selection evaluated by haplotype separation. (b) Random sampling with genetic distance (RS) selection evaluated using Euclidean genetic distance. The dashed line indicates the maximum discrimination achieved for each curve (SI: 100.0%; RS: 99.98%).

We visualized the physical distribution of markers selected across all 10 chromosomes for the SI, RS, and bin-based methods (Fig. 5). The small number of SNPs selected by SI and RS methods resulted in coverage of only a subset of chromosomes, whereas the bin-based selection produced broad chromosomal coverage as the number of markers increased. These results demonstrate that the bin-based method has a particular advantage for achieving whole-genome representativeness, while also highlighting that each selection method carries distinct strengths and weaknesses in classification performance. Notably, we observed overlapping marker selection from all three methods on chromosome 1. This biased distribution suggests chromosome-specific structural features, such as recombination rate variation, gene density, or repetitive elements. Further investigation of these genomic characteristics could help explain the uneven distribution and clustering of selected markers. Overall, our results suggest that core SNP marker selection should balance strong classification performance with uniform genomic coverage to construct an informative and representative marker panel.

https://cdn.apub.kr/journalsite/sites/ales/2025-037-04/N0250370404/images/ales_37_04_04_F5.jpg
Fig. 5.

Chromosomal distribution of SNPs selected using each marker selection method: SI (n = 10), RS (n = 21), 15bin (n = 30), 20bin (n = 39), and 25bin (n = 47). Gray ticks indicate the filtered whole-genome SNP background; colored bars correspond to the SNPs selected using each method as shown in the legend.

To evaluate the selection efficiency of the marker panels, we calculated standard genetic diversity metrics (Table 5). The SI-based approach generated the smallest panel (10 markers) but achieved the highest performance across all metrics—MAF, PIC, and expected heterozygosity—because it prioritizes high-MAF markers through entropy-based selection. The 21-marker RS panel showed intermediate levels across all three diversity metrics. In contrast, the bin-based panels, which select markers solely based on genetic and physical distance, displayed lower diversity relative to the full filtered SNP set. The 20bin panel (39 markers) had the lowest values across all metrics, suggesting that distance-based selection inadvertently included SNPs with overlapping LD or rare alleles.

Table 5

Summary statistics for SNP marker sets obtained using each selection strategy. Values represent the number of selected markers and mean diversity indices (MAF, PIC, and expected heterozygosity (He)).

Method1) Number of markers MAF PIC He
Filtered 21,981 0.221 0.240 0.299
SI 10 0.374 0.329 0.434
RS 21 0.254 0.275 0.343
15,000 30 0.176 0.209 0.253
20,000 39 0.148 0.187 0.223
25bin 47 0.208 0.245 0.299

1)Methods: SI: Shannon index-based selection; RS: random sampling with genetic distance maximization; 15bin/20bin/25bin: genome-wide binning by genetic/physical coordinates targeting 15, 20, or 25 bins. Abbreviations: MAF: minor allele frequency; PIC: polymorphism information content; He: expected heterozygosity.

We examined phylogenetic separation using NJ trees constructed from Nei’s genetic distance (Fig. 6). The SI panel provided good individual-level resolution but showed limited discrimination among groups, with frequent intermixing of elite and DH lines. Bin-based marker sets also enabled clear group-level differentiation. DH lines formed distinct clusters under both the RS and bin marker panels, reflecting their differing genetic backgrounds. In contrast, elite lines were distributed across diverse positions within the inbred and DH clusters, indicating their broad use across breeding programs and substantial genomic contribution, likely due to shared founders or repeated use in line development. These results indicate that a marker panel of appropriate size can effectively capture population structure and classify lines within breeding programs.

https://cdn.apub.kr/journalsite/sites/ales/2025-037-04/N0250370404/images/ales_37_04_04_F6.jpg
Fig. 6.

Neighbor-joining trees of 155 inbred lines based on pairwise Nei’s genetic distances computed using SNP panels selected using each method: SI (n = 10), RS (n = 21), 15bin (n = 30), 20bin (n = 39), and 25bin (n = 47).

Additionally, we evaluated the population structure of the 155 lines using PCoA based on each selected SNP panel (Fig. 7). Analysis using the filtered whole-genome markers revealed clear separation among groups and among lines within groups, indicating effective capture of intergroup genetic variation. In contrast, the SI-based panel did not achieve clear separation of breeding groups. Although SI markers were highly polymorphic, they did not maximize between-group contrast, resulting in weaker cluster differentiation. The RS panel, despite being composed of randomly sampled markers, achieved better group separation than SI because it captured intergroup variation. Among the bin-based panels, the 25bin panel produced more distinct clustering and clearer group separation than the 20bin panel, while the 15bin panel exhibited only limited group-level resolution. The 20bin panel showed the lowest classification ability, with most samples clustering near the origin. These results indicate that increasing the number of markers does not guarantee improved PCoA resolution in bin-based methods. In contrast, the 25bin panel provided improved separation power within the bin-based framework. Because DH lines exhibit low expected heterozygosity and long-range LD, many SNPs provide redundant information. Therefore, selecting uncorrelated markers distributed across the genome is essential for accurately capturing relative genetic distances among lines using a small marker set. Furthermore, SNPs selected solely based on diversity do not always provide strong discriminative power (Nei, 1972; Rosenberg et al., 2003; Sodedji et al., 2021). Previous studies have shown that a small number of highly informative SNPs, identified through measures such as PCA loadings or Fst, can effectively replicate the population structure captured by large genome-wide SNP datasets (Flint-Garcia et al., 2003). Future marker selection strategies should incorporate these considerations.

https://cdn.apub.kr/journalsite/sites/ales/2025-037-04/N0250370404/images/ales_37_04_04_F7.jpg
Fig. 7.

Principal coordinate analysis (PCoA) of Nei’s genetic distances among individuals using SNP panels generated using each selection method. Points are colored by group. Axis labels indicate the proportion of genetic distance variance explained by the first two coordinates for each panel.

Summary

The application of DH technology in waxy maize breeding enables rapid development of fully homozygous lines but also reduces genetic diversity and increases LD extent. To improve the efficiency of DH-based breeding programs, we evaluated molecular characteristics across groups with different breeding histories and assessed phylogenetic structure and genetic diversity. Core marker selection was performed using three complementary algorithms. Although SI markers were highly diverse, they provided insufficient group-level discrimination, limiting their classification utility. RS markers showed superior discriminatory power by optimizing genetic distance. Although the bin-based method offered lower classification power, it provided balanced genome-wide representation. Overall, the 21-marker RS panel effectively captured both line-level classification and population structure. When combined with markers that enhance chromosomal coverage and diversity, this panel can improve the accuracy and efficiency of routine genotyping in breeding programs. We plan to further evaluate the classification performance of the selected markers using large-scale genotyping platforms including applications to DH lines and lines derived from segregating generations. This core SNP panel provides a ready-to-use tool for quality control and resource management, such as line identification, variety protection, core population management, and seed purity testing. Ultimately, these tools will support the establishment of a core waxy maize germplasm set and contribute to the development of a next-generation waxy maize breeding platform.

Acknowledgements

This research was supported by the Local Specialty Crop Technology Development Project (RS-2024-00437799) of the Rural Development Administration.

References

1

Benjamini, Y., Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 57:289-300.

10.1111/j.2517-6161.1995.tb02031.x
2

Bernardo, R. (2009) Should maize doubled haploids be induced among F1 or F2 plants?. Theor Appl Genet 119:255-262.

10.1007/s00122-009-1034-1
3

Brazier, T., Glémin, S. (2022) Diversity and determinants of recombination landscapes in flowering plants. PLoS Genet 18:e1010141.

10.1371/journal.pgen.101014136040927PMC9467342
4

Bukowski, R., Guo, X., Lu, Y., Zou, C., He, B., Rong, Z., Wang, B., Xu, D., Yang, B., Xie, C., Fan, L., Gao, S., Xu, X., Zhang, G., Li, Y., Jiao, Y., Doebley, J. F., Ross-Ibarra, J., Lorant, A., Xu, Y. (2018) Construction of the third-generation Zea mays haplotype map. GigaScience 7:1-12.

10.1093/gigascience/gix13429300887PMC5890452
5

Carlson, C. S., Eberle, M. A., Rieder, M. J., Yi, Q., Kruglyak, L., Nickerson, D. A. (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74:106-120.

10.1086/38100014681826PMC1181897
6

Coe, E. H. Jr. (1959) A line of maize with high haploid frequency. Am Nat 93:381-382.

10.1086/282098
7

Coffman, S. M., Hufford, M. B., Andorf, C. M., Lübberstedt, T. (2020) Haplotype structure in commercial maize breeding programs in relation to key founder lines. Theor Appl Genet 133:547-561.

10.1007/s00122-019-03486-y
8

Degen, B., Blanc-Jolivet, C., Stierand, K., Gillet, E. (2017) A nearest neighbor approach based on genetic distance for assigning individual trees to geographic origin. Forensic Sci Int Genet 27:132-141.

10.1016/j.fsigen.2016.12.011
9

Delzer, B., Liang, D., Szwerdszarf, D., Rodriguez, I., Mardones, G., Elumalai, S., Johnson, F., Nalapalli, S., Egger, R., Burch, E., Meier, K., Wei, J., Zhang, X., Gui, H., Jin, H., Guo, H., Yu, K., Liu, Y., Breitinger, B., Kelliher, T. (2024) Elite, transformable haploid inducers in maize. Crop J 12:314-319.

10.1016/j.cj.2023.10.016
10

Dou, T., Wang, C., Ma, Y., Chen, Z., Zhang, J., Guo, G. (2023) CoreSNP: an efficient pipeline for core marker profile selection from genome-wide SNP datasets in crops. BMC Plant Biol 23:580.

10.1186/s12870-023-04609-w37986037PMC10662547
11

Du, H., Yang, J., Chen, B., Zhang, X., Zhang, J., Yang, K., Geng, S., Wen, C. (2019) Target sequencing reveals genetic diversity, population structure, core-SNP markers, and fruit shape-associated loci in pepper varieties. BMC Plant Biol 19:1-16.

10.1186/s12870-019-2122-231870303PMC6929450
12

Epstein, R., Wheeler, J., Hubisz, M., Sun, Q., Bukowski, R., Zhai, J., Lai, W.-Y., Buckler, E., Pawlowski, W. P. (2024) The maize recombination landscape evolved during domestication. BioRxiv, November 4, 2024, 621928

10.1101/2024.11.04.621928
13

Flint-Garcia, S. A., Thornsberry, J. M., Edward IV, S. B. (2003) Structure of linkage disequilibrium in plants. Annu Rev Plant Biol 54:357-374.

10.1146/annurev.arplant.54.031902.134907
14

Fujii, H., Ogata, T., Shimada, T., Endo, T., Iketani, H., Shimizu, T., Yamamoto, T., Omura, M. (2013) Minimal marker: an algorithm and computer program for identifying minimal sets of discriminating DNA markers for efficient variety identification. Am J Bioinform Comput Biol 11:1250022.

10.1142/S0219720012500229
15

Ha, V. G., Moon, H., So, Y.-S. (2024) Maize doubled haploid technology: a new breeding paradigm. Korean J. Breed. Sci 56:471-489.

10.9787/KJBS.2024.56.4.471
16

Hao, D., Zhang, Z., Cheng, Y., Chen, G., Lu, H., Mao, Y., Shi, M., Huang, X., Zhou, G., Xue, L. (2015) Identification of genetic differentiation between waxy and common maize by SNP genotyping. PLoS ONE 10:e0142585.

10.1371/journal.pone.014258526566240PMC4643885
17

Ilyas, M. Z., Park, H., Jang, S. J., Cho, J., Sa, K. J., Lee, J. K. (2023) Association mapping for evaluation of population structure, genetic diversity, and physiochemical traits in drought-stressed maize germplasm using SSR markers. Plants 12:4092.

10.3390/plants1224409238140419PMC10747078
18

Ledesma, A., Ribeiro, F. A. S., Uberti, A., Edwards, J., Hearne, S., Frei, U., Lübberstedt, T. (2023) Molecular characterization of doubled haploid lines derived from different cycles of the Iowa Stiff Stalk Synthetic (BSSS) maize population. Front Plant Sci 14:1226072.

10.3389/fpls.2023.122607237600186PMC10433169
19

Li, P., Su, T., Yu, S., Wang, H., Wang, W., Yu, Y., Zhang, D., Zhao, X., Wen, C., Zhang, F. (2019) Identification and development of a core set of informative genic SNP markers for assessing genetic diversity in Chinese cabbage. Hortic Environ Biotechnol 60:411-425.

10.1007/s13580-019-00138-4
20

Nei, M. (1972) Genetic distance between populations. Am Nat 106:283-292.

10.1086/282771
21

Palaisa, K., Morgante, M., Tingey, S., Rafalski, A. (2004) Long-range patterns of diversity and linkage disequilibrium surrounding the maize Y1 gene are indicative of an asymmetric selective sweep. Proc Natl Acad Sci U S A 101:9885-9890.

10.1073/pnas.030783910115161968PMC470768
22

Park, J.-S., Sa, K.-J., Park, K. J., Jang, J.-S., Lee, J. K. (2009) Genetic variation of parental inbred lines for korean waxy corn hybrid varieties revealed by SSR markers. Korean J. Breed. Sci. 41:106-114.

23

Park, K.-J., Park, J.-Y., Ryu, S.-H., Goh, B.-D., Seo, J.-S., Min, H.-K., Jung, T.-W., Huh, C.-S., Ryu, I.-M. (2007) A new waxy corn hybrid cultivar, “Mibaek 2,” with good eating quality and lodging resistance. Korean J. Breed. Sci. 39:108-109.

24

Park, K.-J., Park, J.-Y., Seo, Y.-H., Ryu, S.-H., Choi, J.-K., Kim, H.-Y. (2016). Anthocyanin-rich purple waxy corn single cross hybrid ‘Cheongchunchal.’. Korean J. Breed. Sci. 48:541-546.

10.9787/KJBS.2016.48.4.541
25

Paschou, P., Lewis, J., Javed, A., Drineas, P. (2010) Ancestry informative markers for fine-scale individual assignment to worldwide populations. J Med Genet 47:835-847.

10.1136/jmg.2010.078212
26

Pe’er, I., Chretien, Y. R., De Bakker, P. I. W., Barrett, J. C., Daly, M. J., Altshuler, D. M. (2006) Biases and reconciliation in estimates of linkage disequilibrium in the human genome. Am J Hum Genet 78:588.

10.1086/50280316532390PMC1424697
27

Prasanna, B. M., Chaikam, V., Mahuku, G. (2012) Doubled haploid technology in maize breeding: Theory and practice. CIMMYT, Mexico D.F., Mexico.

28

R Core Team (2023) R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org

29

Remington, D. L., Thornsberry, J. M., Matsuoka, Y., Wilson, L. M., Whitt, S. R., Doebley, J., Kresovich, S., Goodman, M. M., Buckler IV, E. S. (2001) Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc Natl Acad Sci USA 98:11479-11484.

10.1073/pnas.20139439811562485PMC58755
30

Rosenberg, N. A., Li, L. M., Ward, R., Pritchard, J. K. (2003) Informativeness of genetic markers for ancestry inference. Am J Hum Genet 73:1402-1422.

10.1086/38041614631557PMC1180403
31

Rousselle, Y., Jones, E., Charcosset, A., Moreau, P., Robbins, K., Stich, B., Knaak, C., Flament, P., Karaman, Z., Martinant, J. P., Fourneau, M., Taillardat, A., Romestant, M., Tabel, C., Bertran, J., Ranc, N., Lespinasse, D., Blanchard, P., Kahler, A., Smith, S. (2015) Study on essential derivation in maize: III. selection and evaluation of a panel of single nucleotide polymorphism loci for use in European and north American germplasm. Crop Sci 55:1170-1180.

10.2135/cropsci2014.09.0627
32

Ryu, S. H., Choi, J. K., Kim, M. J., Han, J. H., Wang, S. H., Kim, H. Y., Kim, K. S., Namgung, M., Park, J. Y., Park, K. J. (2022) Introduction of doubled haploid technology and maize inbred line development. J Agri Life Environ Sci 34:248-256.

10.22698/jales.20220025
33

Sa, K. J., Park, J. Y., Park, K. J., Lee, J. K. (2010) Analysis of genetic diversity and relationships among waxy maize inbred lines in Korea using SSR markers. Genes Genom 32:375-384.

10.1007/s13258-010-0025-6
34

Showkath Babu, B. M., Lohithaswa, H. C., Triveni, G., Mallikarjuna, M. G., Mallikarjuna, N., Balasundara, D. C., Anand, P. (2022) Comparative assessment of genetic variability realized in doubled haploids induced from F1 and F2 plants for response to Fusarium stalk rot and yield traits in maize (Zea mays L.). Agronomy 13:100.

10.3390/agronomy13010100
35

Sodedji, F. A. K., Agbahoungba, S., Agoyi, E. E., Kafoutchoni, M. K., Choi, J., Nguetta, S. P. A., Assogbadjo, A. E., Kim, H. Y. (2021) Diversity, population structure, and linkage disequilibrium among cowpea accessions. Plant Genome 14:e20113.

10.1002/tpg2.20113
36

Stephens, M., Smith, N. J., Donnelly, P. (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978-989.

10.1086/31950111254454PMC1275651
37

Strigens, A., Schipprack, W., Reif, J. C., Melchinger, A. E. (2013) Unlocking the genetic diversity of maize landraces with doubled haploids opens new avenues for breeding. PLoS ONE 8:57234.

10.1371/journal.pone.005723423451190PMC3579790
38

Takeuchi, F., Yanai, K., Morii, T., Ishinaga, Y., Taniguchi-Yanai, K., Nagano, S., Kato, N. (2005) Linkage disequilibrium grouping of single nucleotide polymorphisms (SNPs) reflecting haplotype phylogeny for efficient selection of tag SNPs. Genetics 170:291.

10.1534/genetics.104.03823215716494PMC1449737
39

Van Heerwaarden, J., Hufford, M. B., Ross-Ibarra, J. (2012) Historical genomics of North American maize. Proc Natl Acad Sci U S A 109:12420-12425.

10.1073/pnas.120927510922802642PMC3412004
40

VanLiere, J. M., Rosenberg, N. A. (2008) Mathematical properties of the r2 measure of linkage disequilibrium. Theor Popul Biol 74:130.

10.1016/j.tpb.2008.05.00618572214PMC2580747
41

Varshney, R. K., Thiel, T., Sretenovic-Rajicic, T., Baum, M., Valkoun, J., Guo, P., Grando, S., Ceccarelli, S., Graner, A. (2008) Identification and validation of a core set of informative genic SSR and SNP markers for assessing functional diversity in barley. Mol Breeding 22:1-13.

10.1007/s11032-007-9151-5
42

Weir, B. S., Cockerham, C. C. (1984) Estimating F-Statistics for the analysis of population structure. Evolution 38:1358.

10.2307/2408641
43

Wu, X., Wang, B., Wu, S., Li, S., Zhang, Y., Wang, Y., Li, Y., Wang, J., Wu, X., Lu, Z., Li, G. (2021) Development of a core set of single nucleotide polymorphism markers for genetic diversity analysis and cultivar fingerprinting in cowpea. Legume Science 3:e93.

10.1002/leg3.93
44

Zeitler, L., Ross-Ibarra, J., Stetter, M. G. (2020) Selective loss of diversity in doubled-haploid lines derived from European maize landraces. G3 (Bethesda) 10:2497-2506.

10.1534/g3.120.40119632467127PMC7341142
페이지 상단으로 이동하기