DNA studies of endangered or extinct species often rely on ancient or degraded remains. The majority of ancient DNA(aDNA) extraction protocols focus on skeletal elements, with skin and hair samples rarely explored. Similar to that found in bones and teeth, DNA extracted from historical or ancient skin and fur samples is also extremely fragmented with low endogenous content due to natural degradation processes.Thus, the development of effective DNA extraction methods is required for these materials. Here, we compared the performance of two DNA extraction protocols (commercial and custom laboratory aDNA methods) on hair and skin samples from decades-old museum specimens to Iron Age archaeological material. We found that apart from the impact sample-specific taphonomic and handling history has on the quantity and quality of DNA preservation, skin yielded more endogenous DNA than hair of the samples and protocols tested. While both methods recovered DNA from ancient soft tissue, the laboratory method performed better overall in terms of DNA yield and quality, which was primarily due to the poorer performance of the commercial binding buffer in recovering aDNA.
The survival of ancient DNA (aDNA) molecules in historical and ancient materials provides direct evolutionary information that can be used to reconstruct the dynamics of past species,populations, and ecosystems (Ermini et al., 2015; Orlando et al., 2011; Pedersen et al., 2015). However, most aDNA studies have concentrated on hard skeletal material such as teeth and bones, which are more likely to be preserved than soft tissues. Therefore, most aDNA extraction protocols have focused on obtaining DNA from pulverized teeth or bones(Dabney et al., 2013; Damgaard et al., 2015; Gamba et al.,2016; Nieves-Colón et al., 2018). Soft tissue remains are also an important source material for genetic research, and in some cases may be the only material available for the study of rare or recently extinct animals (Fulton et al., 2012; Hung et al., 2013).
While commercially available DNA extraction kits tailored to specific applications offer convenience and uniformity, a silicabased DNA extraction protocol optimized for the recovery of short DNA fragments from ancient skeletal material has been widely adopted in multiple aDNA labs (Dabney et al., 2013,with later variations in Korlevi? et al., 2015 and Rohland et al.,2018). This method has successfully recovered ancient DNA from a variety of samples, such as a ~400 000-year-old archaic human (Meyer et al., 2014), ~22 000-year-old giant panda from Guangxi (Ko et al., 2018), and ~8 500-year-old humans from Fujian (Yang et al., 2020) in southern China. An extension of this method for DNA recovery in sediment has also been established (Rohland et al., 2018), but its efficacy with soft tissue and hair has yet to be fully explored.
To identify an accessible and practical method for retrieving DNA from historical or ancient soft tissue, we compared the efficiencies of the widely applied aDNA extraction method first reported by Dabney et al. (2013) (referred to hereafter as the Lab protocol) and a commercially available kit for recovering DNA from modern soft tissue (DNeasy Tissue Extraction kit by Qiagen). Results showed that the Lab protocol was a suitable method for working with ancient skin samples. Furthermore,although the lysis buffers behaved similarly between the two methods, the superior results of the DNA binding buffer in the Lab protocol made it preferable to the tissue extraction kit in recovering aDNA from soft tissue.
In the present study, we obtained samples from two black snub-nosed monkeys (Rhinopithecus strykeri; ca. 30 and 50 years old, respectively, stored in the Museum of Yunnan University, China; with hair and skin samples obtained from both individuals) and one Tonkin snub-nosed monkey(Rhinopithecus avunculus; decades-old skin sample stored in a museum in Vietnam) (Table 1). The snub-nosed monkeys(Rhinopithecusspp.) of China and Vietnam are among the world’s rarest and most endangered primates (2000 International Union for Conservation of Nature (IUCN) Red List of Threatened Species, URL: http://www.iucnredlist.org/)and are confined to extremely limited areas in isolated regions(Liedigk et al., 2012). Due to their scarcity and protected status, DNA research on these and similar animals often relies on archival material from museums, such as preserved skin or fur. Additionally, skin samples were recovered from three 3 100-2 400-year-old dogs (Canis lupus familiaris) kept at the Xinjiang Institute of Cultural Relics and Archaeology, China.
Sample preparation was performed in a clean room at the Laboratory of Molecular Paleontology, Institute of Vertebrate Paleontology and Paleoanthropology (IVPP), Chinese Academy of Sciences, Beijing, China. All laboratory procedures were conducted using contamination precautions,including full body protection, bleach decontamination, and UV irradiation of tools and work areas before and between use. All consumables were UV irradiated for 20-40 min.
Samples were cut into <1 mm3pieces (~12-41 mg for skin,~1-11 mg for hair) using sterilized scissors and then placed into 2.0 mL DNA LoBind tubes (Supplementary Table S1).One extraction blank was included for each protocol tested. To eliminate surface contaminants and inhibitors, samples were cleaned with 1.0 mL 70% ethanol. After adding the ethanol,samples were vortexed for 1 min at maximum speed and spun for 1 min at 13 200 r/min, with the supernatant subsequently removed. This cleanup step was repeated three times. After the final cleanup step, the tube was left open for 5 min at 40 °C to allow complete ethanol evaporation.
DNA extraction typically involves two steps: lysis and purification (Gamba et al., 2016). The lysis step lyses cells and denatures protein complexes and the purification step separates DNA from biological and chemical contaminants.We aimed to compare combinations of different lysis and binding buffers. First, we used buffers from the Qiagen DNeasy Tissue Extraction kit (Valencia, USA) (lysis buffer:proteinase K-Buffer ATL; purification buffers: AL and 96%-100% ethanol (binding buffer), AW1, and AW2) (method KK: following the manufacturer’s guidelines using the“Purification of Total DNA from Animal Tissues (Spin-Column Protocol)”). Second, we prepared lysis and binding buffers in the laboratory following Dabney et al. (2013) with slight modifications to the binding buffer volume and sodium acetate concentration (Lab protocol: method LL, see Supporting Material). In total, we tested four combinations: KK (all buffers from commercial kit), KL (lysis buffer from commercial kit, selfmade binding buffer from laboratory), LK (self-made lysis buffer from laboratory, binding buffer from kit), and LL (selfmade lysis and binding buffers from laboratory) (Figure 1A). A summary of our experimental workflow comparing the performance of the four extraction methods is described in Figure 1A.
For library preparation, we used 40% of each extract (12 μL of 30 μL for kit binding buffer; 20 μL of 50 μL for Lab binding buffer), including extraction blanks. We constructed doublestranded sequencing libraries from all samples (including extraction blanks) following the protocols of Meyer & Kircher(2010) with modifications by Kircher et al. (2012), and eluted the libraries with TE buffer to a final volume of 40 μL. All libraries were treated with uracil-DNA-glycosylase (UDG) and endonuclease (Endo VIII) to remove characteristic aDNA deamination (Briggs et al., 2007). Additional non-template library controls were also included to monitor contamination during library preparation and sequencing. We quantified 1:200 dilutions of each adapter-ligated library using quantitative real-time polymerase chain reaction (qRT-PCR)on an Agilent Technologies Stratagene Mx3005P system(Agilent Technologies, USA). Reactions were run for each library at final volumes of 26 μL with the following conditions:12.5 μL of Maxima? SYBR Green qPCR Master Mix (2X),without ROX* (Fermentas), 1.25 μL of 10 μmol/L Sol_iPCR_P7 primer, 1.25 μL of 10 μmol/L Sol_PCR_P5 primer, 10.0 μL of ddH20, and 1 μL of library dilution.Reactions were heated to 95 °C for 15 min, followed by 45 cycles of 95 °C for 15 s, 60 °C for 20 s, and 72 °C for 40 s,followed by a final disassociation step of 95 °C for 1 min,55 °C for 30 s, and 95 °C for 30 s. Analysis of qRT-PCR data focused on cycle threshold (Ct) values, which represent the number of qRT-PCR cycles required for the fluorescent signal to exceed background levels. Mean Ct values were averaged across all replicates per library (Supplementary Table S1).Non-template controls were also included to monitor background fluorescent levels. All libraries were dual indexed and amplified using AccuPrimePfx DNA polymerase (Life Technologies, USA) for 10-30 cycles according to the qRTPCR results. Sample-specific indices were introduced into the P5 and P7 adaptors during library amplification to identify each library with its respective sample (Kircher et al., 2012).Dual-indexed libraries were purified using the Qiagen MinElute PCR purification kit (Valencia, USA), and library concentrations were determined using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, USA) and a DNA-1 000 chip on the Agilent Bioanalyzer 2 100 (Agilent Technologies, USA). Shotgun libraries were paired-end sequenced (2×75 bp) on the Illumina MiSeq platform using a MiSeq Reagent kit v.3 (150 cycles) at the Laboratory of Molecular Paleontology, IVPP, Beijing, China. To allow comparisons of results across all samples, the molecule number per microliter based on the qRT-PCR results(Supplementary Table S1, column J) was normalized to the molecule number per milligram of the sample material(Supplementary Table S1, column K) after taking the DNA elution and input volumes into account. The unique endogenous molecule number per milligram (Supplementary Table S1, column M) is based on the rate of the “Endogenous”column (Supplementary Table S1, column L) and the molecule number per milligram.
The Illumina sequence reads were merged and adapters were trimmed using leeHom v.1.1.5 (Renaud et al., 2014). To compare sequencing results across different extraction treatments and to control for differences in sequencer output,0.1 million reads were randomly selected from all samples.The reads were mapped to the whole genome references ofRhinopithecus bieti(ASM169854v1) andCanis lupus familiaris(CanFam3.1) according to their species of origin. Mapping was performed using BWA v.0.5.10 (Li & Durbin, 2010) (bwa bam2bam -t 5 -g Reference -n 0.01 -l 16 500 -o 2 Input.bam).Duplicate reads were identified using bam-rmdup v.0.2(https://github.com/mpieva/biohazard-tools), which removes PCR duplicates from BAM files and calls a consensus for each duplicate set. Quality filtering (minimum Q30) was performed with SAMtools v.1.5 (Li et al., 2009). Damage patterns of down-sampled BAM files were characterized using mapDamage v.2.0.2 (Ginolhac et al., 2011; Jónsson et al.,2013). Library complexity estimates were generated using preseq v.2.0 (Daley & Smith, 2013) on down-sampled BAM files.
Figure 1 Experimental design, DNA yield of each sample, and sequence content of library bar plots of this study
Table 1 Samples used in this study
The DNA yields were evaluated through fluorometric quantification of purified extracts (Figure 1B; Supplementary Table S2). Based on the four protocols tested, we made the following general observations: among tissues, skin samples yielded more DNA than hair samples, measured as unique endogenous molecules per milligram of dried material(Figure 1B; Supplementary Table S1), and 20th century samples yielded more DNA than ancient samples (Figure 1B);among the different extraction combinations, those using Laboratory DNA binding buffer, i.e., LL and KL, generated higher DNA yields regardless of which lysis buffer was used(Figure 1B; Supplementary Figure S1A). All blanks resulted in DNA yields at least three orders of magnitude below the average yields of the samples (Supplementary Table S1).
All samples in this study had average DNA fragment lengths shorter than 85 bp. For the ancient dog samples from Xinjiang,the average DNA fragment lengths were shorter than 53 bp.This small size is characteristic of ancient DNA and consistent with the expectations for degraded remains (Briggs et al.,2007; Dabney et al., 2013; Meyer et al., 2014) (Supplementary Table S2). Skin samples extracted with LK resulted in a smaller average DNA fragment size, especially for the sample Skin-6 (Supplementary Figure S2A). For the three monkey samples, the different methods showed similar read length distributions (Supplementary Figure S3). For both tissues, a scatter plot of average DNA fragment lengths versus average percent of GC content could not distinguish between the results generated with each method. However, we could clearly distinguish skin samples by individual irrespective of the method used (Supplementary Figure S4C).
All DNA extracts were treated with the USER enzyme mix to reduce nucleotide misincorporations resulting from postmortem cytosine deamination reactions. All reads were characterized by nucleotide misincorporation patterns and DNA damage parameters were quantified using mapDamage2.0 (Jónsson et al., 2013). The C>T and G>A misincorporation rates were largely reduced across all read positions, except the first and last two positions(Supplementary Figure S5). The three DNA damage patterns examined (λ, δD, δS) were similar between samples extracted with either extraction method (Supplementary Figure S6).Most samples had high probabilities (>0.70) of C to T and G to A misincorporations caused by DNA damage at the first and last position of each fragment (Supplementary Table S2),typical of ancient DNA (Briggs et al., 2007).
We obtained between 84 899 and 293 570 reads for each DNA library after shotgun sequencing. To maintain consistency between sequencing depths, we aligned 0.1 million randomly selected reads per library against the whole genome references (Rhinopithecus bieti, ASM169854v1, orCanis lupus familiaris, CanFam3.1), except for three samples that had under 0.1 million reads. The reads from blanks were mapped to bothHomo sapiens(hg19) andRhinopithecus bieti(ASM169854v1). The percentage of uniquely mapped endogenous content was calculated as the proportion of unique reads mapped to the reference (after duplicate removal and quality filtering) over the total amount of down-sampled reads (Supplementary Table S2). We estimated that most monkey skin samples had >30% uniquely mapped endogenous content, while monkey hair samples had <14%,and all ancient dog skin samples had <1.3% (Supplementary Table S2). The comparisons of uniquely mapped endogenous content among the different methods are shown in Figure 1B-e, where KL and LL-treated libraries contained higher levels than KK and LK. Only 0-4 reads of the extraction blanks could be aligned to human or monkey reference seqeuences, demonstrating almost no detectable primate contamination during the experiment.
We next characterized library clonality (Supplementary Table S2), i.e., molecular complexity of the extract. Higher clonality among different treatments from the same starting material denotes a loss of unique molecules during treatment.Sequence clonality (measured as the fraction of mapped reads that are PCR duplicates) ranged from 0.03% to 2.04%(Supplementary Table S2). The clonality percentage differed among the extraction methods, and the LK method showed more overall clonality than any other method, indicating that lab lysis buffer combined with the kit binding buffer resulted in the greatest loss of DNA (Figure 1B-f). The relationship between clonality and uniquely mapped endogenous content is shown in Supplementary Figure S7A.
To determine which method produced higher complexity libraries, we used the c_curve function inpreseqto estimate the number of distinct reads recovered for each library(Supplementary Figure S2C). High complexity libraries have a larger proportion of distinct reads mapped to different parts of the reference genome, resulting in more of the reference being covered with a single sequencing experiment. In contrast, low complexity libraries have a large proportion of reads mapped to the same sites and thus may have strong bias and high redundancy (Head et al., 2014). Low complexity,if present across all methods, can be evidence of either fewer starting endogenous DNA molecules in a sample or inefficient DNA recovery of the methods used. The lower complexity of the LK method for hair libraries supported the higher clonality results for this method. No differences in complexity were observed among the other methods (Supplementary Figure S7B).
Based on the above results, we found that all tested methods were able to retrieve DNA from archeological and archival skin and hair samples, which were several decades to 2 000-3 000 years old. These and similar soft tissue samples represent an abundant source of animal material for historical and ancient DNA research in museums. For each method, we evaluated raw DNA yields and endogenous reads, recovered after shotgun Illumina sequencing from parallel paired extractions, and characterized differences in base pair composition, sequence read complexity, postmortem damage profiles, and average read lengths. Overall, we found that skin performed better than hair with respect to DNA yield(Figure 1B), which is not surprising given that neither method is optimized for DNA recovery from hair. Previous analyses of ancient or historical hair have included dithiothreitol (DTT) in lysis buffers to denature keratin by reducing disulfide bonds(e.g., Gilbert et al., 2004, 2007; Rasmussen et al., 2010,2011). The Qiagen DNeasy Tissue kit “user optimized”protocol for hair also recommends the addition of DTT. Thus,we recommend that applications involving ancient or historical hair take this into consideration. For the different methods, our results suggested that KL and LL were similarly efficient at DNA recovery, with the kit performing slightly better on most samples (Supplementary Figure S8). Furthermore, KK and LK resulted in lower comparative DNA yield (Figure 1B-d;Supplementary Figure S1A), which could be due to the reduced efficiency of the DNA binding buffer in the extraction kit for the recovery of small molecules, typical of ancient DNA.The libraries sequenced using either lysis buffer followed by aDNA specific DNA binding buffer, i.e., LL and KL, showed higher average uniquely mapped endogenous content(Figure 1B-e; Supplementary Figure S1B). We note, however,some sample and experimental variation among our results,as several individual sample-method combinations identified in Figure 1B performed contrary to the overall trends and averages. All hair samples had extremely low uniquely mapped endogenous content (<15%), and average DNA fragment sizes were under 80 bp. This suggests that either the LL or KL method is best for the recovery of informative aDNA from skin samples, and more specialized protocols should be explored based on previous work involving DNA recovery from ancient hair (e.g., Gilbert et al., 2004, 2007; Rasmussen et al.,2010, 2011). The DNA damage patterns of our samples are consistent with those of authentic aDNA sequences (Briggs et al., 2007; Dabney et al., 2013; Overballe-Petersen et al.,2012) and showed no differences among methods(Supplementary Figure S6).
Our results indicate that the principal difference between the commercial tissue kit and custom extraction protocol for the recovery of DNA from ancient soft tissue was the binding buffer used during purification. The composition of the DNA binding buffers for silica-based purification methods largely influence DNA recovery results (Supplementary Figure S8).The importance of a high isopropanol content in DNA binding buffers to recover shorter DNA fragments, as is used in the aDNA protocol explored here, has been shown in Glocke &Meyer (2017). In contrast, the ethanol used in the kit binding buffer favors the recovery of larger molecules, as indicated in the accompanying material. Nucleic acids are less soluble in isopropanol than ethanol, so the use of isopropanol in binding buffers may increase the precipitation of lower concentrations of nucleic acids and the precipitation of DNA at higher temperatures. The use of 10 mL of binding buffer for each sample in the laboratory method versus the 400 μL volume of the kit may also minimally increase binding, with larger volumes increasing the contact time between the extract and silica membrane; however, this was unexplored in the current study. Thus, we consider the observed differences in DNA extraction efficiency between the methods to be largely explained by chemical differences in the reagents. We conclude that the widely used aDNA extraction protocol optimized for the recovery of DNA from ancient skeletal material can also be used to recover DNA from ancient preserved soft remains, and caution against the use of the modern DNeasy Tissue Extraction kit for this purpose,although the lysis buffer from the kit coupled with a high isopropanol binding buffer may give satisfactory results. These findings can be used to expand the utility of standard aDNA protocols to include soft tissue samples.
DATA AVAILABILITY
Genetic data are available at the National Genomics Data Center (https://bigd.big.ac.cn/gsa/) under GSA accession No.CRA003705 and National Center for Biotechnology Information ( https://www.ncbi.nlm.nih.gov/ ) under SRA No.PRJNA721285.
SUPPLEMENTARY DATA
Supplementary data to this article can be found online.
COMPETING INTERESTS
The authors declare that they have no competing interests.
AUTHORS’ CONTRIBUTIONS
Q.M.F. received the funding and conceived the study. P.C.,Q.Y.D., and X.T.F. performed the experiments and conducted the primary analyses. M.Z., Q.M.F., E.A.B., H.R.W., A.M.S.K.,Y.C.L., X.W.M., Y.Q.W., C.R., T.N., W.X., H.W., and L.Y.helped with data collection and analyses. M.Z., Q.M.F., and E.A.B. drafted the manuscript. All authors discussed, critically revised, and approved the final version of the manuscript.