Following the Trace of HVS II Mitochondrial Region Within the Nine Iranian Ethnic Groups Based on Genetic Population Analysis

The Iranian gene pool is seen as an important human genetic resource for investigating the region connecting Mesopotamia and the Iranian plateau. The main objective of this study was to explore gene flow in nine Iranian ethnic/subpopulation groups (402 samples) by examining mtDNA HVS2 sequence variations. This then allowed us to detect mtDNA HVS2 sequence mutations in two independent thalassemia and cystic fibrosis patient sample groups. The patient groups did not explicitly belong to any of the aforementioned nine subpopulations. Across all subpopulations, the haplogroups B4a1c3a, H2a2a1, N10b, H2a2a2, and J1 were seen to be predominant. High haplogroup diversities along with admixture of the exotic groups were observed in this study. The Arab subpopulation was shown to be independent from the others. It was revealed that there is a far distant relationship between Arab and Azeri groups. The thalassemia patient group, represented an almost random sample of most Iranian ethnic groups, and revealed few significant differences (P < 0.05) in their HVS2 sequence. It turned out that the IVS II-I (G → A) mutation in the thalassemia β-globin gene was highly significant. Since the thalassemia patients in the present study represent many unique haplotypes, we can begin to comprehend the importance of mtDNA with this disease and the necessity for more studies in this context.


Introduction
Mitochondria have their own genetic material called mitochondrial DNA (mtDNA) (Galanello and Origa 2010). In total, at least 12 features distinguish human mtDNA from its counterpart nuclear genome (Chial and Craig 2008). Human mtDNA, is a maternally inherited circular double-stranded DNA molecule, comprising 16,569 bp, with approximately 103-104 copies present per cell. mtDNA has an almost 5-15 times higher mutation rate than the nuclear genome (Bircan et al. 2018). The D-loop region, which has the highest rate of polymorphisms, consists of the hyper variableregion I (HVRI) [at position 15971 to 16414 in the mtDNA genome], and hyper variable-region II (HVRII) [format position 15 to 389] and has a unique triple-strand characteristic (Ngiliet al. 2012;Nicholls and Minczuk 2014;Ubyaan et al.). Therefore, mtDNA alterations can result in a variety of single and multi-system disorders affecting various organs, including brain, heart, kidneys and skeletal muscles (Jamali et al. 2016).
Iran can be seen as a staging area for the Neolithic Agricultural Revolution, and the home to some of the earliest world empires. Iran, centered in the Asian continent, consists of diverse ethnic and linguistic groups, namely the Arabs, Armenians, Assyrians, Azeris, Baluchis, Gilak, Mazandaranis, Kurds, Lurs, Persian, Turkmen, and Zoroastrians. Based on mtDNA haplogroup diversity it is being suggested that in populations living in Iran and its neighboring regions e.g. Turkey, Georgia, and Central Asia, the main influx of gene flow has been from west (the Fertile Crescent) to east (Pakistan), a hypothesis that is also supported by genome wide association studies (GWAS) (Karimi et al. 2018) (Di Cristofaro et al. 2013). To date, in order to reveal the genetic structure, expansion patterns and population movements of Iranian subpopulations using complete mtDNA variation, two studies have been done (Schönberg et al. 2011;Derenko et al. 2013). However, further data will improve significance of genetic values, especially when there is no prior information regarding-specific ethnicity.
Thalassemia, the common single-gene disorder in humans, the most common hereditary anemia, containing more than 300 mutations, can manifest in three primary clinical forms due to mutations in the human β-globin locus, with major, intermediate and minor clinical manifestations (Galanello and Origa 2010). Iran is one of the major centers for the prevalence of thalassemia. Due to the high consanguinity in Iran's different ethnic subpopulations, it is estimated that there are more than three million thalassemia carriers (4-8%) in Iran (De Sanctis et al. 2017). The thalassemia gene frequency is high and varies considerably among different Iranian geographical locations, with double the country average rate in the Mazandaran, Sistan and Baluchistan, Fars, Hormozgan, and Kerman provinces and half the average rate in Tehran, East Azerbaijan, Khorasan, Hamadan, Yazd, and West Azerbaijan regions. There are more than 47 different β-globin gene mutations, in which, the most predominate one is the IVS II-I (G → A) mutation (Rezaee et al. 2012;Salehi et al. 2010).
Cystic fibrosis (CF) is an autosomal recessive disorder which is considered as the most common cause of pancreatic insufficiency in children and one of the most important reasons for chronic lung disease (Najafi et al. 2015). This disease is caused by a mutation in a CF trans-membrane conductance regulator gene regulating the activity of other chloride and sodium channels at the cell surface epithelium (Brzezinski et al. 2011;Rafeeq and Murad 2017). A heterogeneous mutation spectrum for the CFTR gene in 60 CF patients has been reported in Iran (Elahi et al. 2006) However, the most pronounced mutation, ΔF508 (p.F508del), represented only 16% of the expected mutated alleles. At a rate of 81.9%, mutation was detected in the CFTR gene, which is the highest recorded mutation detection rate for the CFTR gene (Alibakhshi et al. 2008). Dissection of the genetics underlying Iranian CF patients is appealing as it is reported that most frequent Iranian mutations are not included in a commonly reported set of CF mutations (Elahi et al. 2006). This underlines the importance of identifying geographic/ethnic-specific mutations in Iran. Exploring mutation in Iranian (sub) populations using either mtDNA or nuclear genomes is therefore an appealing route for investigation. The objectives of this study were to decipher the mtDNA relationships among nine different Iranian subpopulations/ ethnic groups (Arab, Armani, Azeri, Bandari, Gilak, Jewish, Kurd, Lur, and Fars) e.g. haplotype sharing and nucleotide variation, with further examination of genetic mutation in two independent thalassemia and CF patients sample groups.

Clinical Subpopulation Haplogroups
The IVS II-I (G → A) mutation was the most diverse thalassemia mutation in our study. This illustrates the fact that the IVS II-I (G → A) mutation is most likely an endemic mutation, found in the entire Iranian population (assuming the thalassemia group represents an unbiased random sample of the entire population). In this way, the H (20.8%), B (16.7%), and W (16.7%) groups were the most frequent haplogroups in thalassemia patients with the IVS II-I (G → A) mutation (Table 2). Overall, the H (27.9%)-which is by far the most common mtDNA lineage in West Eurasian human populations-and B (14.8%) haplogroups represented the highest frequencies in the thalassemia group. However, as we addressed already, the H haplogroup has been predominant in the Middle East, Western and Eastern Europe, Caucasus, Central Asia and Africa. The most common haplogroup found in the CF patients was the U haplogroup (21.4%).

Discussion
Despite the larger sample size of Fars subpopulation, the highest number of unique variants belonged to the thalassemia patients (23 variants), of which 8 were novel. This might indicate that the thalassemia patient group can be treated as an independent group here, constituting people from different ethnic groups. The Fars and Bandari subpopulations showed 17 and 14 unique variants, respectively. However, only 4 unique allelic variants were observed in the independent CF group. Previous studies in Iranian subpopulations have shown that the thalassemia patients had the highest frequency of the IVS II-I (G → A) mutation, especially in the northern, central, and some areas of southern Iran (Karimi et al. 2000(Karimi et al. , 2002Rezaee et al. 2012). In essence, there are some pieces of evidence indicating that the IVS II-I (G → A) mutation has entered from Iraq and then expanded in Iran (Al-Allawi et al. 2006; Rezaee et al. 2012). Additionally, in some of Iran's neighboring countries including Kuwait, Iraq, the eastern province of Saudi Arabia, Azerbaijan, Jordan, and Syria, it is reported that the IVS II-I (G → A) mutation is the most frequent one seen in thalassemia patients (De Sanctis et al. 2017). It is most likely that the IVS II-I (G → A) mutation in thalassemia patients is not only the most frequent, but also the most diverse.
The observed mtDNA haplogroup distribution in our study confirmed the previously characterized global distribution. For instance, the ancestral L, H, and U haplogroups are predominant in Africa, H, U and J haplogroups in Middle East, the D, B, M, F and N haplogroups in East Asia and H, D, C and U haplogroups are Table 3 Fst values of subpopulation comparisons (below diagonal) Pluses (+) and minuses (−) display the existence and non-existence of significant difference amongst subpopulations (above diagonal) (P value < 0.05)  (Chen et al. 1995;Comas et al. 2004;Lott et al. 2013;Rishishwar and Jordan 2017). Also, H, U, T and J haplogroups are the predominant ones in Europe (Lott et al. 2013;Rishishwar and Jordan 2017;Torroni et al. 1996). However, A, B, C and D constitute the majority of haplogroups in North, South and Central American populations, with N and B haplogroups the most predominant in Australia and Oceania (Lott et al. 2013;Rishishwar and Jordan (Table 2). Interestingly though, previous investigations in Caucasus and East Europe disclosed that the U (with 22% frequency) haplogroup were the most common one in these regions

Johoud
The Johoud subpopulation is scattered all over cities of Iran and mostly are the inhabitant of the Shiraz, Isfahan, Hamadan, Yazd, Kerman, Rafsanjan, Sirjan and Borujerd cities. The two profoundly appeared haplogroups in Middle East, i.e. H (25%) and J (17.5%) haplogroups, displayed high frequencies in Johoud subpopulation in our study (Table 2) Kurd and Lur The Kurd and Lur subpopulations are predominantly residents of western regions of Iran. In our study, the J (24% for Kurd and 36% for Lur populations) haplogroup was the most common one in both populations. As the matter of fact, the J haplogroup proven to be one of the most prevalent haplogroups in Middle East populations

Fars
The Fars subpopulation, the most frequent population of Iran, are mostly the inhabitants of East (especially northeast) through center to lower frequencies in southern regions of Iran. Originally, the Fars sunpopulation are mainly located in Mashhad, Isfahan, Kerman, Shiraz and Yazd cities. The H (19.4%) and N (13.3%) haplogroups were the most frequent ones in the Fars populations. As aforementioned, the H haplogroup is the most frequent one in Middle East

3
Biochemical Genetics (2022) 60:987-1006 2017). Table 4 displays the major subpopulation haplogroups and their regional distribution. Despite low number of CF haplotypes, we identified 10 out of 22 macro-haplogroups that were also seen in various main African, Middle Eastern, Caucasus and East European haplogroups (Table 2). Interestingly, our results displayed high haplogroup diversities along with admixture of the European, American and East Asian populations. In essence, the secular trade along the great Silk Road which extends from Xian in China through the Indian subcontinent to Iran and the Eastern Mediterranean, along with the invasions of the Mongols (1220 A.D.) and the Tatars (1380-87 A.D.) could be historical events that have resulted in such diverse and admix Iranian haplogroups (De Sanctis et al. 2017;Rezaee et al. 2012). Previous study in the African, East Asian and European populations showed a tight coherent patterns of mtDNA haplogroup distributions, whereas the Indian and American population haplogroups represented more divergent haplogroups consistent with their admixed origins (Rishishwar and Jordan 2017). The Indian populations, like Iranian subpopulations in our investigation, has been shown to be formed of a combination of European and Asian mtDNA haplogroups, along with relatively ancient human immigration and admixture events (Kivisild et al. 1999;Moorjani et al. 2013). As Table 2 shows, we found some ancient African L haplogroups in our haplotypes which would likely reflect the facts of expansion of these haplogroups in the Near East, and also, immigration between Africa and the Near East (Maca-Meyer et al. 2003). However, along with Iranian ethnic populations, the tremendous haplogroup diversities in thalassemia and CF patients could be the result of admixture of different global human populations due to multiple historical events and the existence of high consanguinity in different subpopulations.
The high haplotype and nucleotide diversities could shed some light on the existence of high genetic diversity in all subpopulations sequences. Previous investigations in Iranian subpopulations revealed high frequencies of thalassemia in the Bandari, Gilak and Fars subpopulations and a low frequency in the Azeri population (De Sanctis et al. 2017). Other studies have revealed a probable ancient and recent gene flow between Iranian populations and the Indian sub-continent and the Arabian Peninsula (Derenko et al. 2013). However, certain mtDNA haplogroups indicate barriers to gene flow due to two major Iranian deserts and the Zagros mountain range (Derenko et al. 2013). Additionally, the results from MDS indicates their evolutionary genetic distance. With unprecedented increasing whole genome data and reliable computational power, genomic simulations and statistical inference in lieu of model-based approaches; a finer scale view of the mtDNA of nine group would be possible. We are expecting using these tools and more well representative ethic group sample, a better spectacular views of their genome proximity could be grasped. It is crucial to say that these methods may help to integrate data from nongenetic factors and sources, a matter that shall help to make accurate inferences from genetic data and to improve our interpretations of ancient events. Interdisciplinary approaches will be essential as we continue to move forward in disentangling human evolutionary history. The interdisciplinary nature of genetic anthropology places us in an ideal position to take advantage of these approaches.

3
This study has its limitations. We did not calculate the statistical power of our mtDNA data (i.e. beta, the probability of rejecting the null hypothesis when actually it is true). As we were constrained by the small number of mtDNA sequences per ethnic group, this may have affected the final results. Although the Iranian population is made up of geographically, ethnically and linguistically diverse groups, likely created from prehistoric post-glacial expansions, the common underlying gene pool and gene flow among them is expected. In our study this was clearly observed. Assuming a common underlying gene pool and gene flow, there were no genetic differences among the studied subpopulations. Due to some issues like sampling schemes, retracing gene flow of the present Iranian populations has been strongly restricted. To further elaborate on these results, a comparative analyzes of paternal (Y-Chromosome) and maternal (mtDNA) lineages in these subpopulations needs to be done in the future. Theormalized ratio of the Y-Chromosome FST distance with respect to the total distance (Y-Chromosome RST + mtDNA) could also help shed some more light in this context. However, a contrast between the results of mtDNA and Y-Chromosome has been observed (Badro et al. 2013).
In summary, the subpopulations displayed high genetic diversity along with diverse haplogroups. The IVS II-I (G → A) mutation was not only the most common thalassemia mutation, but also the most diverse one across all subpopulations. The most significant haplogroups were B4a1c3a, H2a2a1, N10b, H2a2a2 and J1. Our results showed high haplogroup diversity along with admixture of the European, American and East Asian common haplogroups which could be as a result of the secular trade along the great Silk Road and invasion of the Mongols (1220 A.D.) and the Tatars (1380-87 A.D.) as historical events. In fact, we could say that the multi-ethnic population context and complex historical events of Iran resulted in the mtDNA heterogeneity. Few subpopulations especially the e.g. Azeri, and Arab groups are shown to be conserved and display signs of genetic isolation. The thalassemia group which, like the CF group, is assumed to constitute random sampling of the total Iranian population, revealed differences with the Arab, Azeri, and Bandari subpopulations, indicating that this disease is more common in the other subpopulations (i.e. Armani, Gilak, Johoud, Kurd, Lur, Fars). In this way, due to high number of unique and novel nucleotide variations in the thalassemia group, we could claim that these variations are likely associated with this disease in our study.
Thalassemia, heterogeneous at the molecular level, is derived by many mutations (more than 300) leading to a failure of hemoglobin and has relatively high prevalence in Iran. Also, for CF investigated in this study, which is the most common autosomal recessive disorders around the world, the finding indicates even though ΔF508 is the most frequent mutation in populations like Caucasian, but it has not a pivotal role in Iranian CF groups. This study had no mean to associate prevalence of aforementioned diseases with mtDNA sequence information; but tried to pinpoint the matrilineal lineage of CF and Thalassemia groups with other ethical groups in terms of sharing mtDNA.

Ethical Considerations
The study was approved and monitored by the ethics committee of the National Institute of Genetic Engineering and Biotechnology (IR.NIGEB.EC.1398.12.3.E). All methods were carried out in accordance with relevant guidelines and regulations. All participants gave written informed consent to participate in the study and for their anonymized data to be used for statistical analysis and dissemination, also none of them were under 18 years of age.  Table 1. The study was conducted in accordance with the methodology from previous studies (Akbari et al. 2008;Derakhshandeh et al. 2008;Najmabadi et al. 2001;Rezaee et al. 2012) e.g. sample collection, DNA extraction, primers, PCR amplification and sequencing to extra HVSI and HVSII D-loop data. Briefly, total DNA samples were extracted from blood using a QIAamp DNA mini kit (Qiagen, Hilden, Germany) according to the manufacturer's instruction. The quality and purity of the extracted DNA were measured using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA).

PCR Amplification
The mtDNA was amplified using a set of specific primers (numbering according to NCBI Accession No NC_0122920.1): PF, 5′-ATC ATT GGA CAA GTA GCA TC-3′ (15,791-15,810 bp) and PR, 5′-GAG CTG CAT TGC TGC GTG CT-3′ (780-761 bp). Polymerase chain reaction (PCR) amplification was carried out using TEMPase Hot Start 2 × Master Mix A BLUE (Ampliqon, Odense, Denmark) in a final reaction volume of 50 μL, containing 100 ng of DNA, 0.32 μL of each primer (10 pmol), 25 μL of TEMPase 2 × Master Mix, and 23.2 μL RNase-free water. PCR amplification was performed with the following program: pre-PCR incubation at 95 °C for 15 min, 35 cycles of 95 °C for 20 s, annealing at 60 °C for 45 s, and extension at 72 °C for 30 s, with a final extension at 72 °C for 5 min. The specific amplification of a 1550 bp fragment was confirmed by 1.5% agarose gel electrophoresis.

Sequencing
PCR products were sequenced by direct DNA sequencing (Bioneer, South Korea). The sequencing results were analyzed, using Codon Code Aligner 6.0.2 software (Codon Code, Centerville, MA, USA, https:// www. codon code. com/ align er/ new60. htm). The sequences were compared to the revised Cambridge Reference Sequence (rCRS) (Accession No NC_012920.1), using the BLAST sequence analysis tool (NCBI, Bethesda, MD, USA). The sampling scheme of the thalassemia and CF patient groups were quite different than from the nine subpopulations mentioned the above, in a way that individuals constituting these samples were assumed to be an independent random sample of the Iranian population rather than explicitly from a given ethnic group. Therefore, we could say, eleven groups were genetically compared e.g. nine subpopulations with two patient groups.

Statistical Analyses
To compare all the sequenced mtDNA D-loop samples to other datasets and reference sequence, we edited and reduced the sequences size to leave the region spanning nucleotides 47 to 766 of the reference sequence. This segment includes Fig. 2 The flowchart of data collection, laboratory processes and statistical analysis. We collected 402 samples from nine Iranian subpopulations and two patient groups. After DNA extraction, PCR amplification, sequencing, and quality check of sequenced samples, we identified haplotypes and haplogroups. Also, we performed AMOVA to compare studied subpopulations