Description
A total of 2,762 LASI-DAD participants, including 22 trios (mother-father-child), were sequenced at MedGenome, Inc. (Bangalore, India) at an average read depth of 30. Individuals were sampled from 18 different states across India, with median sample size of 157 individuals per state. The raw whole genome sequences were sent to the Genome Center for Alzheimer’s Disease (GCAD) at the University of Pennsylvania for joint calling and quality control. A total of 2,679 samples and 73.2 million autosomal bi-allelic variants passed quality control filters, including 67.1 million single nucleotide variants (SNVs) and 6.04 million insertion-deletions (indels).
The dataset includes individuals born in 23 different states, speaking at least 26 different languages, from both rural (63%) and urban (37%) areas, and belonging to various caste groups as recognized by the Indian government: 4% from Scheduled Tribes, 18% from Scheduled Castes, and 44% from other backward class (OBC). Nearly equal numbers of males and females were recruited in the study constituting 52% of females. For many analyses, individuals were categorized, based on their birth location, into six major geographic regions: North (n=555), West (n=385), Central (n=373), South (n=715), North-East (n=73), and East (n=530). Most analysis in this dataset are preformed on 2,620 individuals that passed quality control checks and excludes first-degree relatives.
Whole-genome data for these samples are located in the ADSP dataset (ng00067).