IGVBrowser–a genomic variation resource from diverse Indian populations

Narang, Ankita; Roy, Rishi Das; Chaurasia, Amit; Mukhopadhyay, Arijit; Mukerji, Mitali; Indian Genome Variation Consortium; Dash, Debasis

doi:10.1093/database/baq022

Abstract

The Indian Genome Variation Consortium (IGVC) project, an initiative of the Council for Scientific and Industrial Research, has been the first large-scale comprehensive study of the Indian population. One of the major aims of the project is to study and catalog the variations in nearly thousand candidate genes related to diseases and drug response for predictive marker discovery, founder identification and also to address questions related to ethnic diversity, migrations, extent and relatedness with other world population. The Phase I of the project aimed at providing a set of reference populations that would represent the entire genetic spectrum of India in terms of language, ethnicity and geography and Phase II in providing variation data on candidate genes and genome wide neutral markers on these reference set of populations. We report here development of the IGVBrowser that provides allele and genotype frequency data generated in the IGVC project. The database harbors 4229 SNPs from more than 900 candidate genes in contrasting Indian populations. Analysis shows that most of the markers are from genic regions. Further, a large fraction of genes are implicated in cardiovascular, metabolic, cancer and immune system-related diseases. Thus, the IGVC data provide a basal level variation data in Indian population to study genetic diseases and pharmacology. Additionally, it also houses data on ∼50 000 (Affy 50 K array) genome wide neutral markers in these reference populations. In IGVBrowser one can analyze and compare genomic variations in Indian population with those reported in HapMap along with annotation information from various primary data sources.

Database URL:http://igvbrowser.igib.res.in

Introduction

Indian population representing one-sixth of the world population has been the global melting pot of human diversity. It has all the world’s major linguistic groups and the populations have been shaped by different waves of migrations and admixture (1, 2). Further, stringent mating patterns have led to the existence of several endogamous populations, which makes it an important resource for mapping genes (3). The Indian Genome Variation Consortium (IGVC) project, an initiative of the Council for Scientific and Industrial Research (CSIR)—was set up to develop a database of genomic variations in Indian population for predictive marker discovery in complex diseases such as diabetes, asthma, neuropsychiatric, infectious and cardiovascular disorders, response to drugs, etc. (4). The Phase I of the project was conducted to determine the extent of genetic differentiation in India. Toward this genotype data of 405 SNPs from 75 genes and 4.2 Mb contiguous chromosome 22 regions were studied in 55 contrasting populations (4, 5). These populations were identified from 4 major linguistic groups namely, Austro-Asiatic (AA), Tibeto-Burman (TB), Indo-European (IE) and Dravidian(DR) spanning 6 geographical regions of habitat (N, north; NE, north-east; W, west; E, east; S, south; C, central) and different ethnic groups (LP, large population, caste; IP, isolated population, tribes; SP, special population, religious groups). Five genetically distinct clusters were identified and a set of 24 populations that represent these clusters were selected for the Phase II of the project. In the Phase II, 3824 SNPs from 834 candidate gene as well as ∼50 000 (Affy 50 K array) genome wide neutral markers have been genotyped using the illumina, sequenom and affymetrix platforms. This initiative lays the foundation for the integration of global genotype-to-phenotype data (6) with Indian population data and development of a federated database.

Data Source and Organization

To address the need for an online comprehensive resource that enables users to visualize IGVC data with integrated information about SNPs from different resources we have developed IGVBrowser as shown in Figure 1.

Figure 1.

Open in new tab Download slide

A representative example of IGVBrowser. Distribution of markers in 2.41 Mb region in human chromosome 1 from IGVC data is displayed along with annotation data from different resources.

IGVBrowser houses genotype data on samples that were recruited in the IGVC project. The database includes (i) final validated dataset from 1871 samples in Phase I comprising of 405 autosomal SNPs spanning over 75 genes including 90 SNPs from 5.2 Mb region of chromosome 22 from 55 diverse endogamous Indian populations (3); (ii) Phase II dataset for 3824 SNPs spanning from 834 genes in 545 samples from 24 IGVdb populations and (iii) ∼50 000 (Affy 50K XbaI array) neutral markers in 26 populations. The Phase II populations are a subset of the populations genotyped in the Phase I. Web-based tool SNPper (http://snpper.chip.org/) was used to classify the 4229 markers in Phase I and Phase II according to their location in genic regions (Figure 2). Similarly, DAVID (http://david.abcc.ncifcrf.gov/) was used to classify the genes containing these markers according to gene–disease association class (Figure 3) and their mapping in various KEGG pathways (Figure 4). We report that a large fraction of genes are implicated in cardiovascular, metabolic, cancer and immune system-related diseases. Thus, the IGVC data provide a basal level variation data in Indian population to study genetic diseases and pharmacology.

Figure 2.

Open in new tab Download slide

Pie chart depicting distribution of SNPs in IGVC according to genomic location. More than 50% of the SNPs belong to intronic regions and 15% are in coding exons.

Figure 3.

Open in new tab Download slide

Bar graph shows the functional annotation of candidate genes in IGVC according to gene–disease association.

Figure 4.

Open in new tab Download slide

Bar graph shows the mapping of candidate genes in significant pathways (after Bonferroni correction) of KEGG Pathway Database.

IGVBrowser also included HapMap SNP genotype data from Phases I + II and III of the HapMap project (http://hapmap.ncbi.nlm.nih.gov/downloads/gbrowse/2009-02_phaseII+III/gff/) based on NCBI B36 assembly, dbSNP b126 from 4 populations: Yoruba from Ibadan, Nigeria (YRI); Japanese in Tokyo, Japan (JPT); Han Chinese in Beijing, China (CHB); and CEPH (Utah residents with ancestry from northern and western Europe) (CEU). Additional annotation information including cytogenetic positions, link to pathway annotations in the Reactome knowledgebase and mRNA sequences were retrieved from HapMap in Generic Feature Finding (GFF) format. Annotation data in tab-delimited format for non-coding RNA genes and pseudogenes, OMIM-associated Genes, miRBase and snoRNABase, simple repeats, database of genomic variants were downloaded from UCSC genome annotation database (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database) based on build hg18.

Database structure, implementation and accessibility

The browser implements one of the widely used platform-independent genome annotation viewer Generic Genome Browser (GBrowse v1.69), developed by Stein et al. (7) as a part of the Generic Model Organism System Database Project (http://www.gmod.org). GBrowse is a combination of database and interactive webpage for displaying genomic information along with providing data interoperability across systems running the same software. Integrated annotation data from primary sources like NCBI, UCSC and HapMap have been linked with variation data from different ethnic populations in India. Compiled data processed into GFF format and complete human genome sequence as plain text files were loaded into MySQL relational database management system using a script of GBrowse. IGVBrowser provides users an interactive display of the genetic variation data. A user can query chromosomal region of interest, reference SNP ID, HGNC symbols, pathway name or any other unique feature recognized by database as a query. It allows researchers to upload their own data in GFF format and view it along with data available in IGVBrowser. Semantic zooming feature of GBrowse in the IGVBrowser allows better interactive viewing options. In addition, the resource is facilitated with sequence analysis servers maintained by NCBI and UCSC. Online data analysis plugins allows text dumps of visible features using a number of standard formats and also facilitates the download of sequence corresponding to selected region.

Future directions

Indian Genome Variation data would be enormously useful for the dissection of common complex diseases and in pharmacogenomics studies. Frequency profiles of markers on disease or drug-related genes that have been generated through the IGVC are being used to identify at-risk chromosomes, founders, LD-based mapping, tracing history of diseases in pharmacogenetics as well as reference populations for mapping relatedness (3,4,5,8–19). The interactive web browser, IGVBrowser, has been created as a central repository for the current and future dataset on Indian populations and is being made accessible in the public domain. The web browser has been made dynamic for periodic future updates. A possible integration of IGVBrowser with HGVbaseG2P (20) can enable researchers for cross study comparison among different populations of the world for disease–gene association study.

Funding

Indian Genome Variation project was funded by the Council for Scientific and Industrial Research programme CMM0016 and SIP0006. Funding for IGVBrowser and open access charge is provided by European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754—the GEN2PHEN project.

Conflict of interest. None declared.

Acknowledgements

The authors would like to thank Meenakshi Anurag, Pankaj Kumar for structuring the manuscript and Gajinder Pal Singh for correcting the draft and providing his valuable suggestions.

References

1

Habib

I

. ,

People's History of India (1) Prehistory

,

2001

Aligarh Historians Society and Tulika Books, Aligarh

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

2

Habib

I

. ,

People's History of India (2) The Indian Civilisation

,

2001

Aligarh Historians Society and Tulika Books, Aligarh

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

3

Bahl

S

,

Ahmed

I

,

Mukerji

M

.

Utilizing linkage disequilibrium information from Indian Genome Variation Database for mapping mutations: SCA12 case study

,

J. Genet.

,

2009

, vol.

88

(pg.

55

-

60

)

4

Indian Genome Variation Consortium.

The Indian Genome Variation database (IGVdb): a project overview

,

Hum. Genet.

,

2005

, vol.

118

(pg.

1

-

11

)

Crossref

PubMed

WorldCat

5

Indian Genome Variation Consortium.

Genetic landscape of the people of India: a canvas for disease gene exploration

,

J. Genet.

,

2008

, vol.

87

(pg.

3

-

20

)

Crossref

PubMed

WorldCat

6

Thorisson

GA

,

Muilu

J

,

Brookes

AJ

.

Genotype-phenotype databases: challenges and solutions for the post-genomic era

,

Nat. Rev. Genet.

,

2009

, vol.

10

(pg.

9

-

18

)

7

Stein

LD

,

Mungall

C

,

Shu

S

, et al.

The generic genome browser: a building block for a model organism system database

,

Genome Res.

,

2002

, vol.

12

(pg.

1599

-

1610

)

8

Sinha

S

,

Arya

V

,

Agarwal

S

, et al.

Genetic differentiation of populations residing in areas of high malaria endemicity in India

,

J. Genet.

,

2009

, vol.

88

(pg.

77

-

80

)

9

Kumar

J

,

Garg

G

,

Kumar

A

, et al.

Single nucleotide polymorphisms in homocysteine metabolism pathway genes: association of CHDH A119C and MTHFR C677T with hyperhomocysteinemia

,

Circ. Cardiovasc. Genet.

,

2009

, vol.

2

(pg.

599

-

606

)

10

Biswas

A

,

Sadhukhan

T

,

Majumder

S

, et al.

Evaluation of PINK1 variants in Indian Parkinson's disease patients

,

Parkinsonism. Relat. Disord.

,

2010

, vol.

16

(pg.

167

-

171

)

11

Bhattacharjee

A

,

Banerjee

D

,

Mookherjee

S

, et al.

Leu432Val polymorphism in CYP1B1 as a susceptible factor towards predisposition to primary open-angle glaucoma

,

Mol. Vis.

,

2008

, vol.

14

(pg.

841

-

850

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

12

Gupta

A

,

Maulik

M

,

Nasipuri

P

, et al.

Molecular diagnosis of Wilson disease using prevalent mutations and informative single-nucleotide polymorphism markers

,

Clin. Chem.

,

2007

, vol.

53

(pg.

1601

-

1608

)

13

Saha

A

,

Mukherjee

S

,

Maulik

M

, et al.

Evaluation of genetic markers linked to hemophilia A locus: an Indian experience

,

Haematologica.

,

2007

, vol.

92

(pg.

1725

-

1726

)

14

Mahajan

A

,

Chavali

S

,

Ghosh

S

, et al.

Allelic heterogeneity of molecular events in human coagulation factor IX in Asian Indians. Mutation in brief #965. Online

,

Hum. Mutat.

,

2007

, vol.

28

pg.

526

15

Sinha

S

,

Mishra

SK

,

Sharma

S

, et al.

Polymorphisms of TNF-enhancer and gene for FcgammaRIIa correlate with the severity of falciparum malaria in the ethnically diverse Indian population

,

Malar. J.

,

2008

, vol.

7

pg.

13

16

Prasher

B

,

Negi

S

,

Aggarwal

S

, et al.

Whole genome expression and biochemical correlates of extreme constitutional types defined in Ayurveda

,

J. Transl. Med.

,

2008

, vol.

6

pg.

48

17

Sinha

S

,

Qidwai

T

,

Kanchan

K

, et al.

Variations in host genes encoding adhesionmolecules and susceptibility to falciparum malaria in India

,

Malar. J.

,

2008

, vol.

7

pg.

250

18

Biswas

A

,

Maulik

M

,

Das

SK

, et al.

Parkin polymorphisms: risk for Parkinson's disease in Indian population

,

Clin. Genet.

,

2007

, vol.

72

(pg.

484

-

486

)

19

HUGO Pan-Asian SNP Consortium

Mapping human genetic diversity in Asia

,

Science

,

2009

, vol.

326

(pg.

1541

-

1545

)

Crossref

PubMed

WorldCat

20

Thorisson

GA

,

Lancaster

O

,

Free

RC

, et al.

HGVbaseG2P: a central genetic association database

,

Nucleic Acids Res.

,

2009

, vol.

37

(pg.

D797

-

D802

)

This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
December 2016	3
January 2017	3
February 2017	3
March 2017	1
April 2017	3
May 2017	4
June 2017	3
July 2017	4
August 2017	3
September 2017	9
October 2017	1
November 2017	1
December 2017	15
January 2018	11
February 2018	6
March 2018	13
April 2018	19
May 2018	12
June 2018	4
July 2018	15
August 2018	18
September 2018	13
October 2018	7
November 2018	24
December 2018	12
January 2019	16
February 2019	15
March 2019	11
April 2019	31
May 2019	23
June 2019	17
July 2019	22
August 2019	20
September 2019	17
October 2019	10
November 2019	18
December 2019	15
January 2020	15
February 2020	13
March 2020	15
April 2020	13
May 2020	15
June 2020	21
July 2020	37
August 2020	27
September 2020	20
October 2020	23
November 2020	14
December 2020	22
January 2021	9
February 2021	28
March 2021	37
April 2021	28
May 2021	36
June 2021	29
July 2021	24
August 2021	22
September 2021	42
October 2021	24
November 2021	34
December 2021	21
January 2022	25
February 2022	9
March 2022	25
April 2022	16
May 2022	12
June 2022	15
July 2022	12
August 2022	12
September 2022	11
October 2022	11
November 2022	9
December 2022	7
January 2023	6
February 2023	11
March 2023	17
April 2023	9
May 2023	29
June 2023	30
July 2023	43
August 2023	36
September 2023	44
October 2023	16
November 2023	9
December 2023	22
January 2024	27
February 2024	43
March 2024	46
April 2024	17

Article Contents

IGVBrowser–a genomic variation resource from diverse Indian populations

Abstract

Introduction

Data Source and Organization

Database structure, implementation and accessibility

Future directions

Funding

Acknowledgements

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

IGVBrowser–a genomic variation resource from diverse Indian populations

Abstract

Introduction

Data Source and Organization

Database structure, implementation and accessibility

Future directions

Funding

Acknowledgements

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only