Urgent Topic for Discussion : Format of globin gene cluster SNP databases
So far I have 6 SNP sets for the globin gene clusters (including our set at NKUA), and they are ALL DIFFERENT!!
I will outline the fields that each data set includes and then we can begin our discussion:
The P09 NKUA (my ) data set includes:
Only beta globin gene. Traditional nucleotide number of base substitution (and change), HGVS number (which I may have got wrong!), Method of detection, approximate frequency in he population.
Marina Kleanthous & Xenia Feleki (P01 CING):
Only beta globin gene. Traditional nucleotide number of base substitution (and change), HGVS number, Method of detection.
George Patrinos (P08 AG):
All beta globin cluster with ? SNPs copied from dbSNP??? And each SNP with an “rs” number. Maybe this is most “correct” but, for me at least, I cannot fathom how to relate the rs number to the globin gene regions themselves…….George please can you enlighten us in the simplest of terms perhaps?????
John Old (P05 ORH), data set:
Alpha and beta globin genes consisting of 2 extensive lists of SNPs found in individual samples eg DNA id, ethnic group, traditional nucleotide number of base substitution (and change), genotype of sample and beta gene mutation (if present), [in cis or trans to SNP ????].
Piero Giordano & Kees Harteveld (P18 LUMC):
All beta globin gene cluster. Despite the Dutch column headings I think that the data set includes gene region name (although some explanation is needed eg what is “preGframe” or “F2” or promAgextra” !!!!??), fragment co-ordinates according to NCBI, nucleotide substitution and exact SNP position (positie) according to NCBI.
Joseph and Alex (P07 UoM) data set:
Beta globin gene region plus XmnI site. I really had requested that the data be submitted in the form of a table listing the SNP, position, frequency in population etc …..so this data set will have to be converted (but wait for the consensus on what it should include etc)
In order to begin the discussion I will ask and answer some questions:
Q a) what is the aim of this database ?
Response to Qa) I THINK THE AIM SHOULD BE AN EASILY ACCESSIBLE REFERENCE TO i) IDENTIFY WHETHER A BASE CHANGE IN A SAMPLE IS POLYMORPHIC OR PATHOLOGICAL AND ii) LOCATE SNP’S THAT MAY BE USEFUL FOR LINKAGE ANALYSIS
Q b) What is the nucleotide numbering system that we should use to identify each SNPs to preclude ambivalence!?
Response to Qb) CLEAR ID OF SNP USING BOTH NCBI CLUSTER NUMBERING AND “rs” NUMBER OF AVAILABLE.
c) What other fields are relevant ?
Response to Qc) ETHNIC GROUP AND RELEVANT FREQUENCY? [THE METHOD FIELD IS PROBABLY IRRELEVANT IN THE END…..].
SHALL I CIRCULATE THE DATA SETS THAT EVERYONE SENT ME SO THAT EVERYONE CAN ALL SEE WHAT I AM TALKING ABOUT? WHOEVER IS INTERESTED THEN LET ME KNOW....
And let the discussion begin……
Regards,
Jan
NKUA