Thursday, October 28, 2004

Database : GeneCards

The GeneCards Database, can be defined as a Portal, automatically harvesting information and links from various gene centric databases. A short list of databases queried by GeneCards : GDB, OMIM, HUGO, LocusLink, SWISS-PROT, GeneLoc, Ensembl, InterPro, BLOCKS, KEGG pathways, Unigene... to name just a few. Links to these databases are available from each "GeneCard"... a webpage agglomerating information about a specific gene and its products. As you can see, the quantity of data available for each gene is quite impressive.

Let's take an example : TP53, a protein involved in apoptosis / cycle cell control, frequently mutated in cancers. At first you can notice something VERY useful for further data mining : Aliases and synonyms. The list IS extensive for most genes I know. When doing microarray data analysis, this feature is invaluable for Pubmed-driven analysis of protein-protein interaction.

Next come the (cute but kinda useless) genomic location; you can see where is the gene located and on which chromosome. Honestly I still have to find a use for this; the placement at the top of the page for non-crucial information still puzzles me.
Protein analysis follow, with general size (in amino acids), post-traductional modifications, 3d structures (if any), cellular localization, etc. Domains analysis via InterPro and BLOCKS can help identify genes with similar functions.

Gene Ontology, an index characterizing molecular and biological functions of genes, help to define in which cellular process the gene could be playing a role. Links to KEGG pathways where the gene's product is known to be implicated in are available. In this particular case, we can see that tp53 is a protein with apoptotic, DNA damage and cell cycle control properties.

Expression analysis in human tissues could have been a VERY interesting feature. Sadly, I find the implementation sadly lacking and unuseful; expression data coming from Affy arrays, SAGE and northern blots is provided for different tissues ranging from brain to kidneys... First, I've never been a fan of tissue analysis with modern transcriptome quantification techniques. Such a broad range of celltypes... you just average and dilute your signal, giving no specific information, really. So tp53 is expressed in the brain... in neurons? microglial cells? astrocytes? Each of those, or just 2/3? It get worse when you compare a tissue with another, or when you do differential expression analysis (following a treatment) on a whole tissue.

Various gene/mRNA/protein sequences links, along with similarity (with other proteins and across organisms) and known SNPs (Single nucleotide polymorphism), known mutations causing disorders / diseases, are provided. This can be useful for stuff ranging from specific primers design to evolution studies, among other things (finding diseases related with your Favorite Protein is great for justifying your work in grants application too, I heard).

I wish most genomics analysis software linked to GeneCards directly... much more useful data density than linking to 90 different, specialized databases. Only problem with this excellent resource : the website is usually slow (at least for me) and searches can take quite a long time (relatively speaking). This is particularly annoying when you want to lookup many genes... A downloadable (SQL) local version would be welcome :(

Back Home