Wednesday, October 27, 2004

Coming soon : Useful databases Column

Data mining. If you get implicated in microarray data analysis, you'll do lots of it. And its VERY easy to get lost. Say you have a list of 200 differentially expressed genes (which is far from unusual). If you're lucky and experienced, you may know the function of 5-10% of those... now your job is to interpret the biological significance of the dataset. More often than not, genes have quite non-descriptive names, and may be identified by multiple names in publications; it can get quite complicated to extract relevant information about what is going on in the studied cell population.

Tomorrow I'm gonna start a new column on databases I use to get an idea about genes/proteins function, and do microarray interpretation in general. Sadly, data mining is still 90% "manual" database searching (as you may know, I'm working to fix that, among other things)... at least specialized databases regrouping information coming from multiple sources exist. The first database I'll "review" is probably GeneCards, because it's the more useful (in my opinion). Right now I'm tired as hell... so stay tuned!

Side note : Science is all about patience. Because you have to restart, reconsider, redesign, rethink and redo. A lot. Biology doesn't always behave like you expected!

