Sunday, October 31, 2004

VLinux 1.0 - Portable Linux-based toolbox

VLinux, a Knoppix based Linux distro offering a platform for most bioinformatics needs (well, on the DNA sequence / protein analysis side), just released version 1.0. It boots from a CD (no installation required) and is quite complete. To name just a few, it contains EMBOSS, RasMol, ClustalW, gene prediction, primer design and some philogeny software. Sadly, no microarray analysis software is included (Bioconductor would have been nice). Nonetheless, it can prove useful to have this portable toolbox on yourself, especially in situations where you don't have access to your laptop. BioKnoppix is another distro of the same genre, offering a different package set.

Saturday, October 30, 2004

Bioinformatics news

BioInform, a Bioinformatics news website (which is kinda rare). Found one headline interesting, but quickly noticed you have to 'subscribe' to read the actual news (nothing is available, not even a few sentences preview). I then searched for the price of the subscription, and eventually found it (its hidden somewhere in the pdf form, not the web based one). 95$ (which I assume to be US$) for 12 issues. So for 100 bucks, you get 3 months of 'news', most of which are more financial than bioinformatics related.

Thursday, October 28, 2004

Database : GeneCards

The GeneCards Database, can be defined as a Portal, automatically harvesting information and links from various gene centric databases. A short list of databases queried by GeneCards : GDB, OMIM, HUGO, LocusLink, SWISS-PROT, GeneLoc, Ensembl, InterPro, BLOCKS, KEGG pathways, Unigene... to name just a few. Links to these databases are available from each "GeneCard"... a webpage agglomerating information about a specific gene and its products. As you can see, the quantity of data available for each gene is quite impressive.

Let's take an example : TP53, a protein involved in apoptosis / cycle cell control, frequently mutated in cancers. At first you can notice something VERY useful for further data mining : Aliases and synonyms. The list IS extensive for most genes I know. When doing microarray data analysis, this feature is invaluable for Pubmed-driven analysis of protein-protein interaction.

Next come the (cute but kinda useless) genomic location; you can see where is the gene located and on which chromosome. Honestly I still have to find a use for this; the placement at the top of the page for non-crucial information still puzzles me.
Protein analysis follow, with general size (in amino acids), post-traductional modifications, 3d structures (if any), cellular localization, etc. Domains analysis via InterPro and BLOCKS can help identify genes with similar functions.

Gene Ontology, an index characterizing molecular and biological functions of genes, help to define in which cellular process the gene could be playing a role. Links to KEGG pathways where the gene's product is known to be implicated in are available. In this particular case, we can see that tp53 is a protein with apoptotic, DNA damage and cell cycle control properties.

Expression analysis in human tissues could have been a VERY interesting feature. Sadly, I find the implementation sadly lacking and unuseful; expression data coming from Affy arrays, SAGE and northern blots is provided for different tissues ranging from brain to kidneys... First, I've never been a fan of tissue analysis with modern transcriptome quantification techniques. Such a broad range of celltypes... you just average and dilute your signal, giving no specific information, really. So tp53 is expressed in the brain... in neurons? microglial cells? astrocytes? Each of those, or just 2/3? It get worse when you compare a tissue with another, or when you do differential expression analysis (following a treatment) on a whole tissue.

Various gene/mRNA/protein sequences links, along with similarity (with other proteins and across organisms) and known SNPs (Single nucleotide polymorphism), known mutations causing disorders / diseases, are provided. This can be useful for stuff ranging from specific primers design to evolution studies, among other things (finding diseases related with your Favorite Protein is great for justifying your work in grants application too, I heard).

I wish most genomics analysis software linked to GeneCards directly... much more useful data density than linking to 90 different, specialized databases. Only problem with this excellent resource : the website is usually slow (at least for me) and searches can take quite a long time (relatively speaking). This is particularly annoying when you want to lookup many genes... A downloadable (SQL) local version would be welcome :(

Wednesday, October 27, 2004

Coming soon : Useful databases Column

Data mining. If you get implicated in microarray data analysis, you'll do lots of it. And its VERY easy to get lost. Say you have a list of 200 differentially expressed genes (which is far from unusual). If you're lucky and experienced, you may know the function of 5-10% of those... now your job is to interpret the biological significance of the dataset. More often than not, genes have quite non-descriptive names, and may be identified by multiple names in publications; it can get quite complicated to extract relevant information about what is going on in the studied cell population.

Tomorrow I'm gonna start a new column on databases I use to get an idea about genes/proteins function, and do microarray interpretation in general. Sadly, data mining is still 90% "manual" database searching (as you may know, I'm working to fix that, among other things)... at least specialized databases regrouping information coming from multiple sources exist. The first database I'll "review" is probably GeneCards, because it's the more useful (in my opinion). Right now I'm tired as hell... so stay tuned!

Monday, October 25, 2004

Support Firefox!

Bioinformatics Jobs - Get serious

An impressive link collection used to successfully look for a bioinformatics position.

Saturday, October 23, 2004

Been away... and new Affy stuff!

I can`t go sleep without posting some fresh news however, so here you are :

Affymetrix released today this press release announcing its ENCODE array, destined to be used in the ENCODE project (the next logical step after the Human Genome Project). ENCODE stands for "Encyclopedia Of DNA Elements" and is a pilot project aiming to characterize selected non-coding regions (1% of known "junk" DNA) of the genome. The new array (named ENCODE01) can be used for de novo transcription mapping, Chromatin Immunoprecipitation (ChIP) Assays, and methylation studies. Its a tool that will help to understand these non-coding areas, which can be promoters, enhancers, silencers, insulators, or something else completely new. Biology still has secrets to reveal, stay tuned.

Wednesday, October 20, 2004

Human Genome : We're almost finished, part III!

Scientists released today in Nature their analysis of the most recent 'finished' version of the Human genome. For those who didn't followed this closely, in June 2000 a 'working draft' of the genome was released. For the general public, the sequencing was finished, but it's not quite the case. 150 000 gaps remained to be closed in this draft. The most recent one still have 341 gaps... which is a major improvement. Scientists judge that these holes in the sequence cannot be closed by the bruteforce approach used to do the rest of the genome; it'll require more research and technology development.

Some interesting facts :

- It seems that we have fewer genes than expected. 20,000-25,000, down from a ~35,000 estimation 4 years ago, and a 100,000 estimation before the release of the first draft. Why we should have have more genes than a rodent is obscure to me; the same functions are there, size doesn't matter. Superiority complex, I guess.

- 1000 new genes arose since our divergence with rodents 75 millions years ago, and we lost at least 33 genes, which are still in our genome but are non-functional (pseudogenes).

2800 researchers worldwide were implicated in the sequencing. A great example of International collaboration... The human genome sequence achievement was one of the most important factor leading to the hype surrounding bioinformatics. Such a massive amount of data can't be analyzed by other means! New insights will surface from this knowledge...

Read the whole story.

Tuesday, October 19, 2004

Microarrays + Neural Networks + Cancer = Coolness

Stumbled on some fresh news showing the potential of microarrays and bioinformatics. Researchers at the National Cancer Institute used a neural network approach (more on this in a bit) coupled to microarray expression profiling to predict the clinical outcome of a conventional treatment against a certain type of cancer (neuroblastoma).

Artificial neural networks
works in the same way as your neurons. They are pattern recognition algorithms that can 'learn', given a training sample set with known outcomes, how to predict the outcome of new sample sets. In this case, they fed the network with microarrays expression profiles from neuroblastoma patients which had a 'good' (no signs of cancer rebound after 3 years) or 'bad' (died of the disease) future following conventional treatment. The microarrays used quantify more than 25000 genes (and probably ESTs). When the network was sufficiently trained, they fed it with some other samples and were asked to predict the clinical outcome of the treatment (which was known by the scientists, but not by the neural network). 88% accuracy was achieved, which is outstanding. After optimization, they reduced the list of critical genes necessary to correctly predict the outcome to 19. Less genes to predict = less costly and more easy to use in a clinical assay. They are now doing more validation, but this approach could be used in a near future by physicians to give you a more appropriate treatment should you have this type of cancer and predicted to be non-responsive to the conventional therapy.

Read about this cool story on

Monday, October 18, 2004

Nonlinear Dynamics = Profit!

Nonlinear Dynamics announced today its results for the year and it turns out they made a good amount of money (>1 million) this year, which is a record for them. Good news in the bioinformatics industry are always good! They're mostly known for their 2D protein gels analysis software (Progenesis) which we use at the lab... it's the best, but it's also VERY expensive (more than 100k CAN). I guess that's a problem with such a limited market, you gotta charge more to turn out a profit... kinda like Genespring, one of the best microarray data analysis sofware, which cost 3000$ (CAN) / seat / year...

Having to fear massive open source development when you're not well established (i.e. Microsoft) require good nerves and a truckload of money. But these examples show that bioIT companies can be profitable; you just have to target a very specific field of bioIT and offer an outstanding solution to problems encountered by this field. This being said, to me the majority of future bioinformatics jobs are in the academic market. Maybe it's because I'm biased in this regard (never worked for an industry). Only time will tell.

Sunday, October 17, 2004

Bioinformatics : past and future trends

Came across an excellent article outlining when and how the bioinformatics hype started, current trends, what you should do if you want to get in the field, realistic training/job expectations... these are all questions that get asked VERY often by people interested in the field. You can read this excellent primer here (

Friday, October 15, 2004

Mission accomplished

According to these guys, the virus is able to package a genome big enough for my needs, but it will probably kill viral titers.

Wednesday, October 13, 2004

Had a wonderful idea this weekend to study viral latency... sadly it would require to add 1.2-1.5 kb of stuff to the HIV-1 genome, which I'm not sure is feasible due to capsid space constraints. 2 copies of the genome are packaged in each virion, and space is limited... I'm not quite sure what the maximum capacity is :| I'm not thrilled at the idea of doing lots of complex molecular biology manipulations (which require time... often a lot more than you predicted) to find out the thing just won't package!

I'm looking for ways to increase my exposure somehow. Had a mixed experience with sciforums... a post I made there attracted one person (IdleMind) with whom it was productive to exchange ideas, and lots of nazist comments along the line "Put all people with AIDS in camp and let them die".

Tuesday, October 12, 2004

Hacking Biology

I always get this fuzzy feeling while doing molecular biology... especially when doing new DNA constructions. When you get to make a cell (or a bacteria, or a virus) express a protein of your choice, it's special. But when you get to really modify an organism in order to make things works the way you intended, it's really nice. I think it's the best aspect of working in molecular biology related fields; you get to experiment with nature itself, discover things that no one has seen before, and if you're lucky enough, help people live a better life with your findings. Could we ask more?

Doing bioinformatics is nifty in this context, because you get to draw parallels between biology and informatics. The best part comes when I begin to think of biology in a programming kind of way, or the opposite. "reprogramming" a cell is akin to hacking Nature's (or God, or whoever) work. A cell is a cryptic system we don't know much about. Tools that we can use to interact with it are often indirect, because of the microscopic aspect of life; think of it as a black box. You input something, you analyze the output, but you don`t really get to see what's really happening; deduction and logic are your friends.

The funny part is that we are made of the system we are analyzing. Imagine if computers would become sentient, and get to analyze themselves (now what would they think when of of their kind would lose power? Get to CPU heaven? Philosophical question, too late to answer).

And all this is in the best interest of Mankind. At least, when greed doesn`t come in the way (Monsanto anyone? I`m totally FOR GM crops... but these guys should burn in Hell. Seriously.)

Monday, October 11, 2004

Last, if you wanna know the state of the bioinformatics industry in Canada, a (kinda) recent article on does a good job describing our strengths / weaknesses as a country, major obstacles in company creation, etc. A really good read, much more 'realist' than the media used to publish in the 'bioinformatics bubble' 2-3 years ago.

Saturday, October 09, 2004

Peace Nobel Prize winner

Here we go again. This year Peace Nobel Prize winner used her newly acquired icon status to say some crazy things about HIV-1. Apparently, western 'mad scientists' created HIV-1 to wipe out the black people of Africa. Reread the last line again. Yeah. Right. HIV-1 virions were discovered in plasma samples of a man taken in 1959 (Zhu, Tuofu, Bette Korber, Andre J Nahinias An African HIV-1 Sequence from 1959 and Implications for the Origin of the Epidemic Nature, 1998). According to this page :

"Analysis in 1998 of the plasma sample from 1959 was interpreted as suggesting that HIV-1 was introduced into humans around the 1940s or the early 1950s, which was earlier than had previously been suggested. Other scientists have suggested that it could have been even longer, perhaps around 100 years or more ago."

So I guess some mad scientist from the west (according to her) created HIV-1 out of boredom. Before the discovery of DNA's structure, restriction enzymes (1968) PCR (1983), and DNA sequencing (1975). Without these tools its impossible to 'engineer' a virus. Or maybe it was an evil government with a special agenda against black people living in Africa. Ignore the fact that SIV (simian immunodeficiency virus), EIV (equine), and FIV(feline) existed ages before that. I guess we could have used them as 'templates'? In the cold war, we would have targeted Africa, not the Russians or whoever else... because I guess we hate black people? Sufficient reason to invest billions of dollars to 'engineer' a deadly virus which we have no cure for. Damn good plan.

Some facts used to 'support' her point of view :

"Some say that AIDS came from the monkeys, and I doubt that because we have been living with monkeys (since) time immemorial, others say it was a curse from God, but I say it cannot be that."
"It's true that there are some people who create agents to wipe out other people. If there were no such people, we could have not have invaded Iraq"
"Us black people are dying more than any other people in this planet"
"Why has there been so much secrecy about AIDS? When you ask where did the virus come from, it raises a lot of flags. That makes me suspicious"
"AIDS (is) not a curse from God to Africans or the black people...It is a tool to control them designed by some evil-minded scientists, but we may not know who particularly did."
"Why is the rest of the world just watching, doing nothing while Africans are being wiped out? The rest of the world has abandoned us"

Sorry, but this kind of people make me sick. Kinda like when the South African Prime Minister didn't believe HIV-1 is causing AIDS when his country was the most affected by the epidemy, refusing international aid... very useful to your people, sir.

BiomedCentral - Bioinformatics

I talked about BioMed Central before, with their "open-source" style of publishing (which is a very good thing). If you want to familiarize with the kind of work a bioinformaticist is involved into, or just check recent papers published in the field, they also have a whole bioinformatics section. 5 Volumes published now, beginning in 2000, which are 100% open access. Be sure to check this out!

Thursday, October 07, 2004

C# and my programing project.

Been reading documentation on C# to update my skills from C++... as far as I can see, it's a little better with small improvements here and there, but nothing radical. I have mixed feeling about the 'departure' of pointers (ok, they're still accessible if I really want to), harder to make my program take down the whole system this way, but I kinda liked the intellectual challenge with pointers managements. Nostalgia I guess :)

Been looking at the NCBI toolkit too, in part because I plan to program an application that could (partially) automate the tedious post-microarray data mining. Inferring gene-gene interaction network from publications in Pubmed (not unlike Bibliosphere) coupled with other sources (GO ontology, protein domains, sequence similarity, etc) is the plan. It's the bioinformatics part of my PhD project and still at the conceptual phase, but would really save time if I can manage to do it.

Wednesday, October 06, 2004

Slashdot article

Mildly interesting article on Slashdot (re)covering resistance to HIV-1 by CCR5delta32 homozygotes... some good, informed comments in the thread (some by me). The general public opinion is also always interesting to see on such topics! See it at here.

Tuesday, October 05, 2004

Saturday, October 02, 2004

Got mail from a reader that I would like to share, because it reflects very well the general desensibilization to HIV/AIDS in developped countries. People don't see it as a threat anymore, because they don't see anybody dying of it. Because of this, they tend to protect less... etc.

Here it is, with my answer following :

> 2 - Do you think the amount of money poored into AIDS research is in proportion to that of other deadly diseases, considering the fact that fewer people in the US have died of AIDS than die every year of cancer? Even when taking global statistics into consideration, AIDS kills far fewer people than the flu, or malaria. Even diarrhea causes as many deaths as AIDS does, yet it receives no research funding to speak of.

If you look at death rates in Canada / United States, AIDS sure receive a lot of $$ per death, because the epidemic in developed countries is almost negligible compared to Africa / India. In Africa, 25 000 000 people (66% of world's total, estimated) are infected, with 3 millions new infections and 2.2 million people dead, only last year.
In some African countries, almost 50% of the working population (18-65) is dying.

Diarrhea is related to poor water quality, which is not easy to fix on a global scale. Drugs exist for malaria prevention... but there's always the problem of (guess what?) money and countries being far too poor to afford it. Cancer sure kills lots of people (and get its fair share of research dollar), but isn't infectious and affect (mostly) old people which usually have other problems. AIDS can affect anyone; even children. AIDS also has the potentiel to unstabilize whole regions politically, leading to civil wars etc.

See the World Health Organization report for last year for more information

Some more facts derived from this report :

In the worst-affected countries of eastern and southern Africa, if current infection rates continue and there is no large-scale treatment programme, up to 60% of today's 15-year-olds will not reach their 60th birthday.

In seven African countries where HIV prevalence is more than 20%, the average life expectancy of a person born between 1995 and 2000 is now 49 years – 13 years lower than in the absence of AIDS.

In Swaziland, Zambia and Zimbabwe, without antiretroviral programmes, average life expectancy is predicted to drop below 35.

Food for thoughts... don't be Developped-World centric ;)