Exercice for Bachelor students on using #ENCODE data in a browser

Almost two months ago, I asked on Twitter:

To which I got no answer. I had searched the web for such an exercice beforehand, without success.

The context is that I teach « Bioinformatics for genomics » to 3rd year Bachelor of Biology students, who do not know Unix, and I wanted to replace an old 2 hour hands-on exercice characterizing a Fugu conserved non coding element with a new exercice. The new exercice should illustrate the wealth of functional genomics data available through a web browser, and how to use it to answer biological questions.

Finding a good example was not easy. As Dan Graur would probably like to point out, most genes or genomic regions of some interest which I looked into had so much signal of various types as to make interpretation very difficult. So I turned to a data-driven starting point, and considered the genes with the most tissue-specific expression based on mouse and human RNA-seq (which we are analyzing anyway). Several of these had very boring profiles (I need a story for students in 2 hours), but we found a nice example in the end I believe.

Here is the exercice in its present form; probably we will modify it a bit next year. Feel free to reuse with citation. Feed-back to improve it or ideas for a better example are welcome.


Amelogenin regulates biomineralization, especially in tooth development. In eutherian mammals, the gene copies on the X and Y chromosome have diverged, following the transposition of one copy into an intron of the Rho GTPase-activating protein 6. In this exercise, we are going to use available genomic and functional genomic data, especially from the ENCODE consortium, to compare the regulation of these three genes: AMELX, the X chromosome copy, AMELY, the Y chromosome copy, and ARHGAP6, the GTPase-activating protein.

Go to the UCSC genome browser, find the genomic region containing AMELX in the human genome.UCSC browser

Zoom out 3X to see the gene in a broader context.

We will first consider histone modification marks. Click on « ENCODE Regulation » under « Regulation » to tune the presentation of the ENCODE tracks. Change all histone marks to « full » view. Back in the main view, right-click on each « Layered H… », chose « Configure », and chose « Overlay method »: « none ».

1. Which histone marks have peaks in the neighborhood of the AMELX gene?
2. In which cell lines?

Zoom out more, until you see the full ARHGAP6 gene. Compare the histone marks which characterize the AMELX and ARHGAP6 genes.

3. Are AMELX and ARHGAP6 active in the same cell lines?

As you showed the detailed histone marks, show the detailed transcription factor ChIP-seq results.

4. Are AMELX and ARHGAP6 regulated by the same transcription factors?

In a separate browser tabulation, find the genomic region containing AMELY in the human genome. Use similar settings to those used for the AMELX region.

5. How does the activity and regulation of AMELY compare to that of AMELX?

Zoom out 100X for both regions (AMELX and AMELY) (Might be slow).

6. Do you see more transcripts on the X or the Y?
7. Are the histone marks and transcription factor binding density consistent with this difference?
8. How do you interpret these observations in terms of biology of the X and Y chromosomes?

Ce contenu a été publié dans bioinformatics, genomics, training. Vous pouvez le mettre en favoris avec ce permalien.