Mouse Retina SAGE Library | [Home] [Libraries] [Images] |
|
Introduction and Use of the Database
The database on this site comprises primarily the data from Blackshaw et al. 2004 and Blackshaw et al. 2001. The data are organized so that one can find the SAGE tags or in situ hybridization images represented therein. (For more information about SAGE, see http://www.sagenet.org/contact/index.html or http://www.ncbi.nlm.nih.gov/SAGE). One can search by entering a gene name, a Unigene number, a SAGE tag sequence, an in situ hybridization probe accession number, a human or a mouse genetic map location. One can perform batch searches using Unigene or Accession numbers, in which case the output will be the sum of the reliable tags for those genes. A search will take you to Unigene entries that might match your query, and you then choose the desired Unigene. The chosen Unigene is matched to an in situ hybridization image that was made from an RNA probe originating with a particular cDNA clone, which will have a specific accession number associated with it. (Many genes do not have associated images.) In addition, the search will provide the SAGE tags that are associated with the matching Unigene. The SAGE tags are only 14 bp, so they cannot uniquely identify a genomic location. They are thus matched to the transcriptome. A match to any location in an expressed sequence will give you the "All tags" set of data. However, various algorithms have been devised to make a best guess match, based upon the fact that most SAGE tags should originate from the 3' most NlaIII site of a polyA+ RNA (http://www.ncbi.nlm.nih.gov/SAGE). "Reliable" tags take this into account. You can choose whether to view all tags or reliable tags on the pull down menu. You may also note a number in parentheses after a reliable tag. This number indicates that the reliable tag also matches another Unigene. If you click on the number, it will take you to the other Unigene(s) that have that tag as a reliable tag. Dynamic Nature of SAGE Tag and Gene Relationships It is important to note that the relationships among Unigenes, probes (i.e. accession numbers), and SAGE tags are dynamic. NCBI changes the assignment of accession numbers to Unigenes and also changes the assignments of SAGE tags to Unigenes. Thus, a SAGE tag that matches Unigene X on a particular iteration of Unigene may indicate an in situ image for a probe assigned to Gene X in one build, but in a later build, the same SAGE tag may be assigned to Gene Y, and the accession number of the probe might remain with gene X, or even switch to Unigene Z. We have built this database to reflect the changes over time in such assignments. If you are using the Tables and Figures from the PloS paper, take note of whether the data you are using is based upon Locuslink accession numbers, probe accession number, SAGE tags, or Unigene IDs. The accession numbers of the probes do not change, and thus the relationship of a probe to an image will not change. The SAGE tags themselves will also never change, only their relationships to Unigenes and perhaps images. Finally, the probe accession numbers are mapped onto LocusLink accession numbers (i.e. NM numbers) using a fixed BLAST-verified set of mappings current as of 3/31/04, and thus the relationship of probes to gene identities do not change. A summary of the data mapping relationships is shown below: The small number of probes that do not have corresponding LocusLink IDs are mapped directly onto their Unigene ID. If a search in a given category does not produce a result, try searching for the gene and images using another field. SAGE Data for Comparison The SAGE tag data include data from SAGE libraries from other sources for purposes of comparison. All tags are normalized to 100,000 for ease of comparison. The retinal SAGE libraries, as well as the hypothalamus, contain from 50,000 to 60,000 tags. Other libraries vary in number of tags. Libraries derived from mouse cerebellum were generated by Dr. Gregory Riggin's lab and are available at the NCBI as GSM767, GSM787, and GSM788. The 3T3 fibroblast library was generated by Dr. Victor Velculescu, and is available at http://www.sagenet.org. The cerebral cortex libraries were produced by Dr. Seong-Seng Tan's lab (reference below). Clusters The retinal SAGE data were clustered using several different clustering algorithms. The algorithm that produced the best match of gene relationships based upon what is known about retinal gene expression patterns from classical studies was the new Poisson based clustering algorithm of Cai et al. (2004). The clusters computed by this method are shown in Supplemental Table 3. By clicking on a Unigene in this Table, one can access the images and other information for this gene provided on this site. Clusters of related in situ hybridization gene expression patterns also were created. These were based on the location of the in situ hybridization signal within the retinal layers over time during development. Supplemental Table 6 contains the full list of expression patterns generated by visual inspection followed by subjective evaluation of the similarities of patterns ("user annotated clusters"). Supplemental Table 7 has the full list of cellular expression clusters generated by clustering software that used as input the evaluation of signal intensity in the various retinal layers ("machine generated clusters"). A fourth clustering method is shown on the page itself and uses a nearest neighbor clustering algorithm. (This method and therefore these clusters are not included in the PloS paper.) The nearest neighbor clustering algorithm computes a similarity score between the selected gene and all others in the database, then returns then most similar genes. The similarity score is the euclidean distance between two 20-dimensional vectors corresponding to the sums of the reliable tags by library for each gene. In Raw Data mode, the actual sums are used; in Normalized Data mode, the sums are first normalized (avg = 0, stdev = 1). This clustering method allows the user to choose the number of most closely related genes to examine. Input this number using the pull down menu. This method was set up by Griffin Weber (weber@fas.harvard.edu). References Blackshaw, S, Harpavat, S, Trimarchi, J, Cai, L, Huang, H, Kuo, W, Lee, K, Fraioli, R, Cho, S, Yung, R, Asch, E, Wong, W, Ohno-Machado, L, Weber, G,, Cepko, C.L. Genomic Analysis of Mouse Retinal Development. PLoS Biology 2(9). (2004) Blackshaw S, Fraioli RE, Furukawa T, Cepko CL. Comprehensive analysis of photoreceptor gene expression and the identification of candidate retinal disease genes. Cell 107:579-589 (2001). Cai, L, Huang, H, Blackshaw, S, Liu, J.S, Cepko, C, Wong, W. Clustering analysis of SAGE data: A Poisson Approach. Genome Biology In Press (2004) Gunnersen JM, Augustine C, Spirkoska V, Kim M, Brown M, Tan SS. Global analysis of gene expression patterns in developing mouse neocortex using serial analysis of gene expression. Mol Cell Neurosci. 2002 Apr;19(4):560-73. Please address all questions about the web site to Griffin Weber at weber@fas.harvard.edu. |