Welcome to PIGED!

 

About this site

This site provides a front-end web application for conserved sequence identification in upstream non-coding regions of annotated prokaryotic genome sequences available at the National Center for Biotechnology Information (NCBI). The most recent update includes completely sequenced prokaryotic genome sequences as of November 2004. To see our list of included genomes, click here.

To return to this page at any time, follow the PIGED Help link in the left frame.

About the data

Upstream intergenic regions were mined from the genome sequences using a PERL program called IGSpy2.3, and stored in a relational database, which can be accessed by several avenues (with links in the left frame):

1) SynSearch – Upstream intergenic sequences can be obtained for any gene of interest simply by keying in the desired gene synonym in FASTA header format (including “>”), available from genome sequence .ptt files at NCBI.

2) COGSearch – Gene synonyms can be obtained for orthologous genes as determined at the Cluster of Orthologous Genes (COG) web-site at NCBI ( here). Intergenic sequences can then be obtained by copy and pasting COGSearch results into SynSearch.

3) SeqSearch - A query will be performed on the database to find matching sequences, returning the synonym and full sequence of the matching IG regions in FASTA format.

4) Microarray Data – Provides a link to prokaryotic microarray experimental data at NCBI Gene Expression Omnibus (GEO). Data can be clustered and analyzed at GEO, resulting in groups of synonyms for accessing sequences of interest on this web-site. IGSpy2.3 and statistics of intergenic sequence retrieval using this program for each prokaryotic genome are available using the links in the left frame.

About data analysis

The research motivation behind the creation of this site stems from interests in evolution of gene regulatory. A position-dependent weight scoring matrix approach is available, using previously available open source programs (Studholme, D.J. and Pau, R.N. (2003) BMC Microbiology 3:24-33).

Following sequence retrieval, ClustalW sequence alignment can be performed on any sequence queries by clicking the "Send Results to ClustalW" at the bottom of the page. A “consensus” scan can then be performed by clicking the link at the top of the ClustalW alignment page. Here, a gap-free sequence alignment region (<100 characters in length) is chosen (using the ruler at the top of the page) for conversion to a position-dependent weight matrix by entering the appropriate start and stop position information. The user then chooses an organism’s genome sequence or simply intergenic sequences, parsed from a particular genome based upon a respective .ptt file, to scan. Lastly, a cut-off value is entered to determine range of output. Although the cut-off value is empirically determined and dependent on both weight matrix and genome sequence chosen, the authors have found values from 70-90 to be useful over numerous searches.

About the output

Scans of intergenic regions will produce output exhibiting score for the match, strand, genome sequence position, sequence match, proximate gene synonym, and proximate gene description. Scans of total genome sequences will return output containing score, strand, genome sequence position, and sequence match.