dbTEU: Protein Database for Trace Element Utilization

Help & FAQ

1. Background

Trace elements are needed in minute quantities for proper growth, development, and physiology of all organisms. These dietary minerals include iron (Fe), zinc (Zn), copper (Cu), molybdenum (Mo), tungsten (W), nickel (Ni), cobalt (Co), manganese (Mn), chromium (Cr), vanadium (V), selenium (Se), iodine (I), and possibly other trace elements. Trace elements provide proteins with unique coordination, catalytic, and electron transfer properties. These properties are employed by organisms in key functions in a variety of pathways, resulting in the dependence of organisms on various trace elements.

The majority of trace elements are metals. Fe and Zn are thought to be the most abundant transition metal ions that are used by all organisms. Other metals, such as Cu, Mo, Ni and Co, are utilized by various metalloproteins in a wide range of organisms in all three domains of life. Most metals are directly used as cofactors, whereas Mo and Co are mainly used to form molybdopterin cofactor (Moco) and cobalamin (vitamin B12), respectively. The use of Se is quite different from any of other trace element as it is mainly used in the form of selenocysteine (Sec, the 21st amino acid) which is found in a number of selenoproteins in the three domains of life.

Recently, genome-wide comparative genomic approaches have been used for the analysis of trace element utilization in a variety of genomes. We previously analyzed the utilization of five trace elements, Cu, Mo, Ni, Co and Se, in sequenced organisms at the level of both utilization trait (such as high-affinity transporters and cofactor biosynthesis pathways) and metalloproteomes/selenoproteomes. The reason we are interested in these elements is that they are utilized by many organisms in all three domains of life, but are characterized by a limited number of user proteins. The analysis of their utilization patterns may provide important information about the function and evolution of trace elements.

Here, we created a protein database (dbTEU) for the utilization of the five trace elements examined. This database is designed to facilitate the studies on understanding the roles of these trace elements in individual organisms. Further data will be included for additional genomes/trace elements.

2. Database content

Currently, the database contains ~16,500 protein sequences from more than 700 sequenced organisms among the three domains of life. These proteins include high-affinity transporter systems and/or user proteins (metalloproteins and selenoproteins) for the five trace elements (Click here for a complete list of all known transporters and user proteins for each trace element). Users can browse all trace element-related protein sequences in any single organism. NCBI link is also available for most proteins for further details. Users can also retrieve data by selection of specific trace element, specific organism, specific protein category (such as transporter and user) or specific protein family.

3. Database browse

The "Browse organism" page contains all sequenced organisms among the three domains of life. Each domain is further divided into a variety of phyla or clades. Users can click on the organism name to retrieve all trace element-related proteins detected the selected organism.

Output format

Trace element: related trace element;

Protein category: transporter or user;

Protein family: protein family name;

NCBI accession number: link to NCBI protein database;

Sequence: link to protein sequence;

Structural information (PDB): BLAST search against PDB database (at NCBI);

Functional association (STRING): link to STRING database (NCBI).

Proteins are sorted by "Trace element", "Protein category" and "Protein family".

4. View protein family

The "View protein family" page contains an interactive view of all known protein families that use these trace elements. User can click on each group name to see the underlying organization and relationships.

5. Database search

The "Search" page provides users with three options to query the database.

Option 1: There are three steps:

Step 1. Select trace element: select single or multiple trace elements from Cu, Mo, Ni, Co and Se;
Step 2. Select an organism: select one or all organisms;
Step 3. Select protein category: select transporter and/or user protein.

Option 2: There are two steps:

Step 1. Select an organism: select one or all organisms;
Step 2. Select a protein family: select one or all protein families that are related to trace element utilization (Click here for a complete list of all of these protein families).

Option 3: Key words search:

Users need to input key words that they are interested in. Multiple words should be separated by comma. For example, if one wants to search for all Cu users in Homo sapiens, the input should be "Homo sapiens, Cu user". Comma represents "and".

The results will be generated according to the fields users choose (Note: The output format is as same as that of "Database browse"). For example, if a user select "Cu" and "Se" for "Select trace element", "All" for "Select an organism", and "Transporter" and "User protein" for "Select protein category", the query result will be all the Cu transporters, all Cu-dependent proteins and all selenoproteins identified in all sequenced organisms (Note: there is no known Se transporter).

6. BLAST search

The "BLAST" page provides users with NCBI BLAST tools to search against the database.

To use BLAST search, a user needs to input a protein sequence, select the BLAST programs (currently only blastp and blastx) and select a dataset (the whole database) to search against. User can also modify other default settings, such as E-value, low complexity filter and the number of alignments.

On the result page, graphical overview of the alignments is shown on the top. Each homologs of the query protein is shown with protein description, accession number, related trace element and category, source organism and E-value.

7. Database statistics

The "Statistics" page shows the general distribution of the number of proteins that belong to different categories:

Co user: Co/B12-dependent proteins;

Cu transporter: Cu transport systems;

Cu user: Cu-dependent proteins;

Mo transporter: Mo transport systems;

Mo user: Mo-dependent proteins;

Ni user: Ni-dependent proteins;

Ni/Co transporter: Ni/Co transport families;

Se user: Selenoproteins.

Users can click on the number that follows each category to retrieve all the sequences (Note: The output format is as same as that of "Database browse"). The "Statistics" page will update automatically if additional sequences are added into the database.

For a list of protein families in each groups, please click here.

8. Database download

User can download either the whole database or a specific subset of sequences.

9. Some frequently asked questions (FAQs)

How to retrieve all trace element-related proteins in Escherichia coli or human?

Go to "Browse organism" page, click on "Escherichia coli" or "Homo sapiens", you will get all proteins that are related to the utilization of the five trace elements.

How to retrieve all trace element-dependent proteins in Escherichia coli or human?

Go to "Search" page, Select all five trace elements for "Select trace element" option, select "Escherichia coli" or "Homo sapiens" from the organism list, and select "user protein" for "Select protein category", click on "Search" button, you will get all trace element-dependent proteins in these organisms.

How to submit new data to dbTEU database?

Currently we do not allow users to directly add new entries into the dataset. If you want to submit new sequences, please contact Vadim Gladyshev or Yan Zhang. In addition, the database will be updated periodically.