Detecting remote homologs with BLAST and PSI-BLAST

Bioinformatics & Molecular Modelling

For all questions illustrate your answers fully, describing what you did at every step and providing output illustrating what output was obtained.
You need to include, embedded, within your submitted work all relevant output from online servers, as appropriate, as well as a written dialogue to fully illustrate what work you carried out. For all parts describe how you obtained your data by stating the bioinformatics portal used and the search strategy. Please quote accession numbers of all database files used.

Q1

Iron exists in the body as Fe2+, ferrous iron, or Fe3+, ferric iron. Ferrous iron is reactive and easily oxidised to ferric iron, and so there are a number of proteins that catalyse the oxidation of Fe2+ to Fe3+. These have ferroxidase activity.
You task is to identify the proteins that are known to have ferroxidase activity in the human genome.

a) Use the NCBI, EBI or Ensembl portals to retrieve one file for each of the several different ferroxidases found in humans. Each file should contain the complete mRNA sequence (it is not necessary to include the sequences in your answer).

b) Compile a table, similar to the one from tutorial 1, comparing the sequence elements of the mRNA of each different ferroxidase that you find. Include an extra column indicating the length of the protein. Comment on your findings.

c) Retrieve files for two genes encoding different, but homologous, ferroxidase proteins and compare the structure of the genes (it is not necessary to include the sequences in your answer). Comment on your findings.

For all parts describe how you obtained your data by stating the bioinformatics portal used and the search strategy. Accession numbers of all sequence files must be given. Any references used should be cited in your answer. Expect to retrieve less than 10 files. Pay attention to units in the table.

30 marks

Q2

The tumour suppressor protein p53 functions by inducing growth arrest or apoptosis. Here we aim to learn about the relationship between the human protein sequence (UniProt code P04637) and a number of other species.

(i) Firstly identify the domains present within the human p53 sequence using one of the domain databases discussed in lecture 2. State the range of amino acids within each domain.

(ii) Using UniProt locate the sequence for the full-length human p53 sequence. Then run a BLAST search for this sequence against the Swiss-Prot protein database and identify 7 other different species of p53 with close similarity to the human sequence (make sure they are full length sequences).

(iii) Give the Accession number for each protein sequence identified, together with the species. Give the percentage identity for each of the 7 sequences with that of the human sequence. State E values and the length of each sequence.

(iv) For all 8 sequences run a multiple sequence alignment using program Clustal Omega. Submit your sequence alignment. How many positions along the multiply aligned sequences are fully conserved between species?

(v) Display both the cladogram and phylogram trees for the aligned sequences and submit with the assessment.

Briefly discuss the evolutionary relationship between the 8 species as indicated by the phylogram and cladograms. Which species is the closest relation to the human species?

35 marks

Q3

Detecting remote homologs with BLAST and PSI-BLAST.
The NCBI website (https://www.ncbi.nlm.nih.gov) gives the option to run both BLAST and PSI-BLAST for a query protein sequence. For this question you need to use the NCBI website to run both BLAST and PSI-BLAST.
The enzyme adenosine deaminase (UniProt accession number P00813) and the enzyme imidazolonepropionase (UniProt accession number P42084) perform a similar function and are remote homologs, both belonging to the SCOP superfamily metallo-dependent hydrolase. The two sequences have a percentage identity of only 18%.
Perform a protein-protein BLAST search using the sequence for the adenosine deaminase sequence (UniProt accession number P00813) searching against the UniProtKB/Swiss-Prot database. Search the results for the imidazolonepropionase enzyme (UniProt accession number P42084). Now repeat your search using PSI-BLAST and compare your results from those obtained from protein-protein BLAST.
Discuss what you observe from the BLAST and PSI-BLAST searches. Discuss which of the two search methods proved most effective and why. Include output as appropriate to illustrate your answer. Include the pairwise alignment for the 18% identical sequences P00813 and P42084 obtained from your output.

find the cost of your paper
Order now to get your homework done

What is the production possibility frontiers of clothing and soda for Brazil and the United States

Suppose that there are two products: clothing and soda. Both Brazil and the United States produce each product. Brazil can produce 100,000 units of clothing per year and 50,000 cans….

Suppose that there are two products: clothing and soda. Both Brazil and the United States produce each product. Brazil can produce 100,000 units of clothing per year and 50,000 cans….