Answers to "Exercise: Protein databases"
The numbers are found using UniProt on Feb 11, 2015 (release 2015_02).
Simple text mining
- How many hits do you find?
- How many of these hits are from Swiss-Prot?
- Can you identify the correct hit (i.e. see which one is actually human insulin and not something else)?
It's P01308 / INS_HUMAN (among the first ten hits).
QUESTION 2: How many hits are now left? How many of these are from Swiss-Prot?
1606 and 872
QUESTION 3: How many hits are now left? How many of these are from Swiss-Prot?
194 and 60
QUESTION 4: How many hits are now left?
- How did you do this?
by adding NOT name:receptor to the query box.
- How many hits are now left?
The contents of UniProt
- How many references are there?
- Why do you think insulin is such a highly investigated protein?
Because it is linked to a common and serious disease (diabetes) and used as a drug.
- Where do you find insulin?
It is secreted from the cell (this is written just below the section heading. Under GO - Cellular component you can find additional locations mentioned, such as endoplasmic reticulum lumen, but these are temporary stages on the way to secretion).
- Why do you think is it found there?
Because it is a hormone - it has to travel through the bloodstream to influence other cells.
QUESTION 8: How long is the signal peptide and the propeptide, respectively?
24 and 31 amino acids.
QUESTION 9: Which positions are in β-sheet conformation in insulin?
Positions 26-29, 48-50, 74-76, and 98-101.
Other databases linked from Swiss-Prot
No questions asked here.
QUESTION 10: How many proteins did you find, and what was the search string (the text in the search field)?
QUESTION 11: How many proteins do you find now, and what has the search string changed into?
annotation:(type:signal AND evidence:experimental)
QUESTION 12: How many proteins do you find now, and what is the search string?
annotation:(type:signal AND evidence:experimental) AND organism:"Homo sapiens (Human) "
QUESTION 13 a: How many proteins are there in UniProt from Neisseria gonorrhoeae with the default TaxID ?
QUESTION 13 b: How many proteins are there in UniProt from Neisseria gonorrhoeae in total (all strains and subspecies)?
34762 (almost six times as many!)
QUESTION 13 c: What does the search string look like now?
taxonomy:"Neisseria gonorrhoeae ".
QUESTION 14: How many proteins of maximum length 10 do you find?
length:[1 TO 10]
QUESTION 15: How many proteins are now left?
length:[1 TO 10] AND existence:"evidence at protein level"
QUESTION 16: How many proteins are now left?
length:[1 TO 10] AND existence:"evidence at protein level" AND fragment:no
QUESTION 17: How many human non-fragment proteins of maximum length 10 do you find in UniProt?
length:[1 TO 10] AND existence:"evidence at protein level" AND fragment:no AND organism:"Human "
QUESTION 18: Here they are in FASTA format:
>sp|P01358|GAJU_HUMAN Gastric juice peptide 1 OS=Homo sapiens PE=1 SV=1 LAAGKVEDSD >sp|P02728|GLEM_HUMAN Erythrocyte membrane glycopeptide OS=Homo sapiens PE=1 SV=1 CEGHSHDHGA >sp|P02729|GLUR_HUMAN Urine glycopeptide OS=Homo sapiens PE=1 SV=1 CEHSHDGA >sp|P22103|PNEU_HUMAN Pneumadin OS=Homo sapiens PE=1 SV=1 AGEPKLDAGV >sp|P01858|TUFT_HUMAN Phagocytosis-stimulating peptide OS=Homo sapiens PE=1 SV=1 TKPR