Answers to "Exercise: Protein databases"
The numbers are found using UniProt on Feb 14, 2014
Simple text mining
- How many hits do you find?
- How many hits are from Swiss-Prot? (tip: Click on "Show only reviewed")
- Can you identify the correct hit (i.e. see which one is actually human insulin and not something else)?
It's P01308 / INS_HUMAN (among the first ten hits).
QUESTION 2: How many hits are now left (still only in Swiss-Prot)?
QUESTION 3: How many hits are now left (still only in Swiss-Prot)?
QUESTION 4: How many hits are now left?
- How did you do this?
by adding NOT name:receptor to the query box.
- How many hits are now left?
The content of Swiss-Prot
- How many references are there?
- Why do you think insulin is such a highly investigated protein?
Because it is linked to a common and serious disease (diabetes) and used as a drug.
- Where do you find insulin?
It is secreted from the cell (this is indicated in Subcellular location under General annotation (Comments) and Cellular component under Ontologies).
- Why do you think is it found there?
Because it is a hormone - it has to travel through the bloodstream to influence other cells.
QUESTION 8: How long is the signal peptide and the propeptide, respectively?
24 and 31 amino acids.
QUESTION 9: Which positions are in β-sheet conformation in insulin?
Positions 26-29, 48-50, 74-76, and 98-101.
Note: As some of you may have noticed, positions 74-76 are within the propeptide and cannot participate in a beta-sheet in mature insulin. Insulin in its biologically active state is a homodimer (where each subunit consists of an A- and a B-chain), and the beta-sheet is formed between the two subunits, with positions 48-50 from each B-chain forming the strands.
Other databases linked from Swiss-Prot
No questions asked here.
QUESTION 10: How many proteins did you find, and what was the search string (the text in the Query box)?
QUESTION 11: How many proteins do you find now, and what has the search string changed into?
annotation:(type:location "secreted" confidence:experimental)
QUESTION 12: How many proteins do you find now, and what is the search string?
annotation:(type:location "secreted" confidence:experimental) AND organism:"Human "
QUESTION 13 a: How many proteins are there in UniProt from Neisseria gonorrhoeae with the default TaxID ?
QUESTION 13 b: How many proteins are there in UniProt from Neisseria gonorrhoeae in total (all strains and subspecies)?
32316 (almost 20 times as many!)
QUESTION 13 c: What does the search string look like now?
taxonomy:485 or taxonomy:"Neisseria gonorrhoeae ", depending on where you look.
QUESTION 14: How many proteins of maximum length 10 do you find?
length:[1 TO 10]
QUESTION 15: How many proteins are now left?
length:[1 TO 10] AND existence:"evidence at protein level"
QUESTION 16: How many proteins are now left?
length:[1 TO 10] AND existence:"evidence at protein level" AND fragment:no
QUESTION 17: How many human non-fragment proteins of maximum length 10 do you find in UniProt?
length:[1 TO 10] AND existence:"evidence at protein level" AND fragment:no AND organism:"Human "
QUESTION 18: Here they are in FASTA format:
>sp|P01358|GAJU_HUMAN Gastric juice peptide 1 OS=Homo sapiens PE=1 SV=1 LAAGKVEDSD >sp|P02728|GLEM_HUMAN Erythrocyte membrane glycopeptide OS=Homo sapiens PE=1 SV=1 CEGHSHDHGA >sp|P02729|GLUR_HUMAN Urine glycopeptide OS=Homo sapiens PE=1 SV=1 CEHSHDGA >sp|P22103|PNEU_HUMAN Pneumadin OS=Homo sapiens PE=1 SV=1 AGEPKLDAGV >sp|P01858|TUFT_HUMAN Phagocytosis-stimulating peptide OS=Homo sapiens PE=1 SV=1 TKPR