Genomika: analýza a algoritmy - Cvičení 5

DNA vs. PROTEIN

Toto je sekvence evolučně konzervovaného kvasinkového genu, jehož rostlinný homolog hledáte.

>Scsec19 
GGGATTGTAGATGTAGTTTCAACACGTCGGCTGATTTATCCCGATTTTGTTAGTAGAAAAGGTTCTACTT 
CATTCTTGCTTGAGACGTCGTCCCATCAAATTTCTAACATAGTCTTTTTTCAAGGAAGGATATTTTTCAA 
AGCAGGACTGCAATTAGTCTTTTCCTTTTCTTTACTCCCCTTCCATCATAACTGTTAGTGAATAACCACT 
TATATAGCATAACACAATGGATCAAGAAACAATAGACACTGACTACGACGTGATTGTCTTAGGTACCGGT 
ATTACCGAATGTATCTTATCTGGTTTACTCTCTGTAGATGGAAAAAAGGTATTACATATTGACAAGCAAG 
ACCATTATGGTGGCGAAGCTGCTTCTGTGACCTTATCTCAATTGTATGAAAAATTTAAACAAAATCCGAT 
CAGTAAAGAGGAACGGGAGTCCAAGTTTGGTAAAGATAGAGATTGGAATGTCGACTTAATTCCTAAATTC 
CTGATGGCCAATGGTGAGCTGACAAATATTTTAATACATACCGATGTGACCAGATATGTCGATTTCAAGC 
AAGTTTCTGGCTCCTACGTTTTTAAGCAAGGCAAAATTTACAAAGTGCCAGCTAATGAAATAGAAGCCAT 
TTCATCGCCATTGATGGGTATTTTTGAAAAACGTAGAATGAAGAAATTTTTAGAATGGATTAGCTCTTAC 
AAAGAAGATGACTTGTCCACTCATCAAGGATTAGACTTAGACAAGAATACCATGGATGAAGTGTATTATA 
AATTTGGGTTAGGCAATTCTACCAAAGAATTCATCGGTCATGCAATGGCTTTATGGACCAATGATGACTA 
CTTACAACAACCTGCTAGGCCATCGTTTGAGAGGATTTTGTTATATTGCCAAAGTGTTGCCCGTTACGGT 
AAATCACCTTATTTGTATCCTATGTATGGGTTAGGCGAACTTCCACAAGGATTTGCTCGTTTGTCGGCTA 
TTTACGGTGGTACTTACATGCTAGACACTCCAATTGATGAAGTATTGTATAAAAAAGACACAGGAAAATT 
TGAAGGGGTCAAGACTAAGCTGGGAACTTTCAAGGCCCCATTGGTTATTGCTGATCCAACTTATTTTCCC 
GAAAAATGTAAATCTACTGGTCAAAGAGTTATTAGAGCCATCTGTATTCTTAACCATCCAGTTCCGAACA 
CCAGTAACGCGGATTCTTTACAAATTATTATCCCACAAAGCCAACTGGGAAGGAAAAGCGATATATACGT 
TGCGATTGTTTCAGATGCGCATAACGTTTGCTCCAAGGGTCACTATTTAGCAATTATTTCTACAATCATT 
GAAACTGATAAACCACATATAGAATTAGAGCCTGCTTTCAAACTTCTGGGACCAATCGAAGAAAAATTCA 
TGGGAATTGCCGAATTATTTGAACCAAGAGAAGACGGCTCTAAGGATAACATTTACTTATCCAGATCATA 
CGACGCATCCTCTCATTTCGAATCCATGACTGACGATGTTAAAGATATTTACTTCAGAGTAACAGGCCAC 
CCATTAGTTCTAAAACAAAGACAAGAACAAGAAAAGCAGTAAATTCATACCTTTACGACTAAAGCAGCAA 
TTGGAGGGTAAACTTATTTTTTCC

Matice

Porovnejte následující dvě sekvence pomocí BLASTP nebo FASTA Otestujte efekt použití různých matic příbuzností: BLOSUM62 vs. PAM30 vs. PAM70 a různých gap penalties (11,1 vs. 6,2)

Sekvence:

 >gi_13397640 unknown protein, Brassica napus
 MSSAPSPGTGSPPSPPSNSTTTTPPPASAPPPTTPSSPPPPSTIPTSPPPSSRSTPSAPPPSPPTPSTPG
 SPPPLPQPSPPAPTTPGSPPAPVTPPTRNPPPSVPGPPSNPSREGGSPRPPSSPSPPSPSSDGLSTGVVV
 GIAIGGVALLVIVTLICLLCKKKRRRDEEDAYYVPPPPPPGPKAGGPYGGQQQQWRQQNATPPSDHVVTS
 LPPPPKAPSPPRQPPPPPPPPFMSSSGGSDYSDRPVLPPPSPGLVLGFSKSTFTYEELARATNGFSEANL
 LGQGGFGYVHKGVLPSGKEVAVKQLKVGSGQGEREFQAEVEIISRVHHRHLVSLVGYCIAGAKRLLVYEF
 VPNNNLELHLHGEGRPTMEWSTRLKIALGSAKGLSYLHEDCNPKIIHRDIKASNILIDFKFEAKVADFGL
 AKIASDTNTHVSTRVMGTFGYLAPEYAASGKLTEKSDVFSFGVVLLELITGRRPVDANNVYVDDSLVDWA
 RPLLNRASEQGDFEGLADAKMNNGYDREEMARMVACAAACVRHSARRRPRMSQIVRALEGNVSLSDLNEG
 MRPGQSNVYSSYGGSTDYDSSQYNEDMKKFRKMALGTQEYNATGEYSNPTSDYGLYPSGSSSEGQTTREM
 EMGKIKRTGQGYSGPSL
 >gi_1345852_sp_P41242|MATK_MOUSE Megakaryocyte-associated tyrosine-protein kinase
 MARRSSRVSWLAFEGWESRDLPRVSPRLFGAWHPAPAAARMPTRWAPGTQCMTKCENSRPKPGELAFRKG
 DMVTILEACEDKSWYRAKHHGSGQEGLLAAAALRHGEALSTDPKLSLMPWFHGKISGQEAIQQLQPPEDG
 LFLVRESARHPGDYVLCVSFGRDVIHYRVLHRDGHLTIDEAVCFCNLMDMVEHYTKDKGAICTKLVKPRR
 KQGAKSAEEELAKAGWLLDLQHLTLGAQIGEGEFGAVLQGEYLGQKVAVKNIKCDVTAQAFLDETAVMTK
 LQHRNLVRLLGVILHHGLYIVMEHVSKGNLVNFLRTRGRALVSTSQLLQFALHVAEGMEYLESKKLVHRD
 LAARNILVSEDLVAKVSDFGLAKAERKGLDSSRLPVKWTAPEALKNGRFSSKSDVWSFGVLLWEVFSYGR
 APYPKMSLKEVSEAVEKGYRMEPPDGCPGSVHTLMGSCWEAEPARRPPFRKIVEKLGRELRSVGVSAPAG
 GQEAEGSAPTRSQDP

Tréning biologického myšlení

Právě jste dostali sekvenci DNA kódující části bakteriálního genu ze sekvenačního servisu s poznámkou, že sekvenace nedopadla moc dobře a že lze v sekvenci očekávat chyby a frameshifty.

>rc-m2-30
TGCCCTGCGCCGCGCTATTCGACGCCATCATGGACTGCCTGAAGGAGCATGGCGAGGTGCGCACCATTCG
CGTGGCTGCGGCGGACGTGAACGGGGTGGCAACGGGTAAGCGCATACCCGCACGTTTCGCAAGCAAGGTT
TTTTCCGAGGGAACACGGTAACCGTTCTCGGTGATGAACCTCGACATCTGGGGCGAGGACATCGAGGAAA
GTCCGCTGGTTTTTGAAACCGGCCTCTGCGATGGCCTGTTGCGCGCGACCGAGAGGCCCTTCATGCCGAT
GCCCTGGCTCGACCCACCGACGGCGCTACTGCCGATCTGGATGTATCACATGGATGGCCGCCCCTATTCG
GCCCTCCACGGCAGGCGCTGGCGGCGGTCAAGGACCGCTACACCGTAAAGGGCCTGACGGGCGTGGTGGC
GACGGAACTTGAAGCTGCTGTGATCGACGACAGCGGCACGATTCTGCGCGTGCCGCCCTCGCCCCGTTCC
GGCAAGCGCCGCACCGGGCCCGAAATCCTGTCGCTGCGTTCGCTTGACGCCTTTGACGGCTTCTTCACCG
CGCTTTACGCGGCCTGCGAGGTGATGGACATTCCGGCAGATATGGCGATTTCCGAAACCGCCTCGGGGCA
GTTCGAGATCAACCTGATGCAGTAGGCCGATCCGCGGAAGTCCGCCGATGACACCTGGCTGTTCAAGATG
CTGGTCAAGGGTCTGGCGCGGCAGCACGGCTATGCCGCCTCGCCCATGGCGAAACCCAATGATCTGTGGT
CGGGCAACGGGATGCGCGGGCATTTCTCGACCCTCGATCAGAACGGCGAAAACATCTTCAACCTGGGCAC
CGAAAAGGGCTCGGATGCGTTGCTGTCCGCGGTGGCGGGCTATCTGGCGGCGCTGCCGGGACCGACGCTG
ATCTTTGCGGTGGTTCAGAACAGCTACACCCAGCAGGTGCCCAATGCCCGTGTGTCTACGCGAATTGTCT
GGGCCTATGAGAACCGCGCGGGGTTTTTGCGGATCCCGTCTTCGGGGCACGCGGCGCGGCGGATCGAGCA
GTCGGGTGGCGTGGGGCGACGTGAACCCCTATCTGATAATCGCCGCTAGCCTTGGTGCGGCGCTGGTCGG
GCTCGTAGACAAAATGGTCCCCGACGAGCCGATCGTCGACAACGCTGATGCGAAATATCTGCCGCACCCG
CCCGCAACGTGGAAACTCGAGATAACCCTGTTCGACAGCTGCCCGCTGATCAAGCGCATCTTTGTAGAAG
AGCTGATCGAGAACTCCCTGATGACCAAGCGTTAGGAGATCCACTACATGGCGGCGCTGTCCGAAGAGTA
GCAGACCGAGCTTTACCTCGCCATCGTCGCCCTGCGTGATCGCGTACCGAC
Time-stamp: <2023-11-08 14:20:46 (hpaces)>