SlideShare uma empresa Scribd logo
1 de 29
INTRODUCTION
TO DATABASES
By:-
 PUNEET
 NEERAJ
 KARTIK
 VARUN
1
INDEX/CONTENTS
 Introduction
 Data & Information
 Database
 Biological Databases
 Types of Databases
- Primary Databases
- Secondary Databases
- Composite Databases
 References
2
INTRODUCTION 3
DATA & INFORMATION
DATA
Data is raw, unorganized facts that need to
be processed.
Example:- Each student's test score is one
piece of data.
INFORMATION
When data is processed, organized,
structured or presented in a given context
so as to make it useful, it is called
information.
Example:- The average score of a class
or of the entire school is information that
can be derived from the given data.
4
DATA INFORMATION
Definition
(Oxford
Dictionaries)
Facts and statistics collected
together for reference or
analysis
Facts provided or
learned about something
or someone
Data as processed,
stored, or transmitted
by a computer
Refers to Raw Data Analyzed Data
Description
Qualitative Or Quantitative
Variables that can be used to
make ideas or conclusions
A group of data which
carries news and
meaning
In the form of
Numbers, letters, or a set of
characters.
Ideas and inferences
Collected via
Measurements, experiments,
etc.
Linking data and making
inferences
Represented in
A structure, such as tabular
data, data tree, a data graph,
etc.
Language, ideas, and
thoughts based on the
data
Interrelation Information that is collected
Data that has been
processed
C
O
M
P
A
R
I
S
O
N
B
E
T
W
E
E
N
D
A
T
A
&
I
N
F
O
R
M
A
T
I
O
N
5
S. No. Type of data Example(s) Weblinks
1. Sequence of
biomolecules viz., DNA,
RNA, proteins
GenBank, EMBL,
DDBJ, Swiss-Prot,
PIR
(i) www.ncbi.nlm.nih.gov/genba
nk/
(ii) https://www.ebi.ac.uk/embl/
(iii) www.ddbj.nig.ac.jp/
(iv)http://web.expasy.org/docs/s
wiss-prot_guideline.html
(v) http://pir.georgetown.edu/
2. Bio-molecular
structures
PDB http://www.rcsb.org/pdb/home
/home.do
3. Bibliography/scientific
literature **
PubMed, Scopus
(Search engine)
(i) www.ncbi.nlm.nih.gov/pubme
d
(ii) www.scopus.com
4. Patent databases USPTO www.uspto.gov/
5. Metabolic pathways /
molecular interactions
KEGG http://www.genome.jp/kegg/pa
thway.htm
6
TYPES OF DATA & INFORMATION
Databases are categorized based on the data type. A few examples are
listed below:-
DATABASE???
A database is a
collection of data
in an organized
manner, which is
accessible in
various ways.
7
WHAT ARE THE BIOLOGICAL
DATABASES ???
8
Biological Databases serve a critical purpose in the collation
and organization of data related to biological systems.
They provide a computational support and a user-friendly
interface to a researcher for a meaningful analysis of biological
data.
9
TYPES OF DATABASES
 Primary Databases
 Secondary Databases
10
PRIMARY DATABASES
 Contains bio-molecular data in its original form.
 Experimental results are submitted directly into the
database by researchers, and the data are essentially
archival in nature.
 Once given a database accession number, the data in
primary databases are never changed.
 Examples :- GenBank, EMBL and DDBJ for DNA/RNA
sequences, SWISS-PROT and PIR for protein sequences
and PDB for molecular structures.
11
GenBank
Database from NCBI, includes sequences from publicly
available resources.
http://www.ncbi.nlm.nih.gov/genbank/ 12
EMBL
 European Molecular Biological Laboratory
 Nucleic acid database from EBI (European
Bioinformatics Institute)
 Produced in collaboration with DDBJ and GenBank
 Search engine – SRS (Sequence Retrieval System)
http://www.ebi.ac.uk/
13
DDBJ
 DNA Databank of Japan
 Started in 1986 in collaboration with GenBank
 Produced and maintained at NIG (National Institute
of Genetics)
http://www.ddbj.nig.ac.jp/ 14
SWISS PROT
 Annotated sequence database established in 1986
 Consists of sequence entries of different lie formats
 Similar format to EMBL
 http://us.expasy.org/sprot/sprot-top.html
http://www.ebi.ac.uk/uniprot/
15
PIR
 Protein Information Resource
 A division of National Biomedical Research
Foundation (NBRF) in U.S.
 One can search for entries or do sequence similarity
search at PIR site.
http://pir.georgetown.edu/ 16
TrEMBL
 Translated European Molecular Biology Laboratory
 Computer annotated supplement of SWISS PROT.
 Contains all the translations of EMBL nucleotide
sequence entries not yet integrated in SWISS PROT.
http://www.ebi.ac.uk/trembl/ 17
COMPOSITE DATABASES
 Collection of various primary database sequences
 Renders sequence searching highly efficient as it
searches multiple resources
 Examples :- NRDB (Non Redundant Database), OWL,
MIPSX, SWISS PROT + TrEMBL
18
19
SECONDARY DATABASES
 Contains data derived from the results of analysing
primary data
 Manually created or automatically generated
 Contains more relevant and useful information
structured to specific requirements
 Example :- PROSITE, PRINTS, BLOCKS, Pfam
20
SECONDARY DATABASES
SECONDARY
DATABASE
PRIMARY
SOURCE
INFORMATION
STORED
PROSITE SWISS PROT
Regular
expression
BLOCKS
PROSITE/PRIN
TS
Aligned
motifs(blocks)
PRINTS
OWL
(Composite DB)
Aligned motifs
Pfam SWISS PROT
Hidden Markov
Models
Profile SWISS PROT
Weighted
Matrices(profile)
21
PROSITE
Families of proteins
Can search using regular expressions
Similar to unix commands using
wildcards, etc.
E.g., [AC]-x-V-x(4)-{ED}
Interpreted as:
[Ala or Cys]-any-Val-any-any-any-
any-{any but Glu or Asp}
Families exhibit these patterns
So we can search over families
http://ca.expasy.org/prosite/ 22
BLOCKS
 Motifs/blocks
are created
by
automatically
detecting the
most
conserved
regions of
each protein
family.
23
PRINTS
 Most protein families are characterized not by one,
but by several conserved motifs
 Fingerprints are groups of conserved motifs excised
from sequence alignments
 Taken together, they provide diagnostic family
signatures. They are the basis of the PRINTS
database, and are stored in the form of aligned
motifs.
 Input about protein families is done manually
24
Pfam
Maintained by the Sanger Centre (Cambridge)
Protein families aligned using HMMs
Hidden Markov Models
Given a new sequence
Find families which the sequence might fit into
Sequence Coverage
11912 families
Split into Pfam-A (high quality) and Pfam-B (low quality)
http://pfam.sanger.ac.uk/ 25
26
PRIMARY VS SECONDARY DATABASES 27
REFERENCES
 Class notes
 ESSENTIAL BIOINFORMATICS- Jin Xiong
 file:///C:/Users/student/Downloads/DATABASES%2
0IN%20BIOINFORMATICS.pdf
 https://www.ebi.ac.uk/training/online/course/bioinfor
matics-terrified/what-database/relational-
databases/primary-and-secondary-databases
 http://www.diffen.com/difference/Data_vs_Informa
tion
 Google images
28
29

Mais conteúdo relacionado

Mais procurados (20)

Composite and Specialized databases
Composite and Specialized databasesComposite and Specialized databases
Composite and Specialized databases
 
Cath
CathCath
Cath
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Kegg
KeggKegg
Kegg
 
Prosite
PrositeProsite
Prosite
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Scop database
Scop databaseScop database
Scop database
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 

Semelhante a Primary and secondary databases ppt by puneet kulyana

Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram
 
BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf
BIOINFORMATICS  AND  DATABASES IN BIOINFORMATICS.pdfBIOINFORMATICS  AND  DATABASES IN BIOINFORMATICS.pdf
BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdfPravanjanDash
 
COMPUNATIONAL BIOLOGY AND DATABASES IN BIOINFORMATICS.pptx
COMPUNATIONAL BIOLOGY AND DATABASES IN BIOINFORMATICS.pptxCOMPUNATIONAL BIOLOGY AND DATABASES IN BIOINFORMATICS.pptx
COMPUNATIONAL BIOLOGY AND DATABASES IN BIOINFORMATICS.pptxPravanjanDash
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological databaseKAUSHAL SAHU
 
Major resources of bioinformatics 2
Major resources of bioinformatics 2Major resources of bioinformatics 2
Major resources of bioinformatics 2Mohd Affan
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Sreekanth Gali
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptxscience lover
 
Primary sequencing of nucleic acids
Primary sequencing of nucleic acidsPrimary sequencing of nucleic acids
Primary sequencing of nucleic acidsvibhakumari12
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...SBituila
 

Semelhante a Primary and secondary databases ppt by puneet kulyana (20)

Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Databases.ppt
Databases.pptDatabases.ppt
Databases.ppt
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf
BIOINFORMATICS  AND  DATABASES IN BIOINFORMATICS.pdfBIOINFORMATICS  AND  DATABASES IN BIOINFORMATICS.pdf
BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf
 
COMPUNATIONAL BIOLOGY AND DATABASES IN BIOINFORMATICS.pptx
COMPUNATIONAL BIOLOGY AND DATABASES IN BIOINFORMATICS.pptxCOMPUNATIONAL BIOLOGY AND DATABASES IN BIOINFORMATICS.pptx
COMPUNATIONAL BIOLOGY AND DATABASES IN BIOINFORMATICS.pptx
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
 
Major resources of bioinformatics 2
Major resources of bioinformatics 2Major resources of bioinformatics 2
Major resources of bioinformatics 2
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Primary sequencing of nucleic acids
Primary sequencing of nucleic acidsPrimary sequencing of nucleic acids
Primary sequencing of nucleic acids
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 

Último

Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfAtiaGohar1
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxJosielynTars
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...HafsaHussainp
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and momentdonamiaquintan2
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 

Último (20)

Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdfReplisome-Cohesin Interfacing A Molecular Perspective.pdf
Replisome-Cohesin Interfacing A Molecular Perspective.pdf
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptx
 
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
DOG BITE management in pediatrics # for Pediatric pgs# topic presentation # f...
 
projectile motion, impulse and moment
projectile  motion, impulse  and  momentprojectile  motion, impulse  and  moment
projectile motion, impulse and moment
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 

Primary and secondary databases ppt by puneet kulyana

  • 1. INTRODUCTION TO DATABASES By:-  PUNEET  NEERAJ  KARTIK  VARUN 1
  • 2. INDEX/CONTENTS  Introduction  Data & Information  Database  Biological Databases  Types of Databases - Primary Databases - Secondary Databases - Composite Databases  References 2
  • 4. DATA & INFORMATION DATA Data is raw, unorganized facts that need to be processed. Example:- Each student's test score is one piece of data. INFORMATION When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information. Example:- The average score of a class or of the entire school is information that can be derived from the given data. 4
  • 5. DATA INFORMATION Definition (Oxford Dictionaries) Facts and statistics collected together for reference or analysis Facts provided or learned about something or someone Data as processed, stored, or transmitted by a computer Refers to Raw Data Analyzed Data Description Qualitative Or Quantitative Variables that can be used to make ideas or conclusions A group of data which carries news and meaning In the form of Numbers, letters, or a set of characters. Ideas and inferences Collected via Measurements, experiments, etc. Linking data and making inferences Represented in A structure, such as tabular data, data tree, a data graph, etc. Language, ideas, and thoughts based on the data Interrelation Information that is collected Data that has been processed C O M P A R I S O N B E T W E E N D A T A & I N F O R M A T I O N 5
  • 6. S. No. Type of data Example(s) Weblinks 1. Sequence of biomolecules viz., DNA, RNA, proteins GenBank, EMBL, DDBJ, Swiss-Prot, PIR (i) www.ncbi.nlm.nih.gov/genba nk/ (ii) https://www.ebi.ac.uk/embl/ (iii) www.ddbj.nig.ac.jp/ (iv)http://web.expasy.org/docs/s wiss-prot_guideline.html (v) http://pir.georgetown.edu/ 2. Bio-molecular structures PDB http://www.rcsb.org/pdb/home /home.do 3. Bibliography/scientific literature ** PubMed, Scopus (Search engine) (i) www.ncbi.nlm.nih.gov/pubme d (ii) www.scopus.com 4. Patent databases USPTO www.uspto.gov/ 5. Metabolic pathways / molecular interactions KEGG http://www.genome.jp/kegg/pa thway.htm 6 TYPES OF DATA & INFORMATION Databases are categorized based on the data type. A few examples are listed below:-
  • 7. DATABASE??? A database is a collection of data in an organized manner, which is accessible in various ways. 7
  • 8. WHAT ARE THE BIOLOGICAL DATABASES ??? 8
  • 9. Biological Databases serve a critical purpose in the collation and organization of data related to biological systems. They provide a computational support and a user-friendly interface to a researcher for a meaningful analysis of biological data. 9
  • 10. TYPES OF DATABASES  Primary Databases  Secondary Databases 10
  • 11. PRIMARY DATABASES  Contains bio-molecular data in its original form.  Experimental results are submitted directly into the database by researchers, and the data are essentially archival in nature.  Once given a database accession number, the data in primary databases are never changed.  Examples :- GenBank, EMBL and DDBJ for DNA/RNA sequences, SWISS-PROT and PIR for protein sequences and PDB for molecular structures. 11
  • 12. GenBank Database from NCBI, includes sequences from publicly available resources. http://www.ncbi.nlm.nih.gov/genbank/ 12
  • 13. EMBL  European Molecular Biological Laboratory  Nucleic acid database from EBI (European Bioinformatics Institute)  Produced in collaboration with DDBJ and GenBank  Search engine – SRS (Sequence Retrieval System) http://www.ebi.ac.uk/ 13
  • 14. DDBJ  DNA Databank of Japan  Started in 1986 in collaboration with GenBank  Produced and maintained at NIG (National Institute of Genetics) http://www.ddbj.nig.ac.jp/ 14
  • 15. SWISS PROT  Annotated sequence database established in 1986  Consists of sequence entries of different lie formats  Similar format to EMBL  http://us.expasy.org/sprot/sprot-top.html http://www.ebi.ac.uk/uniprot/ 15
  • 16. PIR  Protein Information Resource  A division of National Biomedical Research Foundation (NBRF) in U.S.  One can search for entries or do sequence similarity search at PIR site. http://pir.georgetown.edu/ 16
  • 17. TrEMBL  Translated European Molecular Biology Laboratory  Computer annotated supplement of SWISS PROT.  Contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS PROT. http://www.ebi.ac.uk/trembl/ 17
  • 18. COMPOSITE DATABASES  Collection of various primary database sequences  Renders sequence searching highly efficient as it searches multiple resources  Examples :- NRDB (Non Redundant Database), OWL, MIPSX, SWISS PROT + TrEMBL 18
  • 19. 19
  • 20. SECONDARY DATABASES  Contains data derived from the results of analysing primary data  Manually created or automatically generated  Contains more relevant and useful information structured to specific requirements  Example :- PROSITE, PRINTS, BLOCKS, Pfam 20
  • 21. SECONDARY DATABASES SECONDARY DATABASE PRIMARY SOURCE INFORMATION STORED PROSITE SWISS PROT Regular expression BLOCKS PROSITE/PRIN TS Aligned motifs(blocks) PRINTS OWL (Composite DB) Aligned motifs Pfam SWISS PROT Hidden Markov Models Profile SWISS PROT Weighted Matrices(profile) 21
  • 22. PROSITE Families of proteins Can search using regular expressions Similar to unix commands using wildcards, etc. E.g., [AC]-x-V-x(4)-{ED} Interpreted as: [Ala or Cys]-any-Val-any-any-any- any-{any but Glu or Asp} Families exhibit these patterns So we can search over families http://ca.expasy.org/prosite/ 22
  • 23. BLOCKS  Motifs/blocks are created by automatically detecting the most conserved regions of each protein family. 23
  • 24. PRINTS  Most protein families are characterized not by one, but by several conserved motifs  Fingerprints are groups of conserved motifs excised from sequence alignments  Taken together, they provide diagnostic family signatures. They are the basis of the PRINTS database, and are stored in the form of aligned motifs.  Input about protein families is done manually 24
  • 25. Pfam Maintained by the Sanger Centre (Cambridge) Protein families aligned using HMMs Hidden Markov Models Given a new sequence Find families which the sequence might fit into Sequence Coverage 11912 families Split into Pfam-A (high quality) and Pfam-B (low quality) http://pfam.sanger.ac.uk/ 25
  • 26. 26
  • 27. PRIMARY VS SECONDARY DATABASES 27
  • 28. REFERENCES  Class notes  ESSENTIAL BIOINFORMATICS- Jin Xiong  file:///C:/Users/student/Downloads/DATABASES%2 0IN%20BIOINFORMATICS.pdf  https://www.ebi.ac.uk/training/online/course/bioinfor matics-terrified/what-database/relational- databases/primary-and-secondary-databases  http://www.diffen.com/difference/Data_vs_Informa tion  Google images 28
  • 29. 29