Protein Structure and Function
Proteins are composed of 20 amino acids (each defined within mRNA by a triplet of nucleotides called codon), which interlink by peptide bonds to form polypeptide chains. The three-dimensional structure of a protein is determined by its amino acid sequence (so-called primary protein structure). Local interactions between parts of a polypeptide chain may alter the three-dimensional shape and fold into a regular substructure (e.g., a-helix, [(-pleated sheet, turns, and loops) (so-called secondary protein structure). Water-soluble protein may fold into a compact three-dimensional structure with a nonpolar core (so-called tertiary protein structure) after secondary interactions. Further, proteins with multiple polypeptide chains may display specific orientation and arrangement of subunits (so-called quaternary protein structure). Several functional groups (e.g., alcohols, carboxamines, carboxylic acids, thioesters, thiols, and other basic groups) may attach to protein and affect the folding and function of protein. Proteins may also interact with each other or other macromolecules to create complex assemblies with additional functions (e.g., DNA replication, cell signal transmission).
Directly related to their three-dimensional shapes, proteins demonstrate many different functions in the body, ranging from enzyme catalyst, transporter, storage, mechanical support (cytoskeleton or connective tissues), immune protection, movement facilitator (hinges, springs, or levers), transmission of nerve impulses, to controlling of cell growth and differentiation .
Purifying one or a few proteins from a complex mixture (e.g., cells, tissues, or whole organisms) is essential for characterizing the function, structure, post-translational modification, and interactions of the protein(s) of interest. Typically, protein purification requires breakup of cells (in case that proteins of interest such as integral membrane proteins are not secreted into solution), removal of other substances, further separation of proteins, and final concentration. These steps should be ideally carried out at low temperature (4°C) to reduce protein denaturation that takes place after proteins are released from cells .
For disruption of the cells containing the protein, several procedures may be considered: (i) repeated freezing and thawing, (ii) sonication, (iii) homogenization by high pressure (e.g., French press), (iv) homogenization by grinding (e.g., bead mill), and (v) permeabilization by detergents [e.g., triton X-100, CHAPS or sodium dodecyl sulfate (SDS); the latter is somewhat destructive to cell membrane proteins] and/or enzymes (e.g., lysozyme). It is important to proceed cell disruption quickly and keep the cell lysate cool to slow down/prevent protein digestion by endogenous proteases. Further, one or more protease inhibitors may be included in the lysis buffer prior to cell disruption. Addition of DNAse may also help reduce the viscosity of cell lysate due to high DNA content.
For removal of other substances, centrifugation helps separate proteins and other soluble compounds (which remain in supernatant) from cell debris (in pellet). Use of sucrose gradient (e.g., sucrose, glycerol, or Percoll, which is a silica-based density gradient media) during centrifugation offers another option for separation of proteins from other substances.
For further separation of proteins, various procedures that exploit differences in protein size, physicochemical properties, binding affinity, and biological activity can be utilized. These include ammonium sulfate [(NH4)2S04] precipitation and subsequent dialysis to remove ammonium sulfate (this technique is extremely helpful in reducing the overall preparation volume), chromatography (e.g., size exclusion, ion exchange, affinity based on lectin, antibody, His-tag/Strep-tag), high-performance liquid chromatography (HPLC)/reversed-phase chromatography/hydrophobic interaction chromatography (based on polarity/ hydrophobicity), sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE, based on size and solubility), and immunoprecipitation.
Finally, purified proteins may be concentrated by lyophilization (commonly performed after HPLC), ultrafiltration (using selective permeable membranes), precipitation (e.g., isoelectric focus; miscible solvents such as ethanol or methanol; polymers such as dextrans and polyethylene glycols), and flocculation (polyelectrolytes such as alginate, carboxymethylcellulose, polyacrylic acid, tannic acid, and polyphosphates; polyvalent metallic ions such as Ca2+, Mg2+, Mn2+, or Fe2+) .
The quality (integrity) and quantity of purified proteins are determined using several techniques to ensure their suitability for subsequent studies on their function, structure, post-translational modification, and interactions.
The quantity of purified proteins is often evaluated by the Bradford total protein assay and spectrophotometry (at A280 nm, note that residual imidazole may absorb at 280 nm, and cause inaccurate reading of protein concentration) .
The size (molecular weight) and integrity of purified proteins are assessed by SDS-PAGE, immunoprecipitation, and western blot (if specific antibodies are available) using Coomassie blue dye or silver stain. Moreover, the molecular weight and purity of purified proteins may be ascertained by HPLC, ESI-MS. and MALDI-TOF-MS .
Protein sequencing is a technique for analyzing the amino acid composition of a protein (or peptide), which helps uncover the protein identity and its post-translational modifications. A number of protocols are available, including N-terminal sequencing (by reacting the peptide with a reagent such as l-fluoro- 2,4-dinitrobenzene, dansyl chloride, or phenylisothiocyanate that selectively labels the terminal amino acid, hydrolyzing the protein, and determining the amino acid by thin-layer chromatography or high- pressure liquid chromatography and comparison with standards; this offers a more accurate approach for N-terminus analysis than the Edman degradation), C-terminal sequencing (by adding carboxypeptidases to a solution of the protein, taking samples at regular intervals, and determining the terminal amino acid by analyzing a plot of amino acid concentrations against time; this helps verify the primary structures of proteins predicted from DNA sequences and detect any post-translational processing of gene products from known codon sequences), Edman degradation (by breaking any disulfide bridges in the protein with
2-mercaptoethanol and preventing the bonds from re-forming with iodoacetic acid, separating, and purifying the individual chains of the protein complex if there are more than one, determining the amino acid composition and terminal amino acids of each chain, breaking each chain into fragments under 50 amino acids long with trypsin, pepsin, or cyanogen bromide, separating and purifying the fragments, determining the sequence of each fragment, repeating w'ith a different pattern of cleavage, and constructing the sequence of the overall protein; this allows discovery of the ordered amino acid composition of a protein up to approximately 50 amino acids long), blocked N-terminal sequencing with MALDI-ISD, peptide mapping, peptide mass fingerprinting, de novo sequencing, amino acid composition analysis, extinction co-efficiency, and differential scanning calorimetry (DSC).
Polyclonal and Monoclonal Antibodies
Antibody (Ab, also known as immunoglobulin or Ig) is a Y-shaped glycoprotein generated mainly by plasma cells (which are differentiated from native В lymphocytes) as part of host humoral immune response against invading microbial pathogens. Made up of two large heavy chains and two small light chains, the Y-shaped antibody contains binding sites on the tips (variable domains in the antigen-binding fragment or Fab region consisting of light chain and part of heavy chain) with specificity for an epitope on an antigen of microbe, with the goal to inhibit microbial invasion and survival and/or activate macrophages to destroy invading microbe. The base (crystallizable fragment or Fc region consisting of heavy chain only) of the Y-shaped antibody determines its isotype (A, D, E, G, or M).
Secreted by different В-cell lineages, polyclonal antibodies (pAb) represent a collection of immunoglobulins that specifically recognize different epitopes of an antigen. Laboratory production of polyclonal antibodies involves (i) antigen preparation, (ii) adjuvant selection (e.g., Freund’s, alum, Ribi adjuvant system, and Titermax), (iii) animal selection (e.g., mouse, rabbit, or goat), (iv) injection, and (v) blood serum extraction.
By contrast, monoclonal antibodies (mAb) come from a single В-cell lineage and recognize a single epitope of an antigen. In comparison with pAb that binds to multiple epitopes, mAb demonstrates monovalent affinity and binds to the same epitope of an antigen. Laboratory production of mAb, which was first described by Georges Kohler and Cesar Milstein in 1975, involves (i) immunization of mice with antigen, (ii) fusion of mouse spleen cells (B cells) with myeloma cells, (iii) selection of hybridomas secreting antibodies w'ith desired specificity, and (iv) bulk production of mAb in culture or mice (ascites fluid). Recent advances in mAb production include phage display, single В-cell culture, single-cell amplification from various В-cell populations, and single plasma cell interrogation [31-33].
Protein Synthesis and Expression
Proteins may be synthesized in vitro through a cell-free system or expressed in prokaryotic and eukaryotic cell-based systems.
For in vitro protein synthesis, cell extracts containing RNA polymerase, ribosomes, tRNA, and ribonucleotides are utilized. However, due to its low expression levels and high cost, this system has limited practical value.
For protein expression in cell-based systems, the gene encoding the protein of interest is cloned into a plasmid or other vectors (e.g., bacteriophage lambda, baculovirus, retrovirus, adenovirus, and artificial chromosome), transformed into prokaryotic or prokaryotic host [bacteria (e.g., Escherichia coli strain BL21, Pseudomonas fluorescens), yeast (e.g., Saccharomyces cerevisiae, Pichia pastoris), fungi (Aspergillus, Trichoderma, Myceliophthora thermophila Cl), insect cells (e.g., Sf9, Sf21 from Spodoptera frugiperda, Hi-5 from Trichoplusia ni, and Schneider 2 and Schneider 3 cells from Drosophila melano- gaster), and mammalian cells (Chinese hamster ovary cell, mouse myeloma lymphoblastoid NSO cell, HeLa, human embryonic kidney HEK 293 cell, human embryonic retinal Crucell’s Рег.Сб cell, human amniocyte glycotope, and CEVEC cells)], and subsequently induced to produce the protein .
Inside the cells, DNA is transcribed into mRNA, which undergoes post-transcriptional modifications (including the addition of a 5'cap and a 3'poly(A) tail to the 5'and 3'ends of the pre-mRNA, respectively, as well as the removal of introns via RNA splicing) before translation into protein.
After translation, protein (polypeptide) goes through post-translational modifications (>200 types known to date) to become biologically active. These include (i) cleavage (hydrolysis of peptide bonds by proteases leading to a shortened protein with altered function), (ii) addition of chemical groups (through methylation, acetylation, and phosphorylation), (iii) addition of complex molecules (through glycosyl- ation, including N-linked glycosylation and О-linked glycosylation), and (iv) formation of intramolecular bonds (e.g., disulfide bond/bridge between two cysteine amino acids in the oxidizing environment of the endoplasmic reticulum).
Techniques for examining protein modifications range from glycosylation analysis (N-glycan profiling, O-glycan profiling, N-glycan site occupation analysis, O-glycan site occupation analysis, glyco- peptide analysis, sialic acid analysis), phosphorylation analysis, deamidation and oxidation analysis, disulfide bridges and free sulfhydryl groups, N-acetylation analysis, methylation analysis, ubiquitination analysis, sumoylation analysis, lipidation analysis, S-nitrosylation analysis, N-myristoylation analysis, S-palmitoylation analysis to S-prenylation analysis .
Protein interactions with other proteins, DNA, RNA, and other molecules occur constantly within and between cells and underpin the proper running of various biological processes. In the past, studies on biological processes had been limited to single molecules and their individual interactions due to the lack of suitable technologies. Recent advances in molecular biology have enabled simultaneous analyses of multiple molecules and their interactions, uncovered valuable insights on the complex cellular mechanisms at both health and disease states, and empowered the discovery of putative protein targets for the therapeutic purpose .
Protein-protein interactions (PPI) are physical and biochemical events between two or more protein molecules, which are modulated by the electrostatic forces, hydrogen bonding, and hydrophobic effect of individual proteins. Current approaches for analyzing PPI are grouped into three broad categories: in vitro [tandem affinity purification-mass spectroscopy (ТАР-MS), affinity chromatography, co-immuno- precipitation, protein microarray, protein-fragment complementation, phage display, X-ray crystallography, NMR spectroscopy], in vivo [Yeast 2 hybrid (Y2H), synthetic lethality], and in silico [ortholog-based sequence approach, domain-pair-based sequence approach, structure-based approach, gene neighborhood, gene fusion, in silico 2 hybrid (I2H), phylogenetic tree, phylogenetic profile, gene expression]. Use of these techniques facilitates in silico mapping of PPI revealed by in vitro or in vivo studies and also experimental confirmation of computationally identified protein interaction networks .
Protein-DNA interactions (PDI) arise from electrostatic interactions (salt bridges), dipolar interactions (hydrogen bonding, H-bonds), entropic effects (hydrophobic interactions), and dispersion forces (base stacking) between related proteins and DNA and play an indispensible part in gene activation and other biological processes. As DNA-binding proteins (e.g., transcription factors, polymerases, nucleases, histones) contain DNA-binding domains (e.g., zinc finger, helix-turn-helix, leucine zipper) with specific or general affinity for single- or double-stranded DNA, current approaches for PDI analysis center on binding site characterization and gene activation. Binding site characterization examines the binding preferences of a given protein for various DNA sequences in vitro, while gene activation study further investigates the unique sequences (e.g., transcription factors) bound by proteins in the cellular context. Techniques utilized frequently for PDI studies include electrophoretic mobility shift assay (EMSA, which assesses the degree of affinity or specificity of the interaction between protein and known DNA probes),
DNase footprinting assay (which identifies the specific site of binding of a protein to DNA), chromatin immunoprecipitation (ChIP, which identifies the sequence of DNA fragments that bind to a known transcription factor), ChIP-Seq (a combination of ChIP and high-throughput sequencing), ChIP-Seq (a combination of ChIP and microarrays), yeast one-hybrid system (Y1H, which identifies protein that binds to a particular DNA fragment), bacterial one-hybrid system (B1H, which identifies protein that binds to a particular DNA fragment), and X-ray crystallography (which gives a detailed atomic view of protein- DNA interactions) [38,39].
Protein-RNA interactions (PRI) result from electrostatic interactions, hydrogen bonding, hydrophobic interactions, and base stacking between related proteins and RNA and are necessary for transportation of mRNA into the cytoplasm of eukaryotic cells and for the formation of the translation machinery. Common techniques for PRI analysis are RNA electrophoretic mobility shift assay (w'hich detects pro- tein-RNA interactions through changes in migration speed during gel electrophoresis), RNA pull-down assay (which selectively extracts a protein-RNA complex from a sample), oligonucleotide-targeted RNase H protection assay (which detects RNA and RNA fragments in cell extracts and helps map protein-RNA interactions), and fluorescent in situ hybridization co-localization (which detects the position and abundance of a RNA and protein in a cell or tissue sample). It should be noted that both RNA and protein have to be correctly folded to allow proper binding, and special care is taken to avoid introduction of RNases into the assay during PRI analysis .