Do proteins really exist?

Reflections on enzyme kinetics, might-as-well-be-infinite regress, and the Death Star….. from an accidental epistemiological antireductionist.

I recently spent a few very happy months study leave apprenticed to a leading systems biology group at Manchester (MCISB), learning how to use a modelling software (Copasi), how to source enzyme parameter data and how to build a very simple metabolic model.  Very briefly the model consisted of the uptake and intracellular handling of short-chain fatty acids, and their beta-oxidation in the mitochondrion. A couple of things struck me as I progressed through this exercise, one was how accessible Copasi is, how easy even for a non-computer-geek, non-coder, non-mathemetician to pick up and use. The second big lesson for me was how sparse enzyme kinetic data is, and furthermore how generally under-modelled even core biochemical pathways are (which is why a behind-the-curve oldster such as me could get the chance to do something novel with modelling in a pathway as well known as beta-oxidation).  The modelling will, I am sure, experience rapid infil, but the lack of kinetic data is an issue. I soon became reliant on digging through fairly old papers to get Km’s, Vmax’s etc. Younger scientists looking for a steady career for the next few years would do well to get good at enzyme kinetics - it’s still all to be done.

Oxidation of butyrate is a five enzyme pathway in the mitochondrion. Butyrate is ligated to co-enzymeA by medium chain acyl coA synthetase and then competes for the four steps in mitochondrial beta-oxidation with other medium chain fatty acyl coAs. Simple, right? Once my model was built and fitted to the data, the next step for me was to examine what the effect of the product/output was. With this done the next step, and the one of particular interest to me was to model what the effect of acetylation of key proteins in the pathway might be. A recent paper in Science (1) has suggested that multiple enzymes in a range of very well known pathways like glycolysis, TCA, beta-oxidation and beyond, are acetyl proteins. The paper further examined a sample protein for one step in each pathway, and demonstrated that the target enzyme was acetylated, and that this had a regulatory effect, increasing or decreasing activity (in enzyme terms, altering the Kcat). In the case of one of the beta-oxidation proteins, EHHADH, this associated with an increased Kcat for the forward reaction (the sharp-minded will recall these enzymes can work in both directions).  Similar effects were found for other enzymes and the acetylations were often substrate driven, so that glucose, LCFA, and mixed amino acids all had an effect. This seems a powerful way of achieving differential regulation of cell metabolic pathways and one of the missed opportunities in the paper, to my mind, was investigating the effects of substrates not entering a pathway, or competing substrates - for example it’s all well and interesting to show glucose upregulates glycolysis, but not really surprising, but what does it do to beta-oxidation and reciprocally what do LCFAs do to glycolysis? Anyhow, my big interest is really in acetylation of proteins and I’ve shown potent effects of acetylation on transcription factors and cytoskeletal proteins and even the casual reader of biochemical or molecular papers will know that phosphorylation profoundly alters protein function. Acetyl-coA is on one hand the product of beta-oxidation but it is also an input substrate, along with each of the beta-oxidation enzymes, for acetyl-transferases, which can thereby regulate the pathway. This seemed like a very interesting feedback or feedforward loop, an exciting chance to get on-trend in modelling and to get some genuine insight into the regulation of metabolism.

And then I hit a really rather large problem.

When I looked in the online modification database www.phosphosite.org, the five enzymes in fact have a total of 111 post-translational modifications (PTMs) noted and demonstrated empirically: MACS has 9, ACAD has 8, EHHADH 34, SCHAD-1 has 28 and ACAT-1 has 32. If one makes an unsafe assumption that each PTM is a binary possibility and is independent of other PTMs, this means that the number of states that the pathway can exist in is 2111. The simplest way to compute this would be with a binary star topology of 111 nodes. Derek Gatherer has calculated here that at the limits of current computational processing the processor time required for a 193 state space (to study the Bacillus sporulation network which has 193 genes) would be about 1048 seconds, a time somewhat greater than the age of the universe. Even with the ultimate laptop, one would need to carpet the planet in laptops, or maybe develop a Deathstar-sized facility in order to consider addressing this problem. And let’s just remind ourselves that this really is a simple problem of 5 enzymes and one substrate; we’ve not even got into the multiple effects of competing substrates of different acyl chain lengths; binary, trinary or quaternary states at a residue (for example, lysine can be acetylated, propionylated, butyrylated, methylated, dimethylated, trimethylated – seven states ); nor does it take into account weighting of effects, such that some PTMs may have greater or lesser (non-equal) effects; nor exist in greater or lesser abundance. In all the excitement  we’ve forgotten that they’re not just enzymes but also substrates for acetyl-transferases, deacetylases, phosphatases, kinases etc, that the affinity and activity of these enzymes for their substrates will be affected by the substrates’ PTM status at non-target residues  …….. and oh so rapidly we are hitting a problem which is not infinite regress, but which might as well be infinite regress as it is neither tractable nor computable.

A day or two after rambling on about this problem to colleagues, I came across this paper (2) by Gatherer, classifying my concerns as part of the epistemiological anti-reductionist school (nice to know you’re not alone in being kept awake at night by this stuff, but takes the wind out of your sails when you realise you’re not the first down the route). I don’t think the BBSRC or EPSRC will be gagging to give an average scientist like me enough money to build a planet-sized computer (or series of them) so perhaps a pragmatic approach is better. The modelling folk using kinetic data seem fairly happy to work with net parameter data (but pretty excited about the possibility of building in one variable parameterization, for example a phosphorylation). However as I’m coming to this from the point of view of someone primarily interested in PTMs, a different approach will be needed.  The systems biology community are having a go at how this might be addressed in this paper (3). Establishing ways of describing multistate species seems pre-emptively sensible, but I did struggle to get past the amount of words on the best font to use to describe different levels of modification…….

I’ve previously blogged about Rosen’s M,R. An engaging feature of the (M,R) is that every biochemical , including the macromolecules, is a metabolite - a product of metabolism and converted from input masses. This got me questioning just how useful is the concept of a protein? The SBML approach to PTMs/multistates is to consider all PTMs on a backbone as contributors to a multiple state species. However in trying to retractablize (I made that one up, can you tell?) enzyme biology, I wonder whether it may be much more useful to stop thinking of proteins as multistate species, in fact I wonder whether it may be useful to stop thinking about proteins altogether. This is the analogy: AMP, ADP and ATP each differ by a single phosphate residue, but no one dreams of describing these as separate states of the same molecule. By the same rationale MACS, phospho-MACS, and diphospho-acetyl MACS should, I believe, not be considered as states of the same protein, but as points along a biochemical pathway of MACS metabolism, just as progressive modifications to a mass feature in TCA cycle or glycolysis. This approach to treating the protein as just another metabolite or pathway has particular value in eroding one of the flawed assumptions I made earlier, that all modifications are equally likely and that they are independent. It’s likely that many PTMs are dependent on a precursor series of events, other PTMs, and exist within the context of a sequence, just like any biochemical pathway. This is certainly the case for p53 (if you can keep up with that literature) where acetylation is dependent on phosphorylation and is more potently activating in terms of promoting DNA binding. It’s also likely that some PTMs occur at low stoichiometry and may have relatively low importance in function of the backbone (for example those present in the cellular pool, but only involved in directing the backbone to the right compartment). 

Thus we arrive at a much more tractable position, that rather than addressing and investigating the combinatorial explosion, we are studying a metabolic pathway (which we’re quite good at), looking at a series of modifications that yields a biochemically important species (enzyme) with properties (catalytic activity, substrate affinity). Closely related species are of interest as they may have slightly shifted properties, altered directionality.  It is, however, only by questioning the value of the concept of “a protein” that we can arrive at the “protein-as-metabolome” or “protein-as-pathway” concept that will enable us to study, ironically, er…. proteins.  The removal of one hierarchy (metabolites versus proteins) and replacement with a sequential hierarchy (the pathway) offers the chance to make tractable the explosion of data on PTMs being yielded at ever-accelerating rates by high-throughput approaches. Such approaches are, in lieu of good biochemistry and in face of incomputablity, of only questionable use in elevating our understanding of protein function and cellular metabolism.

Further Reading:

  1. Zhao et al (2010) Regulation of cellular metabolism by protein lysine acetylation. Science 327: 1000-1004
  2. Derek Gatherer (2010) So what do we really mean when we say that systems biology is holistic? BMC Systems Biology 4:22
  3. Oellrich et al (2010) Multistate and Multicomponent species (multi) http://sbml.org/images/8/8d/Multi_2010_November_29.pdf