Music Translation of Tertiary Protein Structure

Music Translation of Tertiary Protein Structure: Auditory Patterns of the Protein Folding
Riccardo Castagna1,*, Alessandro Chiolerio1, and Valentina Margaria2

1 LATEMAR - Politecnico di Torino Dipartimento di Scienza dei Materiali ed Ingegneria Chimica Corso Duca degli Abruzzi 24, 10129 Torino, Italy Tel.: +39 011 0907381 riccardo.castagna@polito.it 2 Independent Researcher
Abstract. We have translated genome-encoded protein sequence into musical notes and created a polyphonic harmony taking in account its tertiary structure. We did not use a diatonic musical scale to obtain a pleasant sound, focusing instead on the spatial relationship between aminoacids closely placed in the 3dimensional protein folding. In this way, the result is a musical translation of the real morphology of the protein, that opens the challenge to bring musical harmony rules into the proteomic research field. Keywords: Bioart, Biomusic, Protein Folding, Bioinformatics.
1 Introduction
During recent years, several approaches have been investigated to introduce biology to a wider, younger and non-technical audience [2, 10]. Accordingly, bio-inspired art (Bioart) represents an interdisciplinary field devoted to reduce the boundaries between science, intended as an absolutely rational argument, and the emotional feelings. By stimulating human senses, such as sight, touch and hearing, scientists and artists together attempt not just to communicate science but also to create new perspectives and new inspirations for scientific information analysis based on the rules of serendipity [13]. Bio-inspired music (Biomusic) is a branch of Bioart representing a well developed approach with educational and mere scientific aims. In fact, due to the affinity between genome biology and music language, lot of efforts have been dedicated to the conversion of genetic information code into musical notes to reveal new auditory patterns [4, 6, 7, 10-12, 14, 15]. The first work introducing Biomusic [10] showed the attempt to translate DNA sequences into music, converting directly the four DNA basis into four notes. The goal was initially to create an acoustic method to minimize the distress of handling the increasing amount of base sequencing data. A certain advantage of this approach
*
Corresponding author.
C. Di Chio et al. (Eds.): EvoApplications 2011, Part II, LNCS 6625, pp. 214222, 2011. Springer-Verlag Berlin Heidelberg 2011
Music Translation of Tertiary Protein Structure
215
was that the DNA sequences were easily recognized and memorized, but from an aesthetic/artistic point of view it represented a poor result due to the lack of musicality and rhythm. Other approaches to convert DNA sequences into music were based on codons reading frame and mathematical analysis of the physical properties of each nucleotide [6, 7]. Unfortunately, because of the structure and the nature of DNA, all these attempts gave rise to note sequences lacking of musical depth. In fact, since the DNA is based on four nucleotides (Adenine, Cytosine, Guanine, Thymine), a long and un-structured repetition of just four musical notes is not enough to create a musical composition. As a natural consequence, scientists focused their attention on proteins, instead of DNA, with the aim of obtaining a reasonable, pleasant and rhythmic sound that can faithfully represent genomic information. Proteins are polymers of twenty different amino acids that fold into specific spatial conformations, driven by non-covalent interactions, to perform their biological function. They are characterized by four distinct levels of organization: primary, secondary, tertiary and quaternary structure. The primary structure refers to the linear sequence of the different amino acids, determined by the translation of the DNA sequence of the corresponding gene. The secondary structure, instead, refers to regular local sub-structures, named alpha helix (-helix) and beta sheet (-sheet). The way the -helices and -sheets folded into a compact globule describes the tertiary structure of the protein. The correct folding of a protein is strictly inter-connected with the external environment and is essential to execute the molecular function (see Figure 1). Finally, the quaternary structure represents a larger assembly of several protein molecules or polypeptide chains [3]. A number of studies have dealt with the musical translation of pure protein sequences [4, 15]. For example, Dunn and Clark used algorithms and secondary structure of proteins to translate amino acid sequences into musical themes [4]. Another example of protein conversion in music is given by the approach used by Takahashi and Miller [15]. They translated the primary protein structure in a sequence of notes, and after that they expressed each note as a chord of a diatonic scale.
protein folding
Amino Acid Sequence (Primary Structure)
Folded Protein (Tertiary Structure)
Fig. 1. Protein structures: linear sequence of amino acids, described on the left, fold into specific spatial conformations, driven by non-covalent interactions
216
R. Castagna, A. Chiolerio, and V. Margaria
Moreover, they introduced rhythm into the composition by analyzing the abundance of a specific codon into the corresponding organism and relating this information with note duration. Anyway, the use of the diatonic scale and the trick of chords built on a single note gave rise to results that are partially able to satisfy the listener from a musical point of view but, unfortunately, they are not a reliable representation of the complexity of the molecular organization of the protein. Music is not a mere linear sequence of notes. Our minds perceive pieces of music on a level far higher than that. We chunk notes into phrases, phrases into melodies, melodies into movements, and movements into full pieces. Similarly, proteins only make sense when they act as chunked units. Although a primary structure carries all the information for the tertiary structure to be created, it still "feels" like less, for its potential is only realized when the tertiary structure is actually physically created [11]. Consequently, a successful approach for the musical interpretation of protein complexity must take in account, at least, its tertiary structures and could not be based only on its primary or secondary structure.
2 Method
Our pilot study focused on the amino acid sequence of chain A of the Human Thymidylate Synthase A (ThyA), to create a comparison with the most recent work published on this subject [15]. The translation of amino acids into musical notes was based on the use of Bio2Midi software [8], by means of a chromatic scale to avoid any kind of filter on the result. The protein 3-dimensional (3D) crystallographic structure was obtained from the Protein Data Bank (PDB). Information relative to the spatial position in a 3D volume of each atom composing the amino acids was recovered from the PDB textual structure (http://www.rcsb.org/pdb/explore/explore.do?structureId=1HVY). The above mentioned file was loaded in a Matlab environment, together with other useful indicators such as the nature of the atom and its sequential position in the chain, the corresponding amino acid type and its sequential position in the protein. A Matlab script was written and implemented to translate the structure in music, as described below. We adopted an approach based on the computation of the centre of mass of each amino acid, which was identified and used as the basis for subsequent operations: this was done considering each atom composing every amino acid, its position in a 3D space and its mass. Therefore, the mass-weighed mean position of the atoms represented the amino acid centre of mass.
3 Results
3.1 Distance Matrix: Musical Chords The first important output obtained from the algorithm was the distance matrix, containing the lengths of the vectors connecting the amino acids one by one. The matrix is symmetrical by definition and features an interesting block structure (see Figure 2). The symmetry is explained simply considering that the distance
217
Fig. 2. Distance matrix. X and Y axis: sequential number of amino acid; Z axis: distance in pm (both vertical scale and colour scale).
Fig. 3. Sketch of the Matlab script operation
218
Fig. 4. Distance distributions. Histograms depicting the distance distribution in the three orders(X axis: distance in pm; Y axis: number of amino acid couples). Selection rules avoid the nearest-neighbours amino acids. Increasing the cut-off to third order it is possible to sample the second peak of the bi-modal distribution (diagonal lines).
between the i-th and j-th amino acid is the same of that between the j-th amino acid and the i-th one. The block structure is probably due to amino acid clustering in portions of the primary chain. We sought spatial correlations between amino acids as non-nearest-neighbour, hence ignoring those amino acids which are placed close one to the other along the primary sequence. By spatial correlation, we mean the closest distance between nonobviously linked amino acids. Running the sequence, the Matlab script looked for three spatial correlations (i.e. three minimal distances) involving three different amino acids (two by two), as sketched in Figure 3. The two couples for every position in the primary chain were then stored and the corresponding distance distribution was extracted and plotted (see Figure 4). Those spatial correlations, or distances, are addressed to as first, second and third order. The parallelism between the sequence order and the discretization order ubiquitous in every field of digital information theory emerges from the spatial description of the protein: the more precise is the observation of the amino acids in proximity of a certain point, the higher the order necessary to include every spatial relation. The same concept applies to digital music: the higher either the bit-rate (as in MP3 codification) or the sampling frequency (as in CD-DA), the higher the fidelity. The topmost limit is an order equal to n, the number of amino acids composing a protein.
219
3.2 Note Intensity: Musical Dynamics The note intensity is given in a range between 0 and 99. We assumed that the closer is the couple of amino acids, the higher is the intensity of the musical note. In order to play each order with an intensity comparable to the first order, characterized by the closest couples which may be found in the whole protein structure, we performed a normalization of the distance data within each order. In this way, normalized distance data, multiplied times 99, give the correct intensity scale. 3.3 Angle Distribution: Musical Rhythm The primary sequence was analyzed also to extrapolate the degree of folding, a measure of the local angle between segments ideally connecting the centres of mass of subsequent amino acids. Proteins composed by extended planar portions -sheet tend to have an angular distribution centred around 180. The angle distribution was extracted (see Figure 5) and parameterized as the note length: the more linear is the chain, the shorter is the note. This step gave rhythm to the generated music. In this way, the musical rhythm is intended as the velocity of an imaginary visitor running on the primary sequence. We would like to point out that this conversion features a third order cut-off, meaning that the spatial description fidelity is based on three spatial relations for each amino acid position; the higher is the cut-off, the higher the sound quality.
Fig. 5. Angular distribution. Histogram showing the angular distribution of the vectors linking the amino acids along the chain A of the human Thymidylate Synthase A (ThyA), used to codify each note length.
220
3.4 Output: Musical Score/Notation Finally, a conversion from each amino acid to the corresponding note was performed, generating an ASCII textual output that can be converted to a MIDI file with the GNMidi software [9]. Since amino acids properties influence the protein folding process, we adopted Dunns translation method [4] that is based on amino acids water solubility. The most insoluble residues were assigned pitches in the lowest octave, the most soluble, including the charged residues, were in the highest octave, and the moderately insoluble residues were given the middle range. Thus, pitches ranged over two octaves for a chromatic scale. After that, the MIDI files of the three orders were loaded in Ableton Live software [1] and assigned to MIDI instruments. We chose to assign the first order sequence of musical notes to a lead instrument (Glockenspiel) and to use a string emulator to play the other two sequences (see Figure 6). In this way it is possible to discern the primary sequence from the related musical texture that represents the amino acids involved in the 3D structure (See Additional Data File).
Fig. 6. Score of the musical translation of the chain A of the human Thymidylate Synthase A (ThyA). The three instruments represent respectively the primary sequence (Glockenspiel) and the two different amino acids correlated in the tertiary structure (String Bass, Viola).
4 Discussion
We obtained a polyphonic music by translating into musical notes the amino acid sequence of a peptide (the chain A of ThyA) and arranging them in chords by analyzing their spatial relationship (see Figure 7). To our knowledge, it is the first time that a team of scientists and musicians creates a polyphonic music that describes the entire 3D structure of a bio-molecule.
221
a.
PPHGELQYLGQIQHILRCGVRKD DRTGTGTLSVFGMQARYSLRDEF PLLTTKRVFWKGVLEELLWFIKGS TNAKELSSKGVKIWDANGSRDFL DSLGFSTREEGDLGPVYGFQWRH FGAEYRDMESDYSGQGVDQLQR VIDTIKTNPDDRRIIMCAWNPRDL PLMALPPCHALCQFYVVNSELSC QLYQRSGDMGLGVPFNIASYALLT YMIAHITGLKPGDFIHTLGDAHIYL NHIEPLKIQLQREPRPFPKLRILRK VEKIDDFKAEDFQIEGYNPHPTIK MEMAV
b.
P 273
L 74 E 30
Fig. 7. From tertiary protein structure to musical chord. The primary structure of ThyA (chain A), on top right of the figure, fold into its tertiary structure (a, b). In yellow an example of the amino acids composing a musical chord: E30, L74 and P273 are non-obviously linked amino acids accordingly to our 3D spatial analysis criteria.
Previous works, attempting to translate a protein in music, focused on primary or secondary protein structure and used different tricks to obtain a polyphonic music. Instead, the Matlab script we developed is able to analyze the PDB file that contains the spatial coordinates of each atom composing the amino acids of the protein. The computation of distances and other useful geometrical properties between non-adjacent amino acids, generates a MIDI file that codifies the 3D structure of the protein into music. In this way, the polyphonic music contains all the crucial information necessary to describe a protein, from its primary to its tertiary structure. Nevertheless, our analysis is fully reversible: by applying the same translation rules that are used to generate music, one can store, position by position, the notes (i.e. the amino acids) and obtain their distance. A first order musical sequence gives not enough information to recover the true protein structure, because there is more than one unique possibility to draw the protein. On the contrary, our approach, based on a third order musical sequence, has 3 times more data and describes one and only one solution to the problem of placing the amino acids in a 3D space.
5 Conclusions
Our work represents an attempt to communicate to a wider audience the complexity of the 3D protein structure, based on a rhythmic musical rendering. Biomusic can be an useful educational tool to depict the mechanisms that give rise to intracellular vital signals and determine cells fate. The possibility to hear the relations between amino acids and protein folding could definitely help students and a non technical auditory to understand the different facets and rules that regulate cells processes.
222
Moreover, several examples of interdisciplinary projects demonstrated that the use of an heuristic approach, sometimes perceived by the interacting audience as a game, can lead to interesting and useful scientific results [2, 5, 16]. We hope to bring musical harmony rules into the proteomic research field, encouraging a new generation of protein folding algorithms. Protein structure prediction, despite all the efforts and the development of several approaches, remains an extremely difficult and unresolved undertaking. We do not exclude that, in the future, musicality could be one of the driving indicators for protein folding investigation. Acknowledgments. The authors would like to acknowledge Smart Collective (www. smart-collective.com) and Prof. Fabrizio Pirri (Politecnico di Torino) for supporting.
References
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Ableton Live, http://www.ableton.com Cyranoski, D.: Japan plays trump card to get kids into science. Nature 435, 726 (2005) Dobson, C.M.: Protein folding and misfolding. Nature 426(6968), 884890 (2003) Dunn, J., Clak, M.A.: Life music: the sonification of proteins. Leonardo 32, 2532 (1999) Foldit, http://fold.it/portal Gena, P., Strom, C.: Musical synthesis of DNA sequences. XI Colloquio di Informatica Musicale, 203204 (1995) Gena, P., Strom, C.: A physiological approach to DNA music. In: CADE 2001, pp. 129 134 (2001) Gene2Music, http://www.mimg.ucla.edu/faculty/miller_jh/ gene2music/home.html GNMidi, http://www.gnmidi.com Hayashi, K., Munakata, N.: Basically musical. Nature 310(5973), 96 (1984) Hofstadter, D.R.: Gdel, Escher, Bach: An Eternal Golden Braid. Basic Books, New York (1979) ISBN 0465026567 Jensen, E., Rusay, R.: Musical representations of the Fibonacci string and proteins using Mathematica. Mathematica J. 8, 55 (2001) Mayor, S.: Serendipity and cell biology. Mol. Biol. Cell 21(22), 38083870 (2010) Ohno, S., Ohno, M.: The all pervasive principle of repetitious recurrence governs not only coding sequence construction but also human endeavor in musical composition. Immunogenetics 24, 7178 (1986) Takahashi, R., Miller, J.: Conversion of amino acid sequence in proteins to classical music: search for auditory patterns. Genome Biology 8(5), 405 (2007) The Space Game, http://www.thespacegame.org
15. 16.

Music Translation of Tertiary Protein Structure - Auditory Patterns of The Protein Folding

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Music Translation of Tertiary Protein Structure - Auditory Patterns of The Protein Folding

Caricato da

Copyright:

Formati disponibili

Music Translation of Tertiary Protein Structure: Auditory Patterns of the Protein Folding

Riccardo Castagna1,*, Alessandro Chiolerio1, and Valentina Margaria2

Amino Acid Sequence (Primary Structure)

Folded Protein (Tertiary Structure)

R. Castagna, A. Chiolerio, and V. Margaria