Sei sulla pagina 1di 170

Parrot Virtual Machine

A Book from English Wikibooks

PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Thu, 26 Sep 2013 13:32:25 UTC

Contents
Articles
Wikibooks:Collections Preface Introduction 1 3 7 7 10 14 17 17 19 21 31 34 37 38 42 43 43 47 53 60 64 67 70 72 72 74 75 80 81 84 84

Introduction To Parrot
Introduction Building Parrot Running Parrot

Programming For Parrot


Parrot Programming Parrot Assembly Language Parrot Intermediate Representation Parrot Magic Cookies Multithreading and Concurrency Exception Handling Classes and Objects The Parrot Debugger

Parrot Compiler Tools


Parrot Compiler Tools Parrot Grammar Engine Not Quite Perl Optables and Expressions Advanced PGE Building A Compiler HLL Interoperation

Parrot Hacking
Parrot Internals IMCC and PIRC Run Core Memory and Garbage Collection PMC System String System Exception Subsystem

IO Subsystem JIT and NCI Parrot Embedding Extensions Packfiles

85 85 85 86 86 87 87 87 87 88 89 90 90 98 98 103 103 104 108 112 119 126 133 141 149 156 162 162 163

Appendices
PIR Reference PASM Reference PAST Node Reference Languages on Parrot HLLCompiler Class Command Line Options Built-In PMCs Bytecode File Format VTABLE List

"Squaak" Language Tutorial


Squaak Tutorial Introduction Poking in Compiler Guts Squaak Details and First Steps PAST Nodes and More Statements Variable Declaration and Scope Scope and Subroutines Operators and Precedence Hash Tables and Arrays Wrap-Up and Conclusion

Resources and Licensing


Resources Licensing

References
Article Sources and Contributors Image Sources, Licenses and Contributors 164 166

Article Licenses
License 167

Wikibooks:Collections Preface

Wikibooks:Collections Preface
This book was created by volunteers at Wikibooks (http://en.wikibooks.org).

What is Wikibooks?
Started in 2003 as an offshoot of the popular Wikipedia project, Wikibooks is a free, collaborative wiki website dedicated to creating high-quality textbooks and other educational books for students around the world. In addition to English, Wikibooks is available in over 130 languages, a complete listing of which can be found at http:/ / www. wikibooks. org. Wikibooks is a "wiki", which means anybody can edit the content there at any time. If you find an error or omission in this book, you can log on to Wikibooks to make corrections and additions as necessary. All of your changes go live on the website immediately, so your effort can be enjoyed and utilized by other readers and editors without delay. Books at Wikibooks are written by volunteers, and can be accessed and printed for free from the website. Wikibooks is operated entirely by donations, and a certain portion of proceeds from sales is returned to the Wikimedia Foundation to help keep Wikibooks running smoothly. Because of the low overhead, we are able to produce and sell books for much cheaper then proprietary textbook publishers can. This book can be edited by anybody at any time, including you. We don't make you wait two years to get a new edition, and we don't stop selling old versions when a new one comes out. Note that Wikibooks is not a publisher of books, and is not responsible for the contributions of its volunteer editors. PediaPress.com is a print-on-demand publisher that is also not responsible for the content that it prints. Please see our disclaimer for more information: http://en.wikibooks.org/wiki/Wikibooks:General_disclaimer .

What is this book?


This book was generated by the volunteers at Wikibooks, a team of people from around the world with varying backgrounds. The people who wrote this book may not be experts in the field. Some may not even have a passing familiarity with it. The result of this is that some information in this book may be incorrect, out of place, or misleading. For this reason, you should never rely on a community-edited Wikibook when dealing in matters of medical, legal, financial, or other importance. Please see our disclaimer for more details on this. Despite the warning of the last paragraph, however, books at Wikibooks are continuously edited and improved. If errors are found they can be corrected immediately. If you find a problem in one of our books, we ask that you be bold in fixing it. You don't need anybody's permission to help or to make our books better. Wikibooks runs off the assumption that many eyes can find many errors, and many able hands can fix them. Over time, with enough community involvement, the books at Wikibooks will become very high-quality indeed. You are invited to participate at Wikibooks to help make our books better. As you find problems in your book don't just complain about them: Log on and fix them! This is a kind of proactive and interactive reading experience that you probably aren't familiar with yet, so log on to http:/ / en. wikibooks. org and take a look around at all the possibilities. We promise that we won't bite!

Wikibooks:Collections Preface

Who are the authors?


The volunteers at Wikibooks come from around the world and have a wide range of educational and professional backgrounds. They come to Wikibooks for different reasons, and perform different tasks. Some Wikibookians are prolific authors, some are perceptive editors, some fancy illustrators, others diligent organizers. Some Wikibookians find and remove spam, vandalism, and other nonsense as it appears. Most wikibookians perform a combination of these jobs. It's difficult to say who are the authors for any particular book, because so many hands have touched it and so many changes have been made over time. It's not unheard of for a book to have been edited thousands of times by hundreds of authors and editors. You could be one of them too, if you're interested in helping out.

Wikibooks in Class
Books at Wikibooks are free, and with the proper editing and preparation they can be used as cost-effective textbooks in the classroom or for independent learners. In addition to using a Wikibook as a traditional read-only learning aide, it can also become an interactive class project. Several classes have come to Wikibooks to write new books and improve old books as part of their normal course work. In some cases, the books written by students one year are used to teach students in the same class next year. Books written can also be used in classes around the world by students who might not be able to afford traditional textbooks.

Happy Reading!
We at Wikibooks have put a lot of effort into these books, and we hope that you enjoy reading and learning from them. We want you to keep in mind that what you are holding is not a finished product but instead a work in progress. These books are never "finished" in the traditional sense, but they are ever-changing and evolving to meet the needs of readers and learners everywhere. Despite this constant change, we feel our books can be reliable and high-quality learning tools at a great price, and we hope you agree. Never hesitate to stop in at Wikibooks and make some edits of your own. We hope to see you there one day. Happy reading!

Introduction

Introduction
What Is Parrot?
Parrot is a virtual machine (VM), similar to the Java VM and the .NET VM. However, unlike these two which are designed for statically-typed languages like Java or C#, Parrot is designed for use with dynamically typed languages such as Perl, Python, Ruby, or PHP. The Parrot VM itself is written in the C programming language, which means thatin theoryit will be portable to a large number of different computer architectures and operating systems. It is written to be easily modular and extensible. Programmers can write in any of the languages for which a Parrot-capable compiler exists. Modules written in one language, such as Perl, can transparently interoperate with modules which have originally been written in any of the other languages supported by Parrot. This easy interoperability and native support for cutting-edge dynamic programming features makes Parrot an important tool for next-generation language designers and implementers. It is precisely because Parrot is intended to support so many diverse high level languages that Parrot has developed a very general and feature-rich architecture. Much of the Parrot architecture is still under active development, so those parts will not be able to be properly discussed here in this book quite yet. Once Parrot reaches a stable release, and more details are set in stone, this book will be able to provide a more comprehensive coverage.

History of Parrot
The Parrot project was born from the Perl 6 development project. As such, the history of Parrot, at least the early history of it, is closely tied to the history of Perl 6. In fact, understanding just how large and ambitious Perl 6 is, you'll start to understand why Parrot must have all the features it has. It was famously quoted about version 5 of the Perl programming language that "nothing can parse Perl but perl". The implication was that the perl executable was the only program that could reliably parse the Perl programming language. There were two reasons for this. First, the Perl language didn't follow any formal specification; The behavior of the perl interpreter was the definitive documentation for the actions of Perl. Second, the Perl programming language allowed the use of source filters, programs which could modify their own source code prior to execution. This means that to reliably parse and understand a Perl program, you needed to be able to execute the source filters reliably. The only program that could do both was perl. The next planned version of Perl, Perl 6, was supposed to be a major rewrite of the language. In addition to standardizing and bringing sanity to all the features which had slowly entered the language grammar, it was decided that Perl 6 would be a formal specification first, and implementations of that specification later. The name "Parrot" was first used as an April Fool's joke. The story claimed that the Perl and Python languages (which are competitors, and which were both undergoing major redesigns) were going to merge together into a single language named Parrot. This was, of course, a hoax, but the idea was a powerful one. When the project was started to create a virtual machine that would be capable of running not only Perl 6, but also Python and other dynamic languages, the name Parrot was a perfect fit. The first release of Parrot, 0.0.1, was released in September 2001. The development team has prepared a stable point release on the third Tuesday of every month.

Introduction

The Parrot Foundation


The Parrot Foundation was established in mid 2008 to serve as an advocate for Parrot. The Parrot Foundation is a non-profit charity organization in the United States, and donations to the foundation are tax-deductable. Prior to the creation of the Parrot Foundation, Parrot was managed and overseen by the Perl Foundation. This relationship was historical in nature, due to the fact that Parrot was originally intended just to be the backend for the Perl 6 programming language. Since Parrot has grown beyond that, and is attempting to deal equally with all high-level dynamic programming languages, it was decided to become separate from the Perl Foundation. Parrot's website is http://www.parrot.org

Who Is This Book For?


This book is for readers at the intermediate to advanced level with a solid background in computer programming. Perl Programming would be a good start, although a background in any dynamic language would be helpful. Having a background in Compiler Construction, Regular Expressions, or the compiler-building tools Lex and Yacc would also be a benefit. For the sections about Parrot hacking, a background knowledge of C Programming is required.

What Will We Cover?


This book is going to serve as, at least, a basic introduction to the Parrot Virtual Machine. We will cover basic programming for Parrot in the lowest-level languages that it supports: PIR and PASM. We will also discuss one of the greatest strengths of the Parrot platform, the Parrot Compiler Tools (PCT), which enables compilers to be written easily for higher-level languages like Perl and Python. Later sections will actually delve into the Parrot internals, and discuss how Parrot works and how to contribute code for the Parrot development project. Extensive reference materials at the end of the book will try to keep track of the information that is most necessary for developers.

Where To Get More Information


The definative source for Parrot information and documentation is the Parrot project website, http:/ / www. parrot. org.Parrot programmers, hackers, and enthusiasts also chat in the Parrot IRC chatroom [1].

How To Get Involved In Parrot Development


The Parrot development process is large and varied. Depending on skill level, there are many opportunities for a person to get involved in Parrot development. Here are some examples: If you are good at C programming If you know C programming, help is always needed to work on Parrot. In addition to normal development tasks, there are bug reports to resolve, compile errors to fix, new platforms to port to, and optimizations to perform. Parrot needs to be ported to many different systems, and it needs to be properly tested on all of them. If you are good with Perl programming Much of the Parrot build tools are written in Perl 5. However, there is also a massive development effort to support the Perl 6 project. An intermediate language which is similar to Perl 6 but with many missing features called Not Quite Perl (NQP) is used to implement compilers for higher-level languages. If you are good with Perl and are willing to learn Perl 6 and NQP, there is a lot of compiler-implementation work that needs to be done. If you are good with system administration

Introduction Parrot needs to be built and tested regularly. People are always needed who are willing to perform regular builds and tests of Parrot. If you are willing to set up automated build bot to perform regular builds and tests, that's even better. If you can write This book needs your help, and anybody can edit it. Also, there are a number of other book-writing projects concerning Parrot that are looking for active authors and editors. The more is written about Parrot, the more new users will be able to learn about it. If you don't fall cleanly into any of these categories, there are other opportunities to help as well. This might be a good opportunity for you to learn a new skill, like programming Perl 6, PIR, or NQP. If you are interested in writing or editing, you can help with this wikibook too!

Parrot Developers
There are several different roles that people have taken up in Parrot development, even though there is no centralized management hierarchy. Volunteers tend to fulfill certain roles that they enjoy and that they have skill at. Architect The Parrot Architect, currently w:Allison Randal, is in charge with laying out the overall design specifications for Parrot. The architect has the final say in important decisions and is responsible to ensure that design documents are up to date. By laying out the overall requirements of the system, other volunteers are able to contribute to areas where they are most interested. Pumpking The Pumpking has oversight of the Parrot source repository, and is also the lead developer. The Pumpking defines coding standards that all contributors must follow, and helps to coordinate other contributors. Release Managers Parrot has a schedule of making releases approximately once a month. The release manager oversees this process, and ensures that releases are high quality. Release managers will control when new features can be added, and when code should be frozen for debugging. Pre-release debugging sessions are very productive and important periods for Parrot development, and ensure that many bugs get fixed between each release. Committer A committer is a person with write access to the Parrot SVN repository. Committers typically have submitted several patches and participated in Parrot-related discussions. Metacommitter A metacommitter is a person who has write access to the Parrot SVN repository and is also capable of promoting new committers. The architect and the Pumpking are automatically metacommitters, but there are several others too. Among the above groups, there are other designations as well. This is because many committers tend to focus their efforts on a relatively small portion of the Parrot development effort. Core Developer A person who works on Parrot internals, typically one or two subsystems. Core developers need to be skilled in C programming, and also need to work with many development utilities written in Perl. Compiler Developer These developers, like the Core Developers are working on the internals of Parrot, typically by writing lots of C code. In contrast, Compiler Developers focus their effort on the various compiler front-ends such as IMCC, PIRC, PGE, or TGE.

Introduction High-Level Language Developer A high-level language developer is a person who is working to implement a high-level language on Parrot. Even though they have commit access to the whole repository, many high-level language developers will focus only on a single language implementation. High-level language developers need to be skilled in PCT and many of the Perl 6-based development tools for HLLs. Build Managers Build managers help to create and maintain tools that other developers rely on. Testers Testers create and maintain a suite of hundreds and thousands of tests to verify the operations of Parrot, its subsystems, its compilers and the high-level languages that run on it. Platform Porters A platform porter ensures that Parrot can be built on multiple platforms. Porters must build and test Parrot on different platforms, and also create and distribute pre-compiled installation packages for different platforms. This certainly isn't an exhaustive list of possible roles either. If you have programming skills, but don't know if you fit in well to any of the designations above, your help is still needed.

Resources
http://www.parrotcode.org/docs/intro.html http://www.parrotcode.org/docs/roadmap.html http://www.parrotcode.org/docs/parrothist.html

References
[1] irc:/ / irc. perl. org/ Parrot

Introduction To Parrot
Introduction
What Is Parrot?
Parrot is a virtual machine (VM), similar to the Java VM and the .NET VM. However, unlike these two which are designed for statically-typed languages like Java or C#, Parrot is designed for use with dynamically typed languages such as Perl, Python, Ruby, or PHP. The Parrot VM itself is written in the C programming language, which means thatin theoryit will be portable to a large number of different computer architectures and operating systems. It is written to be easily modular and extensible. Programmers can write in any of the languages for which a Parrot-capable compiler exists. Modules written in one language, such as Perl, can transparently interoperate with modules which have originally been written in any of the other languages supported by Parrot. This easy interoperability and native support for cutting-edge dynamic programming features makes Parrot an important tool for next-generation language designers and implementers. It is precisely because Parrot is intended to support so many diverse high level languages that Parrot has developed a very general and feature-rich architecture. Much of the Parrot architecture is still under active development, so those parts will not be able to be properly discussed here in this book quite yet. Once Parrot reaches a stable release, and more details are set in stone, this book will be able to provide a more comprehensive coverage.

History of Parrot
The Parrot project was born from the Perl 6 development project. As such, the history of Parrot, at least the early history of it, is closely tied to the history of Perl 6. In fact, understanding just how large and ambitious Perl 6 is, you'll start to understand why Parrot must have all the features it has. It was famously quoted about version 5 of the Perl programming language that "nothing can parse Perl but perl". The implication was that the perl executable was the only program that could reliably parse the Perl programming language. There were two reasons for this. First, the Perl language didn't follow any formal specification; The behavior of the perl interpreter was the definitive documentation for the actions of Perl. Second, the Perl programming language allowed the use of source filters, programs which could modify their own source code prior to execution. This means that to reliably parse and understand a Perl program, you needed to be able to execute the source filters reliably. The only program that could do both was perl. The next planned version of Perl, Perl 6, was supposed to be a major rewrite of the language. In addition to standardizing and bringing sanity to all the features which had slowly entered the language grammar, it was decided that Perl 6 would be a formal specification first, and implementations of that specification later. The name "Parrot" was first used as an April Fool's joke. The story claimed that the Perl and Python languages (which are competitors, and which were both undergoing major redesigns) were going to merge together into a single language named Parrot. This was, of course, a hoax, but the idea was a powerful one. When the project was started to create a virtual machine that would be capable of running not only Perl 6, but also Python and other dynamic languages, the name Parrot was a perfect fit. The first release of Parrot, 0.0.1, was released in September 2001. The development team has prepared a stable point release on the third Tuesday of every month.

Introduction

The Parrot Foundation


The Parrot Foundation was established in mid 2008 to serve as an advocate for Parrot. The Parrot Foundation is a non-profit charity organization in the United States, and donations to the foundation are tax-deductable. Prior to the creation of the Parrot Foundation, Parrot was managed and overseen by the Perl Foundation. This relationship was historical in nature, due to the fact that Parrot was originally intended just to be the backend for the Perl 6 programming language. Since Parrot has grown beyond that, and is attempting to deal equally with all high-level dynamic programming languages, it was decided to become separate from the Perl Foundation. Parrot's website is http://www.parrot.org

Who Is This Book For?


This book is for readers at the intermediate to advanced level with a solid background in computer programming. Perl Programming would be a good start, although a background in any dynamic language would be helpful. Having a background in Compiler Construction, Regular Expressions, or the compiler-building tools Lex and Yacc would also be a benefit. For the sections about Parrot hacking, a background knowledge of C Programming is required.

What Will We Cover?


This book is going to serve as, at least, a basic introduction to the Parrot Virtual Machine. We will cover basic programming for Parrot in the lowest-level languages that it supports: PIR and PASM. We will also discuss one of the greatest strengths of the Parrot platform, the Parrot Compiler Tools (PCT), which enables compilers to be written easily for higher-level languages like Perl and Python. Later sections will actually delve into the Parrot internals, and discuss how Parrot works and how to contribute code for the Parrot development project. Extensive reference materials at the end of the book will try to keep track of the information that is most necessary for developers.

Where To Get More Information


The definative source for Parrot information and documentation is the Parrot project website, http:/ / www. parrot. org.Parrot programmers, hackers, and enthusiasts also chat in the Parrot IRC chatroom [1].

How To Get Involved In Parrot Development


The Parrot development process is large and varied. Depending on skill level, there are many opportunities for a person to get involved in Parrot development. Here are some examples: If you are good at C programming If you know C programming, help is always needed to work on Parrot. In addition to normal development tasks, there are bug reports to resolve, compile errors to fix, new platforms to port to, and optimizations to perform. Parrot needs to be ported to many different systems, and it needs to be properly tested on all of them. If you are good with Perl programming Much of the Parrot build tools are written in Perl 5. However, there is also a massive development effort to support the Perl 6 project. An intermediate language which is similar to Perl 6 but with many missing features called Not Quite Perl (NQP) is used to implement compilers for higher-level languages. If you are good with Perl and are willing to learn Perl 6 and NQP, there is a lot of compiler-implementation work that needs to be done. If you are good with system administration

Introduction Parrot needs to be built and tested regularly. People are always needed who are willing to perform regular builds and tests of Parrot. If you are willing to set up automated build bot to perform regular builds and tests, that's even better. If you can write This book needs your help, and anybody can edit it. Also, there are a number of other book-writing projects concerning Parrot that are looking for active authors and editors. The more is written about Parrot, the more new users will be able to learn about it. If you don't fall cleanly into any of these categories, there are other opportunities to help as well. This might be a good opportunity for you to learn a new skill, like programming Perl 6, PIR, or NQP. If you are interested in writing or editing, you can help with this wikibook too!

Parrot Developers
There are several different roles that people have taken up in Parrot development, even though there is no centralized management hierarchy. Volunteers tend to fulfill certain roles that they enjoy and that they have skill at. Architect The Parrot Architect, currently w:Allison Randal, is in charge with laying out the overall design specifications for Parrot. The architect has the final say in important decisions and is responsible to ensure that design documents are up to date. By laying out the overall requirements of the system, other volunteers are able to contribute to areas where they are most interested. Pumpking The Pumpking has oversight of the Parrot source repository, and is also the lead developer. The Pumpking defines coding standards that all contributors must follow, and helps to coordinate other contributors. Release Managers Parrot has a schedule of making releases approximately once a month. The release manager oversees this process, and ensures that releases are high quality. Release managers will control when new features can be added, and when code should be frozen for debugging. Pre-release debugging sessions are very productive and important periods for Parrot development, and ensure that many bugs get fixed between each release. Committer A committer is a person with write access to the Parrot SVN repository. Committers typically have submitted several patches and participated in Parrot-related discussions. Metacommitter A metacommitter is a person who has write access to the Parrot SVN repository and is also capable of promoting new committers. The architect and the Pumpking are automatically metacommitters, but there are several others too. Among the above groups, there are other designations as well. This is because many committers tend to focus their efforts on a relatively small portion of the Parrot development effort. Core Developer A person who works on Parrot internals, typically one or two subsystems. Core developers need to be skilled in C programming, and also need to work with many development utilities written in Perl. Compiler Developer These developers, like the Core Developers are working on the internals of Parrot, typically by writing lots of C code. In contrast, Compiler Developers focus their effort on the various compiler front-ends such as IMCC, PIRC, PGE, or TGE.

Introduction High-Level Language Developer A high-level language developer is a person who is working to implement a high-level language on Parrot. Even though they have commit access to the whole repository, many high-level language developers will focus only on a single language implementation. High-level language developers need to be skilled in PCT and many of the Perl 6-based development tools for HLLs. Build Managers Build managers help to create and maintain tools that other developers rely on. Testers Testers create and maintain a suite of hundreds and thousands of tests to verify the operations of Parrot, its subsystems, its compilers and the high-level languages that run on it. Platform Porters A platform porter ensures that Parrot can be built on multiple platforms. Porters must build and test Parrot on different platforms, and also create and distribute pre-compiled installation packages for different platforms. This certainly isn't an exhaustive list of possible roles either. If you have programming skills, but don't know if you fit in well to any of the designations above, your help is still needed.

10

Resources
http://www.parrotcode.org/docs/intro.html http://www.parrotcode.org/docs/roadmap.html http://www.parrotcode.org/docs/parrothist.html

Building Parrot
Obtaining Parrot
The most recent development release of Parrot can be downloaded from CPAN [1]. Development of Parrot is controlled through the SVN repository at http:/ / svn. parrot. org/ parrot/ . The most up-to-date version of Parrot can be obtained from https://svn.parrot.org/parrot/trunk/via svn checkout.

Building Parrot From Source


Parrot is currently available as a source code download, although some people are trying to maintain precompiled versions for download as well. These versions are typically available for Windows, Cygwin, Debian, and Red Hat. Other binary distributions may be added in the future. Instructions for installing a precompiled binary distribution of Parrot for your system vary depending on the particular platform and the method in which it was bundled. Consult the accompanying documentation for any distribution you download for more details. This page will not discuss these particular distributions, only the method of building Parrot from the original source code. On a Windows platform, substitute the freely-available nmake instead of make. Currently the Parrot build process requires the use of make, a working C compiler, and a working Perl 5 installation. Perl should be version 5.8 or higher. Without these things, it will not be possible for you to build Parrot. Automated testing is performed on a variety of systems with various combinations of these tools, and any particular revision should be able to be compiled properly. If you have problems compiling Parrot on your system, send an email to the Parrot Porters mailing list with details of the problems, and one of the Parrot developers will try to help fix it.

Building Parrot

11

Configure.pl
Notice that Configure.pl has a capitalized first letter. This is an important distinction on Unix and Linux systems which are case sensitive. The first step in building Parrot is to run the Configure.pl script which will perform some basic tests on your system and produce a makefile. To automatically invoke Configure.pl with the most common options, run the program Makefile.pl instead. The configuration process performs a number of tests on your system to determine some important parameters. These tests may make several minutes on some systems, so be patient. In addition, configuration creates a number of platform-specific code files for your system. Without these generated files in place, the build process cannot procede. After Configure.pl is finished executing, you should have a file named Makefile (with no suffix). From the shell, go to the Parrot directory and type the command "make" or "nmake" on Windows. This will start the process to build Parrot. The Parrot build process could take several minutes because there are a number of steps. We will discuss some of these steps in a later section.

MANIFEST
The root directory of Parrot contains a file called MANIFEST. MANIFEST contains a list of all necessary files in the Parrot repository. If you add a new file to the Parrot source tree, make sure to add that file to MANIFEST. Configure.pl checks MANIFEST to ensure all files exist properly before attempting to build.

Configure.pl Options
Depending on what tasks you want to perform, or how you are using Parrot, there are a number of options that can be specified to Configure.pl. These options may change the makeup of several generated files, including the Makefile. Here, we will list some of these options:
--help --version --verbose --fatal --silent --nomanicheck --languages --ask --test Shows a help message Prints version information about Configure.pl Prints extra information to the console If any step fails, kill Configure immediately and do not run additional tests No output to the console Do not check the file MANIFEST to ensure all files exist. Specify a comma-separated list of languages to build also, after Parrot has been built. Ask the user for answers to common questions, instead of running probes. Test the configuration tools first, then Configure, then the build tools. Use --test=configure to test the configuration tools then run Configure.pl. Use--test=build to run Configure.pl and then also test the build tools. set --debugging=0 to turn off debugging. Debugging is on by default. Specify whether your C compiler supports inline code using the C inline keyword. Compile Parrot using compiler optimizations, and a few other speed-up tricks. Creates a faster bird, but may expose more errors and failures. Use --optimize=(flags) to specify compiler optimization flags to use. Link Parrot dynamically to libparrot, instead of linking statically. On a 64-bit platform, compile a 32-bit Parrot. Turn on profiling. Only used with the GCC compiler, for now. Turn on additional warnings, for the Cage Cleaners.

--debugging --inline --optimize

--parrot_is_shared --m=32 --profile --cage

Building Parrot

12
Specify the compiler to use. For instance, --cc=gcc for the GCC compiler, and --cc=cl for Microsoft's C++ compiler. Use --ccflags to specify any additional compiler flags, and --ccwarn to turn on any additional warnings. Here are some more options: 1. 2. 3. 4. 5. 6. 7. To build Parrot with a C++ compiler, use --cxx to specify the compiler to use. Use --libs to specify any additional libraries to link Parrot with. Use --link to specify a linker Use --linkflags to send options to the linker Use --ld to select a loader Use --ldflags to send flags to the loader Use --make to specify what make utility to use

--cc

--intval --floatval --opcode --ops --pmc --without-gmp --without gdbm --without-opengl --without-crypto --icu-config --without-icu --maintainer

Set the C data types to use for each value. Notice that --intval and --opcode must be the same, or strange errors may result.

Specify any optional OPS files to build. Specify any optional PMC files to build. Do not use Build Parrot without GMP Build Parrot without OpenGL support Build Parrot witout the cryptography library Specify a location for the Unicode ICU library on your system. Build Parrot without ICU and Unicode support. Compile IMCC's tokenizer and parser using Lex and Yacc (or equivalent). Use --lex to specify the name of the lexer, nd --yacc to specify the name of the parser. Build miniparrot Specify a path prefix Specify an execution path prefix The directory for binary executable files on your system The system admin executables folder Program executables folder read-only data directory for machine-independent data. read-only data that is machine dependent modifiable architecture-independent data directory modifiable architecture-dependet data directory Object code directory Folder for Compiler include files C header file directory for old versions of GCC info documentation directory Man pages docmentation folder

--miniparrot --prefix --exec-prefix --bindir --sbindir --libexecdir --datadir --sysconfdir --sharedstatedir --localstatedir --libdir --includedir --oldincludedir --infodir --mandir

Building Parrot

13

Parrot Executable
After the build process you should have, among other things, an executable file for Parrot. This will be, on Windows systems, named parrot.exe. On other systems, it may be named slightly differently, such as with no suffix. Two other programs of interest are created, miniparrot.exe and libparrot.dll. These files will be named something different if you are not on a Windows system.

Make Targets
For readers who are not familiar with the make program, it is a program which can be used to automatically determine how to build a software project from source code files. In a makefile, you specify a list of dependencies, and the method for producing one file from others. make then determines the order and method to build your project. make has targets, which means a single makefile can have multiple goals. For Parrot, a number of targets have been defined which can help with building, debugging, and testing. Here are a list of some of the make targets:
Command make make clean Explanation Builds Parrot from source. Only rebuilds components that have changed from the last build. removes all the intermediate files that are left over from the build process. Cleans the directory tree so that Parrot can be completely rebuilt. Completely removes all temporary files, all intermediate files, and all makefiles. After a make realclean command, you will need to run Configure.pl again. Builds Parrot, if needed, and runs the test suite on it. If there are errors in the test results, you can try to fix them yourself, or you can submit a bug report to the Parrot developers. This is always appreciated. Build Parrot, if needed, and runs the test suit on every run core. This can be a very time-consuming operation, and is typically only performed prior to a new release. Performs smoke testing. This runs the parrot test suite and attempts to transmit the test results directly to the Parrot development servers. Smoke test results help the developers to keep track of the systems where Parrot is building correctly.

make realclean make test

make fulltest make smoke

Submitting Bugs and Patches


As we mentioned above, smoke testing is an easy way for you to help submit information about Parrot on your system. Since Parrot is supposed to support so many different computer architectures and operating systems, it can be difficult to know how Parrot is performing on all of them. Besides smoke testing, there are a number of ways that you can submit a bug report to Parrot. If you are a capable programmer, you may be interested in trying to make fixes and submit patches as well.

Resources
http://www.parrotcode.org/docs/gettingstarted.html

References
[1] http:/ / www. parrotcode. org/ release/ devel

Running Parrot

14

Running Parrot
Running Parrot
Parrot can be run from the command line in a number of modes with a number of different options. There are three forms of input that Parrot can work with directly: Parrot Assembly Language (PASM), which is a low-level human readable assembly language for the virtual machine, Parrot Intermediate Representation (PIR) which is a syntactic overlay on PASM with nicer syntax for some expressions, and Parrot Bytecode (PBC) which is a compiled binary input format. PIR and PASM are converted to PBC during normal execution. Only PBC can be executed by Parrot directly. The compilation stage to convert PIR or PASM to PBC takes some time, and can be done separately. We'll be talking about these processes a little later.

Parrot Information
To get information about the current Parrot version, type: parrot -V To get a list of command-line options and their purposes, type: parrot -h We'll discuss all the various command-line options later in this book, but it's always good to have multiple resources when a question pops up.

File Types
Files that end in .pbc are treated as parrot bytecode files and are executed immediately. Files that end in .pir or .pasm are treated as PIR or PASM source code files, respectively, and interpreted. To compile PIR or PASM into bytecode, use the -o switch, as such: parrot -o output.pbc input.pir or parrot -o output.pbc input.pasm Notice that if we use a .pasm file extension, we can output to PASM instead of PBC: parrot -o output.pasm input.pir To force output of PBC even if the output file does not have a .pbc extension, use the --output-pbc switch. To run the generated PBC file after you generate it, use the -r switch. To force a file to be run as PASM regardless of the file extension, use the -a switch. To force a file to be run as a PBC file, regardless of the file extension, use the -c switch.

Running Parrot

15

Runtime Options
Parrot can operate with a number of additional options too.

Optimizations
Optimizations can take time to perform, but increase the execution speed of the resulting program. For simple programs, short and sloppy one-time programs, extensive optimizations might not make much sense. You would spend more time optimizing a piece of software then you even spend executing it. However, for programs which are run frequently, or for very large programs, or programs which must run continuously with good performance, optimizations can be a valuable thing. Compile a program once with optimizations, and the output optimized bytecode can be saved to disk, never needing to be optimized again (unless Parrot integrates better optimizations). Parrot has multiple optimization options, depending on the extensiveness of the optimizations to be performed. Each can be activated using different commandline switches in the form -Ox where the x is a character representing the type of optimization to perform:
Flag -O0 Description no optimizations, this is the default mode

-O1 or -O optimizations without life info (e.g. branches) -O2 -Op -Ot -Oc optimizations with life info rewrite I and N PASM registers most used first select fastest runcore (default with -O1 and -O2) turns on the optional/experimental tail call optimizations

Life info is an analysis step where code and data is traced to determine control flow patterns and lifetimes of local variables. Knowing the areas where certain variables are used and not used enables registers to be reused instead of having to allocate new ones. Knowing when certain code is unreachable enables the optimizer to ignore it completely.

Run Cores
The run core is the central loop of the Parrot program, and there are several different runcores available that specify the performance and capabilities of Parrot. Runcores determine how parrot executes the bytecode instructions that are passed into the interpreter. Runcores can perform certain tasks such as bounds-checking, testing, or debugging. Other runcores have been optimized to operate extremely quickly. Implementation details about the various cores can be found in src/runops_cores.c. Different cores can be activated by passing particular switches at the command-line. The sections below will discuss the various runcores, what they do, how they work, and how to activate them.

Running Parrot

16

Basic Cores
Slow Core The default "slow" core treats all ops as individual C functions. Each function is called, and returns the address of the next instruction operation. Many cores, such as the tracing and debugging cores, are based on the slow core design. Fast core The fast core is a bare-bones core that does not perform any special operations such as tracing, debugging, or bounds-checking. Computed Goto Core Computed goto is a feature of some compilers that allows a goto instruction to target a variable containing the address of a label, not necessarily directly to a label. By caching the addresses of all labels into an array, a jump can be made directly to the necessary instructions. This avoids the overhead of multiple subroutine calls, and can be very quick on platforms that support it. For more information about the workings of the computed-goto runcore, see the generated file src/ops/core_ops_cg.c. Switch Core The switch core uses the standard C switch and case structure to select the next operation to run. At each iteration, a switch is performed, and each case represents one of the ops. After the op has been performed, control flow jumps back to the top of the switch and the cycle repeats. Switch statements, especially those that use many consecutive values, are typically converted by the compiler into jump tables which perform very similarly to computed-goto jumps.

Variant Cores
The above cores are the basic designs upon which other specialized cores are based.

mod_parrot
Some members of the Parrot team have developed an extension for the Apache webserver that allows Parrot to be used to generate server-side content. The result of this work is mod_parrot, which can be used to produce web sites using PIR or PASM. This has limited usefulness by itself. However, mod_parrot allows the creation of additional modules for languages with compilers that target parrot. One notable module like this, mod_perl6 is a bytecode module that runs on top of mod_parrot. More information about mod_parrot is available at it's website: http://www.parrot.org/mod_parrot

17

Programming For Parrot


Parrot Programming
Parrot Programming
The Parrot Virtual Machine (PVM) can be programmed using a variety of languages, scripts, and techniques. This versatility can be confusing at first. Here are some ways that Parrot can be programmed: 1. Parrot Assembly Language (PASM). This is the lowest human-readable way to program Parrot, and is very similar to traditional assembly languages. 2. Parrot Intermediate Representation (PIR). This is a higher-level language which is easier to program in than PASM, and significantly more common. 3. Not Quite Perl (NQP). This is a bare-bones partial implementation of the Perl 6 language, which is designed for bootstrapping. It is higher-level than PIR and has many of the features and the capabilities of Perl 6. At the moment NQP is not fully-featured and must be compiled separately into bytecode before it can be run on Parrot. 4. Custom Languages. Using the Parrot Compiler tools (PCT), new dynamic languages can easily be implemented on Parrot. Once a parser and libraries have been created for a language, that language can be used to program for Parrot. Many common programming languages, including Perl 6, Python, and JavaScript (ECMAScript) are being implemented on Parrot. We will discuss more languages in a later section.

Programming Steps
There are a number of different methods to program Parrot, as we see in the list above. However, different programming methods require different steps. Here, we will give a very brief overview of some of the ways you program Parrot. PASM and PIR A program written in PASM or PIR, such as Foo.pasm or Bar.pir, can be run in one of two different ways. They can be interpreted directly by typing (on most Windows and Unix/Linux systems): ./parrot Foo.pasm or ./parrot Bar.pir This will run Parrot in interpreter mode. However, we can compile these programs down to Parrot Bytecode (PBC) using the following flags: ./parrot -o Foo.pbc Foo.pir ./parrot -o Bar.pbc Bar.pir Of course, you can name the output files anything you want. Once you have a PBC file, you can run it like this: ./parrot Foo.pbc NQP

Parrot Programming NQP must be compiled down to PIR using the NQP compiler. This is located in the compilers/nqp directory of the Parrot repository High Level Languages To program parrot in a higher-level language than NQP or PIRsuch as Perl 6, Python, or Rubythere must first be a compiler available for that language. To run file Foo.pl for example (".pl" is the file extension for Perl programs), you would type: ./parrot languages/perl6/perl6.pbc Foo.pl This runs the Perl 6 compiler on Parrot, and passes the file name Foo.pl to the compiler. To output a file into PIR or PBC, you would use the --target= option to specify an output format.

18

Virtual Machines?
One term that we are going to be using frequently in this book is "Virtual Machine", or VM for short. It's worth discussing now what exactly a VM is. Before talking about virtual machines, let's consider actual computer hardware first. In an ordinary computer system, a native machine, there is a microprocessor which takes instructions and performs the necessary actions. Those instructions are written in a high level language and compiled into the binary machine code that the processor uses. The problem with this is that different types of processors use a different machine code, and to get a program to run on different platforms it needs to be recompiled for each. A virtual machine, unlike a regular computer processor, is built using software, not hardware. The virtual machine is written in a high level language, and compiled to machine code as usual. However, programs that run on the virtual machine are compiled into bytecode instead of machine code. This bytecode runs on top of the virtual machine, and the virtual machine converts it into processor instructions. Here is a table that summarizes some of the important differences between a virtual machine and a native machine:
Native Machine Implementation Speed of Execution Native Machine Code Programming Portability Hardware Fast Must compile every program into native machine code Every program must be recompiled on every new hardware platform Impossible Software Slow Must only compile the virtual machine into native machine code, everything else is converted to bytecode Programs only need to be compiled into bytecode once, and can run anywhere a VM is installed Virtual machines can be improved, extended, patched, and added to over time Virtual Machine

Extensibility

Parrot Assembly Language

19

Parrot Assembly Language


Parrot Assembly Language
The Parrot Virtual Machine (PVM) operates on a special-purpose bytecode format. All instructions in PVM are converted into bytecode instructions to be operated on by the virtual machine. In the same way that ordinary assembly languages share a one-to-one correspondence to the underlying machine code words, so too do the Parrot bytecode words have a similar correspondence to a Parrot Assembly Language (PASM). The PASM is very similar to traditional assembly languages, except that the instructions provide access to many of the dynamic and high-level features of the Parrot system.

Instruction Types and Operands


Internally to parrot, there are many many different instructions. Some instructions are just variations of each other with the same behavior, but different arguments. For instance, there are instructions for: add_n_n_n add_i_i_i add_i_i_ic add_n_n_nc The letters after the name of the instruction specify what kinds of operands that instruction requires. add_i_i_ic takes an integer (i) and an integer constant (ic) and returns an integer (i). add_n_n_n takes two floating point numbers and returns a floating point number. In PASM, when you write the following statement: add $I0, $I1, $I2 Parrot looks up the appropriate instruction from the list, add_i_i_i and calls it. The user sees 1 instruction, "add", but Parrot actually has multiple instructions and decides which to use automatically for you. If you type the following into Parrot: add $P0, $I1, $I2 You will get an error message that there is no such instruction add_p_i_i. This should help you debug your programs.

Parrot Assembly Basics


Parrot is a register-based virtual machine. There are an undetermined number of registers that do not need to be instantiated before they are called. The virtual machine will make certain to create registers as they are needed, and rearrange them as makes sense to do so. Register names are lexically scoped, so register "$P0" in one function is not necessarily the same data location as register "$P0" in another function. All registers start with a "$" sign. Following the "$", called the "sigil", there is a letter that denotes the data type of the register, followed by the register number. There are 4 types of data items, each with a unique register character identifier. These are: String String registers start with an "S". String registers can be named things like "$S0" or "$S100".

Parrot Assembly Language Integer Integer registers start with an "I". Integer registers can be named things like "$I0" or "$I56". Number Floating point number registers, registers which can hold a floating point number, start with a letter "N". These registers can be named things like "$N0" or "$N354". PMC PMCs are advanced object-oriented data types, and a PMC register can be used to hold many different kinds of data. PMC registers start with a "P" identifier, and can be named things like "$P0" or "$P35".

20

Basic Statements
A basic PASM statement contains an optional label, an instruction mnemonic, and a series of comma-separated arguments. Here is an example: my_label: add_n $P0, $P1, $I1 In this example the add_n instruction performs addition on two registers and stores the result in a third. The values from $P1 and $I1 are added together, and the result is stored in $P0. Notice that the operands are different types. One of the arguments, and the result are both PMC registers, but the second operand is an integer and the add_n instruction is an integer instruction. Parrot will automatically handle data type conversions as necessary when performing instructions like this. The only thing that is required is that it is possible to convert between two data types. If it is possible, Parrot will handle the details. In some cases, however, automatic type conversions are not possible and in these cases Parrot will raise an exception.

Directives
PASM has few available directives. .pcc_sub This directive defines the start of a new subroutine.

Resources
http://www.parrotcode.org/docs/pdd/pdd06_pasm.html

Parrot Intermediate Representation

21

Parrot Intermediate Representation


Parrot Intermediate Representation
The Parrot Intermediate Representation (PIR) is similar in many respects to the C programming language: It's higher-level than assembly language but it is still very close to the underlying machine. The benefit to using PIR is that it's easier to program in than PASM, but at the same time it exposes all of the low-level functionality of Parrot. PIR has two purposes in the world of Parrot. The first is to be used as a target for automatic code generators from high-level languages. Compilers for high-level languages emit PIR code, which can then be interpreted and executed. The second purpose is to be a low-level human-readable programming language in which basic components and Parrot libraries can be written. In practice, PASM exists only as a human-readable direct translation of Parrot's bytecode, and is rarely used to program by humans directly. PIR is used almost exclusively to write low-level software for Parrot.

PIR Syntax
PIR syntax is similar in many respects to older programming languages such as C or BASIC. In addition to PASM-like operations, there are control structures and arithmetic operations which simplify the syntax for human readers. All PASM is legal PIR code, PIR is almost little more then an overlay of fancy syntax over the raw PASM instructions. When available, you should always use PIR's syntax instead of PASM's for ease. Even though PIR has more features and better syntax then PASM, it is not itself a high-level language. PIR is still very low-level and is not really intended for use building large systems. There are many other tools available to language and application designers on Parrot that PIR only really needs to be used in a small subset of areas. Eventually, enough tools might be created that PIR is never needed to be used directly.

PIR and High-Level Languages


PIR is designed to help implement higher-level languages such as Perl, TCL, Python, Ruby, and PHP. As we've discussed before, high-level languages (HLL) are related to PIR in two possible ways: 1. We write a compiler for the HLL using the language NQP and the Parrot Compiler Tools (PCT). This compiler is then converted to PIR, and then to Parrot bytecode. 2. We write code in the HLL and compile it. The compiler converts the code into a tree-like intermediate representation called PAST, to another representation called POST, and finally to PIR code. From here, the PIR can be interpreted directly, or else it can be further compiled to Parrot bytecode. PIR, therefore, has features that help to enable writing compilers, and it also has features that support the HLLs that are written using those compilers.

Comments
Similarly to Perl, PIR uses the "#" symbol to start comments. Comments run from the # until the end of the current line. PIR also allows the use of POD documentation in files. We'll talk about POD in more detail later.

Subroutines
Subroutines start with the .sub directive, and end with the .end directive. We can return values from a subroutine using the .return directive. Here is a short example of a function that takes no parameters and returns an approximation of :

Parrot Intermediate Representation .sub 'GetPi' $N0 = 3.14159 .return($N0) .end Notice that the subroutine name is written in single quotes. This isn't a requirement, but it's very helpful and should be done whenever possible. We'll discuss the reasons for this below.

22

Subroutine Calls
There are two methods to call a subroutine: Direct and Indirect. In a direct call, we call a specific subroutine by name: $N1 = 'GetPi'() In an indirect call, however, we call a subroutine using a string that contains the name of that subroutine: $S0 = 'GetPi' $N1 = $S0() The problem arises when we start to use named variables (which we will discuss in more detail below). Consider the following snippet where we have a local variable called "GetPi": GetPi = 'MyOtherFunction' $N0 = GetPi() In this snippet here, do we call the function "GetPi" (since we made the call GetPi()) or do we call the function "MyOtherFunction" (since the variable GetPi contains the value 'MyOtherFunction')? The short answer is that we would call the function "MyOtherFunction" because local variable names take precidence over function names in these situations. However, this is a little confusing, isn't it? To avoid this confusion, there are some standards that people use to make this easier:
$N0 = GetPi() $N0 = 'GetPi'() Used only for indirect calls Used for all direct calls

By sticking with this convention, we avoid all possible confusions later on.

Subroutine Parameters
Parameters to a subroutine can be declared using the .param directive. Here are some examples: .sub 'MySub' .param int myint .param string mystring .param num mynum .param pmc mypmc In a parameter declaration, the .param directives must be at the top of the function. You may not put comments or other code between the .sub and .param directives. Here is the same example above:

Parrot Intermediate Representation

23
.sub 'MySub' # These are my params: .param int myint .param string mystring .param num mynum .param pmc mypmc Wrong!

Named Parameters
Parameters that are passed in a strict order like we've seen above are called positional arguments. Positional arguments are differentiated from one another by their position in the function call. Putting positional arguments in a different order will produce different effects, or may cause errors. Parrot supports a second type of parameter, a named parameter. Instead of passing parameters by their position in the string, parameters are passed by name and can be in any order. Here's an example: .sub 'MySub' .param int yrs :named("age") .param string call :named("name") $S0 = "Hello " . call $S1 = "You are " . yrs $S1 = $S1 . " years old print $S0 print $S1 .end .sub main :main 'MySub'("age" => 42, "name" => "Bob") .end In the example above, we could have easily reversed the order too: .sub main :main 'MySub'("name" => "Bob", "age" => 42) .end

# Same!

Named arguments can be a big help because you don't have to worry about the exact order of variables, especially as argument lists get very long.

Optional Parameters
Functions may declare optional parameters, which the caller may or may not specify. To do this, we use the :optional and :opt_flag modifiers: .sub 'Foo' .param int bar :optional .param int has_bar :opt_flag In this example, the parameter has_bar will be set to 1 if bar was supplied by the caller, and will be 0 otherwise. Here is some example code that takes two numbers and adds them together. If the second argument is not supplied, the first number is doubled: .sub 'AddTogether' .param num x

Parrot Intermediate Representation .param num y :optional .param int has_y :opt_flag if has_y goto ive_got_y y = x ive_got_y: $N0 = x + y .return($N0) .end And we will call this function with 'AddTogether'(1.0, 1.5) 'AddTogether'(3.0) #returns 2.5 #returns 6.0

24

Slurpy Parameters
A subroutine can take any number of arguments, which can be loaded into an array. Parameters which can accept a variable number of input arguments are called :slurpy parameters. Slurpy arguments are loaded into an array PMC, and you can loop over them inside your function if you wish. Here is a short example: .sub 'PrintList' .param list :slurpy print list .end .sub 'PrintOne' .param item print item .end .sub main :main PrintList(1, 2, 3) # Prints "1 2 3" PrintOne(1, 2, 3) # Prints "1" .end Slurpy parameters absorb the remainder of all function arguments. Therefore, slurpy parameters should only be the last argument to a function. Any parameters after a slurpy parameter will never take any values, because all arguments passed for them will get absorbed by the slurpy parameter instead.

Flat Argument Arrays


If you have an array PMC that contains data for a function, you can pass in the array PMC. The array itself will become a single parameter which will be loaded into a single array PMC in the function. However, if you use the :flat keyword when calling a function with an array, till will pass each element of the array into a different parameter. Here is an example function: .sub 'ExampleFunction' .param pmc a .param pmc b .param pmc c

Parrot Intermediate Representation .param pmc d :slurpy We have an array called x that contains three Integer PMCs: [1, 2, 3]. Here are two examples:
Function Call 'ExampleFunction'(x, 4, 5) Parameters a = [1, 2, 3] b=4 c=5 d = [] 'ExampleFunction'(x :flat, 4, 5) a=1 b=2 c=3 d = [4, 5]

25

Variables
Local Variables
Local variables can be defined using the .local directive, using a similar syntax as is used with parameters: .local .local .local .local int myint string mystring num mynum pmc mypmc

In addition to local variables, in PIR you can use the registers for data storage as well.

Namespaces
Namespaces are constructs that allow the reuse of function and variable names without causing conflicts with previous incarnations. Namespaces are also used to keep the methods of a class together, without causing naming conflicts with functions of the same names in other namespaces. They are a valuable tool in promoting code reuse and decreasing naming pollution. In PIR, namespaces are specified with the .namespace directive. Namespaces may be nested using a key structure: .namespace ["Foo"] .namespace ["Foo";"Bar"] .namespace ["Foo";"Bar";"Baz"] The root namespace can be specified with an empty pair of brackets: .namespace [] .namespace #Right! Enters the root namespace #WRONG! Brackets are required!

Strings
Strings are a fundamental datatype in PIR, and are incredibly flexible. Strings can be specified as quoted literals, or as "Heredoc" literals in the code.

Heredocs
Heredoc string literals have become a common tool in modern programming languages to specify very long multi-line string literals. Perl programmers will be familiar with them, but so will most shell programmers and even modern .NET programmers too. Here is how a Heredoc works in PIR: $S0 = << "TAG"

Parrot Intermediate Representation This is part of the Heredoc string. Everything between the '<< "TAG"' is treated as a literal string constant. This string ends when the parser finds the end marker. TAG Heredocs allow long multi-line strings to be entered without having to use lots of messy quotes and concatenation operations.

26

Encodings and Charsets


Quoted string literals can be specified to be encoded in a specific characterset or encoding

File Includes
You can include an external PIR file into your current file using the .include directive. For example, if we wanted to include the file "MyLibrary.pir" into our current file, we would write: .include "MyLibrary.pir" Notice that the .include directive is a raw text-substitution function. A file of PIR code is not self contained the way you might expect from some other languages. For instance, one problem that occurs relatively commonly among new users is the concept of namespace overflow. Consider two files, A.pir and B.pir:
A.pir .namespace ["namespace 2"] B.pir .namespace ["namespace 1"] #here, we are in "namespace 1" .include "A.pir" #here we are in "namespace 2"

The .namespace directive from file A overflows into file B, which is counter intuitive for most programmers.

Classes and Methods


We'll devote a lot of time talking about classes and object-oriented programming later on in this book. However, since we've already talked about namespaces and subroutines a little bit, we can lay some ground work for those later discussions. A class in PIR consists of a namespace for that class, an initializer, a constructor, and a series of methods. A "method" is exactly the same as an ordinary subroutine except for three differences: 1. It has the :method flag 2. It is called using "dot notation": Object.Method() 3. The object that is used to call the method (on the left side of the dot) is stored in the "self" variable in the method. To create a class, we first need to create a namespace for that class. In the most simple classes, we create the methods. We will talk about initializers and constructors later, but for now we'll stick to a simple class that uses neither of these: .namespace ["MathConstants"] .sub 'GetPi' :method

Parrot Intermediate Representation $N0 = 3.14159 .return($N0) .end .sub 'GetE' :method $N0 = 2.71828 .return($N0) .end With this class (which we probably store in "MathConstants.pir" and include into our main file), we can write the following things: .local pmc mathconst mathconst = new 'MathConstants' $N0 = mathconst.'GetPi'() #$N0 contains the value 3.14159 $N1 = mathconst.'GetE'() #$N1 contains the value 2.71828 We'll explain more of the messy details later, but this should be enough to get you started.

27

Control Statements
PIR is a low-level language and so it doesn't support any of the high-level control structures that programmers may be used to. PIR supports two types of control structures: conditional and unconditional branches. Unconditional Branches are handled by the goto instruction. Conditional Branches use the goto command also, but accompany it with an if or unless statement. The jump is only taken if the if-condition is true or the unless-condition is false.

HLL Namespace
Each HLL compiler has a namespace that is the same as the name of that HLL. For instance, if we were programming a compiler for Perl, we would create the namespace .namespace ["Perl"]. If we are not writing a compiler, but instead writing a program in pure PIR, we would be in the default namespace .namespace ["Parrot"]. To create a new HLL compiler, we would use the .HLL directive to create the current default HLL namespace: .HLL "mylanguage", "mylanguage_group" Everything that is in the HLL namespace is visible to programs written in that HLL. For example, if we have a PIR function "Foo" that is in the "PHP" namespace, a program written in PHP can call the Foo function as if it were a regular PHP function. This may sound a little bit complicated. Here is a short example:
PIR Code .namespace ["perl6"] .sub 'AddTwo' .param int a .param int b $I0 = a + b .return($I0) .end $x = AddTwo(4 + 5); Perl 6 code

To simplify, we can write simply .namespace (without the brackets) to return to the current HLL namespace.

Parrot Intermediate Representation

28

Multimethods
Multimethods are groups of subroutines which share the same name. For instance, the subroutine "Add" might have different behavior depending on whether it is passed a Perl 5 Floating point value, a Parrot BigNum PMC, or a Lisp Ratio. Multiple dispatch subroutines are declared like any other subroutine in PIR, except they also have the :multi flag. When a Multi is invoked, Parrot loads the MultiSub PMC object with the same name, and starts to compare parameters. Whichever subroutine has the best match to the accepted parameter list gets invoked. The "best match" routine is relatively advanced. Parrot uses a Manhattan distance to order subroutines by their closeness to the given list, and then invokes the subroutine at the top of the list. When sorting, Parrot takes into account roles and multiple inheritance. This makes it incredibly powerful and versatile.

MultiMethods, MultiSubs, and other key words


The vocabulary on this page might start to get a little bit complicated. Here, we will list a few terms which are used to describe things in Parrot. Subroutine A basic block of code with a name and a parameter list. Method A basic block of code which belongs to a particular class and can be called on an object of that class. Methods are just subroutines with an extra implicit self parameter. Multi Dispatch Where multiple subroutines have the same name, and Parrot selects the best one to invoke. Single Dispatch Where there is only one subroutine with the given name, and Parrot does not need to do any fancy sorting or selecting. MultiSub a PMC type that stores a collection of subroutines which can be invoked by name and sorted/searched by Parrot. MultiMethod Same as a MultiSub, except it is called as a method instead of a subroutine.

PIR Macros and Constants


PIR allows a text-replacement macro functionality, similar in concept (but not in implementation) to those used by C's preprocessor. PIR does not have preprocessor directives that support conditional compilation.

Macro Constants
Constant values can be defined with the .macro_const keyword. Here is an example: .macro_const PI 3.14 .sub main :main print .PI .end

#Prints "3.14"

Parrot Intermediate Representation A .macro_const can be an integer constant, a floating point constant, a string literal, or a register name. Here's another example: .macro_const MyReg S0 .macro_const HelloMessage "hello world!" .sub main :main .MyReg = .HelloMessage print .MyReg .end This allows you to give names to common constants, strings, or registers.

29

Macros
Basic text-substitution macros can be created using the .macro and .endm keywords to mark the start and end of the macro respectively. Here is a quick example: .macro SayHello print "Hello!" .endm .sub main :main .SayHello .SayHello .SayHello .end This example, as should be obvious, prints out the word "Hello!" three times. We can also give our macros parameters, to be included in the text substitution: .macro CircleCircumference(r) $N0 = r * 3.1.4 $N0 = $N0 * 2 print $N0 .endm .sub main :main .CircleCircumference(5) .CircleCircumference(10) .end

Macro Local Variables


What if we want to define a temporary variable inside the macro? Here's an idea: .macro PrintSomething .local string something something = "This is a message" print something .endm

Parrot Intermediate Representation .sub main :main .PrintSomething .PrintSomething .end After we do the text substitution, we get this: .sub main :main .local string something something = "This is a message" print something .local string something something = "This is a message" print something .end After the substitution, we've declared the variable something twice! Instead of that, we can use the .macro_local declaration to create a variable with a unique name that is local to the macro: .macro PrintSomething .macro_local something something = "This is a message" print something Now, the same function translates to this after the text substitution: .sub main :main .local string main_PrintSomething_something_1 main_PrintSomething_something_1 = "This is a message" print main_PrintSomething_something_1 .local string main_PrintSomething_something_2 main_PrintSomething_something_2 = "This is a message" print main_PrintSomething_something_2 .end Notice how the local variable declarations now are unique? They depend on the name of the parameter, the name of the macro, and other information from the file? This is a reusable approach that doesn't cause any problems.

30

Resources
http://docs.parrot.org/parrot/latest/html/docs/pdds/pdd19_pir.pod.html

Parrot Magic Cookies

31

Parrot Magic Cookies


Polymorphic Containers (PMCs)
Polymorphic Containers (PMCs) -- which were previously known as 'Parrot Magic Cookies' -- are one of the fundamental data types of Parrot, and are one of the most powerful and flexible data types available. A PMC is very much like a class object, with data storage and associated class methods. PMCs include all aggregate data types including arrays, associative arrays (Hashes), Exceptions, Structures, and Objects. Parrot comes with a core set of PMCs, but new PMCs can be added for use with specific programs or languages. PMCs are written in a C-like language that we will call "PMC Script" and compiled. PMCs can be built-in to Parrot directly, or they can be written separately and loaded in later. PMCs which are loaded at runtime are called "dynamic PMCs", or DYNPMCs for short.

Writing PMCs in C
PMC definitions are written in a C-like language that is translated to C code using a special PMC compiler program called pmc2c.pl. Once converted to C code, the PMCs are included in the Parrot build process.

The PMC Compiler


The PMC Compiler, pmc2c.pl has a number of tasks to perform. It converts the PMC into legal C syntax, inserts the function names in the appropriate tables, and exports information about the PMC and its methods to the rest of the Parrot system.

PMC Script
The script language used to write a PMC is based on C. In fact, it's mostly C with a few additional keywords and constructs. The PMC compiler converts PMC files into C code for compilation. All standard ANSI C 89 code is acceptable for use in PMC files. Here we will list some of the additions.

PMC Class Definition


All the methods and vtables of the PMC must be enclosed in a PMC class declaration: pmclass NAME { } In addition to just giving the name of the PMC, you can specify single-inheritance too: pmclass NAME is SUPERNAME { } Where SUPERNAME is the name of the parent PMC class. In your PMC vtable methods you can use the SUPER keyword to access the vtable methods of the parent class. You can also allocate an additional storage area called a PMC_EXT using the needs_ext keyword. PMC_EXT is an additional structure that can be allocated to help with special operations, such as sharing between multiple interpreters. If the PMC is not automatically thread safe, you should add a PMC_EXT.

Parrot Magic Cookies

32

Specifier is SUPERNAME need_ext abstract no_init Specifies the parent class, if any Needs a PMC_EXT for special handling The class is abstract and cannot be instantiated

Meaning

The PMC does not have an init vtable method for Parrot to call. Normally, Parrot calls the init method when the PMC is first created. if you don't need that, use no_init. INTERFACE is one of the standard interfaces, and the PMC can be used as if it were an object of that type. The interfaces are "array", "hash"

provides INTERFACE

Helper Functions
Like ordinary C, you can define addtional functions to help with your calculations. These functions should be written in ordinary C (without any special keywords or values) and should be defined outside of the C<pmclass> definition.

Defining PMC Attributes


PMCs can be given a custom set of data field attributes using the ATTR keyword. ATTR allows the PMC to be extended to contain custom data structures that are automatically managed by Parrot's memory subsystem. Here's an example: pmclass Foo { ATTR INTVAL bar; ATTR PMC baz; ... } The attributes are stored in a custom data structure that can be accessed using a macro with the same name as the PMC, but all upper-case: Parrot_Foo_attributes * attrs = PARROT_FOO(SELF); attrs->bar = 7; /* it's an INTVAL */ attrs->baz = pmc_new( ... ) /* It's a PMC */ Notice how the type name of the attributes structure is Parrot_, followed by the name of the PMC with the same capitalzation as is used in the pmclass definition, followed by _attributes. The macro to return this structure is PARROT_ followed by the name of the PMC in all caps.

VTABLEs and Methods


Note: The VTABLE interface, and the specific functions in a vtable are subject to change before the Parrot 1.0 release. PMCs can supply definitions for any number of VTABLE interfaces. Any interfaces not defined will fall back to a default implementation which throws an error. VTABLE interfaces must all follow a pre-defined format, and attemping to define a VTABLE interface that is not one of the normal interfaces or does not use the same parameter list and return value as the normal interfaces will throw an error. The parameters for all VTABLE and METHOD declarations may be either INTVAL, FLOATVAL, STRING, or PMC, as these are the only values which can be passed from PIR code. VTABLE Interfaces are defined with the VTABLE keyword, and Methods on the PMC can be defined with the METHOD keyword.

Parrot Magic Cookies

33

VTABLES
All PMCs have a standard API, an interface that they share in common with all other PMCs. This standard interface is called a VTABLE. A VTABLE is a list of about 150 standard functions, called "VTABLE Interfaces" that implement basic, common, behavior for PMCs. All PMCs implement all these interfaces, although if one is not explicitly provided it can inherit from a parent PMC class, or it can default to throwing an exception. VTABLE methods can be defined in one of two ways, in the .pmc using the C-like PMC language, or in PIR using the :vtable function qualifier. VTABLEs correspond to some basic operations that can be performed on any object, such as arithmetic, class operations, casting operations (to INTVAL, FLOATVAL, STRING, or PMC), and other common operations. Regardless of how the VTABLE method is defined, they must have very specific names.

Writing VTABLE Interfaces


VTABLE functions all have fixed names and parameter lists. When implementing a new VTABLE method, you must strictly conform to this, or there could be several compilation errors and warnings. For a list of all vtable methods and their expected function signatures, you can check out the header file /include/parrot/vtables.h. Inside a VTABLE method there are several available keywords that can be used: SELF the current PMC INTERP the parrot interpreter SUPER The parent PMC class. You can also reference other methods or vtable methods of the current PMC using a standard dot notation such as: SELF.VTABLE_OR_METHOD_NAME() If you want to default all or part of your processing to the super class (if you have a superclass), you can use the SUPER() function to do that. Any vtable method that you do not implement will be automatically defaulted to the super class (if any) or to te default parent class.

Methods
In addition to VTABLEs, a PMC may supply a series of custom interface functions called methods to supply additional functionality. Notice that methods are not integrated into the PIR operators or PASM opcodes in the same way that VTABLE methods are. Methods can be written in the C-like PMC script for individual PMCs, or they can be written in PIR for user-defined PMC subclasses.

Invoking Methods
Once a method has been defined, it can be accessed in a PMC file using the PCCINVOKE command.

Parrot Magic Cookies

34

VTABLE List
A complete list of all vtable methods is located in the appendix.

Resources
Built-In PMC Appendix http://www.parrotcode.org/docs/pdd/pdd04_datatypes.html http://www.parrotcode.org/docs/pdd/pdd17_pmc.html http://www.parrotcode.org/docs/pmc2c.html http://www.parrotcode.org/docs/pmc.html

Multithreading and Concurrency


Multithreading Requirements
Multithreading is an area where interpreted languages such as Perl 5 have had some problems in the past. Perl 5, as a prime example, had a very buggy and non-robust implementation of threads that was not scalable. In contrast to Perl 5, Perl 6 is including advanced multithreading and concurrency features in the core language. To support all these advanced features of the Perl 6 language, Parrot must provide those same features as well.

Coroutines
While not technically "multithreading", coroutines represent a simple form of concurrency that will find many interesting uses as they catch on with mainstream programmers. Coroutines are like subroutines which use a yield statement instead of a return statement. The .yield statement causes the coroutine to return a value to the caller. However, .yield does not cause the coroutine to exit, disappear, or lose it's current state. The next time the caller calls the coroutine, the coroutine will pick up where it left off the last time. Here is an example: .sub 'MyCoroutine' .yield(1) .yield(2) .return(3) .end .sub 'MyCaller' $I0 = 'MyCoroutine'() say($I0) $I0 = 'MyCoroutine'() say($I0) $I0 = 'MyCoroutine'() say($I0) .return The output of the function MyCaller will be: 1 2 3

Multithreading and Concurrency Coroutines are immediately useful in places where a persistent state needs to be maintained, but where it might be difficult to coordinate that state between multiple callers, or multiple threads. One such example is in file or database access where a coroutine can serialize accesses, and maintain persistent state information about the size of the file and locations of certain features in the file. Coroutines are also called "continuations" in other places, or even "continuation sandwich" in others.

35

Coroutine Parameters
Parameters to a coroutine are only passed the first time the coroutine is called, or any time the coroutine is called after having a .return statement. Here is an example: .sub 'MyCoroutine' .param int a .yield(a) .yield(a) .yield(a) .return(a) .end .sub 'main' :main $I0 = 'MyCoroutine'(1) say $I0 $I0 = 'MyCoroutine'(2) say $I0 $I0 = 'MyCoroutine'(3) say $I0 $I0 = 'MyCoroutine'(4) say $I0 .end

# the "call" # "continuation" # "continuation" # "continuation"

This code will print the following result, even though the parameter to the MyCoroutine function changes with each call: 1 1 1 1 This is because only the first call to MyCoroutine is actually a function call that creates the local parameters. The other calls to MyCoroutine are simply continuations, not calls. A continuation does not create new parameter variables.

Multithreading and Concurrency

36

Returning and Yielding


A return call in a coroutine causes it to return a value to the caller and to destroy it's current lexical scope, as usual. A yield call, however, will return a value without destroying the current lexical scope. Yielding then will allow the coroutine to maintain it's current state and resume it's execution the next time it is called.

Continuation and Coroutine PMCs


This is where the heart of the coroutine system is located, the coroutine PMC. The coroutine PMC is an object that stores the state of the coroutine so that it can be called multiple times in a row. A continuation PMC, which is a superclass of the coroutine, is a stored interpreter state. The coroutine operates by storing a continuation when a .yield directive is called, and invoking that continuation when the coroutine is called again.

Threads
Internally, Parrot supports multiple different methods to perform threading. Luckily, all of these different methods are abstracted and the details are hidden from the HLL programmer. Parrot's concurrency system is modular, so new multithreading technologies can be added to Parrot later, and HLL programmers will be able to benefit from these changes and additions without having to make any changes to their code.

Managing Threads
The scheduler is an instance of the Scheduler PMC type. However, you don't use the scheduler PMC directly, Parrot uses it for you. There are four basic operations that can be used to create and manage a new thread. In Parrot, threads are all abstracted in the Task PMC. To create a new thread, you first create a new Task PMC, and then you add it to the scheduler.

Resources
http://www.parrotcode.org/docs/pdd/pdd25_concurrency.html Synopsis 17

Exception Handling

37

Exception Handling
Exceptions
Exceptions are errors that are caused in a program. However, unlike an ordinary error which causes the program to terminate unexpectedly, exceptions are controlled and can be recovered from without having to restart the program. In Parrot, exceptions are objects. This means you create an exception using the new keyword, and manipulate an exception using methods on that object. Before we discuss exceptions any further, we need to discuss some terminology. Readers who are familar with exceptions in other programming languages can probably skip these definitions. Throw To throw an exception means to create an exception object. Once an exception has been created, the system enters a kind of "panic state" where it attempts to fix the exception. If the exception cannot be fixed, the program terminates. Raise To raise an exception is the same as to throw one. Handler A handler is a routine which can fix an exception. When an exception is raised and the system enters panic mode, it looks for a handler. If there is a handler available, the exception is sent to that handler. If no handlers are available to handle the exception, the system will terminate. Catch A handler that receives an exception object is said to "catch" it. Whenever an exception is thrown, a hander should be available to catch it. Again, if no handlers are available to catch an exception, Parrot will exit. Rethrow All handlers are not equipped to handle all exceptions. If a handler catches an exception that it cannot fix, it can optionally rethrow that exception. Rethrowing causes the system to search for a different handler.

Creating an Exception
Creating a new exception object in PIR is deceptively simple: $P0 = new 'Exception' Exceptions are hash-like, which means they have named fields. One such field is the '_message' field, which contains the name of the exception. Exception handlers will check the name to determine if they can handle a particular exception, or if it needs to be rethrown.

Exception Handling

38

Creating a Handler
In Parrot, a handler is a label that the system jumps to in the event of an exception. These labels are stored in a stack structure. The exception handler on the top receives the instruction, but if it rethrows, the exception will be propagated down through the exception stack until it is finally handled.

Resources
http://www.parrotcode.org/docs/pdd/pdd23_exceptions.html

Classes and Objects


Classes and Objects
We briefly discussed some class and object PIR code earlier, and in this chapter we are going to go into more detail about it. As we mentioned before, classes have 4 basic components: A namespace, an initializer, a constructor, and methods. A namespace is important because it tells the virtual machine where to look for the methods of the object when they are called. If I have an object of class "Foo", and I call the "Bar" method on it: .local pmc myobject myobject = new "Foo" myobject.'Bar'() The virtual machine will see that myobject is a PMC object of type Foo, and then will look for the method 'Bar' in the namespace 'Foo'. In short, the namespace helps to keep everything together.

Initializers
An initializer is a function that is called at the beginning of the program to set up the class. PIR doesn't have a syntax for declaring information about the class directly, you have to use a series of opcodes and statements to tell Parrot what your class looks like. This means that you need to create the various data fields in your class (called "attributes" here), and set up relationships with other classes. Initializer functions tend to follow this format: .namespace .sub 'onload' :anon :init :load .end The :anon flag means that the name of the function will not be stored in the namespace, so you don't end up with all sorts of name pollution. Of course, if the name of the function isn't stored, it can be difficult to make additional calls to this function, although that doesn't matter if we only want to call it once. The :init flag causes the function to run as soon as parrot initializes the file, and the :load flag causes the function to run as soon as the file is loaded, if it is loaded as an external library. In short: We want this function to run as soon as possible and we only want it to run once. Notice also that we want the initializer to be declared in the HLL namespace.

Classes and Objects

39

Making a New Class


We can make a new class with the keyword newclass. To create a class called "MyClass", we would write an initializer that does the following: .sub 'initmyclass' :init :load :anon newclass $P0, 'MyClass' .end Also, we can simplify this using PIR syntax: .sub 'initmyclass' :init :load :anon $P0 = newclass 'MyClass .end In the initialzer, the register $P0 contains a reference to the class object. Any changes or additions that we want to make to the class need to be made to this class reference variable.

Creating new class objects


Once we have a class object, the output of the newclass opcode, we can create or "instantiate" objects of that class. We do this with the new keyword: .local PMC myobject myobject = new $P0 Or, if we know the name of the class, we can write: .local PMC myobject myobject = new 'MyClass'

Subclassing
We can set up a subclass/superclass relationship using the subclass command. For instance, if we want to create a class that is a subclass of the builtin PMC type "ResizablePMCArray", and if we want to call this subclass "List", we would write: .sub 'onload' :anon :load :init subclass $P0, "ResizablePMCArray", "List" .end This creates a class called "List" which is a subclass of the "ResizablePMCArray" class. Notice that like the newclass instruction above, we store a reference to the class in the PMC register $P0. We'll use this reference to modify the class in the sections below.

Classes and Objects

40

Adding Attributes
Attributes can be added to the class by using the add_attribute keyword with the class reference that we received from the newclass or subclass keywords. Here, we create a new class 'MyClass', and add two data fields to it: 'name' and 'value': .sub 'initmyclass' :init :load :anon newclass $P0, 'MyClass' add_attribute $P0, 'name' add_attribute $P0, 'value' .end We'll talk about accessing these attributes below.

Methods
Methods, as we mentioned earlier, have three major differences from subroutines: The way they are flagged, the way they are called, and the fact that they have a special self variable. We know already that methods should use the :method flag. :method indicates to Parrot that the other two differences (dot-based calling convention and "self" variable) need to be implemented for the method. Some methods will also use the :vtable flag as well, and we will discuss that below. We want to create a class for a stack class. The stack has "push" and "pop" methods. Luckily, Parrot has push and pop instructions available that can operate on array-like PMCs (like the "ResizablePMCArray" PMC class). However, we need to wrap these PIR instructions into functions or methods so that they can be used from our high-level language (HLL). Here is how we can do that: .namespace .sub 'onload' :anon :load :init subclass $P0, "ResizeablePMCArray", "Stack" .end .namespace ["Stack"] .sub 'push' :method .param pmc arg push self, arg .end .sub 'pop' :method pop $P0, self .return($P0) .end Now, if we had a language compiler for Java on Parrot, we could write something similar to this: Stack mystack = new Stack(); mystack.push(5); System.out.println(mystack.pop());

Classes and Objects The example above would print the value "5" at the end. If we look at the same example in a language like Perl 5, we would have: my $stack = Stack::new(); $stack->push(5); print $stack->pop(); This, again, would print out the number "5".

41

Accessing Attributes
If our class has attributes, we can use the setattribute and getattribute instructions to write and read those attributes, respectively. If we have a class 'MyClass' with data attributes 'name' and 'value', we can write accessors and setter methods for these: .sub 'set_name' :method .param pmc newname $S0 = 'name' setattribute self, $S0, newname .end .sub 'set_data' :method .param pmc newdata $S0 = 'data' setattribute self, $S0, newdata .end .sub 'get_name' :method $S0 = 'name' $P0 = getattribute self, $S0 .return($P0) .end .sub 'get_value' :method $S0 = 'value' $P0 = getattribute self, $S0 .return($P0) .end

Classes and Objects

42

Constructors
The constructor is the function that we call when we use the new keyword. The constructor initializes the data object attributes, and maybe performs some other bookkeeping tasks as well. A constructor must be a method named 'new'. Besides the special name, the constructor is like any other method, and can get or set attributes on the self variable as needed.

Resources
http://www.parrotcode.org/docs/pdd/pdd15_objects.html http://www.parrotcode.org/docs/pdd/pdd21_namespaces.html

The Parrot Debugger


The Parrot Debugger
Parrot's debugger used to be a separate program that was not always built by default. However, work is under way to add a debugging runloop directly to the Parrot executable itself. As work progresses on this project, more information will be included here.

Resources
http://www.parrotcode.org/docs/debug.html http://www.parrotcode.org/docs/debugger.html

43

Parrot Compiler Tools


Parrot Compiler Tools
Parrot Compiler Tools
The first section of this book covered some of the basics of the Parrot platform, and the various features that Parrot provides for use with other high level languages. It is important to notice that Parrot provides more features and capabilities than most individual languages require. This is because Parrot aims to be a platform to support multiple high-level dynamic programming languages, each of which have diverse feature sets. Some of the most recent versions of these programming languages, such as Perl 6 and Python 3000 have very interesting feature sets planned that cannot be supported well by any other existing interpreter or virtual machine platform. PIR and Parrot programming so far has been relatively low-level, but the goal of Parrot is to support high-level languages. To facilitate this goal, Parrot provides tools that compiler-designers can use to quickly and easily create the advanced language features that next-generation languages like Python 3000 and Perl 6 need. These are, collectively, known as the Parrot Compiler Tools (PCT). The PCT are a set of tools that people can use to quickly and easily implement new programming languages on the Parrot platform. We will talk about them in this chapter and some of the following chapters too.

Parsing and Compiling: How Parrot Works


Parrot is designed to be a highly modular system. This means that many components can be interchanged as needed. Some of these changes need to be specified at compile time, but others can be performed at runtime. Inputting a program to Parrot goes through multiple steps. Here is a brief overview of these: Parser and Lexer The first stage to Parrot is the Parser and Lexer (Lexer is short for "Lexical Analyzer"). We will discuss the operations of these components in more detail later in this chapter, and in future chapters. The parser and lexer read input code in PIR or PASM and convert it into a data representation called an abstract syntax tree (AST). An AST is a way to represent program instructions in a way that is very easy for a computer to work with. Compiler The compiler unit converts information in the AST into Parrot Bytecode format. Bytecode is a set of instructions in binary machine language. From here, Parrot can execute the bytecode directly, or it can save the bytecode to disk and execute it later. Optimizer The optimizer takes the generated bytecode and attempts to make it smaller, faster, and more efficient. Bytecode that has been properly optimized will typically execute faster then non-optimized bytecode. JIT Compiler Short for "Just In Time", the JIT compiler attempts to convert Parrot bytecode into native machine code. This will typically bring large speed increases, but is highly platform-dependent and does not yet work on any systems. Interpreter

Parrot Compiler Tools Once a program has been converted into bytecode, that bytecode is loaded into the interpreter where it is executed. This is just a very brief overview of these components, we will discuss them in more detail in later chapters. It is worth nothing here, however, that many of these components are modular and can be swapped out if you would like to use a different one. For instance, if you already have a parser written for a particular language, instead of having to rewrite the parser using PCT, you can load your existing parser into Parrot. Of course, you will probably need to make modifications to ensure that your custom parser outputs a proper AST, but that's a small price to pay to avoid having to completely rewrite your language parser from the ground up.

44

PCT Design Process


PCT includes a number of tools and design steps necessary to create a compiler for a new programming language. Here is a brief look at some of the steps required to create a new compiler: 1. 2. 3. 4. Create a language shell Create a Grammar file Create a grammar actions file Create necessary classes, built-in functions, and PMCs

5. Create the driver program Once you have your compiler, you can use it to run programs written in your high-level language. Here are some steps involved in running your compiler: 1. 2. 3. 4. Compile your grammar into a Parrot Abstract Syntax Tree (PAST) Compile the PAST into a Parrot Optimized Syntax Tree (POST) Compile the POST into Parrot Bytecode (PBC) or PIR Run the PBC or PIR on Parrot

This should give you a rough idea of what needs to be done to create a compiler, and how a compiler operates. We'll elaborate on each of these steps in this chapter and the next few chapters in this section.

Creating a Language Shell


A new language shell has a number of components. There are the grammar and action files that we've mentioned, but you also need a driver program to create the HLLCompiler object and start the compilation. Also, if you want to have any built-in functions or classes, you will need to write them. To simplify the whole process, you will want to have a makefile to handle all the build steps for writing your language. Luckily, there is a tool available to simplify this process, mk_language_shell.pl. mk_language_shell.pl is a Perl 5 program that creates all the necessary files for creating a new language compiler, and fills those files with some helpful default code. It is located, from the Parrot root folder in the tools/dev/ folder. To run this program from your shell, go to the Parrot root folder and type: tools/dev/mk_language_shell.pl <LANGUAGE_NAME> <PATH> Here, <LANGUAGE_NAME> is the name of your new language, and <PATH> is the directory where you want it to be stored. By convention, all language projects are stored in the languages/ directory. Using this directory makes it more easy for other build tools to find it. For example, if we wanted to create a new language called "mylanguage", we could write tools/dev/mk_language_shell.pl mylanguage languages/mylanguage This will create all the necessary files, including a makefile for your language project. Notice that many of these default files, including the makefile, will need to be edited or modified as time goes on. You may want to, as

Parrot Compiler Tools practice, open the makefile and see how things are being built. If you've never seen a makefile before, this is your opportunity to learn about what they are and how they work.

45

Grammars and Actions


Note: PGE actually is the Perl 6 grammar engine. The Perl 6 compiler calls PGE to handle grammar rules in Perl 6 source code. Grammars, typically files with a ".pg" file extension, are compiled using the Parrot Grammar Engine (PGE). PGE is an implementation of the Perl 6 rules engine for Parrot. PGE uses a Recursive Descent parser, although certain components such as expressions can be parsed using a bottom-up parser for efficiency. If you have read the book on Compiler Construction this should make some sense to you. If not, the details about the parser are not particularly important at this point. Unfortunately, there is a little bit of terminology that we need to cover before we can go any further into this. People who are familiar with grammars and parsers can skip this section. Everybody else should try to read through it because it's valuable and pertinent information. A tool called a lexical analyzer reads the input file and converts chunks of text into things called "tokens". Tokens are then arranged into particular patterns called "rules" by the parser. When a rule is successfully applied to a set of input tokens, the rule is said to "match" the input. Think of a token as a word in a sentence. Alone, a single word might not have much meaning. But if you put multiple words together into a sentence, the intended meaning becomes clear. A parser takes a group of tokens together and tries to form them into a "sentence", or a known pattern. If a valid pattern of tokens is found, the parser succeeds. At each step of the parsing process, the parser receives a token from the lexical analyzer. If the parser has enough tokens to make a valid pattern, it succeeds. If it doesn't have enough information to form a valid pattern, it requests the next token and tries again. Large patterns are divided up into smaller patterns. Tokens are combined together into small patterns, and small patterns are combined together into larger tokens. Eventually, the whole code file is reduced to a single pattern and the parser exits. At each step the parser may optionally perform an action using information in the token. The parser will associate particular actions with different token types. The action performed on an open-parenthesis token is not going to be the same as the action performed on a close-parenthesis token. In the case of PGE, actions are functions, typically written in PIR or NQP, that create a PAST node. PAST nodes are stored into a large tree that represents the input. This is called the parse tree. When the parser reaches its final match and succeeds, the parse tree is passed to the next stages of the toolkit for processing and eventual conversion into Parrot bytecode. Implementing a new language on Parrot, as we mentioned earlier, is broken into a number of parts: Write a grammar file using Perl 6 Grammar rules Write a grammar actions file using NQP Write a driver program in PIR Write built-in functions, classes, and PMCs, using PIR (or C, for the PMCs)

Once you create your language shell, all of these files will be produced for you. All you need to do is fill in your grammar and actions into the necessary files, write the rest of the necessary built-in code, and you should have a working compiler. Once you have modified these files to do what you need them to, there is an additional optional step that you should take: Write a series of test modules to verify that your language operates properly. We will discuss testing and test harnesses later. We will discuss writing parsers and action files in the next few chapters.

Parrot Compiler Tools

46

The Driver Program


The driver program, which is the main entry point to your compiler has a number of tasks to perform. The first and most important job for your driver program is to create a compiler object for the high-level language in question, and pass the command-line arguments to that compiler object. The compiler object is an HLLCompiler object, and the HLLCompiler class contains all the necessary methods for parsing command-line arguments and initializing the compiler. For more information about the HLLCompiler class, see the Appendix. A driver program has a number of tasks. Here they are, in no particular order: 1. Specify a :main function, which starts the program 2. Create an HLLCompiler object for the given High-Level Language(HLL) 3. Specify any additional details to the HLLCompiler object, to change the operation of the compiler prior to the parsing stage. 4. Include the necessary libraries of classes and built-in functions that the language needs to operate. For most language, this will include at least one library loading routine capable of loading additional libraries into Parrot for use with the HLL and programs written in it. 5. Declare any global variables that will be used with the parser, or will be used by HLL programs. In addition to these, there may be other tasks which the language designer might wish to perform inside the main driver program.

Getting Help
When you are writing your new language compiler, there are a number of places that you can go to get help. The Parrot repository contains all the current Parrot documentation, in POD format. Perl 5 programmers will be familiar with POD, but other users might not be. POD is a simple documentation format that is treated like multi-line comments in Perl code. Special programs like pod2html can be used to convert POD files into other file types for presentation, such as HTML. There are many languages in the languages/ directory. If you are trying to implement a particular feature for your language, chances are good that you can find an existing example of how another language has implemented that feature. One excellent tool to use, especially when you are constructing PAST node trees, or writing functions in PIR is the --target= directive to Parrot. This directive lets you specify an output dump format. For instance, if you go to the languages/perl6/ directory, you can type the following ../../parrot perl6.pbc --target=pir This command will output the PIR of any Perl 6 instructions that you type in. These options work for Parrot, so all the languages will use them, not just Perl 6. Here are some of the other targets you may want to try: pir: prints out the result PIR from the code pasm: Prints out the result PASM code past: Prints out the past node tree that is generated from the code parse: prints out a parse tree of the code

Try all these, and see what kinds of results you get using different languages. If you have looked for help in the POD documentation and in the existing code examples, it might be time to find a real human to ask. Parrot developers and enthusiasts congregate in the #parrot [1] (irc.perl.org) chatroom. Perl 6 developers and enthusiasts congregate in the #perl6 [2] (freenode) chatroom. Other resources and methods of contact are available at http://www.parrotcode.org/resources.html

Parrot Compiler Tools

47

Resources
http://www.parrotcode.org/docs/compiler_faq.html

References
[1] irc:/ / irc. perl. org/ parrot [2] irc:/ / irc. freenode. net/ perl6

Parrot Grammar Engine


Parrot Grammar Engine
The Parrot Grammar Engine (PGE) is an implementation of Perl 6's regular expression syntax on Parrot. The Perl 6 regular expression syntax is much more powerful and expressive than the regular expression syntax in Perl 5. Since regular expression implementations in many other languages are based on the Perl 5 implementation, Perl 6 is more powerful than those too. The Perl 6 regular expression and grammar syntax is very similar to parser generators like the Perl 5 module "Parse::RecDescent", or the parser-generator programs Yacc and Bison. Unlike Yacc/Bison, however, the Perl 6 grammar engine uses a recursive descent algorithm instead of an LALR one. The differences are not important to the casual reader, but we will go into some more details about them later. Perl 6 grammars are combinations of lexical analyzers (which break the source code into tokens) and parsers (which combine tokens into patterns). The lexical analysis portion of the grammar is the regular expression engine. The parser portion of the grammar engine is the rules system that we will describe here.

Parsers and Terminology


Before we talk about PGE any further, we need to discuss some more terminology, and we also need to take a quick look at the algorithms that parsers use. By understanding the algorithms and the limitations of those algorithms, you the compiler designer should be able to employ these tools more optimally. Input to a compiler is a source code file, written in a particular high-level language, which consists of various components. Consider the following statement in the C programming language: int x = add_both(5, 4); This statement needs to be broken up into smaller chunks which represent the individual components. A chunk, as we have already seen, is called a "token". The regular expression engine, the lexical analyzer, is used to break the input into tokens by iteratively matching the input against known regular expressions. Using this method, the statement above can be broken down into the following tokens: KEYWORD IDENTIFIER OPERATOR IDENTIFIER PAREN INTEGER OPERATOR INTEGER PAREN "int" "x" "=" "add_both" "(" "5" "," "4" ")" Notice that the whitespace has been removed from the list, so only the "important" stuff is turned into a token. Each token has two important pieces of information: The type ("KEYWORD", "IDENTIFIER", etc) and a value ("int", "x", etc). The parser can arrange the token types into a pattern, and the actions can use the token values to construct the PAST parse tree. An ordered group of tokens is called a string if it matches a given input pattern for the parser. The set of all strings which are valid for a compiler is called the language. The rules that specify this language is called a grammar. The third part of a compiler is the back end. The back end, also called the target code generator, turns the intermediate data representation of the input source (the parse tree) and converts it into the target language. In our

Parrot Grammar Engine case, the target language is PIR or PBC. All compilers are formed from these three key components: The lexical analyzer, the parser, and the back end. Parrot provides regular expressions for the lexical analyzer, and it handles all the target language generation. The only thing you, as a compiler designer, need to design is the parser. However, there is a lot of information about parsers that we need to know first. There are two methods to parse a string of input tokens. First, we can do a top-down parse where we start with our highest-level token and try to find a match. Second, we can do a bottom-up parse where we start with the small tokens and try to combine them into bigger and bigger tokens until we have the highest-level. The ultimate goal of both parsing methods is to reduce the string of input tokens into a single match token.

48

Bottom-Up Parse
note: Parser-generators such as yacc or Bison produce bottom-up parsers. As a simple bottom-up parse example, we will start at the left side of our string of tokens and try to combine small tokens into bigger ones. What we want, in the end, is a token that represents the entire input. We do this by utilizing two fundamental operations: shift and reduce. A shift operation means we read a new input token into the parser. A reduce operation means we turn a matched pattern of small tokens into a single larger token. Let's put this method into practice below. KEYWORD IDENTIFIER OPERATOR IDENTIFIER PAREN INTEGER OPERATOR INTEGER PAREN "int" "x" "=" "add_both" "(" "5" "," "4" ")" The above string turns into the string below if we realize that the tokens "int" and "x" are a variable declaration. We reduce the first two tokens into a variable declaration token: DECLARATION OPERATOR IDENTIFIER PAREN INTEGER OPERATOR INTEGER PAREN "=" "add_both" "(" "5" "," "4" ")" Now, we move through the line, shifting tokens into the parser from left to right. We can't see anything that we can reduce until we reach the parenthesis. We can reduce the open and close parenthesis, and all the tokens in the middle of them into an argument list as shown below: DECLARATION OPERATOR IDENTIFIER ARGUMENT_LIST "=" "add_both" Now, we know that when we have an identifier followed by an argument list, that it is a function call. We reduce these two tokens to a single function call token: DECLARATION OPERATOR FUNCTION_CALL "=" And finally, by skipping a few steps, we can convert this into an assignment statement: ASSIGNMENT_STATEMENT Every time we reduce a set of small tokens into a bigger token, we add them to the tree. We add the small tokens as children of the bigger token. This type of parser that we are talking about here is called a "shift-reduce" parser because those are the actions the parser performs. A subset of shift-reduce parsers that are useful for arithmetic expressions is called an operator precedence parser, and is one that we will talk about more below. Note Shift-reduce parsers are prone to a certain type of error called a shift-reduce conflict. A shift-reduce conflict is caused when a new input token can cause the parser to either shift or reduce. In other words, the parser has

Parrot Grammar Engine more then one option for the input token. A grammar that contains possible shift-reduce conflicts is called an ambiguous grammar. While there are ways to correct such a conflict, it is often better to redesign the language grammar to avoid them entirely. For more information on this, see the Resources section at the bottom of the page.

49

Top-Down Parse
A top-down parser is a little bit different from a bottom-up parser. We start with the highest level token, and we try to make a match by testing for smaller patterns. This process can be inherently inefficient, because we must often test many different patterns before we can find one that matches. However, we gain an ability to avoid shift-reduce conflicts and also gain a certain robustness because our parser is attempting multiple options before giving up. Let's say that we have a string of tokens, and a top-level definition for a STATEMENT token. Here is the definition in a format called a context-free grammar: STATEMENT := ASSIGNMENT | FUNCTION_CALL | DECLARATION This is a simple example, of course, and certainly not enough to satisfy a language like C. The ":=" symbol is analogous to the words "is made of". The vertical bar "|" is the same as saying "or". So, the statement above says "A statement is made of an assignment or a function call or a declaration". We have this grammar rule that tells us what a statement is, and we have our string of input tokens: KEYWORD IDENTIFIER OPERATOR IDENTIFIER PAREN INTEGER OPERATOR INTEGER PAREN "int" "x" "=" "add_both" "(" "5" "," "4" ")" The top-down parser will try each alternative in the STATEMENT definition to see if it matches. If one doesn't match, it moves to the next one and tries again. So, we try the ASSIGNMENT rule, and ASSIGNMENT is defined like this: ASSIGNMENT := (VARIABLE_NAME | DECLARATION) '=' (EXPRESSION | FUNCTION_CALL) Parenthesis are used to group things together. In English, this statement says "An assignment is a variable name or a declaration, followed by a '=' followed by an expression or a function call". So, to satisfy ASSIGNMENT, we try VARIABLE_NAME first, and then we try DECLARATION. The string "int x" is a declaration and not a simple variable name, so VARIABLE_NAME fails, and DECLARATION succeeds. Next, the '=' matches. To satisfy the last group, we try EXPRESSION first, which fails, and then we try FUNCTION_CALL, which succeeds. We proceed this way, trying alternatives and slowly moving down to smaller and smaller tokens, until we've matched the entire input string. Once we have matched the last input token, if we have also satisfied the top-level match, the parser succeeds. This type of parser is called a "recursive descent" parser, because we recurse into each subrule, and slowly descend from the top of the parse tree to the bottom. Once the last subrule succeeds, the whole match succeeds and the parser returns. In this process, when a rule matches, we create a node in our PAST tree. Because we test all subrules first before a rule succeeds, all the children nodes are created before the higher-level nodes are created. This creates the parse tree from the bottom going up, even though we started at the top of the tree and moved down. Top-down parsers can be inefficient because the parser will attempt to match patterns that obviously cannot succeed. However, there are techniques that we can use to "prune" the tree of possibilities by directing the parser towards certain paths or stopping it from going down branches that will not match. We can also prevent the parser from backtracking from subrules back up to larger rules, which helps to reduce unnecessary repetition.

Parrot Grammar Engine

50

Rules And Actions


A recursive descent parser, like the one used in PGE, is a top-down parser. This means it attempts to start at the highest-level definition and work its way down to match the given input. The top-level rule is always called TOP. After that, we can create as many other rules as we need to in order to specify our entire grammar. Let's start by creating a very simple recursive descent parser to try and parse the input that we gave earlier: int x = add_both(5, 4); Here is part of our basic parser: rule TOP { <assignment> | <function_call> | <declaration> } rule assignment { <lvalue> '=' <rvalue> } rule lvalue { <variable_name> | <variable_declaration> } rule rvalue { <expression> | <function_call> } rule function_call { <identifier> '(' <arguments> ')' } rule variable_declaration { 'int' <identifier> } This is only part of the parser, but the idea should be clear. We define each rule in terms of constants or other rules. The angle brackets "<" and ">" indicate that we need to match a subrule. Single quotes are used to indicate something that is a literal string constant.

Basic Rules
Rules have a couple basic operators that we can use, some of which have already been discussed. People who are familiar with regular expressions will recognize most of them.

Parrot Grammar Engine

51

Op

What It Means "zero or more of" "one or more of" "one or zero"

Example

Explanation

<number>* '.' <number>

Accepts a string with zero-or-more numbers, followed by a period, followed by 1 number. Example: 1234.5 One-or-more letters, followed by a number. Example: abcde5, or ghij6

<letter>+ <number>

<number>? '.' <number>+

An optional number followed by a period and a string of one-or-more numbers. Examples: 1.234 or .234 A letter followed by any amount of letters or number. Example: a123, ident, wiki2txt

[] Group

<letter> [<letter> | <number>]*

We have already discussed the or operator "|". Here are some examples. Decimal Numbers rule decimal_numbers { <digit>* '.' <digit>+ |<digit>+ '.' <digit>* } Function Parameters rule function_parameters { '(' [ <identifier> [ ',' <identifier> ]* ]? ')' }

Actions
As it successfully matches rules, PGE creates a special "match object", which contains information about the match. This match object can be sent to a function written in PIR or NQP for processing. The processing function that receives the match object is called the action. Each rule in the grammar has an associated action function with the same name. When we have completed a successful match, we need to send the match object to the action method. We do this with the special symbol {*}. {*} sends the current match object, not necessarily the complete one. You can call {*} multiple times in a rule to invoke the action method multiple times, if needed. Here is an example: rule k_r { {*} 'hello' {*} 'world' {*} }

#Calls the action method with an empty match object #calls the action method after matching 'hello' #Calls the action method after matching 'hello' 'world'

There are two important points to remember about action methods: 1. The parser moves from left-to-right, top-to-bottom. In the k_r example above, if the input was "hello johnny", the action method would get called the first two times, but the rule would fail to match the word "world" and the action method would not be called the third time. 2. The parser returns after a successful match, even if there are more possibilities to try. Consider this example below, where only one of the action methods can be called depending on the result of the alternation. In this example, if the input is "hello franky", the action method only gets called for the {*} after 'franky'. After that

Parrot Grammar Engine matches, the parser returns and does not try to match 'johnny'. rule say_hello { 'hello' [ | 'tommy' {*} | 'franky' {*} | 'johnny' {*} ] } It can be very helpful sometimes to specify which action method got called, when there is a list of possibilities. This is because we want to treat certain matches differently from others. to treat an action method differently, we use a special comment syntax: rule say_hello { 'hello' [ | 'tommy' {*} #= tommy | 'franky' {*} #= franky | 'johnny' {*} #= johnny ] } The special "#=" symbol is not a regular comment. Instead, it's a second parameter for the action method called a "key". If you use "#=", the action method will receive two arguments: the match object and the key. You can then test the value of the key to determine how to treat the match object. We will discuss this more in the next chapter about NQP.

52

Resources
Synopsis 5: Regexes and Rules [1] Synopsis 6: Subroutines [2]

References
[1] http:/ / perlcabal. org/ syn/ S05. html [2] http:/ / perlcabal. org/ syn/ S06. html

Not Quite Perl

53

Not Quite Perl


Not Quite Perl
Note: The source for the NQP compiler on Parrot can be found in the compilers/nqp directory in the Parrot repository. Not Quite Perl (NQP) is an implementation of a subset of the Perl 6 language which was originally intended to help bootstrap the implementation of Perl 6. In other words, the Perl 6 developers are writing the Perl 6 compiler in a subset of the Perl 6 language itself. This bootstrapping was accomplished by first writing a small NQP compiler using PIR. After the NQP compiler was completed, programs could then be written in NQP instead of having to write them entirely in PIR. NQP is not just a tool reserved for use with Perl 6, however. Other languages are using NQP as a light-weight implementation language. A major benefit to NQP is that it does not rely on any external code libraries which would be subject to change over time. Because of its small footprint, however, NQP tends to lack many features of higher-level programming languages, and learning to program without using some common constructs can be challenging at first.

Variables in NQP
Here we are going to discuss some of the basics of NQP programming. Experienced Perl programmers, even programmers who are familiar with Perl 5 but not necessarily Perl 6, will find most of this to be a simple review. NQP is not perl5 or perl 6. This point cannot be stressed enough. There are a lot of features from Perl that are missing in NQP. Sometimes, this means you need to do some tasks the hard way. In NQP, we use the := operator, which is called the bind operator. Unlike normal variable assignment, bind does not copy the value from one "container" to another. Instead, it creates a link between the two variables, and they are, from that point forward, aliases of the same container. This is similar to the way copying a pointer in C does not copy the data being pointed to. Variables in NQP typically have one of three basic types: scalars, arrays, and hashes. Scalars are single values, like an integer, a floating point number, or a string. Arrays are lists of scalars that are accessed with an integer index. A hash is a list of scalars that use a string, called a key, for indexing. All variable names have a sigil in front of them. A sigil is a punctuation symbol like "$", "@", or "%" that tells the type of the variable. Scalar variables have a "$" sigil. The following are examples of scalar values: $x := 5; $mystring := "string"; $pi := 3.1415; Arrays use the "@" sigil. We can use arrays like this: @myarray[1] := 5; @b[2] := @a[3]; Notice that NQP does not have a list context like Perl6 has. This means you can't do a list-assignment, like: @b := (1, 2, 3); # WRONG! $b := (1, 2, 3); # CORRECT

Not Quite Perl NQP is designed to be bare-bones, as little as is needed to support development of Perl6. The above line could be written also: @b[0] := 1; @b[1] := 2; @b[2] := 3; We'll discuss this in more detail a little bit further down the page. Hashes are prefixed with the "%" sigil: %myhash{'mykey'} := 7 %mathconstants{'pi'} := 3.1415; %mathconstants{'2pi'} := 2 * %mathconstants{'pi'}; Hashes, for people who aren't familiar with Perl, are also known as Dictionaries (in Python) or associative arrays. Basically, they are like arrays but with string indices instead of integer indices.

54

Where's My List Context?


As we mentioned before, there is no such thing in NQP as "array context", which Perl 5 programmer might have expected. One of the big features of the Perl language is that it's context-aware, and it treats things differently depending on whether you are in scalar or array context. Without this, it really isn't perl. That's why they call it NQP, because it's perl-ish, but isn't quite perl. In NQP you cannot write either of the following:
@a := (1, 2, 3); %b := ("a" => "b", "c" => "d"); Wrong! Wrong!

Lexical And Global Variables


All variables (hashes, scalars, and arrays) can be declared to be lexical with the keyword "my", or global with the keyword "our". For those readers who have read the sections on PIR, "my" variables correspond to the .lex directive, and the instructions store_lex, and find_lex. "our" variables correspond to the set_global and find_global instructions. Here's an example:
This NQP Code my $x; my @y; my %z; Translates (roughly) into this PIR code set_lex "$x", "" $P1 = new 'ResizablePMCArray' set_lex "@y", $P1 $P2 = new 'Hash' set_lex "%z", $P2

Likewise, for "our":


This NQP Code our $x; our @y; our %z; Translates (roughly) into this PIR code set_global "$x", "" $P1 = new 'ResizablePMCArray' set_global "@y", $P1 $P2 = new 'Hash' set_global "%z", $P2

Not Quite Perl

55

NQP Control Constructs


NQP has all the high-level control constructs that are missing in PIR. We have loops and If/Then/Else branches in a way that PIR does not have. Because this is a Perl-like language, the loops that NQP does have are varied and relatively high-level.

Branching Constructs
In terms of branches, we have: If/Then/Else if ($key eq 'foo') { THEN DO SOME FOO STUFF } elsif ($key eq 'bar') { THEN DO THE BAR-RELATED STUFF } else { OTHERWISE DO THIS } Unless/Then/Else

Looping Constructs
For A "For" loop iterates over a list and sets $_ to the current index, as in perl5. There's no c-style loop with STARTING_POINT and STEP_ACTION in NQP, although there is a similar construct in both Perl 5 and Perl 6. Here is a basic for loop: for (1,2,3) { Do something with $_ } Translated exactly into this PIR code: .sub 'for_statement' .param pmc match .local pmc block, past $P0 = match['EXPR'] $P0 = $P0.'item'() $P1 = match['block'] block = $P1.'item'() block.'blocktype'('sub') .local pmc params, topic_var params = block[0] $P3 = get_hll_global ['PAST'], 'Var' topic_var = $P3.'new'('name'=>'$_', 'scope'=>'parameter') params.'push'(topic_var) block.'symbol'('$_', 'scope'=>'lexical') $P2 = get_hll_global ['PAST'], 'Op'

Not Quite Perl $S1 = match['sym'] past = $P2.'new'($P0, block, 'pasttype'=>$S1, 'node'=>match) match.'result_object'(past) .end You can also iterate over the keys of a hash like so: for (keys %your_hash) { DO SOMETHING WITH %your_hash{$_} } where keys %your_hash creates a list of all of the keys in %your_hash, and iterates through this list setting $_ to hold the current key. While "While" loops are similar to for loops. In NQP, a while loop looks like this: while(EXIT_CONDITION) { LOOP_CONTENTS } Which roughly becomes in PIR: loop_top: if(!EXIT_CONDITION) goto loop_end LOOP_CONTENTS goto loop_top loop_end: Do/While A "do/while" loop is similar to a while loop except that the condition is tested at the end of the loop and not at the beginning. This means that the loop is always executed at least once, and possibly more times if the condition is not satisfied. In NQP: do { LOOP_CONTENTS } while(EXIT_CONDITION); In PIR: loop_top: LOOP_CONTENTS if(!EXIT_CONDITION) goto loop_end goto loop_top loop_end:

56

Not Quite Perl

57

Operators
NQP supports a small set of operators for manipulating variables.
Operator +, *, / % $( ... ) @( ... ) %( ... ) ~ eq ne := >, <, >=, <=, ==, != Purpose Scalar addition and subtraction Scalar multiplication and division integer modulus Convert the argument into a scalar Treat the argument as an array Treat the argument as a hash String concatenation String equality comparison String inequality comparison binding Equality and inequality operators

The Match Object, Defaults, and Hashes


When a grammar rule matches and the {*} rule is performed, a special type of hash object called the match object is generated and passed to the associated NQP method. This match object is given the special name $/. You can name it something different if you like, but you would lose a lot of the power that makes the $/ variable so special. Ordinarily when you reference an object in a hash, you would use { } curly brackets. For example: my %hash; %hash{'key'} = "value"; When you want to call a value from a hash reference, you would have to do something even more complex: $hashref->{'key'} = "value"; Note: In NQP (and in Perl 6) angle-brackets magically "auto quote" what's inside them. So you can write <field> instead of {'field'} or <'field'>. However, with the special default match object, you can use < > angle brackets instead. So, instead of writing $/->{'key'} We can write the less-verbose: $<key> The keys of the hash object correspond to the names of the subrules used in the grammar. So, if we had the grammar rule: rule my_rule { <first> <second> <third> <andmore> } Our match object would have the fields:

Not Quite Perl $<first> $<second> $<third> $<andmore> If we have multiples of any one field, such as: rule my_rule { <first> <second> <first> <second> } Now, $<first> and $<second> are both two-item arrays. Also, we can extend this behavior to repetition operators in the grammar: rule my_rule { <first>+ <second>* } Now, both $<first> and $<second> are arrays whose length indicate how many items were matched by each. You can use the + operator or the scalar() function to get the number of items matched.

58

Examples
Example: Word Detection
We want to make a simple parser that detects the words "Hello" or "Goodbye". If either of these words are entered, we want to print out a success message and the word. If neither word was entered, we print an error. To pick out words in our input, we will use the built-in subrule <ident>. rule TOP { <ident> $ {*} } In this grammar rule we are looking for a single identifier (which will be a word, for our purposes), followed by the end of the file. Once we have these, we create our match object and we call our Action method: method TOP($/) { if($<ident> eq "Hello") { say("success! we found Hello"); } elsif($<ident> eq "Goodbye") { say("success! we found Goodbye"); } else { say("failure, we found: " ~ $<ident>); } make PAST::Stmts.new(); }

Not Quite Perl Since the HLLCompiler class expects our action method to return a PAST node, we must create and return an empty stmts node. When we run this parser on input it will have three possible outcomes: 1. We've received a "Hello" or a "Goodbye", and the system will print a success method. 2. We've received a different word, and we will receive an error message. 3. We've received too many words, not enough words, or something that isn't a word. This will cause a parse error. Try it!

59

Example: Oct2Bin
Here is a simple example that shows how to make a program to convert octal numbers into binary. We start with a basic language shell from mk_language_shell.pl Grammar File: grammar Oct2Bin::Grammar is PCT::Grammar; rule TOP { <octdigit>+ [ $ || <panic: Syntax error> ] {*} } token octdigit {'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'} Action File: class Oct2Bin::Grammar::Actions; method TOP($/) { my @table; @table[0] := '000'; @table[1] := '001'; @table[2] := '010'; @table[3] := '011'; @table[4] := '100'; @table[5] := '101'; @table[6] := '110'; @table[7] := '111'; my $string := ""; for $<octdigit> { $string := $string ~ @table[$_]; } say( $string ); make PAST::Stmts.new( ); } Notice how in our actions file we had to instantiate the look-up table one element at a time? this is because NQP does not have a complete understanding of arrays. Notice also that we have our TOP method return an empty PAST::Stmts node, to suppress warnings from PCT that there are no PAST nodes.

Not Quite Perl

60

PIR Actions
NQP isn't the only way for writing action methods to accompany a grammar. It's an attractive tool for a number of reasons, but it isn't the only option. Action methods can also be written in PIR or PASM. This is how the NQP compiler itself is implemented. Here is an example of how a PIR action might look: .sub 'block' :method .param pmc match .param string key .local pmc past $P0 = get_hll_global ['PAST'], 'Stmts' past = $P0.'new'('node' => match) ... match.'result_object'(past) # make $past; .end

Optables and Expressions


Optables and Expressions
Top-down parsing has some problems. First, it's slow because it may need to try many alternatives before it finds a match. Each time it fails on a match, it needs to return to the previous rule and try again. Another problem is that parse trees can become very large for statements that have lots of little tokens, operator precedence, and many subrules. Consider a simple mathematical expression "1 * 2 + 5" and the following grammar: EXPRESSION := TERM ADD_OP TERM TERM := (NUMBER MUL_OP NUMBER) | NUMBER ADD_OP := '+' | '-' MUL_OP := '*' | '/' Our equation above creates a very complex parse tree: EXPRESSION TERM NUMBER 2 MUL_OP "*" NUMBER 5 ADD_OP "+" TERM NUMBER 5 For things like a mathematical expression, we would like to use a bottom-up parser instead of a top-down one. A special type of bottom-up parser, called an operator precidence table parser is particularly efficient and well-suited for these tasks. To create an operator precedence table, PGE has a special tool called an optable.

Optables and Expressions

61

Creating Optables
An optable is created by creating a rule that is optable. Here is an example: rule expression is optable { ... } The ... is not an omission, it's part of the rule. If you don't have the ... three dots, your rule will not work. This is called the yada yada yada operator, and is used to define a function prototype. PGE will instantiate the function for you, using the options you specify. The keyword is optable tells the parser to convert to bottom-up mode to parse expressions

Operator Prototypes
In an optable, tokens are specified using the proto keyword instead of the rule or token keywords. A proto definition must define several things: The name and type of the operator, the precidence of the operator, and the PIR code that is used to perform the particular operation. Here are some examples: proto 'infix:+' is precidence('1') is pirop('add') { ... } proto 'prefix:-' is tighter('infix:+') is pirop('neg') { ... } proto 'postfix:++' is equal('prefix:-') { ... } These examples show how to parse three common operators: A "+" between two values, a negative value for a negative number (such as "-123") and a post-increment operator ("x++"). There are lots of important keywords here, and this isn't even all the keywords you will need to produce a full expression parser. Operator Types There are four types of operator: prefix, postfix, infix, and circumfix. Here are some examples:
Type Prefix postfix infix circumfix Example prefix:postfix:* infix:+ circumfix:() Use -123 x* x+y (x)

postcircumfix postcircumfix:[] x[y] ternary ternary:? : x?y:z

Operator Precedence
Bottom-up parsers, as we mentioned before, suffer from a particular problem called shift-reduce conflicts. A shift-reduce conflict occurs when a parser cannot determine whether to shift in a new input token, or reduce the tokens it already has. Here is an example from the C programming language: -x + 2 * y We know, because we've been taught about operator precedence since we were children, that the expression is grouped like this: (-x) + (2 * y) However, a computer doesn't know this unless we tell it what precedence the operators follow. For instance, without knowing the order in which operators act, a parser might group the expression on a first-come, first-served basis like this:

Optables and Expressions ((-x) + 2) * y Or maybe it would shift in all tokens and start reducing from the right: -(x + (2 * y) The point is, we need to create rules to tell the operator precedence parser how to parse these conflicts. Here are the precedence rules that we can specify:
Code is precidence('1') Effect Sets a basic precidence level. The number in the quotes is arbitrary. The important part is that this is a constant, and that other operators can be defined in relation to this.

62

is equiv('OPERATOR') Here, the word OPERATOR will be replaced by an existing operator, such as is operator('infix:+') This shows that one operator has the same precedence as another operator. All operators with the same precedence are evaluated from left-to-right. is tighter('OPERATOR') is looser('OPERATOR') The operator binds tighter than OPERATOR. In the expression -x + y, we expect the prefix:- to bind tighter than the infix:+. Similar to is tighter(), but the opposite. Shows that the operator binds less tightly than OPERATOR

We didn't show this in the example, but it's also possible to determine whether operators act to the left or to the right. We define this with the is assoc keyword. The following table summarizes these rules:
Associativity is assoc('right') is assoc('left') is assoc('list') is assoc('chain') is assoc('none') Explanation

Operator Actions
There are three ways to specify how an operator behaves. The first is to associate the operator with a PIR opcode. The second is to associate the operator with a PAST node type. The third and final way to specify the actions of an operator is to create a function. The first two methods can be performed with the is pirop() and is pasttype() specifiers, respectively.

is pirop Operators
For simple operations, it doesn't make any sense to create and call a function. This is especially true for simple arithmetic operations which can be performed very quickly using PASM opcodes that Parrot already defines. Use the code is pirop('opname') to implement an operator as a single opcode.

Optables and Expressions

63

is pasttype Operators
Some operators, such as logical operators, can be best implemented using types of nodes that PAST already provides. Use the is pasttype('TYPE') form to specify the type of PAST node to use.

Operators as Functions
By default, unless you specify an operator as is pirop or is pasttype, the operator will default to being a function. Most operators are therefore defaulted to be functions, which is very good for languages which intend to allow operator overloading.

Proto Regexes
We will use the term "Proto", "Proto Regex" or "Protoregex" on occasion in this book, and it is used often in the documentation for Perl 6 as well. Protos are like rules except they offer something similar to multiple dispatch: You can have multiple rules of the same name that depend on the operands passed to it. By declaring an operator as a proto, you are giving it a name and you are also allowing future additions to be made to the language. This is an imporant feature for languages which utilize operator overloading. Protos are defined with the proto keyword. They can be used in either the top-down or bottom-up parser, although because of their most common use as a support mechanism for operator overloading, they are most often used in the bottom-up operator precidence parser. To define a new proto, use the form: proto NAME OPTIONS { ... } In this declaration, the { ... } is a literal piece of code: you actually must type the three dots (ellipsis) between the curly brackets. What this does is alerts the parser that this is not an implementation of the rule itself, it is actually just a signature or a forward-declaration for other functions. For instance, we can define a proto MyProto: proto MyProto { ... } And in another file we can define functions to implement this: .sub MyProto # Insert the parameter list and function logic here .sub .sub MyProto # Another MyProto! # Insert a different parameter list and different function logic here .end

Parsing Terms
Terms are the values between operators. In the following expression: 1 + 2 * 3 The numbers 1, 2, and 3 are all terms. Terms are parsed using the ordinary rules, and are loaded into the operator precedence table using the following syntax: proto 'term:' is precedence('1') is parsed(&term) { ... } Of course, you can set the precedence however you see fit, but it's typically best to make terms parse as tightly as possible. The is parsed() attribute specifies that terms are to be parsed using a rule named "term". You can define this term using ordinary rule syntax. The & symbol is the sigil that NQP uses to refer to a subroutine. In this

Optables and Expressions case, the subroutine in question is the grammar method that is intended to parse terms (but it can conceivably be anything else). We will learn more about NQP and sigils in the following chapter. Terms are the way the parser switches back from bottom-up to top-down parsing.

64

Expression and Term Methods


Once we've created our parser, we need to create methods associated with them to construct the necessary PAST nodes for the parse tree. Lucky for us, these methods are easy to create, and typically the best option is to use a basic template from another language implementation. Terms are defined like any other rule, and don't require any special syntax. Expressions, however, do require a little bit of special attention.

Advanced PGE
Advanced PGE
We've already looked at some of the basics of parser constructing using PGE and NQP. In this chapter we are going to give a more in-depth look at some of the features of the grammar engine that we haven't seen yet. Some of these more advanced features, such as inline PIR code, assertions, function calls and built-in token types will make the life of a compiler designer much easier, but are not needed for most basic tasks.

regex, token and proto


A regex is a high-level matching operation that allows backtracking. A token is a low-level matching operation that does not allow backtracking. A proto is like a regex but allows multiple dispatch. Think of a proto declaration as being a prototype or signature that several functions can match.

Inline PIR Sections


PIR can be embedded directly into both PGE grammar files and NQP files. This is important to fill in some gaps that NQP cannot handle due to its limitations. It is also helpful to insert some active processing into a grammar sometimes, to be able to direct the parser in a more intelligent way. In NQP, PIR code can be inlined using the PIR statement, followed by a quoted string of PIR code. This quoted string can be in the form of a perl-like "qw< ... >" type of quotation, if you think that looks better. In PGE, inline PIR can be inserted using double-curly-brackets "{{ ... }}". Once in PIR mode, you can access the current match object by calling $Px = find_global "$/" (where $Px is any of the valid PIR registers where x is a number).

Advanced PGE

65

Built-In Token Types


PGE has basic default values of certain rules already defined to help with parsing. However, you can redefine these to be something else, if you don't like the default behavior.

Calling Functions
functions or subroutines are an integral part of modern programming practices. As such, support for them is part of the PAST system, and is relatively easy to implement. We're going to cover a little bit of necessary background information first, and then we will discuss how to put all the pieces together to create a system with usable subroutines.

return Described
In Parrot control flow, especially return operations from subroutines, are implemented as special control exceptions. The reason why it is done as an exception and not as a basic .return() PIR statement is a little bit complicated. Many languages allow for nested lexical scopes, where variables defined in an "inner" scope cannot be seen, accessed, or modified by statements in the "outer" scope. In most compilers, this behavior is enforced by the compiler directly, and is invisible when the code is converted to assembly and machine languages. However PIR is like an assembly language for the Parrot system, and it's not possible to hide things at that level. All local variables are local to the entire subroutine and cannot be localized to a single part of a subroutine. To implement nested scopes, Parrot instead uses nested subroutine

Returns and Return Values


Functions can be made to return a value use the "return" PAST.op type. The return system is based on a control exception. Exceptions, as we've discussed before, move control flow to a specified location called the "exception handler". In terms of a return exception, the handler is the code directly after the original function call. The return values (currently, the return PAST node only allows a single return value) are passed as exception data items and are retrieved by the control exception handler. All of these details are generally hidden from the programmer, and you can treat a return PAST node exactly like you would expect. You pass a return value, if any, to the return PAST node. The current function ends and its scope is destroyed. Control flow returns to the calling function, and the return value from the function is made available.

Advanced PGE

66

Assertions
MetaSyntactic Assertions
You can call a function from within a rule using the <FUNC( )> format.

Non-Capturing Assertions
Use <. > form to create a match object that does not capture its contents.

Indirect Rules
A rule of the form <$ >, which can be a string or some other data, is converted into a regular expression and then run.

Character Classes
Rules of the form <[ ]> contain custom character classes. Rules with <-[ ]> are complimented character classes.

Built-in Assertions
<?before>, <!before> <?after>, <!after> <?same>, <!same> <.ws> <?at()>, <!at()>

Partial Matches
You can specify a partial match, a match which attempts to match as much as possible and never fails, with the <* > form.

Recursive Calls
You can recurse back into subrules of the current match rule using the <~~ > rule.

Resources
Synopsis 5 (rules) [1] Synopsis 6 (Subroutines) [2]

References
[1] http:/ / dev. perl. org/ perl6/ doc/ design/ syn/ S05. html [2] http:/ / dev. perl. org/ perl6/ doc/ design/ syn/ S06. html

Building A Compiler

67

Building A Compiler
HLL Compilers
Now that we know NQP and PGE, we have the tools that we need to start writing a compiler for our favorite language. Let's review the steps to creating a compiler: 1. 2. 3. 4. 5. Write Grammar using PGE Write Actions using NQP Actions create PAST tree PCT converts PAST into PIR Parrot converts PIR into bytecode.

After step 5, if we have done everything correctly, we should have a working compiler. Of course, there is a lot of work to do before we reach step 5. Lucky for us, however, steps 4 and 5 are automated by the build process, so we only need to worry about 1, 2, and 3. In the Tutorials section we are going to have several language-building tutorials that you can follow.

Creating a PAST Tree


PAST nodes are objects that your NQP Action methods must create. PCT automatically inserts these nodes into the tree once you create them. PAST nodes are complex objects with an array and a hash component. The array stores a list of references to the children nodes, and the hash stores information about the node itself. Creating the past tree means that we create nodes in lower-rules, and insert them into the array for higher rules. As we go, we set the necessary options for each rule, so that PCT can generate the appropriate code. There is a reference for the various types of PAST nodes available in the appendix. PAST::Node is the base class from which the other PAST node types are derived. The other types are PAST::Stmts, PAST::Val, PAST::Var, PAST::Op, and PAST::Block. PAST::Op PAST::Op nodes are operations that need to be performed. PAST::Val PAST::VAL nodes are constant values like integers or strings PAST::Var PAST::Var nodes are variables PAST::Block PAST::Block nodes are used for lexically scoping variables and defining subroutines. They keep other nodes together. PAST::Stmts PAST::Stmts nodes are just groups of other nodes, and perform no task besides basic organization. Once you have created the necessary node, you insert it into the parse tree with the keyword make.

Building A Compiler

68

The Match Object


The Match object is a special object that contains a hash. The match object hash contains information about the matched rule. Each element in the hash is named after one of the sub-rules in the rule. For instance, if we have the following rule: rule sentence { <adjective> <noun> <adverb> <verb> } Then the match rule would have entries "adjective", "noun", "adverb", and "verb". The value of the hash for each of these keys are the values that were returned from the subrules using the make command, if any. So, we would have the following fields: $<adjective> $<noun> $<adverb> $<verb> We would then use the values of each of these to produce and make a node for the sentence rule. If each of these fields return PAST nodes of their own, we should push those

Examples
We are going to show some very small examples here to demonstrate the basic compiler construction method. More advanced tutorials are provided in the Tutorials section.

Example: Basic Calculator


Problem We want to create a basic calculator program that can perform addition and subtraction on integers. The calculator should print the result to the screen. Solution We start by running mk_language_shell.pl to create a basic language framework. This also produces a builtin function called say. We are going to use the say function to print results to the screen. We are going to use a basic top-down parser for this, we are not going to use an optable. We create a basic grammar for our calculator: rule TOP { <expression> {*} } rule expression { <term> <operation> <term> {*} } token operation { '+' | '-' } token term { \d+ }

Building A Compiler We now need to create two actions, one for TOP and the other for expression. The TOP method should take the value of the parsed expression and pass that to the say function. The expression function should generate the necessary PIR operations, and insert the term values into these operations. Here is a basic actions file to do this: method TOP($/) { my $past := PAST::Op.new(:pasttype('inline')); my $expr := $( $<expression> ); $past.inline('say(%0)'); $past.unshift($expr); make $past; } method expression($/) { my $left := $( $<term>[0] ); my $right := $( $<term>[1] ); my $op := $( $<operation> ); my $pirop := "sub_n"; if $op eq "+" { my $pirop := "add_n"; } my $past := PAST::Op.new(:pasttype('pirop'), :pirop($pirop)); $past.unshift($left); $past.unshift($right); make $past; }

69

HLL Interoperation

70

HLL Interoperation
HLL Interoperability
Parrot was designed not just for Perl6, even though that was one of the bigger driving forces initially. Parrot is being designed and implemented to be a virtual machine for all dynamic programming languages (and even a few statically-typed ones too). The ultimate goal is to be able to combine tools and libraries written in various languages together, and allow developers to write different parts of a project in the language that makes the most sense to that project. Parrot makes interoperability easy, but it's ultimately the responsibility of the language designers to make sure that their languages play nicely with others.

Sharing Data
Vtables: Standard Interfaces
All PMC objects implement the standard vtable functions, and these are to be the primary interface for dealing with foreign or exotic data types. If you receive a Ratio object from common LISP, you might not know what it's methods are, or what it's internal storage structure is when you use it from Perl 6. However, by calling the standard vtable methods, you can interact with it in an easy way.

Sharing Functions
Multi Methods
Like many modern programming languages, Parrot allows function name overloading. Called multiple method dispatch (MMD), Parrot's function call system is very powerful and flexible. In the MMD system, multiple functions in a single namespace can have the same name, so long as they have different call signatures. A call signature specifies the number and type of parameters and return values that the function expects. If we call a function: (a, b, c, d) = Foo(x, y, z) Parrot only calls a function Foo if it has three inputs and four outputs. Multiple dispatch isn't just used for functions, it's also used for opcodes. A program (or, more likely an HLL) can add new opcodes with the same name as an existing opcode, so long as it uses a different function signature. This allows a very powerful and flexible way to customize your system. We will talk more about MMD in a later chapter.

HLL Interoperation

71

Translation Interfaces
The idea of including some sort of translation interface, functions that would automatically convert calls from one HLL to appropriate calls to another, has been attractive to some developers. However, there is a large associated development cost because for n languages we would require interfaces to translate to and from them all. Because of this additional complexity cost, it is not recommended to handle data or function sharing using interfaces.

Documentation
The most important thing a language developer can do is document their interfaces. It is important to document what data types a function expects and what data types it returns. This way, a person using these libraries from a different languages know how to interact with these functions and objects. Plus, documentation may expose some functions or objects as being overly difficult for reuse. This will help deter developers from using a complicated interface. It will also help to expose to the library designers ways to simplify their library.

72

Parrot Hacking
Parrot Internals
Parrot Development Process
The Parrot development project is a large and complex project with multiple facets. Here is an overview of some key points about the Parrot build process. Some of the points here have not been discussed before, but we will covert them in this or later chapters: The build environment is configured using the Configure.pl program. This program is written, like many of the build tools for Parrot, in Perl 5. Configure.pl determines options on your system including which compiler you are using, which Make program (if any) you are using, what platform-specific libraries are required, etc. PMCs are written in a C-like script which is compiled into C code using the PMC Compiler. The PMC Compiler will produce C code and associated header files for all PMCs, and will register the PMCs into the Parrot PMC table. Opcodes are written in a C-like script which is compiled into C, just like PMCs are. The syntax of Opcode files is similar in some respects to that used for PMCs, but is different in many ways too. Ops files are converted into C before being compiled into machine code. Native Call Interface (NCI) function signatures must be converted into C functions prior to compilation using the NCI compiler Just-In-Time operations must be converted into C code for compilation into native code. The parsers for PASM and PIR are written in Lex/Bison. These need to be compiled into C files for compilation. The constant string converter converts CONST_STRING declarations into string constants at compile time. This saves a lot of time at execution. The Makefile automates the build process by compiling all the PMCs, Compiling all the C files, building the executables and libraries, etc. In this chapter we are going to give an overview of some of the components of the Parrot Virtual Machine, later chapters will discuss the various Parrot subsystems including many of the processes that we've described above. The chapters in this section are all going to discuss Parrot hacking and development. If you aren't interesting in helping with Parrot development, you can skip these chapters.

Parrot Repository
Here is the general structure of the Parrot Repository, as far as source code is concerned:

Major Parrot Components


PASM and PIR Parsers
There are two parsers for PIR available. The first is IMCC, which is used currently but is inefficient, and the other is PIRC which is more efficient but not stable yet. The long-term plan is for PIRC to become the predominant PIR parser by the time the 1.0 version of Parrot is released. Both IMCC and PIRC are written in the C programming language with parsers written in Lex and Yacc. PIRC and IMCC act as front-ends to two other Parrot components: the bytecode compiler and the interpreter.

Parrot Internals

73

Bytecode Compiler and Optimizer


The bytecode compiler is the portion of Parrot which is responsible for converting input symbols (in the form of PASM or PIR) into Parrot bytecode. This bytecode, once compiled, can be run on Parrot quickly and efficiently. Another related Parrot component is the bytecode optimizer which is responsible for low-level optimizations of Parrot bytecode.

Interpreter
While the bytecode compiler takes input symbols from PIRC or IMCC and converts them into a bytecode for storage and later execution, the interpreter uses these symbols to execute the program directly. This means that there is no intermediate step of compilation, and a script can be execute quickly without having to be compiled.

Subsystems
I/O Subsystem
The I/O subsystem controls reading and writing operations to the console, to files, and to the operating system. Much of this functionality is being performed in special PMCs.

Regular Expression Engine


The regular expression engine is used to provide fast regular expressions for Parrot programs. The functionality of this engine is most obviously expressed in PGE, but is also available in other places as well. Perl 6 regular expressions, on which this engine is based, differ significantly from Perl 5 regular expressions and their variants.

Garbage Collector and Memory Management


The memory management subsystem is designed to allocate and organize memory for use with Parrot and programs which run on top of Parrot. The garbage collector detects when allocated memory is no longer being used and returns that memory to the pool for later allocation.

APIs
Embedding API
Parrot is not just an executable program, it's also a linkable library called libparrot. libparrot can be linked to other programs, and a Parrot interpreter object can be called from inside that program. An entire embedding API has been created to allow libparrot to communicate with other programs.

Extensions API
Parrot can be extended by using dynamic libraries, such as linux .so files, or Windows .dll files. These extensions must interact with Parrot in a safe and controlled way. For this, the Extensions API was written to given extensions a communications channel into the heart of Parrot.

Parrot Hacking
The next several chapters are going to look at the individual components of Parrot. We will discuss the software architectures and operations of each component. As we have already seen, Parrot itself is written using the C programming language, although individual components (such as the opcodes, PMCs, and other features) are written in special domain-specific languages and later translated into C code. Some higher-level functionality, such as PCT

Parrot Internals is writtin in PASM and PIR too. Parsers for PIR are written using a combination of Lex and Yacc. Programming for Parrot is typically going to require a good knowledge of the C programming language, but also a good understanding of Perl 5. this is because Perl 5 is used to write all the development tools which control the build process for Parrot.

74

Resources
http://www.parrotcode.org/docs/pdd/pdd01_overview.html

IMCC and PIRC


IMCC and PIRC
IMCC and PIRC are two separate parsers for PIR and PASM. Both are written using Lex and Yacc.

IMCC
IMCC, the Intermediate Code Compiler is the current front-end for Parrot that reads PASM and PIR code. It's a relatively old subsystem, and has proven itself difficult to extend or maintain. IMCC will likely be the "offical" parser for PIR and PASM until the Parrot 1.0 release. However, the plan for the future is to move to a more extensible and more manageable parser, such as PIRC. IMCC is written using a combination of a Lex tokenizer and a Yacc parser. It includes a number of components, such as register allocators, and code optimizers.

PIRC
PIRC is, in theory anyway, a better alternative to parsing PIR and PASM than IMCC is. However, PIRC is not currently complete, and does not offer the feature set of IMCC yet. There are two different implementations of PIRC: one is a hand-written recursive descent parser, and the other is based on a multi-stage Lex/Yacc parser. The old version, the hand-written recursive descent version, is obsolete and is no longer maintained. We will only talk about the new implementation of PIRC here. PIRC is divided into three different parsers: 1. A parser for "Heredoc" constructs, strings which are embedded directly into the file. 2. A macro parser and text replacer. This is a preprocessor for handling macros and constants. 3. A parser for the rest of PIR and PASM Breaking things up like this proves to keep things much more simple, and reduces the number of states and conditionals that a single large parser would require.

PIRC Current Status


PIRC is intended to replace IMCC as the primary front-end to Parrot. However, this changeover might not be made until after the 1.0 release of Parrot. In other words, it's an important development, but the Parrot developers aren't going to wait around for it.

Run Core

75

Run Core
Run Core
We've discussed run cores earlier, but in this chapter we are going to get into a much deeper discussion of them. Here, we are going to talk about opcodes, and the special opcode compiler that converts them into standard C code. We will also look at how these opcodes are translated by the opcode compiler into different forms, and we will see the different runcores that perform these opcodes.

Opcodes
Opcodes are written using a very special syntax which is a mix of C and special keywords. Opcodes are converted by the opcode compiler, tools/dev/ops2c.pl into the formats necessary for the different run cores. The core opcodes for Parrot are all defined in src/ops/, in files with a *.ops extension. Opcodes are divided into different files, depending on their purpose:
Ops file bit.ops cmp.ops core.ops debug.ops bitwise logical operations comparison operations Basic Parrot operations, private internal operations, control flow, concurrency, events and exceptions. ops for debugging Parrot and HLL programs. Purpose

experimental.ops ops which are being tested, and which might not be stable. Do not rely on these ops. io.ops math.ops object.ops obscure.ops pic.ops pmc.ops ops to handle input and output to files and the terminal. mathematical operations ops to deal with object-oriented details ops for obscure and specialized trigonometric functions private opcodes for the polymorphic inline cache. Do not use these. Opcodes for dealing with PMCs, creating PMCs. Common operations for dealing with array-like PMCs (push, pop, shift, unshift) and hash-like PMCs ops to set and load registers Ops for software transactional memory, the inter-thread communication system for Parrot. In practice, these ops are not used, use the STMRef and STMVar PMCs instead. Ops for working with strings Operations to interact with the underlying system ops to deal with lexical and global variables

set.ops stm.ops

string.ops sys.ops var.ops

Run Core

76

Writing Opcodes
Ops are defined with the op keyword, and work similarly to C source code. Here is an example: op my_op () { } Alternatively, we can use the inline keyword as well: inline op my_op () { } We define the input and output parameters using the keywords in and out, followed by the type of input. If an input parameter is used but not altered, you can define it as inconstThe types can be PMC, STR (strings), NUM (floating-point values) or INT (integers). Here is an example function prototype: op my_op(out NUM, in STR, in PMC, in INT) { } That function takes a string, a PMC, and an int, and returns a num. Notice how the parameters do not have names. Instead, they correspond to numbers: op my_op(out NUM, in STR, in PMC, in INT) ^ ^ ^ ^ | | | | $1 $2 $3 $4 Here's another example, an operation that takes three integer inputs, adds them together, and returns an integer sum: op sum(out INT, in INT, in INT, in INT) { $1 = $2 + $3 + $4; } Nums are converted into ordinary floating point values, so they can be passed directly to functions that require floats or doubles. Likewise, INTs are just basic integer values, and can be treated as such. PMCs and STRINGs, however, are complex values. You can't pass a Parrot STRING to a library function that requires a null-terminated C string. The following is bad: #include <string.h> op my_str_length(out INT, in STR) { $1 = strlen($2); // WRONG! }

Advanced Parameters
When we talked about the types of parameters above, we weren't entirely complete. Here is a list of direction qualifiers that you can have in your op:

Run Core

77

direction in out

meaning The parameter is an input The parameter is an output op my_op(in INT) op pi(out NUM) { $1 = 3.14; }

example

inout

The parameter is an input and an output:

op increment(inout INT) { $1 = $1 + 1; } || inconst || The input parameter is constant, it is not modified | <pre> op double_const(out INT, inconst INT) { $1 = $2 + $2; } And, in PIR: $I0 = double_const 5 # numeric literal "5" is a constant

invar

The input parameter is a variable, like a PMC

op my_op(invar PMC)

The type of the argument can also be one of several options:


type INT NUM STR PMC KEY meaning integer value floating-point value string PMC variable Hash key example 42 or $I0 3.14 or $N3 "Hello" or $S4 $P0 ["name"] [5]

INTKEY Integer index LABEL

location in code to jump to jump_here:

OP naming and function signatures


You can have many ops with the same name, so long as they have different parameters. The two following declarations are okay: op my_op (out INT, in INT) { } op my_op (out NUM, in INT) { } The ops compiler converts these op declarations similar to the following C function declarations: INTVAL op_my_op_i_i(INTVAL param1) { } NUMBER op_my_op_n_i(INTVAL param1) { }

Run Core Notice the "_i_i" and "_n_i" suffixes at the end of the function names? This is how Parrot ensures that function names are unique in the system to prevent compiler problems. This is also an easy way to look at a function signature and see what kinds of operands it takes.

78

Control Flow
An opcode can determine where control flow moves to after it has completed executing. For most opcodes, the default behavior is to move to the next instruction in memory. However, there are many sorts of ways to alter control flow, some of which are very new and exotic. There are several keywords that can be used to obtain an address of an operation. We can then goto that instruction directly, or we can store that address and jump to it later.
Keyword NEXT() Jump to the next opcode in memory Meaning

ADDRESS(a) Jump to the opcode given by a. a is of type opcode_t*. OFFSET(a) POP() Jump to the opcode given by offset a from the current offset. a is typically type in LABEL. get the address given at the top of the control stack. This feature is being deprecated and eventually Parrot will be stackless internally.

The Opcode Compiler


The opcode compiler is located at dev/build/ops2c.pl, although most of it's functionality is located in a variety of included libs, such as Parrot::OpsFile. Parrot::Ops2c::* and Parrot::OpsTrans::*. We'll look at the different runcores in the section below. Suffice it to say, however, that different runcores require that the opcodes be compiled into a different format for execution. Therefore the job of the opcode compiler is relatively complex: it must read in the opcode description files and output syntactically correct C code in several different output formats.

Dynops: Dynamic Opcode Libraries


The ops we've been talking about so far are all the standard built-in ops. These aren't the only ops available however, Parrot also allows dynamic op libraries to be loaded in at runtime. dynops are dynamically-loadable op libraries. They are written almost exactly like regular built-in ops are, but they're compiled separately into a library and loaded in to Parrot at runtime using the .loadlib directive.

Run Cores
Runcores are the things that decode and execute the stream of opcodes in a PBC file. In the most simple case, a runcore is a loop that takes each bytecode value, gathers the parameter data from the PBC stream, and passes control to the opcode routine for execution. There are several different opcores. Some are very practical and simple, some use special tricks and compiler features to optimize for speed. Some opcores perform useful ancillary tasks such as debugging and profiling. Some runcores serve no useful purpose except to satisfy some basic academic interest.

Run Core

79

Basic Cores
Slow Core In the slow core, each opcode is compiled into a separate function. Each opcode function takes two arguments: a pointer to the current opcode, and the Parrot interpreter structure. All arguments to the opcodes are parsed and stored in the interpreter structure for retrieval. This core is, as it's name implies, very slow. However, it's conceptually very simple and it's very stable. For this reason, the slow core is used as the base for some of the specialty cores we'll discuss later. Fast Core The fast core is exactly like the slow core, except it doesn't do the bounds checking and explicit context updating that the slow core does. Switched Core The switch core uses a gigantic C switch { } statement to handle opcode dispatching, instead of using individual functions. The benefit is that functions do not need to be called for each opcode, which saves on the number of machine code instructions necessary to call an opcode.

Native Code Cores


JIT Core Exec Core

Advanced Cores
The two cores that we're going to discuss next rely on a specialty feature of some compilers called computed goto. In normal ANSI C, labels are control flow statements and are not treated like first-class data items. However, compilers that support compute goto allow labels to be treated like pointers, stored in variables, and jumped to indirectly. void * my_label = &&THE_LABEL; goto *my_label; The computed goto cores compile all the opcodes into a single large function, and each opcode corresponds to a label in the function. These labels are all stored in a large array: void *opcode_labels[] = { &&opcode1, &&opcode2, &&opcode3, ... }; Each opcode value can then be taken as an offset to this array as follows: goto *opcode_labels[current_opcode]; Computed Goto Core The computed goto core uses the mechanism described above to dispatch the various opcodes. After each opcode is executed, the next opcode in the incoming bytecode stream is looked up in the table and dispatched from there. Predereferenced Computed Goto Core

Run Core In the precomputed goto core, the bytecode stream is preprocessed to convert opcode numbers into the respective labels. This means they don't need to be looked up each time, the opcode can be jumped to directly as if it was a label. Keep in mind that the dispatch mechanism must be used after every opcode, and in large programs there could be millions of opcodes. Even small savings in the number of machine code instructions between opcodes can make big differences in speed.

80

Specialty Cores
GC Debug Core Debugger Core Profiling Core Tracing Core

Memory and Garbage Collection


Arenas and Pools
Parrot allocates memory in large blocks called arenas. Each arena has enough space to hold exactly N items of size M, typically tightly packed together in consecutive memory locations. Some collectors may also store additional metadata in the arena as well. Multiple arenas of the same type are stored as a linked list inside a pool. A pool may no arenas, one arena, or multiple arenas, but arenas in a given pool all contain objects of the same size. However, each arena may hold a different number of objects. The parrot interpreter structure, which maintains state for the entire virtual machine, contains a pointer to the arena_base structure. The arena base structure contains pointers to all the various pools, and a few other important data items that are used for memory management.

Garbage Collectors
Parrot has a number of garbage collectors available, which can be selected at compile time using compiler directives. The most mature and robust collector at this time is a simple mark & sweep collector, GC_MS. The different collectors can be activated or deactivated prior to compilation in the file include/Parrot/settings.h. That file contains a number of options that you can set to customize the behavior of Parrot.

Mark & Sweep Collector


The Mark & Sweep (MS) collector is the only collector for Parrot that is currently mature and stable enough for regular use. However, this collector is the cause of several performance problems in Parrot, and there are active efforts under way to replace it. The mark & sweep collector will likely be deprecated or removed entirely from Parrot by the 1.0.0 release.

Memory and Garbage Collection

81

Incremental Tricolor Collector


Development is currently under way on a new garbage collection scheme which uses a tricolor marking algorithm to mark data objects. The tricolor algorithm will enable an incremental behavior in the collector so that the "stop the world" behavior of the MS collector can be avoided for large memory pools.

Writing new Collectors


Writing a new collector is as simple as implementing a particular interface. This interface is defined in detail in PDD 09, but we will cover some of the basics here.

Resources
http://www.parrotcode.org/docs/memory_internals.html http://www.parrotcode.org/docs/pdd/pdd09_gc.html

PMC System
PMCs
We've discussed PMCs already -- in the Parrot Virtual Machine/Polymorphic Containers (PMCs) chapter -including how to define new PMC types using the PMC compiler, and how to use them in PIR programs. This chapter is going to go into more detail about how PMCs are actually used in Parrot, including memory management of PMCs, morphing PMCs, and interfacing with PMCs.

PMC System Overview


The PMC data structure is deceptively simple, and is designed to be extensible enough for general purpose data and functionality use. Here is the definition of the PMC structure and the associated PMC_EXT structure: struct PMC { Parrot_UInt flags; VTABLE *vtable; DPOINTER *data; struct PMC_EXT *pmc_ext; }; typedef struct PMC_EXT { DPOINTER *data; PMC *_metadata; struct _Sync *_synchronize; PMC *_next_for_GC; } PMC_EXT; As we can see in these definitions, the PMC is actually very small. Much of the information about a PMC, including all it's various methods and VTABLE interfaces is stored in the ->vtable pointer. The VTABLE structure is a very large structure that contains function pointers for all the various VTABLE interfaces.

PMC System

82

PMC Data
Every PMC type also contains a pointer for a data structure that's specific to that PMC. These data structures are defined based upon the inheritance hierarchy of the PMC and the various attributes that it has been defined with. For instance, this PMC definition: pmclass MyPmc { ATTR INTVAL a; ATTR FLOATVAL b; ATTR STRING *c; ATTR PMC *d; ... } will turn into this C data structure definition: typedef struct Parrot_MyPmc_attributes { INTVAL a; FLOATVAL b; STRING *c; PMC *d; } Parrot_MyPmc_attributes; This structure is supposed to be contained in the ->data pointer, which should always be accessed using the PMC_data macro. This way if the PMC structure definition changes eventually, all code that uses the macros properly will be automatically updated because the macro will be updated. Here is an example of an initialization VTABLE that uses these attributes: VTABLE void init () { Parrot_MyPmc_attributes *p = mem_allocate_typed(Parrot_MyPmc_attributes); p->a = 0; p->b = 0.0 p->c = NULL; p->d = PMCNULL; PMC_data(SELF) = p; } There is another macro which uses the word PARROT and the name of the PMC in all capital letters to retrieve the data structure again and properly cast it (so your compiler doesn't give warnings about using pointers without a cast): Parrot_MyPmc_attributes *attr = PARROT_MYPMC(SELF);

PMC System

83

PObjects
C isn't a class-based (or "object oriented") language, but many lessons from OO programming methodologies have been adapted for use in Parrot's code base. PMCs, STRINGs, and a few other data types are based off the definition of a "PObj", also known as a "Buffer": typedef struct Buffer { Parrot_UInt flags; } Buffer; Notice how the first two entries in the Buffer are the same as in the PMC? All objects that start off with these two data items are said to be "PObject isomorphic". For short, we say that all pobject isomorphic are simply "PObjects", and there are many types of PObjects. The memory management system, for example, can test the flags of all pobjects to determine what type of PObject a memory object is. A PMC may optionally contain a PMC_EXT structure, which adds additional functionality. PMC_EXT allows a PMC to be shared between multiple threads, or multiple Parrot interpreters without introducing data contention. PMC_EXT also allows a PMC to contain a hash of metadata (attribute value pairs), which are typically added as attributes in PIR.

PMC Management
PMCs are allocated from two special pools, a PMC pool and a constant PMC pool. Constant PMCs are considered to be immutable and everlasting, and so are never modified nor collected by the garbage collector. STRINGS are allocated in either a string pool or a constant string pool. The same relationship applies, constant strings are never modified and never collected. PMC_EXT structures are not currently managed by the memory management subsystem. However, since PMC_EXTs are assigned to PMCs in a one-to-one relationship, we always know we can free it when it's PMC is freed. In terms of garbage collection, PMCs are one of the only aggregate data types in Parrot. STRINGS do not contain pointers to other data items of interest to the garbage collector. Stack chunks, which are used internally in some structures are PObjs and are also aggregates, but are marked separately by the collector and are not treated as aggregates directly.

VTables
VTables represent a standard interface to PMCs of all types. For every single PMC, there are a series of standard operations that you can perform (or attempt). Not all PMCs support all Vtable operations

VTable Types
VTables are complicated data items that contain, in addition to a large series of function pointers, a number of data items to support PMCs. One of the data items is a class PMC, a PMC that represents a particular PMC class. Another item is an enum that differentiates between all PMC classes.

Resources
http://www.parrotcode.org/docs/vtables.html

String System

84

String System
Strings Overview

Exception Subsystem
Exception handling has become a staple in most modern programming languages. Parrot, since it intends to host many such languages must support a robust exception system. Not only does Parrot use exceptions for error handling and recovery, but it also encourages the use of control flow exceptions to implement high-level control flow features of those languages as well. What this means is that the exception subsystem is one of the most important for language implementers to become familiar with.

Exceptions: The Basics


Exceptions are broken down into two primary parts: the exception object and the exception handler. Exception handlers are like subroutines in many ways, and must be specially registered with Parrot before they can be used. When Parrot detects an error, it creates an exception object by including information about the error, a continuation for the current control flow state, and a few other pieces of information. Exceptions, like most everything else in Parrot, are PMCs and can be stored, manipulated, and used like other PMCs. Once the exception object is created, Parrot looks through it's list of handlers, and passes it's exception object to each. Handlers are stored in a stack-like structure, and the most recently registered handler will have the first access to the exception object. The handler can do one of several things. First, it can handle the exception: It can fix the error, call the return continuation that is within the exception object, and returns control flow back to where it was before the error occurred. Second, it could re-throw the exception, passing it to the next handler in the stack. Third, it could ignore the exception. Exceptions which are ignored, or for which no handlers are available, cause Parrot to exit.

IO Subsystem

85

IO Subsystem
I/O Subsystem Resources
http://www.parrotcode.org/docs/pdd/pdd22_io.html

JIT and NCI


JIT and NCI
JIT, the Just in time compiler is a system which compiles Parrot opcodes into native machine code for faster execution. NCI, the native call interface is a subsystem in Parrot that allows it to communicate with compiled library functions. These two systems are very important for Parrot and, as we shall see, they are closely related to one another.

NCI

Parrot Embedding
Embedding Parrot
Because the Parrot Virtual Machine is modular, we can link the libparrot library into other executables, creating programs which contain a Parrot interpreter object. One simple use for such a technology would be to create an executable file which contains data in the form of precompiled bytecode and a simple instantiation of the Parrot interpreter to create a standalone executable for a particular programming language. This is called Native Execution, and we will discuss it in more detail below.

Resources
http://www.parrotcode.org/docs/pdd/pdd10_embedding.html http://www.parrotcode.org/docs/native_exec.html

Extensions

86

Extensions
Extending Parrot
Parrot is not just an executable VM, it's a dynamically-linkable library that exports a Parrot API. The API allows add-ons to be developed to extend the functionality of Parrot. There are two basic APIs or, more specifically, a single API that can be divided into two distinct categories: Those that have access to the internals of Parrot, and those that do not. In general, most extensions will not need deep internal access to Parrot's structures, and most should not rely on them. Parrot's internal structures are subject to change, and relying on a precise format of one for your extension could cause compatibility problems later on.

Packfiles
Packfiles
Parrot bytecode files are called "packfiles" internally. Access routines to get information from and feed information to a packfile are stored in /src/packfile.c Some things that can affect the way a file is stored are: 1. Endianness. Some computers are called "little endian" and some computers are called "big endian". This has to do with the way that bits are ordered in a byte. Instead of picking one or the other to be the default. 2. Value sizes. Things like pointers and INTVALs are going to be different sizes on different computers. Parrot must translate between 16 bit, 32 bit, and 64 bit values for these and other things. Also, FLOATVALS may be 32bit, 64bit, or 128bit, and need to be translated.

Serialization
HLL code is most often used on the computer where it is first compiled. To that end, Parrot is optimized to write packfiles using local settings. If a packfile is read which has been created on some other computer, Parrot must translate it internally so that it can be run on your computer. This translation process adds extra execution overhead, but it only needs to be run once on your computer to get things into the proper local format.

Freeze and Thaw


Serialization occurs through two interfaces: Freeze and thaw. Freezing is the process of converting a PMC or other type of data into a format ready for insertion into a packfile. Thawing is the process of reading data out of a packfile and recreate the PMC or other data object. PMCs have optional Freeze and Thaw vtable methods, although good defaults are available if the PMC does not use any additional storage. If you use additional storage, especially storage which has been allocated from the system using malloc, you must supply custom Freeze/Thaw methods to store and retrieve that.

87

Appendices
PIR Reference
Resources
http://www.parrotcode.org/docs/pdd/pdd19_pir.html

PASM Reference
Resources
http://docs.parrot.org/parrot/latest/html/docs/pdds/draft/pdd06_pasm.pod.html http://docs.parrot.org/parrot/latest/html/docs/pdds/draft/pdd05_opfunc.pod.html http://docs.parrot.org/parrot/latest/html/ops.html

PAST Node Reference


Resources
http://www.parrotcode.org/docs/pdd/pdd26_ast.html

Languages on Parrot

88

Languages on Parrot
Languages on Parrot
There are a number of programming languages being implemented on Parrot, some of which are nearing functional completion, some which are still in active development, and some which have been started but are now abandoned. Interested developers may want to help join in the development effort with some of these languages, adopt an abandoned language project, or start a new language project entirely. As of the 1.0.0 release, all language implementations except toy and example languages will be developed and maintained outside of the central Parrot repository. Where available, locations to external project pages will be provided.

Language Projects
Rakudo (Perl 6)
Rakudo is the name of the Perl6 implementation on Parrot. This is not the only implementation of Perl6, however. Rakudo development is test-driven. There is a gigantic suite of tests for the Perl6 language that have been developed over the years. The progress of the Rakudo interpreter is measured by the number of specification tests, or "spectests" that pass. There is not a straight-forward way to measure the percentage progress of the project, because the total number of tests is changing regularly as well. Rakudo is under active development by several volunteers. Some developers have even received funding to work on Rakudo more regularly.

abc
A basic calculator language.

C99
The implementation of the C programming language, following the C99 specification, has a number of purposes. C is a strongly-typed language, so it isn't necessarily the best candidate for implementation on the dynamically-typed Parrot. However, there are a number of benefits to the Parrot project to having a C parser available. The C99 language parser is being used, at least in part, to help automate the process of generating NCI function signatures for new libraries and extensions. This is under active development by volunteers, some of which have been funded.

Cardinal (Ruby)
An implementation of Ruby

Jako
A language derived from C and Perl

Pipp (PHP)
Pipp is a recursive acronym for Pipp is Parrot's PHP. This language implementation was previously named "Plumhead", shorthand for the name "Plum-headed Parakeet". Pipp is being maintained on github at pipp [1]. The project seems halted: the website is down/missing and the last commit was at 2009-07-22.

Languages on Parrot

89

Partcl (TCL)
ParTCL is the TCL compiler for Parrot. The ParTCL project lives at http://code.google.com/p/partcl/

Translator Projects
Projects to translate to or from Parrot Bytecode.

Resources
http://www.parrot.org/languages/

References
[1] http:/ / wiki. github. com/ bschmalhofer/ pipp

HLLCompiler Class
HLLCompiler Class
The HLLCompiler class is used to help instantiate and operate a compiler for a high-level language written for Parrot. HLLCompiler coordinates the execution of the HLL grammar, and controls the conversion of the HLL from PAST to POST, from POST to PIR, and ultimately to Parrot bytecode. This page is going to serve as a brief reference to the HLLCompiler class, and a description of how the class is used to create a compiler for an HLL.

Command Line Options

90

Command Line Options


Resources
http://www.parrotcode.org/docs/running.html

Built-In PMCs
Built-in PMCs
Parrot ships with a number of PMC data types built in. This means that these standard types are always available. This page is going to serve as a reference to these PMC types. We will not attempt to cover all the PMC types that are added specifically for other HLLs, libraries, or programs. (For information on using these PMC types, and on defining new PMC types, see the Parrot Virtual Machine/Polymorphic Containers (PMCs) chapter). The entries in this list should (A) contain a link to the relevant PMC documentation, and (B) provide a brief overview of the PMC and it's methods.

AddrRegistry
http://docs.parrot.org/parrot/latest/html/src/pmc/addrregistry.pmc.html

Array
http://www.parrotcode.org/docs/pmc/pmc/array.html A simple array class, serves as the base class for other array PMCs. This type of PMC is rarely used directly. Instead, more versatile array PMC types, such as ResizablePMCArray are used. Array specifies an interface that all other Array classes must share. It also provides a number of defaults that other array-like PMCs may default to.

BigInt
http://docs.parrot.org/parrot/latest/html/src/pmc/bignum.pmc.html A PMC type for storing an arbitrarily large number, or a number with arbitrary precision. Not currently implemented.

Boolean
http://docs.parrot.org/parrot/latest/html/src/pmc/boolean.pmc.html A boolean True/False PMC.

Bound_NCI
http://www.parrotcode.org/docs/pmc/pmc/bound_nci.html

Built-In PMCs

91

Capture
http://docs.parrot.org/parrot/latest/html/src/pmc/capture.pmc.html

Closure
http://www.parrotcode.org/docs/pmc/pmc/closure.html

Compiler
http://www.parrotcode.org/docs/pmc/pmc/compiler.html A Compiler PMC for a particular language. Can be used to convert an HLL into PIR and eventually into Parrot Bytecode.

Complex
http://www.parrotcode.org/docs/pmc/pmc/complex.html A PMC for Complex numbers.

Continuation
http://www.parrotcode.org/docs/pmc/pmc/continuation.html A Continuation PMC allows Parrot to take a snapshot of the current state of the system to return to later.

Coroutine
http://www.parrotcode.org/docs/pmc/pmc/coroutine.html A sub-like PMC that implements a coroutine.

Default
http://www.parrotcode.org/docs/pmc/pmc/default.html

Deleg_PMC
http://www.parrotcode.org/docs/pmc/pmc/deleg_pmc.html

Delegate
http://www.parrotcode.org/docs/pmc/pmc/delegate .html

Enumerate
http://www.parrotcode.org/docs/pmc/pmc/enumerate.html

Env
http://www.parrotcode.org/docs/pmc/pmc/env.html Allows access to the system's environment variables, as a hash.

Built-In PMCs

92

Eval
http://www.parrotcode.org/docs/pmc/pmc/eval.html

Exception
http://www.parrotcode.org/docs/pmc/pmc/exception.html An Exception PMC holds information about system errors for recovery.

Exception_Handler
http://www.parrotcode.org/docs/pmc/pmc/exception_handler.html A sub-like routine that catches and resolves exceptions

Exporter
http://www.parrotcode.org/docs/pmc/pmc/exporter.html

File
http://docs.parrot.org/parrot/latest/html/src/dynpmc/file.pmc.html A read/write interface for files

FixedBooleanArray
http://www.parrotcode.org/docs/pmc/pmc/fixedbooleanarray.html An array of fixed size of Boolean values.

FixedFloatArray
http://www.parrotcode.org/docs/pmc/pmc/fixedfloatarray.html An array of fixed size for FLOATVAL floating point numbers

FixedPMCArray
http://www.parrotcode.org/docs/pmc/pmc/fixedpmcarray.html An array of fixed size for PMC values

FixedStringArray
http://www.parrotcode.org/docs/pmc/pmc/fixedstringarray.html An array of fixed size for STRING values

Built-In PMCs

93

Float
http://www.parrotcode.org/docs/pmc/pmc/float.html A floating point number PMC. Used similarly to a FLOATVAL, except has methods and vtable methods. FLOATVALs become Float PMCs when they are promoted to become a PMC.

Hash
http://www.parrotcode.org/docs/pmc/pmc/hash.html A hash, also known as a "dictionary" or an "associative array". Like an array but indexed with strings instead of integers

Integer
http://www.parrotcode.org/docs/pmc/pmc/integer.html A basic integer number PMC. Used similarly to an INTVAL, but with methods and vtable methods. INTVALS become Integer PMCs when they are promoted to become a PMC.

IntList
http://www.parrotcode.org/docs/pmc/pmc/intlist.html A simple list, or array, of integers.

Iterator
http://www.parrotcode.org/docs/pmc/pmc/iterator.html An Iterator PMC provides a stateful counter that enables you to iterate over the items in one of the array classes, one at a time.

Key
http://www.parrotcode.org/docs/pmc/pmc/key.html A value, typically a string, which is used to look up values in a hash.

LexInfo
http://www.parrotcode.org/docs/pmc/pmc/lexinfo.html

LexPad
http://www.parrotcode.org/docs/pmc/pmc/lexpad.html

ManagedStruct
http://www.parrotcode.org/docs/pmc/pmc/managedstruct.html A low-level structure whose memory is allocated and automatically deallocated by Parrot. Extends UnManagedStruct, but adds automatic memory collection.

Built-In PMCs

94

MultiArray
http://www.parrotcode.org/docs/pmc/pmc/multiarray.html

MultiSub
http://www.parrotcode.org/docs/pmc/pmc/multisub.html A collection of subroutines with the same name. In Multiple Method Dispatch (MMD) the parameters of the function called determine which subroutine from the collection to call.

Namespace
http://www.parrotcode.org/docs/pmc/pmc/namespace.html Implements a Parrot namespace. Contains information about variables, subroutines, coroutines, and MultiSubs that are stored in that namespace.

NCI
http://www.parrotcode.org/docs/pmc/pmc/nci.html A native call function PMC. Stores interface information to a function which has been written in C.

Null
http://www.parrotcode.org/docs/pmc/pmc/null.html A PMC with a NUL value

Object
http://www.parrotcode.org/docs/pmc/pmc/object.html

OrderedHash
http://www.parrotcode.org/docs/pmc/pmc/orderedhash.html

OS
http://www.parrotcode.org/docs/pmc/pmc/os.html

Pair
http://www.parrotcode.org/docs/pmc/pmc/pair.html An association of a Key PMC with a PMC value. Hashes are typically implemented as an array of Pair PMCs

ParrotClass
http://www.parrotcode.org/docs/pmc/pmc/parrotclass.html

Built-In PMCs

95

ParrotInterpreter
http://www.parrotcode.org/docs/pmc/pmc/parrotinterpreter.html An interface to the interpreter structure.

ParrotIO
http://www.parrotcode.org/docs/pmc/pmc/parrotio.html A read/write interface to the console

ParrotLibrary
http://www.parrotcode.org/docs/pmc/pmc/parrotlibrary.html A dynamically-loaded library object.

ParrotObject
http://www.parrotcode.org/docs/pmc/pmc/parrotobject.html

ParrotRunningThread
http://www.parrotcode.org/docs/pmc/pmc/parrotrunningthread.html

ParrotThread
http://www.parrotcode.org/docs/pmc/pmc/parrotthread.html A PMC that stores information about a thread

Pmethod_test
http://www.parrotcode.org/docs/pmc/pmc/pmethod_test.html

Pointer
http://www.parrotcode.org/docs/pmc/pmc/pointer.html

Random
http://www.parrotcode.org/docs/pmc/pmc/random.html

Ref
http://www.parrotcode.org/docs/pmc/pmc/ref.html

ResizableBooleanArray
http://www.parrotcode.org/docs/pmc/pmc/resizablebooleanarray.html A resizable array to store Boolean values

Built-In PMCs

96

ResizableFloatArray
http://www.parrotcode.org/docs/pmc/pmc/resizablefloatarray.html A resizable array to store floating point values.

ResizableIntegerArray
http://www.parrotcode.org/docs/pmc/pmc/resizableintegerarray.html A resizable array to store integer values

ResizablePMCArray
http://www.parrotcode.org/docs/pmc/pmc/resizablepmcarray.html A resizable array to store PMC values

ResizableStringArray
http://www.parrotcode.org/docs/pmc/pmc/resizablestringarray.html A resizable array to store Strings

RetContinuation
http://www.parrotcode.org/docs/pmc/pmc/retcontinuation.html A return continuation. Like a regular Continuation PMC, but can only be used once. Can be promoted to a Continuation using the Clone vtable method.

Role
http://www.parrotcode.org/docs/pmc/pmc/role.html An abstract role, or interface, for a class. Specifies actions and properties of a class, but cannot be instantiated

SArray
http://www.parrotcode.org/docs/pmc/pmc/sarray.html

SharedRef
http://www.parrotcode.org/docs/pmc/pmc/sharedref.html

Slice
http://www.parrotcode.org/docs/pmc/pmc/slice.html

Built-In PMCs

97

SMOP_Attribute
http://www.parrotcode.org/docs/pmc/pmc/smop_attribute.html

SMOP_Class
http://www.parrotcode.org/docs/pmc/pmc/smop_class.html

STMLog
http://www.parrotcode.org/docs/pmc/pmc/stmlog.html

STMRef
http://www.parrotcode.org/docs/pmc/pmc/stmref.html

STMVar
http://www.parrotcode.org/docs/pmc/pmc/stmvar.html

String
http://www.parrotcode.org/docs/pmc/pmc/string.html A PMC to contain a string value. Like a STRING value, but has methods and vtable methods. STRINGS become String PMCs when they are promoted to PMCs.

Sub
http://www.parrotcode.org/docs/pmc/pmc/sub.html A Parrot subroutine. Implements a basic subroutine (using the sub command in PIR), but also serves as a base class for more intricate sub-like classes

Super
http://www.parrotcode.org/docs/pmc/pmc/super.html A parent PMC class, to support multiple inheritance.

Timer
http://www.parrotcode.org/docs/pmc/pmc/timer.html

TQueue
http://www.parrotcode.org/docs/pmc/pmc/tqueue.html

Undef
http://www.parrotcode.org/docs/pmc/pmc/undef.html An undefined PMC with no usable type.

Built-In PMCs

98

UnamangedStruct
http://www.parrotcode.org/docs/pmc/pmc/unamangedstruct.html A low-level structure which the programmer must manage manually. Parrot does not automatically collect memory allocated for the struct.

Version
http://www.parrotcode.org/docs/pmc/pmc/version.html

Bytecode File Format


Packfile Header Format
Packfiles start with a standard header that give some information about how Parrot was configured when it wrote the file. This configuration information is used to determine how the reading Parrot must translate the incoming packfile.

Resources
http://www.parrotcode.org/docs/parrotbyte.html

VTABLE List
Vtable List
Vtable name absolute add_attribute Returns the absolute value of the PMC, as a PMC Adds an attribute to the PMC object. Attributes are typically stored in the pmc->pmc_ext->_metadata field. Adds a new method to the PMC's class Description

add_method add_parent add_role add_vtable_override assign_pmc assign_string_native bitwise_not bitwise_nots can clone clone_pmc decrement defined defined_keyed defined_keyed_int

Assigns a PMC value to the PMC Assigns a string to the PMC

Decrements the integer value of the PMC by 1 Determines if the PMC is defined

VTABLE List

99

defined_keyed_str delprop destroy does does_pmc elements exists_keyed exists_keyed_int exists_keyed_str find_method freeze get_attr get_bignum get_bool get_class get_integer get_integer_keyed get_integer_keyed_int get_integer_keyed_str get_iter get_namespace get_number get_number_keyed get_number_keyed_int get_number_keyed_str get_pmc get_pmc_keyed get_pmc_keyed_int get_pmc_keyed_str get_pointer get_pointer_keyed get_pointer_keyed_int get_pointer_keyed_str get_repr get_string get_string_keyed get_string_keyed_int get_string_keyed_srt getprop Get the value of a certain property from the PMC Gets the string representation of the PMC Gets the PMC representation of the PMC Gets the floating-point representation of the PMC Gets the integer representation of the PMC Gets the BigNum representation of the PMC Gets the boolean representation of the PMC Deletes a property from the PMC Destroy the PMC

VTABLE List

100

getprops i_absolute i_bitwise_not i_bitwise_nots i_logical_not i_net increment init init_pmc inspect inspect_str instantiate invoke The invoke vtable method is called when the PMC is called like a function. In the following code: .local pmc mypmc = new 'MyPMCType' mypmc() The invoke vtable method is called on the second line when the PMC is treated as a function call. For string functions, for example, the string class uses the value of the string to look up a function with the same name, and then invokes that function. Subroutine PMCs invoke the given function when they are called. is_same isa isa_pmc logical_not mark Mark the PMC and all it's children as alive for the memory manager. This prevents children of the PMC from being collected prematurely by the garbage collector. Increments the integer value of the PMC by 1 Initializes the PMC. This method is called when the new keyword is used to create a new PMC.

morph name neg new_from_string nextkey_keyed nextkey_keyed_int nextkey_keyed_str pop_float pop_integer pop_pmc pop_string push_float push_integer push_pmc push_string remove_attribute If the PMC is an array, pops a floating point value off the top of it If the PMC is an array, pops an integer value off the top of it If the PMC is an array, pops a PMC value off the top of it If the PMC is an array, pops a string value off the top of it If the PMC is an array, pushes a floating point value onto the top of it If the PMC is an array, pushes an integer onto the top of it If the PMC is an array, pushes a PMC onto the top of it If the PMC is an array, pushes a string onto the top of it Removes an attribute from the PMC

VTABLE List

101

remove_method remove_parent remove_role remove_vtable_override set_attr set_attr_keyed set_attr_keyed_str set_bugnum_int set_bignum_num set_bignum_str set_bool set_integer_keyed set_integer_keyed_int set_integer_keyed_str set_integer_native set_number_keyed set_number_keyed_int set_number_keyed_str set_number_native set_number_same set_pmc set_pmc_keyed set_pmc_keyed_int set_pmc_keyed_str set_pointer set_pointer_keyed set_pointer_keyed_int set_pointer_keyed_str set_string_keyed set_string_keyed_int set_string_keyed_str set_string_native set_string_same setprop share share_ro shift_float shift_int shift_pmc If the PMC is an array, shifts a floating point value onto the bottom of it If the PMC is an array, shifts an integer onto the bottom of it If the PMC is an array, shifts a PMC onto the bottom of it Sets the value of the PMC as if it were a string Sets the value of a PMC to the value of another PMC Sets the value of the PMC as if it were a floating point value Sets the value of the PMC as if it were an integer Sets the value of the PMC as if it were a boolean Sets an attribute value for the given PMC

VTABLE List

102
If the PMC is an array, shifts a string onto the bottom of it

shift_string slice splice substr substr_str thaw thawfinish type type_keyed type_keyed_int type_keyed_str unshift_float unshift_integer unshift_pmc unshift_str visit

If the PMC is an array, unshifts a floating point valuefrom the bottom of it If the PMC is an array, unshifts an integer from the bottom of it If the PMC is an array, unshifts a PMC from the bottom of it If the PMC is an array, unshifts a string from the bottom of it

103

"Squaak" Language Tutorial


Squaak Tutorial
Squaak Language Tutorial
Episode 1: Introduction Episode 2: Poking in Compiler Guts Episode 3: Squaak Details and First Steps Episode 4: PAST Nodes and More Statements Episode 5: Variable Declaration and Scope Episode 6: Scope and Subroutines Episode 7: Operators and Precedence Episode 8: Hashtables and Arrays Episode 9: Wrap-Up and Conclusion This tutorial for the toy language "Squaak" was first presented on http:/ / www. parrotblog. org. The tutorials have been released into the public domain by the author, and so have been adapted for use here on Wikibooks. The initial upload revisions for each of these pages are released into the public domain. Subsequent edits made to these pages are released under the GFDL. File names used in these pages are relative to the Parrot source distribution. "PDD" are the Parrot Design Documents, a series of explanatory documents which are located in POD format in the Parrot repository, or on the http:/ / www. parrot. org website. For more information about downloading the Parrot repository, see the chapter on Building Parrot. These pages have been changed as necessary to fit Wikibooks, and are being improved like any other book or module. If you appreciate this tutorial and would like to leave comments for its original author, you can do so at http://www.parrotblog.org. Notice Since this tutorial has been released, a number of changes have been made to the underlying tools: PCT, PGE, and NQP. Because of this, the code discussed in the tutorial might not be accurate and will likely not operate correctly anymore. These pages will be updated eventually to reflect these changes, but they are not currently up to date.

Introduction

104

Introduction
Squaak Language Tutorial
Episode 1: Introduction Episode 2: Poking in Compiler Guts Episode 3: Squaak Details and First Steps Episode 4: PAST Nodes and More Statements Episode 5: Variable Declaration and Scope Episode 6: Scope and Subroutines Episode 7: Operators and Precedence Episode 8: Hashtables and Arrays Episode 9: Wrap-Up and Conclusion

Introduction
This is the first episode in a tutorial series on building a compiler with the Parrot Compiler Tools. If you're interested in virtual machines, you've probably heard of the Parrot virtual machine. Parrot is a generic virtual machine designed for dynamic languages. This is in contrast with the Java virtual machine (JVM) and Microsoft's Common Language Runtime (CLR), both of which were designed to run static languages. Both the JVM and Microsoft (through the Dynamic Language Runtime -- DLR) are adding support for dynamic languages, but their primary focus is still static languages.

High Level Languages


The main purpose of a virtual machine is to run programs. These programs are typically written in some High Level Language (HLL). Some well-known dynamic languages (sometimes referred to as scripting languages) are Lua, Perl, PHP, Python, Ruby, and Tcl. Parrot is designed to be able to run all these languages. Each language that Parrot hosts needs a compiler to parse the syntax of the language and generate Parrot instructions. If you've never implemented a programming language (and maybe even if you have implemented a language), you might consider writing a compiler a bit of a black art. I know I did when I became interested. And you know what? It is. Compilers are complex programs, and implementing a language can be very difficult. The Facts: 1) Parrot is suitable for running virtually any dynamic language known, but before doing so, compilers must be written, and 2) writing compilers is rather difficult.

The Parrot Compiler Toolkit


Enter the Parrot Compiler Toolkit (PCT). In order to make Parrot an interesting target for language developers, the process of constructing a compiler should be supported by the right tools. Just as any construction task becomes much easier if you have the right tools (you wouldn't build a house using only your bare hands, would you?), the same is true for constructing a compiler. The PCT was designed to do just that: provide powerful tools to make writing a compiler for Parrot childishly easy. This tutorial will introduce the PCT by demonstrating the ease with which a (simple) language can be implemented for Parrot. The case study language is not as complex as a real-world language, but this tutorial is written to whet your appetite and show the power of the PCT. This tutorial will also present some exercises which you can explore in order to learn more details of the PCT not covered in this tutorial.

Introduction

105

Squaak: A Simple Language


The case study language, named Squaak, that we will be implementing on Parrot will be a full-fledged compiler that can compile a program from source into Parrot Intermediate Representation (PIR) (or run the PIR immediately). It can also be used as a command-line interpreter. Squaak demonstrates some common language constructs, but at the same time is lacking some other, seemingly simple features. For instance, our language will not have return, break or continue statements (or equivalents in your favorite syntax). Squaak will have the following features: global and local variables basic types: integer, floating-point and strings aggregate types: arrays and hash tables operators: +, -, /, *, %, <, <=, >, >=, ==, !=, .., and, or, not subroutines and parameters assignments and various control statements, such as "if" and "while"

As you can see, a number of common (more advanced) features are missing. Most notable are: classes and objects exceptional control statements such as break and return advanced control statements such as switch closures (nested subroutines and accessing local variables in an outer scope)

The Compiler Tools


The Parrot Compiler Tools we'll use to implement Squaak consist of the following parts: Parrot Grammar Engine (PGE). The PGE is an advanced engine for regular expressions. Besides regexes as found in Perl 5, it can also be used to define language grammars, using Perl 6 syntax. (Check the references for the specification.) Parrot Abstract Syntax Tree (PAST). The PAST nodes are a set of classes defining generic abstract syntax tree nodes that represent common language constructs. HLLCompiler class. This class is the compiler driver for any PCT-based compiler. Not Quite Perl (6) (NQP). NQP is a lightweight language inspired by Perl 6 and can be used to write the methods that must be executed during the parsing phase, just as you can write actions in a Yacc/Bison input file.

Getting Started
For this tutorial, it is assumed you have successfully compiled parrot (and maybe even run the test suite). If you browse through the languages directory in the Parrot source tree, you'll find a number of language implementations. Most of them are not complete yet; some are actively maintained actively and others aren't. If, after reading this tutorial, you feel like contributing to one of these languages, you can check out the mailing list or join IRC (see the references section for details). The languages subdirectory is the right spot to put our language implementation. Parrot comes with a special Perl5 script to generate the necessary files for a language implementation. Before it can be run, parrot must be installed for development. From parrot's root directory, type: $ make install-dev In order to generate a new directory containing files for our language, type (assuming you're in parrot's root directory): $ perl tools/dev/mk_language_shell.pl Squaak

Introduction (Note: if you're on Windows, you should use backslashes.) This will generate the files in a directory "squaak", and use the name "Squaak" as the language's name. After this, go to this directory and type: $ parrot setup.pir test This will compile the generated files and run the test suite. If you want more information on what files are being generated, please check out the references at the end of this episode. Note that we didn't write a single line of code, and already we have the basic infrastructure in place to get us started. Of course, the generated compiler doesn't even look like the language we will be implementing, but that's ok for now. Later we'll adapt the grammar to accept our language. Now you might want to actually run a simple script with this compiler. Launch your favorite editor, and put in this statement: say "Squaak!"; Save the file (for instance as test.sq) and type: $ parrot squaak.pbc test.sq This will run Parrot, specifying squaak.pbc as the file to be run by Parrot, which takes a single argument: the file test.sq. If all went well, you should see the following output: $ parrot squaak.pbc test.sq Squaak! Instead of running a script file, you can also run the Squaak compiler as an interactive interpreter. Run the Squaak compiler without specifying a script file, and type the same statement as you wrote in the file: $ parrot squaak.pbc say "Squaak!"; which will print: Squaak!

106

What's Next?
This first episode of this tutorial is mainly an overview of what will be coming. Hopefully you now have a global idea of what the Parrot Compiler Tools are, and how they can be used to build a compiler targeting Parrot. If you want to check out some serious usage of the PCT, check out Rakudo (Perl 6 on Parrot) or Pynie (Python on Parrot). The next episodes will focus on the step-by-step implementation of our language, including the following topics: structure of PCT-based compilers using PGE rules to define the language grammar implementing operator precedence using an operator precedence table using NQP to write embedded parse actions implementing language library routines

In the mean time, experiment for yourself. You are welcome to join us on IRC (see the References section for details). Any feedback on this tutorial is appreciated.

Introduction

107

Exercises
The exercises are provided at the end of each episode of this tutorial. In order to keep the length of this tutorial somewhat acceptable, not everything can be discussed in full detail. The answers and/or solutions to these exercises will be posted several days after the episode. Problem 1 Advanced Interactive Mode Launch your favorite editor and look at the file src/Squaak/Compiler.pm, still in the directory squaak. This file is written in Not Quite Perl ("NQP") and is compiled to src/gen_compiler.pir when you run parrot setup.pir test. It contains the setup for the compiler. The class HLL::Compiler defines methods to set a command-line banner and prompt for your compiler when it is running in interactive mode. For instance, when you run Python in interactive mode, you'll see: Python 2.6.5 (r265:79063, Apr 1 2010, 05:28:39) [GCC 4.4.3 20100316 (prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. or something similar (depending on your Python installation and version). This text is called the command line banner. And while running in interactive mode, each line will start with: >>> which is called a prompt. For Squaak, we'd like to see the following when running in interactive mode (of course you can change this according to your personal taste): $ ../../parrot squaak.pbc Squaak for Parrot VM. > Add code to the file Compiler.pm to achieve this. Hint: Note that only double-quoted strings in NQP can interpret escape-characters such as '\n'. Answer Given the hints that were provided, it was probably not too hard to find the solution, which is shown below. This code can be found in the file Compiler.pm. The relevant lines are printed in bold. INIT { Squaak::Compiler.language('Squaak'); Squaak::Compiler.parsegrammar(Squaak::Grammar); Squaak::Compiler.parseactions(Squaak::Actions); Squaak::Compiler.commandline_banner("Squaak for Parrot VM.\n"); Squaak::Compiler.commandline_prompt('> '); } There are more options provided by the HLL::Compiler class that you can set, you should explore them all.

Introduction

108

References
Parrot mailing list: parrot-dev@lists.parrot.org IRC: join #parrot on irc.perl.org Getting started with PCT: docs/pct/gettingstarted.pod Parrot Abstract Syntax Tree (PAST): docs/pct/past_building_blocks.pod Operator Precedence Parsing with PCT: docs/pct/pct_optable_guide.pod Perl 6/PGE rules syntax: Synopsis 5 [1]

Poking in Compiler Guts


Squaak Language Tutorial
Episode 1: Introduction Episode 2: Poking in Compiler Guts Episode 3: Squaak Details and First Steps Episode 4: PAST Nodes and More Statements Episode 5: Variable Declaration and Scope Episode 6: Scope and Subroutines Episode 7: Operators and Precedence Episode 8: Hashtables and Arrays Episode 9: Wrap-Up and Conclusion In the first episode, we introduced the Parrot Compiler Tools, and generated a very simple language using a shell script provided with the Parrot distribution. We also announced Squaak, a simple programming language developed specially for this tutorial. Squaak will be the case study to show how PCT can be used as a very effective set of tools to implement a language for Parrot. A list of features of Squaak was specified. If you felt lucky, you might even have tried to do the exercise at the end of the previous episode. In this episode, we will take a closer look at the generated compiler. We shall check out the different stages of the compilation process, and show what's going on in PCT-based compilers.

Under The Hood


Remember how we invoked our compiler in the previous episode? We can pass a file, or invoke the compiler without a command line argument, in which case our compiler enters the interactive mode. Consider the first case, passing the file test.sq, just as we did before: $ parrot squaak.pbc test.sq When invoking our compiler like this, the file test.sq is compiled and the generated code (bytecode) is executed immediately by Parrot. How does this work, you might wonder. The interpretation of a script is done through a series of transformations, starting at the script source and ending in a format that can be executed by Parrot. Compilers built with the PCT (based on the HLL::Compiler class) can take a target option, to show one of the intermediate representations. This option can have the following values, corresponding to the four default compilation phases of an HLLCompiler object: --target=parse --target=past --target=post --target=pir

Poking in Compiler Guts This is an example of using the target option set to "parse", which will print the parse tree of the input to stdout: $ parrot squaak.pbc --target=parse test.sq In interactive mode, giving this input: say 42; will print this parse tree (without the line numbers):

109

1 "parse" => PMC 'Regex;Match' => "say 42;\n" @ 0 { 2 <statementlist> => PMC 'Regex;Match' => "say 42;\n" @ 0 { 3 <statement> => ResizablePMCArray (size:1) [ 4 PMC 'Regex;Match' => "say 42" @ 0 { 5 <statement_control> => PMC 'Regex;Match' => "say 42" @ 0 { 6 <EXPR> => ResizablePMCArray (size:1) [ 7 PMC 'Regex;Match' => "42" @ 4 { 8 <integer> => PMC 'Regex;Match' => "42" @ 4 { 9 <decint> => PMC 'Regex;Match' => "42" @ 4 10 <VALUE> => \parse[0][0] 11 } 12 } 13 ] 14 <sym> => PMC 'Regex;Match' => "say" @ 0 15 } 16 } 17 ] 18 } 19 } When changing the value of the target option, the output changes into a different representation of the input. Why don't you try that right now? So, a HLL::Compiler object has four compilation phases: parsing, construction of a Parrot Abstract Syntax Tree (PAST), construction of a Parrot Opcode Syntax Tree (POST), generation of Parrot Intermediate Representation (PIR). After compilation, the generated PIR is executed immediately. If your compiler needs additional stages, you can add them to your HLL::Compiler object. For Squaak, we will not need this, but for details, check out compilers/pct/src/PCT/HLLCompiler.pir. We shall now discuss each compilation phase in more detail. The first two phases, parsing the input and construction of the PAST are executed simultaneously. Therefore, these are discussed together.

Poking in Compiler Guts

110

Parse phase: match objects and PAST construction


During the parsing phase, the input is analyzed using Perl 6's extended regular expressions, known as Rules (see Synopsis 5 for details). When a rule matches some input string, a so-called Match object is created. A Match object is a combined array and hashtable, implying it can be indexed by integers as well as strings. As rules typically consist of other (sub) rules, it is easy to retrieve a certain part of the match. For instance, this rule: rule if_statement { 'if' <expression> 'then' <statement> 'end' {*} } has two other subrules: expression and statement. The match object for the rule if_statement represents the whole string from if to end. When you're interested only in the expression or statement part, you can retrieve that by indexing the match object by the name of the subrule (in this case, expression and statement, respectively). During the parse phase, the PAST is constructed. There is a small set of PAST node types, for instance, PAST::Var to represent variables (identifiers, such as print), PAST::Val to represent literal values (for instance, "hello" and 42), and so on. Later we shall discuss the various PAST nodes in more detail. Now, you might wonder, at which point exactly is this PAST construction happening? This is where the special {*} symbol comes in, just below the string 'if' in the if_statement rule shown above. These special markers indicate that a parse action should be invoked. Such a parse action is just a method that has the same name as the rule in which it is written (in this case: if_statement). So, during the parsing phase, several parse actions are executed, each of which builds a piece of the total PAST representing the input string. More on this will be explained later. The Parrot Abstract Syntax Tree is just a different representation of the same input string (your program being compiled). It is a convenient data structure to transform into something different (such as executable Parrot code) but also to do all sorts of analysis, such as compile-time type checking.

PAST to POST
After the parse phase during which the PAST is constructed, the HLL::Compiler transforms this PAST into something called a Parrot Opcode Syntax Tree (POST). The POST representation is also a tree structure, but these nodes are on a lower abstraction level. For instance, on the PAST level there is a node type to represent a while statement (constructed as PAST::Op.new( :pasttype('while') ) ). The template for a while statement typically consists of a number of labels and jump instructions. On the POST level, the same while statement is represented by a set of nodes, each representing a one instruction or a label. Therefore, it is much easier to transform a POST into something executable than when this is done from the PAST level. Usually, as a user of the PCT, you don't need to know details of POST nodes, which is why this will not be discussed in further detail. Use the target option to see what a POST looks like.

Poking in Compiler Guts

111

POST to PIR
In the fourth (and final) stage, the POST is transformed into Parrot Intermediate Representation (PIR). As mentioned, transforming a POST into something executable is rather straightforward, as POST nodes already represent individual instructions and labels. Again, normal usage of the PCT does not require you to know any details about this transformation.

And now for the good news...


We established the general data flow of PCT-based compilers, which consists of four stages: 1. 2. 3. 4. source to parse tree parse tree to PAST PAST to POST POST to PIR

where we noted that the first two are done during the parse stage. Now, as you're reading this tutorial, you're probably interested in using the PCT for implementing Your Favorite Language for Parrot. We already saw that a language grammar is expressed in Perl 6 Rules. What about the other transformations? Well, earlier in this episode we mentioned the term parse actions, and that these actions create PAST nodes. After you have written a parse action for each grammar rule, you're done!

Say what?
That's right. Once you have correctly constructed a PAST, your compiler can generate executable PIR, which means you just implemented your first language for Parrot. Of course, you still need to implement any language specific libraries, but that's besides the point. PCT-based compilers already know how to transform a PAST into a POST, and how to transform a POST into PIR. These transformation stages are already provided by the PCT.

What's next?
In this episode we took a closer look at the internals of a PCT-based compiler. We discussed the four compilation stages, that transform an input string (a program, or script, depending on your definition) into a PAST, a POST and finally executable PIR. The next episodes is where the Fun Stuff is: we will be implementing Squaak for Parrot. Piece by piece, we will implement the parser and the parse actions. Finally, we shall demonstrate John Conway's "Game of Life" running on Parrot, implemented in Squaak.

Exercises
Starting in the next episode, the exercises will be more interesting. For now, it would be useful to browse around through the source files of the compiler, and see if you understand the relation between the grammar rules in Grammar.pg and the methods in Actions.pm. It's also useful to experiment with the --target option described in this episode. If you don't know PIR, now is the time to do some preparation for that. There's sufficient information to be found on PIR, see the References section for details. In the mean time, if you have any suggestions, questions and whatnot, don't hesitate to leave a comment.

Poking in Compiler Guts

112

References
1. PIR language specification: docs/pdds/pdd19_pir.pod 2. PIR book: docs/book/pir

Squaak Details and First Steps


Squaak Language Tutorial
Episode 1: Introduction Episode 2: Poking in Compiler Guts Episode 3: Squaak Details and First Steps Episode 4: PAST Nodes and More Statements Episode 5: Variable Declaration and Scope Episode 6: Scope and Subroutines Episode 7: Operators and Precedence Episode 8: Hashtables and Arrays Episode 9: Wrap-Up and Conclusion In the previous episodes we introduced the Parrot Compiler Tools (PCT). Starting from a high-level overview, we quickly created our own little scripting language called Squaak, using a Perl script provided with Parrot. We discussed the general structure of PCT-based compilers, and each of the default four transformation phases. This third episode is where the Fun begins. In this episode, we shall introduce the full specification of Squaak. In this and following episodes, we will implement this specification step by step, in small increments that are easy to digest. Once you get a feel for it, you'll notice implementing Squaak is almost trivial, and most important, a lot of fun! So, let's get started!

Squaak Grammar
Without further ado, here is the full grammar specification for Squaak. This specification uses the following meta-syntax: statement {statement} [step] 'do' indicates indicates indicates indicates a non-terminal, named "statement" zero or more statements an optional step the keyword 'do'

Below is Squaak's grammar. The start symbol is program. program stat-or-def ::= {stat-or-def} ::= statement | sub-definition ::= | | | | | if-statement while-statement for-statement try-statement throw-statement variable-declaration

statement

Squaak Details and First Steps | assignment | sub-call | do-block block do-block if-statement ::= {statement} ::= 'do' block 'end' ::= 'if' expression 'then' block ['else' block] 'end' ::= 'while' expression 'do' block 'end' ::= 'for' for-init ',' expression [step] 'do' block 'end' ::= ',' expression ::= 'var' identifier '=' expression ::= 'try' block 'catch' identifier block 'end' ::= 'throw' expression ::= 'sub' identifier parameters block 'end' ::= '(' [identifier {',' identifier}] ')'

113

while-statement

for-statement

step for-init try-statement

throw-statement sub-definition

parameters

variable-declaration ::= 'var' identifier ['=' expression] assignment sub-call primary postfix-expression ::= primary '=' expression ::= primary arguments ::= identifier postfix-expression* ::= key | index | member

Squaak Details and First Steps

114

key index member arguments expression

::= '{' expression '}' ::= '[' expression ']' ::= '.' identifier ::= '(' [expression {',' expression}] ')' ::= | | | ::= | | | | | expression {binary-op expression} unary-op expression '(' expression ')' term float-constant integer-constant string-constant array-constructor hash-constructor primary

term

hash-constructor named-field array-constructor binary-op

::= '{' [named-field {',' named-field}] '}' ::= string-constant '=>' expression ::= '[' [expression {',' expression} ] ']' ::= '+' | '-' | '/' | 'and | 'or' | '>' | '==' | '!=' ::= 'not' | '-' | '*' | '%' | '>=' | '<' | '..' | '<='

unary-op

Gee, that's a lot, isn't it? Actually, this grammar is rather small compared to "real world" languages such as C, not to mention Perl 6. No worries though, we won't implement the whole thing at once, but in small steps. What's more, the exercises section contains enough exercises for you to learn to use the PCT yourself! The solutions to these exercises will be posted a few days later (but you really only need a couple of hours to figure them out).

Squaak Details and First Steps

115

Semantics
Most of the Squaak language is straightforward; the if-statement executes exactly as you would expect. When we discuss a grammar rule (for its implementation), a semantic specification will be included. This is to prevent myself from writing a complete language manual, which could take some pages.

Interactive Squaak
Although the Squaak compiler can be used in interactive mode, there is one point of attention to be noted. When defining a local variable using the 'var' keyword, this variable will be lost in any consecutive commands. The variable will only be available to other statements within the same command (a command is a set of statements before you press enter). This has to do with the code generation by the PCT, and will be fixed at a later point. For now, just remember it doesn't work.

Let's get started!


In the rest of this episode we will implement the basic parts of the grammar, such as the basic data types and assignments. At the end of this episode, you'll be able to assign simple values to (global) variables. It ain't much, but it's a very important first step. Once these basics are in place, you'll notice that adding a certain syntactic construct becomes a matter of minutes. First, open your editor and open the files src/Squaak/Grammar.pm and src/Squaak/Actions.pm. The former implements the parser using Perl 6 rules, and the latter contains the parse actions, which are executed during the parsing stage. In the file Grammar.pm, you'll see the top-level rule, named "TOP". It's located at, ehm... the top. When the parser is invoked, it will start at this rule (a rule is nothing else than a method of the grammar class). When we generated this language (in the first episode), some default rules were defined. Now we're going to make some small changes, just enough to get us started. Firstly, change the statement rule to this: rule statement { <assignment> {*} } and add these rules: rule assignment { <primary> '=' <expression> {*} } rule primary { <identifier> {*} } token identifier { <!keyword> <ident> {*} }

Squaak Details and First Steps token keyword { ['and'|'catch'|'do' |'not'|'or' |'sub' }

116

|'else' |'end' |'for' |'if' |'throw'|'try' |'var'|'while']>>

Now, change the rule "value" into this (renaming to "expression"): rule expression { | <string_constant> {*} | <integer_constant> {*} }

#= string_constant #= integer_constant

Rename the rule "integer" as "integer_constant", and "quote" as "string_constant" (to better match our language specification). Phew, that was a lot of information! Let's have a closer look at some things that may look unfamiliar. The first new thing is in the rule "identifier". Instead of the "rule" keyword, you see the keyword "token". In short, a token doesn't skip whitespace between the different parts specified in the token, while a rule does. For now, it's enough to remember to use a token if you want to match a string that doesn't contain any whitespace (such as literal constants and identifiers), and use a rule if your string does (and should) contain whitespace (such as a an if-statement). We shall use the word "rule" in a general sense, which could refer to a token. For more information on rules and tokens (and there's a third type, called "regex"), take a look at synopsis 5. In token "identifier", the first subrule is called an assertion. It asserts that an "identifier" does not match the rule keyword. In other words, a keyword cannot be used as an identifier. The second subrule is called "ident", which is a built-in rule in the class PCT::Grammar, of which this grammar is a subclass. In token "keyword", all keywords of Squaak are listed. At the end there's a ">>" marker, which indicates a word boundary. Without this marker, an identifier such as "forloop" would wrongly be disqualified, because the part "for" would match the rule keyword, and the part "loop" would match the rule "ident". However, as the assertion <!keyword> is false (as "for" could be matched), the string "forloop" cannot be matched as an identifier. The required presence of the word boundary prevents this. The last rule is "expression". An expression is either a string-constant or an integer-constant. Either way, an action is executed. However, when the action is executed, it does not know what the parser matched; was it a string-constant, or an integer-constant? Of course, the match object can be checked, but consider the case where you have 10 alternatives, then doing 9 checks only to find out the last alternative was matched is somewhat inefficient (and adding new alternatives requires you to update this check). That's why you see the special comments starting with a "#=" character. Using this notation, you can specify a key, which will be passed as a second argument to the action method. As we will see, this allows us to write very simple and efficient action methods for rules such as expression. (Note there's a space between the #= and the key's name).

Testing the Parser


It is useful to test the parser before writing any action methods. This can save you a lot of work; if you write the actions immediately after writing the grammar rules, and only later find out that your parser must be updated, then your action methods probably need to be updated as well. In Episode 2 we saw the target command line option. In order to test the parser, the "parse" target is especially helpful. When specifying this option, your compiler will print the parse tree of your input string, or print a syntax error. It is wise to test your parser with both correct and incorrect input, so you know for sure your parser doesn't accept input that it shouldn't.

Squaak Details and First Steps

117

And... Action!
Now we have implemented the initial version of the Squaak grammar, it's time to implement the parse actions we mentioned before. The actions are written in a file called src/Squaak/Actions.pm. If you look at the methods in this file, here and there you'll see that the Match object ($/) , or rather, hash fields of it (like $<statement>) is evaluated in scalar context, by writing "$( ... )". As mentioned in Synopsis 5, evaluating a Match object in scalar context returns its result object. Normally the result object is the matched portion of the source text, but the special make function can be used to set the result object to some other value. This means that each node in the parse tree (a Match object) can also hold its PAST representation. Thus we use the make function to set the PAST representation of the current node in the parse tree, and later use the $( ... ) operator to retrieve the PAST representation from it. In recap, the Match object ($/) and any subrules of it (for instance $<statement>) represent the parse tree; of course, $<statement> represents only the parse tree what the <statement> rule matched. So, any action method has access to the parse tree that the equally named grammar rule matched, as the Match object is always passed as an argument. Evaluating a parse tree in scalar context yields the PAST representation (obviously, this PAST object should be set using the make function). If you're following this tutorial, I highly advise you to get your feet wet, and do the exercises. Remember, learning and not doing is not learning (or something like that :-). This week's exercises are not that difficult, and after doing them, you'll have implemented the first part of our little Squaak language.

What's next?
In this episode we introduced the full grammar of Squaak. We took the first steps to implement this language. The first, and currently only, statement type is assignments. We briefly touched on how to write the action methods that are invoked during the parsing phase. In the next episode, we shall take a closer look on the different PAST node types, and implement some more parts of the Squaak language. Once we have all basic parts in place, adding statement types will be rather straightforward. In the mean time, if you have any questions or are stuck, don't hesitate to leave a comment or contact me.

Exercises
This episode's exercises are simple enough to get started on implementing Squaak. Problem 1 Rename the names of the action methods according to the name changes we made on the grammar rules. So, "integer" becomes "integer_constant", "value" becomes "expression", and so on. Problem 2 Look at the grammar rule for statement. A statement currently consists of an assignment. Implement the action method "statement" to retrieve the result object of this assignment and set it as statement's result object using the special make function. Do the same for rule primary. Solution method statement($/) { make $( $<assignment> ); } Note that at this point, the rule statement doesn't define different #= keys for each type of statement, so we don't declare a parameter $key. This will be changed later.

Squaak Details and First Steps method primary($/) { make $( $<identifier> ); } Problem 3 Write the action method for the rule identifier. As a result object of this "match", a new PAST::Var node should be set, taking as name a string representation of the match object ($/). For now, you can set the scope to 'package'. See "pdd26: ast" for details on PAST::Var nodes. Solution method identifier($/) { make PAST::Var.new( :name(~$/), :scope('package'), :node($/) ); } Problem 4 Write the action method for assignment. Retrieve the result objects for "primary" and for "expression", and create a PAST::Op node that binds the expression to the primary. (Check out pdd26 for PAST::Op node types, and find out how you do such a binding). Solution method assignment($/) { my $lhs := $( $<primary> ); my $rhs := $( $<expression> ); $lhs.lvalue(1); make PAST::Op.new( $lhs, $rhs, :pasttype('bind'), :node($/) ); } Note that we set the lvalue flag on $lhs. See PDD26 for details on this flag. Problem 5 Run your compiler on a script or in interactive mode. Use the target option to see what PIR is being generated on the input "x = 42". Solution .namespace .sub "_block10" new $P11, "Integer" assign $P11, 42 set_global "x", $P11 .return ($P11) .end The first two lines of code in the sub create an object to store the number 42, the third line stores this number as "x". The PAST compiler will always generate an instruction to return the result of the last statement, in this case $P11.

118

Squaak Details and First Steps

119

Some Notes
Help! I get the error message "no result object". This means that the result object was not set properly (duh!). Make sure each action method is invoked (check each rule for a "{*}" marker), and that there is an action method for that rule, and that "make" is used to set the appropriate PAST node. Note that not all rules have action methods, for instance the "keyword" rule (there's no point in that). While we're constructing parts of Squaak's grammar, we'll sometimes make a shortcut, by forgetting about certain rules for a while. For instance, you might have noticed we're ignoring float-constants right now. That's ok. When we'll need them, these rules will be added.

References
pdd26: ast synopsis 5: Rules docs/pct/*.pod

PAST Nodes and More Statements


Squaak Language Tutorial
Episode 1: Introduction Episode 2: Poking in Compiler Guts Episode 3: Squaak Details and First Steps Episode 4: PAST Nodes and More Statements Episode 5: Variable Declaration and Scope Episode 6: Scope and Subroutines Episode 7: Operators and Precedence Episode 8: Hashtables and Arrays Episode 9: Wrap-Up and Conclusion The previous episode introduced the full grammar specification of Squaak, and we finally started working on the implementation. If you're doing the exercises, you currently have basic assignments working; strings and integers can be assigned to (global) variables. This episode will focus on implementation of some statement types and explain a few bits about the different PAST node types.

Parrot Abstract Syntax Tree


A Parrot Abstract Syntax Tree (PAST) represents a program written in Squaak (or any other Parrot-ported language), and consists of nodes. In the previous episode, we already saw nodes to represent string and integer literals, identifiers and "operator" nodes (PAST::Op), in our case assignment. Other operators represent other high-level language constructs such as conditional statements, loops, and subroutine invocation. Depending on the node type, a PAST node can take child nodes. For instance, a PAST node to represent an if-statement can have up to three child nodes. The first child node represents the condition; if true, the second child node is evaluated. If the condition evaluates to false, and there's a third child node, this third child node is evaluated (the else part). If the PAST represents a subroutine invocation, the child nodes are evaluated in a different way. In that case, the first child node represents the subroutine that is to be invoked (unless the :name attribute was set on this node, but more

PAST Nodes and More Statements on that in a later episode), and all other child nodes are passed to that subroutine as arguments. It generally doesn't matter of which PAST node type the children are. For instance, consider a language in which a simple expression is a statement: 42 You might wonder what kind of code is generated for this. Well, it's really very simple: a new PAST::Val node is created (of a certain type, for this example that would be 'Integer'), and the value is assigned to this node. It might seem a bit confusing to write something like this, as it doesn't really do anything (note that this is not valid Squaak input): if 42 then "hi" else "bye" end But again, this works out correctly; the "then" and "else" blocks are compiled to instructions that load that particular literal into a PAST::Val node and leave it there. That's fine, if your language allows such statements. The point I'm trying to make is, that all PAST nodes are equal. You don't need to think about the node types if you set a node as a child of some other parent node. Each PAST node is compiled into a number of PIR instructions.

120

Go with the control-flow


Now you know a bit more on PAST nodes, let's get our hands dirty and implement some more statement types. In the rest of this episode, we'll handle if-statements and throw-statements.

If-then-else
The first statement we're going to implement now is the if-statement. An if-statement has typically three parts (but this of course depends on the programming language): a conditional expression, a "then" part and an "else" part. Implementing this in Perl 6 rules and PAST is almost trivial: rule if_statement { 'if' <expression> 'then' <block> ['else' $<else>=<block> ]? 'end' {*} } rule block { <statement>* {*} } rule statement { | <assignment> {*} | <if_statement> {*} }

#= assignment #= if_statement

Note that the optional else block is stored in the match object's "else" field. If we hadn't written this $<else>= part, then <block> would have been an array, with block[0] the "then" part, and block[1] the optional else part. Assigning the optional else block to a different field, makes the action method slightly easier to read.

PAST Nodes and More Statements Also note that the statement rule has been updated; a statement is now either an assignment or an if-statement. As a result, the action method statement now takes a key argument. The relevant action methods are shown below: method statement($/, $key) { # get the field stored in $key from the $/ object, # and retrieve the result object from that field. make $( $/{$key} ); } method block($/) { # create a new block, set its type to 'immediate', # meaning it is potentially executed immediately # (as opposed to a declaration, such as a # subroutine definition). my $past := PAST::Block.new( :blocktype('immediate'), :node($/) ); # for each statement, add the result # object to the block for $<statement> { $past.push( $( $_ ) ); } make $past; } method my my my if_statement($/) { $cond := $( $<expression> ); $then := $( $<block> ); $past := PAST::Op.new( $cond, $then, :pasttype('if'), :node($/) ); if $<else> { $past.push( $( $<else>[0] ) ); } make $past;

121

} That's, easy, huh? First, we get the result objects for the conditional expression and the then part. Then, a new PAST::Op node is created, and the :pasttype is set to 'if', meaning this node represents an if-statement. Then, if there is an "else" block, this block's result object is retrieved and added as the third child of the PAST node. Finally, the result object is set with the make function.

Result objects
At this point it's wise to spend a few words on the make function, the parse actions and how the whole PAST is created by the individual parse actions. Have another look at the action method if_statement. In the first two lines, we request the result objects for the conditional expression and the "then" block. When were these result objects created? How can we be sure they're there? The answer lies in the order in which the parse actions are executed. The special {*} symbol that triggers a parse action invocation, is usually placed at the end of the rule. For this input

PAST Nodes and More Statements string: "if 42 then x = 1 end" this implies the following order: 1. parse TOP 2. parse statement 3. parse if_statement 4. parse expression 5. parse integer 6. create PAST::Val( :value(42) ) 7. parse block 8. parse statement 9. parse assignment 10. parse identifier 11. create PAST::Var( :name('x')) 12. parse integer 13. create PAST::Val( :value(1) ) 14. create PAST::Op( :pasttype('bind') ) 15. create PAST::Block (in action method block) 16. create PAST::Op( :pasttype('if') ) 17. create PAST::Block (in action method TOP) As you can see, PAST nodes are created in the leafs of the parse tree first, so that later, action methods higher in the parse tree can retrieve them.

122

Throwing Exceptions
The grammar rule for the "throw" statement is really quite easy, but it's useful to discuss the parse action, as it shows the use of generating custom PIR instructions. First the grammar rule: rule throw_statement { 'throw' <expression> {*} } I assume you know how to update the "statement" rule by now. The throw statement will compile down to Parrot's "throw" instruction, which takes one argument. In order to generate a custom Parrot instruction, the instruction can be specified in the :pirop attribute when creating a PAST::Op node. Any child nodes are passed as arguments to this instruction, so we need to pass the result object of the expression being thrown as a child of the PAST::Op node representing the "throw" instruction. method throw_statement($/) { make PAST::Op.new( $( $<expression> ), :pirop('throw'), :node($/) ); }

PAST Nodes and More Statements

123

What's Next?
In this episode we implemented two more statement types of Squaak. You should get a general idea of how and when PAST nodes are created, and how they can be retrieved sub (parse) trees. In the next episode we'll take a closer look at variable scope and subroutines. In the mean time, I can imagine some things are not too clear. In case you're lost, don't hesitate to leave comment, and I'll try to answer (as far as my knowledge goes).

Exercises
Problem 1 We showed how the if-statement was implemented. The while-statement and try-statement are very similar. Implement these. Check out pdd26 to see what PAST::Op nodes you should create. Solution The while-statement is straightforward: method while_statement($/) { my $cond := $( $<expression> ); my $body := $( $<block> ); make PAST::Op.new( $cond, $body, :pasttype('while'), :node($/) ); } The try-statement is a bit more complex. Here are the grammar rules and action methods. rule try_statement { 'try' $<try>=<block> 'catch' <exception> $<catch>=<block> 'end' {*} } rule exception { <identifier> {*} } method try_statement($/) { ## get the try block my $try := $( $<try> ); ## ## ## ## ## ## create a new PAST::Stmts node for the catch block; note that no PAST::Block is created, as this currently has problems with the exception object. For now this will do.

PAST Nodes and More Statements my $catch := PAST::Stmts.new( :node($/) ); $catch.push( $( $<catch> ) ); ## get the exception identifier; ## set a declaration flag, the scope, ## and clear the viviself attribute. my $exc := $( $<exception> ); $exc.isdecl(1); $exc.scope('lexical'); $exc.viviself(0); ## ## ## ## ## my generate instruction to retrieve the exception objct (and the exception message, that is passed automatically in PIR, this is stored into $S0 (but not used). $pir := " .get_results (%r, $S0)\n" ~ " store_lex '" ~ $exc.name() ~ "', %r";

124

$catch.unshift( PAST::Op.new( :inline($pir), :node($/) ) ); ## do the declaration of the exception ## object as a lexical here: $catch.unshift( $exc ); make PAST::Op.new( $try, $catch, :pasttype('try'), :node($/) ); } method exception($/) { our $?BLOCK; my $past := $( $<identifier> ); $?BLOCK.symbol( $past.name(), :scope('lexical') ); make $past; } Instead of putting "identifier" after the "catch" keyword, we made it a separate rule, with its own action method. This allows us to insert the identifier into the symbol table of the current block (the try-block), before the catch block is parsed. First the PAST node for the try block is retrieved. Then, the catch block is retrieved, and stored into a PAST::Stmts node. This is needed, so that we can make sure that the instructions that retrieve the exception object come first in the exception handler.

PAST Nodes and More Statements Then, we retrieve the PAST node for the exception identifier. We're setting its scope, a flag telling the PAST compiler this is a declaration, and we clear the viviself attribute. The viviself attribute is discussed in a later episode; if you didn't read that yet, just keep in mind the viviself attribute (if set) will make sure all declared variables are initialized. We must clear this attribute here, to make sure that this exception object is not initialized, because that will be done by the instruction that retrieves the thrown exception object, discussed next. In PIR, we can use the .get_results directive to retrieve a thrown exception. You could also generate the get_results instruction (note the missing dot), but this is much easier. Currently, in PIR, when retrieving the exception object, you must always specify both a variable (or register) for the exception object, and a string variable (or register) to store the exception message. The exception message is actually stored within the exception object. We use $S0 to store the exception message, and we'll ignore it after that. Just remember for now that if you want to retrieve the exception object, you must also specify a place to store the exception message. There is no special PAST node to generate these instructions, so we use a so-called inline PAST::Op node. We store the instructions to be generated into a string and store that in the inline attribute of a PAST::Op node. Once created, this node is unshifted onto the PAST::Stmts node representing the exception handler. After that, the declaration is stored in that PAST::Stmts node, so that this declaration comes first. Finally, we have the block representing the try block, and a PAST::Stmts node representing the exception handler. Both are used to create a PAST::Op node whose pasttype is set to the built-in "try" type. Problem 2 Start Squaak in interactive mode, and specify the target option to show the generated PIR instructions. Check out what instructions and labels are generated, and see if you can recognize which instructions make up the conditional expression, which represent the "then" block, and which represent the "else" block (if any). Solution Start Squaak in interactive mode, and specify the target option to show the generated PIR instructions. Check out what instructions and labels are generated, and see if you can recognize which instructions make up the conditional expression, which represent the "then" block, and which represent the "else" block (if any). > if 1 then else end .namespace .sub "_block16" new $P18, "Integer" assign $P18, 1 ## this is the condition: if $P18, if_17 ## this is invoking the else-block: get_global $P21, "_block19" newclosure $P21, $P21 $P20 = $P21() set $P18, $P20 goto if_17_end ## this is invoking the then-block: if_17: get_global $P24, "_block22" newclosure $P24, $P24

125

PAST Nodes and More Statements $P23 = $P24() set $P18, $P23 if_17_end: .return ($P18) .end .namespace .sub "_block22" :outer("_block16") .return () .end .namespace .sub "_block19" :outer("_block16") .return () .end

126

References
PDD26: AST docs/art/*.pod for good introductions to PIR

Variable Declaration and Scope


Squaak Language Tutorial
Episode 1: Introduction Episode 2: Poking in Compiler Guts Episode 3: Squaak Details and First Steps Episode 4: PAST Nodes and More Statements Episode 5: Variable Declaration and Scope Episode 6: Scope and Subroutines Episode 7: Operators and Precedence Episode 8: Hashtables and Arrays Episode 9: Wrap-Up and Conclusion Episode 4 discussed the implementation of some statement types, such as the if-statement. In this episode we'll talk about variable declarations and scope handling. It's going to be a long story, so take your time to read this episode.

Globals, locals and default values


Squaak variables have one of two scopes: either they're global, or they're local. In order to create a global variable, you just assign some expression to an identifier (which hasn't been declared as a local). Local variables, on the other hand, must be declared using the var keyword. In other words, at any given point during the parsing phase, we have a list of variables that are known to be local variables. When an identifier is parsed, it is looked up and if found, its scope is set to local. If not, its scope is assumed to be global. When using an uninitialized variable, its value is set to an object called Undef. Some examples are shown below. x = 42 var k = 10 # x was not declared, so it is global # k is local and initialized to 10

Variable Declaration and Scope a + b # neither a nor b was declared; # both default to the value "Undef"

127

Scoping and Symbol Tables


Earlier we mentioned the need to store declared local variables. In compiler jargon, such a data structure to store declarations is called a symbol table. For each individual scope, there's a separate symbol table. Squaak has a so-called do-block statement, that is defined below. rule do_block { 'do' <block> 'end' {*} } Each do-block defines a new scope; local variables declared between the "do" and "end" keywords are local to that block. An example to clarify this is shown below: do var x = 1 print(x) do var x = 2 print(x) end print(x) end # prints 2 # prints 1

# prints 1

So, each do/end pair defines a new scope, in which any declared variables hide variables with the same name in outer scopes. This behavior is common in many programming languages. The PCT has built-in support for symbol tables; a PAST::Block object has a method symbol that can be used to enter new symbols and query the table for existing ones. In PCT, a PAST::Block object represents a scope. There are two blocktypes: immediate and declaration. An immediate block can be used to represent the blocks of statements in an do-block statement, for instance: do block end When executing this statement, block is executed immediately. A declaration block, on the other hand, represents a block of statements that can be invoked at a later point. Typically these are subroutines. So, in this example: sub foo(x) print(x) end a PAST::Block object is created for the subroutine foo. The blocktype is set to declaration, as the subroutine is defined, not executed (immediately). For now you can forget about the blocktype, but now that I've told you, you'll recognize it when you see it. We'll come back to it in a later episode.

Variable Declaration and Scope

128

Implementing Scope
So, we know how to use global variables, declare local variables, and about PAST::Block objects representing scopes. How do we make our compiler to generate the right PIR instructions? After all, when handling a global variable, Parrot must handle this differently from handling a local variable. When creating PAST::Var nodes to represent the variables, we must know whether the variable is a local or a global variable. So, when handling variable declarations (of local variables; globals are not declared), we need to register the identifier as a local in the current block's symbol table. First, we'll take a look at the implementation of variable declarations.

Variable declaration
The following is the grammar rule for variable declarations. This is a type of statement, so I assume you know how to extend the statement rule to allow for variable declarations. rule variable_declaration { 'var' <identifier> ['=' <expression>]? {*} } A local variable is declared using the var keyword, and has an optional initialization expression. If the latter is missing, the variable's value defaults to the undefined value called Undef. Let's see what the parse action looks like: method variable_declaration($/) { # get the PAST for the identifier my $past := $( $<identifier> ); # this is a local (it's being defined) $past.scope('lexical'); # set a declaration flag $past.isdecl(1); # check for the initialization expression if $<expression> { # use the viviself clause to add a # an initialization expression $past.viviself( $($<expression>[0]) ); } else { # no initialization, default to "Undef" $past.viviself('Undef'); } make $past; } Well, that wasn't too hard, was it? Let's analyze what we just did. First we retrieved the PAST node for the identifier, which we then decorated by setting its scope to lexical (a local variable is said to be lexically scoped, hence lexical), and setting a flag indicating this node represents a declaration (isdecl). So, besides representing variables in other statements (for instance, assignments), a PAST::Var node is also used as a declaration statement.

Variable Declaration and Scope Earlier in this episode we mentioned the need to register local variables in the current scope block when they are declared. So, when executing the parse action for variable-declaration, there should already be a PAST::Block node around, that can be used to register the symbol being declared. As we learned in Episode 4, PAST nodes are created in a depth-first fashion; the leafs are created first, and then the nodes "higher" in the parse tree. This implies that a PAST::Block node is created after the statement nodes (which variable_declaration is) that will be the children of the block. In the next section we'll see how to solve this problem.

129

Implementing a scope stack


In order to make sure that a PAST::Block node is created before any statements are parsed (and their parse actions are executed -- these might need to enter symbols in the block's symbol table), we add a few extra parse actions. Let's take a look at them. rule TOP { {*} <statement>* [ $ || <.panic: syntax error> ] {*} }

#= open

#= close

We now have two parse actions for TOP, which are differentiated by an additional key parameter. The first parse action is executed before any input is parsed, which is particularly suitable for any initialization actions you might need. The second action (which was already there) is executed after the whole input string is parsed. Now we can create a PAST::Block node before any statements are parsed, so that when we need the current block, it's there (somewhere, later we'll see where exactly). Let's take a look at the parse action for TOP. method TOP($/, $key) { our $?BLOCK; our @?BLOCK; if $key eq 'open' { $?BLOCK := PAST::Block.new( :blocktype('declaration'), :node($/) ); @?BLOCK.unshift($?BLOCK); } else { # key is 'close' my $past := @?BLOCK.shift(); for $<statement> { $past.push( $( $_ ) ); } make $past; } } Let's see what's happening here. When the parse action is invoked for the first time (when $key equals "open"), a new PAST::Block node is created and assigned to a strange-looking (if you don't know Perl, like me. Oh wait, this is Perl. Never mind..) variable called $?BLOCK. This variable is declared as "our", which means that it is a package variable. This means that the variable is shared by all methods in the same package (or class), and, equally important, the variable is still around after the parse action is done. Please refer to the Perl 6 specification [1] for

Variable Declaration and Scope more semantics on "our". The variable $?BLOCK holds the current block. After that, this block is unshifted onto another funny-looking variable, called @?BLOCK. This variable has a "@" sigil, meaning this is an array. The unshift method puts its argument on the front of the list. In a sense, you could think of the front of this list as the top of a stack. Later we'll see why this stack is necessary. This @?BLOCK variable is also declared with "our", meaning it's also package-scoped. However, as we call a method on this variable, it should have been already created; otherwise you'd invoke the method on an undefined ("Undef") variable. So, this variable should have been created before the parsing starts. We can do this in the compiler's main program, squaak.pir. Before doing so, let's take a quick look at the "else" part of the parse action for TOP, which is executed after the whole input string is parsed. The PAST::Block node is retrieved from @?BLOCK, which makes sense, as it was created in the first part of the method and unshifted on @?BLOCK. Now this node can be used as the final result object of TOP. So, now we've seen how to use the scope stack, let's have a look at its implementation.

130

A List Class
We'll implement the scope stack as a ResizablePMCArray object. This is a built-in PMC type. However, this built-in PMC does not have any methods; in PIR it can only be used as an operand of the built-in shift and unshift instructions. In order to allow us to write this as method calls, we create a new subclass of ResizablePMCArray. The code below creates the new class and defines the methods we need. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 .namespace .sub 'initlist' :anon :init :load subclass $P0, 'ResizablePMCArray', 'List' new $P1, 'List' set_hll_global ['Squaak';'Grammar';'Actions'], '@?BLOCK', $P1 .end .namespace ['List'] .sub 'unshift' :method .param pmc obj unshift self, obj .end .sub 'shift' :method shift $P0, self .return ($P0) .end

Well, here you have it: part of the small amount of PIR code you need to write for the Squaak compiler (there's some more for some built-in subroutines, more on that later). Let's discuss this code snippet in more detail (if you know PIR, you could skip this section). Line 1 resets the namespace to the root namespace in Parrot, so that the sub 'initlist' is stored in that namespace. The sub 'initlist' defined in lines 2-6 has some flags: :anon means that the sub is not stored by name in the namespace, implying it cannot be looked up by name. The :init flag means that the sub is executed before the main program (the "main" sub) is executed. The :load flag makes sure that the sub is executed if this file was compiled and loaded by another file through the load_bytecode instruction. If you don't understand this, no worries. You can forget about it now. In any case, we know for sure there's a List class when we need it, because the class creation is done before running the actual compiler code. Line 3 creates a new subclass of ResizablePMCArray, called "List". This results in a new class object, which is left in register $P0, but it's not used after that. Line 4 creates a new List object, and stores it in register $P1. Line 5, stores this List object by name of "@?BLOCK" (that name should ring a bell now...) in the namespace of the Actions

Variable Declaration and Scope class. The semicolons in between the several key strings indicate nested namespaces. So, lines 4 and 5 are important, because the create the @?BLOCK variable and store it in a place that can be accessed from the action methods in the Actions class. Lines 7-11 define the unshift method, which is a method in the "List" namespace. This means that it can be invoked as a method on a List object. As the sub is marked with the :method flag, the sub has an implicit first parameter called "self", which refers to the invocant object. The unshift method invokes Parrot's unshift instruction on self, passing the obj argument as the second operand. So, obj is unshifted onto self, which is the List object itself. Finally, lines 12-15 define the "shift" method, which does the opposite of "unshift", removing the first element and returning it to its caller.

131

Storing Symbols
Now, we set up the necessary infrastructure to store the current scope block, and we created a datastructure that acts as a scope stack, which we will need later. We'll now go back to the parse action for variable_declaration, because we didn't enter the declared variable into the current block's symbol table yet. We'll see how to do that now. First, we need to make the current block accessible from the method variable_declaration. We've already seen how to do that, using the "our" keyword. It doesn't really matter where in the action method we enter the symbol's name into the symbol table, but let's do it at the end, after the initialization stuff. Naturally, we're only going to enter the symbol if it's not there already; duplicate variable declarations (in the same scope) should result in an error message (using the panic method of the match object). The code to be added to the method variable_declaration looks then like this: method variable_declaration($/) { our $?BLOCK; # get the PAST node for identifier # set the scope and declaration flag # do the initialization stuff # cache the name into a local variable my $name := $past.name(); if $?BLOCK.symbol( $name ) { # symbol is already present $/.panic("Error: symbol " ~ $name ~ " was already defined.\n"); } else { $?BLOCK.symbol( $name, :scope('lexical') ); } make $past; }

Variable Declaration and Scope

132

What's Next?
With this code in place, variable declarations are handled correctly. However, we didn't update the parse action for identifier, which creates the PAST::Var node and sets its scope; currently all identifiers' scope is set to 'package' (which means it's a global variable). As we already covered a lot of material in this episode, we'll leave this for the next episode. In the next episode, we'll also cover subroutines, which is another important aspect of any programming language. Hope to catch you later!

Exercises
Problem 1 In this episode, we changed the action method for the TOP rule; it is now invoked twice, once at the beginning of the parse, once at the end of the parse. The block rule, which defines a block to be a series of statements, represents a new scope. This rule is used in for instance if-statement (the then-part and else-part), while-statement (the loop body) and others. Update the parse action for block so it is invoked twice; once before parsing the statements, during which a new PAST::Block is created and stored onto the scope stack, and once after parsing the statements, during which this PAST node is set as the result object. Make sure $?BLOCK is always pointing to the current block. In order to do this exercise correctly, you should understand well what the shift and unshift methods do, and why we didn't implement methods to push and pop, which are more appropriate words in the context of a (scope) stack. Solution Keeping the Current block up to date: Sometimes we need to access the current block's symbol table. In order to be able to do so, we need a reference to the "current block". We do this by declaring a package variable called "$?BLOCK", declared with "our" (as opposed with "my"). This variable will always point to the "current" block. As blocks can nest, we use a "stack", on which newly created blocks are stored. Whenever a new block is created, we assign this to $?BLOCK, and store it onto the stack, so that the next time a new block is created, the "old" current block isn't lost. Whenever a scope is closed, we pop off the current block from the stack, and restore the previous "current" block. Why unshift/shift and not push/pop?: When we're talking about stacks, it would seem logical to talk about stack operations such as "push" and "pop". Instead, we use the operations "unshift" and "shift". If you're not a Perl programmer (such as myself), these names might not make sense. However, it's pretty easy. Instead of pushing a new object onto the "top" of the stack, you unshift objects onto this stack. Just see it as an old school bus, with only one entrance (at the front of the bus). Pushing a new person means taking the first free seat when entering, while unshifting a new person means everybody moves (shifts) one place to the back, so the new person can sit in the front seat. You might think this is not as efficient (more stuff is moved around), but that's not really true (actually: I guess (and certainly hope) the shift and unshift operations are implemented more effectively than the bus metaphor; I don't know how it is implemented). So why unshift/shift, and not push/pop? When restoring the previous "current block", we need to know exactly where it is (what position). It would be nice to be able to always refer to the "first passenger on the bus", instead of the last person. We know how to reference the first passenger (it's on seat no. 0 (it was designed by an IT guy)); we don't really know what is the seat no. of the last person: s/he might sit in the middle, or at the back. I hope it's clear what I mean here... otherwise, have a look at the code, and try to figure out what's happening: method block($/, $key) { our $?BLOCK; our @?BLOCK; if $key eq 'open' { $?BLOCK := PAST::Block.new(

Variable Declaration and Scope :blocktype('immediate'), :node($/) ); @?BLOCK.unshift($?BLOCK); } else { my $past := @?BLOCK.shift(); $?BLOCK := @?BLOCK[0]; for $<statement> { $past.push( $( $_ ) ); } make $past; } }

133

References
[1] http:/ / perlcabal. org/ syn/ S02. html#Names

Scope and Subroutines


Squaak Language Tutorial
Episode 1: Introduction Episode 2: Poking in Compiler Guts Episode 3: Squaak Details and First Steps Episode 4: PAST Nodes and More Statements Episode 5: Variable Declaration and Scope Episode 6: Scope and Subroutines Episode 7: Operators and Precedence Episode 8: Hashtables and Arrays Episode 9: Wrap-Up and Conclusion In Episode 5, we looked at variable declarations and implementing scope. We covered a lot of information then, but did not tell the full story, in order to keep that post short. In this episode we'll address the missing parts, which will also result in implementing subroutines.

Variables
In the previous episode, we entered local variables into the current block's symbol table. As we've seen earlier, using the do-block statement, scopes may nest. Consider this example: do var x = 42 do print(x) end end

Scope and Subroutines In this example, the print statement should print 42, even though x was not declared in the scope where it is referenced. How does the compiler know it's still a local variable? That's simple: it should look in all scopes, starting at the innermost scope. Only when the variable is found in any scope, should its scope be set to "lexical", so that the right instructions are being generated. The solution I came up with is shown below. Please note that I'm not 100% sure if this is the "best" solution, but my personal understanding of the PAST compiler is limited. So, while this solution works, I may teach you the wrong "habit". Please be aware of this. method identifier($/) { our @?BLOCK; my $name := ~$<ident>; my $scope := 'package'; # default value # go through all scopes and check if the symbol # is registered as a local. If so, set scope to # local. for @?BLOCK { if $_.symbol($name) { $scope := 'lexical'; } } make PAST::Var.new( :name($name), :scope($scope), :viviself('Undef'), :node($/) ); }

134

Viviself
You might have noticed the viviself attribute before. This attribute will result in extra instructions that will initialize the variable if it doesn't exist. As you know, global variables spring into life automatically when they're used. Earlier we mentioned that uninitialized variables have a default value of "Undef": the viviself attribute does this. For local variables, we use this mechanism to set the (optional) initialization value. When the identifier is a parameter, the parameter will be initialized automatically if it doesn't receive a value when the subroutine it belongs to is invoked. Effectively this means that all parameters in Squaak are optional!

Scope and Subroutines

135

Subroutines
We already mentioned subroutines before, and introduced the PAST::Block node type. We also briefly mentioned the blocktype attribute that can be set on a PAST::Block node, which indicates whether the block is to be executed immediately (for instance, a do-block or if statement) or it represents a declaration (for instance, subroutines). Let us now look at the grammar rule for subroutine definitions: rule sub_definition { 'sub' <identifier> <parameters> <statement>* 'end' {*} } rule parameters { '(' [<identifier> [',' <identifier>]* ]? ')' {*} } This is rather straightforward, and the action methods for these rules are quite simple, as you will see. First, however, let's have a look at the rule for sub definitions. Why is the sub body defined as <statement>* and not as a <block>? Surely, a subroutine defines a new scope, which was already covered by <block> Well, you're right in that. However, as we will see, by the time that a new PAST::Block node would be created, we are too late! The parameters would already have been parsed, and not entered into the block's symbol table. That's a problem, because parameters are most likely to be used in the subroutine's body, and as they are not registered as local variables (which they are), any usage of parameters would not be compiled down to the right instructions to fetch any parameters. So, how do we solve this in an efficient way? The solution is simple. The only place where parameters live, is in the subroutine's body, represented by a PAST::Block node. Why don't we create the PAST::Block node in the action method for the parameters rule. By doing so, the block is already in place and the parameters are registered as local symbols right in time. Let's look at the action methods. method parameters($/) { our $?BLOCK; our @?BLOCK; my $past := PAST::Block.new( :blocktype('declaration'), :node($/) ); # now add all parameters to this block for $<identifier> { my $param := $( $_ ); $param.scope('parameter'); $past.push($param); # register the parameter as a local symbol $past.symbol($param.name(), :scope('lexical')); }

Scope and Subroutines

136

# now put the block into place on the scope stack $?BLOCK := $past; @?BLOCK.unshift($past); make $past; } method sub_definition($/) { our $?BLOCK; our @?BLOCK; my $past := $( $<parameters> ); my $name := $( $<identifier> ); # set the sub's name $past.name( $name.name() ); # add all statements to the sub's body for $<statement> { $past.push( $( $_ ) ); } # and remove the block from the scope # stack and restore the current block @?BLOCK.shift(); $?BLOCK := @?BLOCK[0]; make $past; } First, let's check out the parse action for parameters. First, a new PAST::Block node is created. Then, we iterate over the list of identifiers (which may be empty), each representing a parameter. After retrieving the result object for a parameter (which is just an identifier), we set its scope to "parameter", and we add it to the block object. After that, we register the parameter as a symbol in the block object, specifying the scope as "lexical". Parameters are just a special kind of local variables, and there's no difference in a parameter and a declared local variable in a subroutine, except that a parameter will usually be initialized with a value that is passed when the subroutine is invoked. After handling the parameters, we set the current block (referred to by our package variable $?BLOCK) to the PAST::Block node we just created, and push it on the scope stack (referred to by our package variable @?BLOCK). After the whole subroutine definition is parsed, the action method sub_definition is invoked. This will retrieve the result object for parameters, which is the PAST::Block node that will represent the sub. After retrieving the result object for the sub's name, we set the name on the block node, and add all statements to the block. After this, we pop off this block node of the scope stack (@?BLOCK), and restore the current block ($?BLOCK). Pretty easy, huh?

Scope and Subroutines

137

Subroutine invocation
Once you defined a subroutine, you'll want to invoke it. In the exercises of Episode 5, we already gave some tips on how to create the PAST nodes for a subroutine invocation. In this section, we'll give a complete description. First we'll introduce the grammar rules. rule sub_call { <primary> <arguments> {*} } Not only allows this to invoke subroutines by their name, you can also store the subroutines in an array or hash field, and invoke them from there. Let's take a look at the action method, which is really quite straightforward. method sub_call($/) { my $invocant := $( $<primary> ); my $past := $( $<arguments> ); $past.unshift($invocant); make $past; } method arguments($/) { my $past := PAST::Op.new( :pasttype('call'), :node($/) ); for $<expression> { $past.push( $( $_ ) ); } make $past; } The result object of the sub_call method should be a PAST::Op node (of type 'call'), which contains a number of child nodes: the first one is the invocant object, and all remaining children are the arguments to that sub call. In order to "move" the result objects of the arguments to the sub_call method, we create the PAST::Op node in the method arguments, which is then retrieved by sub_call. In sub_call, the invocant object is set as the first child (using unshift). This is all too easy, isn't it? :-)

What's Next?
In this episode we finished the implementation of scope in Squaak, and implemented subroutines. Our language is coming along nicely! In the next episode, we'll explore how to implement operators and an operator precedence table for efficient expression parsing. In the mean time, should you have any problems or questions, don't hesitate to leave a comment!

Exercises
Problem 1 By now you should have a good idea on the implementation of scope in Squaak. We haven't implemented the for-statement yet, as it needs proper scope handling to implement. Implement this. Check out episode 3 for the BNF rules that define the syntax of the for-statement. When implementing it, you will run into the same issue as we did when implementing subroutines and parameters. Use the same trick for the implementation of the for-statement. Solution

Scope and Subroutines First, let us look at the BNF of the for-statement: for-statement ::= 'for' for-init ',' expression [step] 'do' block 'end' step for-init ::= ',' expression ::= 'var' identifier '=' expression

138

It's pretty easy to convert this to Perl 6 rules: rule for_statement { 'for' <for_init> ',' <expression> <step>? 'do' <block> 'end' {*} } rule step { ',' <expression> {*} } rule for_init { 'var' <identifier> '=' <expression> {*} } Pretty easy huh? Let's take a look at the semantics. A for-loop is just another way to write a while loop, but much easier in certain cases. This: for var <ident> = <expr1>, <expr2>, <expr3> do <block> end corresponds to: do var <ident> = <expr1> while <ident> <= <expr2> do <block> <ident> = <ident> + <expr3> end end If <expr3> is absent, it defaults to the value "1". Note that the step expression (expr3) should be positive; the loop condition contains a <= operator. When you specify a negative step expression, the loop variable will only decrease in value, which will never make the loop condition false (unless it overflows, but that's a different issue; this might even raise an exception in Parrot; this I do not know). Allowing negative step expressions introduces more complexity, which I felt was not worth the trouble for this tutorial language.

Scope and Subroutines Note that the loop variable <ident> is local to the for loop; this is expressed in the equivalent while loop by the surrounding do/end pair: a new do/end pair defines a new (nested) scope; after the end keyword, the loop variable is no longer visible. Let's implement the action method for the for-statement. As was mentioned in the exercise description, we're dealing with the same situation as with subroutine parameters. In this case, we're dealing with the loop variable, which is local to the for-statement. Let's check out the rule for for_init: method for_init($/) { our $?BLOCK; our @?BLOCK; ## create a new scope here, so that we can ## add the loop variable ## to this block here, which is convenient. $?BLOCK := PAST::Block.new( :blocktype('immediate'), :node($/) ); @?BLOCK.unshift($?BLOCK); my $iter := $( $<identifier> ); ## set a flag that this identifier is being declared $iter.isdecl(1); $iter.scope('lexical'); ## the identifier is initialized with this expression $iter.viviself( $( $<expression> ) ); ## enter the loop variable into the symbol table. $?BLOCK.symbol($iter.name(), :scope('lexical')); make $iter; } So, just as we created a new PAST::Block for the subroutine in the action method for parameters, we create a new PAST::Block for the for-statement in the action method that defines the loop variable. (Guess why we made for-init a subrule, and didn't put in "var <ident> = <expression>" in the rule of for-statement). This block is the place to live for the loop variable. The loop variable is declared, initialized using the viviself attribute, and entered into the new block's symbol table. Note that after creating the new PAST::Block object, we put it onto the stack scope. Now, the action method for the for statement is quite long, so I'll just embed my comments, which makes reading it easier. method for_statement($/) { our $?BLOCK; our @?BLOCK; First, get the result object of the for statement initialization rule; this is the PAST::Var object, representing the declaration and initialization of the loop variable. my $init := $( $<for_init> );

139

Scope and Subroutines Then, create a new node for the loop variable. Yes, another one (besides the one that is currently contained in the PAST::Block). This one is used when the loop variable is updated at the end of the code block (each iteration). The difference with the other one, is that it doesn't have the isdecl flag, and it doesn't have a viviself clause, which would result in extra instructions checking whether the variable is null (and we know it's not, because we initialize the loop variable). ## cache the name of the loop variable my $itername := $init.name(); my $iter := PAST::Var.new( :name($itername), :scope('lexical'), :node($/) ); Now, retrieve the PAST::Block node from the scope stack, and push all statement PAST nodes onto it. ## the body of the loop consists of the statements written by the user and ## the increment instruction of the loop iterator. my $body := @?BLOCK.shift(); $?BLOCK := @?BLOCK[0]; for $<statement> { $body.push($($_)); } If there was a step, we use that value; otherwise, we use assume a default step size of "1". Negative step sizes won't work, but if you Feel Lucky, you could go ahead and try. It's not that hard, it's just a lot of work, and I'm too lazy for that now.... ehm, I mean, I leave it as the proverbial exercise to the reader. my $step; if $<step> { my $stepsize := $( $<step>[0] ); $step := PAST::Op.new( $iter, $stepsize, :pirop('add'), :node($/) ); } else { ## default is increment by 1 $step := PAST::Op.new( $iter, :pirop('inc'), :node($/) ); } The incrementing of the loop variable is part of the loop body, so add the incrementing statement to $body. $body.push($step); The loop condition uses the <= operator, and compares the loop variable with the maximum value that was specified. ## while loop iterator <= end-expression my $cond := PAST::Op.new( $iter, $( $<expression> ), :name('infix:<=') ); Now we have the PAST for the loop condition and the loop body, so now create a PAST to represent the (while) loop.

140

Scope and Subroutines my $loop := PAST::Op.new( $cond, $body, :pasttype('while'), :node($/) ); Finally, the initialization of the loop variable should go before the loop itself, so create a PAST::Stmts node to do this: make PAST::Stmts.new( $init, $loop, :node($/) ); } Wow, we've done it! This was a good example of how to implement a non-trivial statement type using PAST.

141

Operators and Precedence


Squaak Language Tutorial
Episode 1: Introduction Episode 2: Poking in Compiler Guts Episode 3: Squaak Details and First Steps Episode 4: PAST Nodes and More Statements Episode 5: Variable Declaration and Scope Episode 6: Scope and Subroutines Episode 7: Operators and Precedence Episode 8: Hashtables and Arrays Episode 9: Wrap-Up and Conclusion Up till now, we've implemented a great deal of the Squaak language. We've seen assignments, control-flow statements, variable declarations and scope, subroutines and invocation. Our expressions have been limited so far to singular values, such as string literals and integer constants. In this episode, we'll enhance Squaak so it can handle operators, so you can construct more complex expressions.

Operators, precedence and parse trees


We will first briefly introduce the problem with recursive-descent parsers (which parsers generated with the PCT are) when parsing expressions. Consider the following mini-grammar, which is a very basic calculator. rule TOP { <expression>* } rule expression { <term> } rule term { <factor> [ <addop> <factor> ]* } token addop { '+' | '-' }

Operators and Precedence

142

rule factor { <value> [ <mulop> <value> ]* } token mulop { '*' | '/' | '%' } rule value{ | <number> | '(' <expression> ')' } This basic expression grammar implements operator precedence by taking advantage of the nature of a recursive-descent parser (if you haven't seen the word, google it). However, the big disadvantage of parsing expressions this way, is that the parse trees can become quite large. Perhaps more importantly, the parsing process is not very efficient. Let's take a look at some sample input. We won't show the parse trees as shown in Episode 2, but we'll just show an outline. input: 42 results in this parse tree: TOP expression term factor value number 42 As you can see, the input of this single number will invoke 6 grammar rules before parsing the actual digits. Not that bad, you might think. input: "1 + 2" results in this parse tree (we ignore the operator for now): TOP expression term factor | value | number | 1 factor value number 2 Only a few more grammar rules are invoked, not really a problem either. input: "(1 + 2) * 3" results in this parse tree: TOP expression term factor

Operators and Precedence value | expression | term | | factor | | value | | number | | 1 | term | factor | value | number | 2 value number 3 Right; 16 grammar rules just to parse this simple input. I'd call this slightly inefficient. The point is, implementing operator precedence using a recursive-descent parser is somewhat problematic, and given the fact there are better methods to parse expressions like these, not the way to go. Check out this nice explanation [1] or google it.

143

Bottom-up parsing and stacks: operator tables


I would like to explain to you how bottom-up parsing works for expressions (or bottom-up parsers in general; Yacc/Bison are parser generators that generate bottom-up parsers for your grammar specification), taking operator precedence into account. However, it's been about 6 years that I did this in a CS class, and I don't remember the particular details. If you really want to know, check out the links at the end of the previous section. It's actually worth checking out. For now, I'll just assume you know what the problem is, so that I'll introduce the solution for PCT-based compilers immediately. At some point when parsing your input, you might encounter an expression. At this point, we'd like the parser to switch from top-down to bottom-up parsing. The Parrot Grammar Engine supports this, and is used as follows: rule expression is optable { ... } Note that we used the word "expression" here, but you can name it anything. This declares that, whenever you need an expression, the bottom-up parser is activated. Of course, this "optable" must be populated with some operators that we need to be able to parse. This can be done by declaring operators as follows: proto 'infix:*' is tighter('infix:+') { ... } This defines the operator "*" (the "infix:" is a prefix that tells the operator parser that this operator is an infix operator; there are other types, such as prefix, postfix and others). The "is tighter" clause tells that the "*" operator has a higher precedence than the "+" operator. As you could have guessed, there are other clauses to declare equivalent precedence ("is equiv") and lower precedence ("is looser"). It is very important to spell all clauses, such as "is equiv" correctly (for instance, not "is equil"), otherwise you might get some cryptic error message when trying to run your compiler. See the references section for the optable guide, that has more details on this. Of course, the expression parser does not just parse operators, it must also parse the operands. So, how do we declare the most basic entity that represents an operand? It can be anything, from a basic integer-constant, a function call, or even a function definition (but adding two function definition doesn't really make sense, does it?). The operands are parsed in a recursive-descent fashion, so somewhere the parser must switch back from bottom-up (expression

Operators and Precedence parsing) to top-down. To declare this "switch-back" point, write: proto 'term:' is tighter('prefix:-') is parsed(&term) { ... } The name "term:" is a built-in name of the operator bottom-up parser; it is invoked every time a new operand is needed. The "is parsed" clause tells the parser that "term" (which accidentally looks like "term:", but you could also have named it anything else) parses the operands. Note: it is very important to add a "is tighter" clause to the declaration of the "term:" rule. Otherwise your expression parser will not work! My knowledge here is a bit limited, but I usually define it as "is tighter" relative to the tightest operator defined.

144

Squaak Operators
We have defined the entry and exit point of the expression (bottom-up) parser, now it's time to add the operators. Let's have a look at Squaak's operators and their precedence. The operators are listed with decreasing precedence (so that high-precedence operators are listed at the top). (I'm not sure if this precedence table is common compared to other languages; some operators may have a different precedence w.r.t. other operators than you're used to. At least the mathematical operators are organized according to standard math rules). unary "-" unary "not" * / % + - .. < <= >= > != == and or (".." is the string concatenation operator). Besides defining an entry and exit point for the expression parser, you need to define some operator as a reference point, so that other operators' precedence can be defined relative to that reference point. My personal preference is to declare the operator with the lowest precedence as the reference point. This can be done like this: proto 'infix:or' is precedence('1') { ... } Now, other operators can be defined: proto proto proto proto proto proto 'infix:and' is tighter('infix:or') { ... } 'infix:<' is tighter('infix:and') { ... } 'infix:+' is tighter('infix:<') { ... } 'infix:*' is tighter('infix:+') { ... } 'prefix:not' is tighter('infix:*') { ... } 'prefix:-' is tighter('prefix:not') { ... }

Note that some operators are missing. See the exercises section for this. For more details on the use of the optable, check out docs/pct/pct_optable_guide.pod in the Parrot repository.

Operators and Precedence

145

Short-circuiting logical operators


Squaak has two logical operators: and and or; and results true if and only if both operands evaluate to true, while or results true if at least one of its operands evaluates to true. Both operands are short-circuited, which means that they don't evaluate both operands if that's unnecessary. For instance, if the first operand of the and operator evaluates to false, then there's no need to evaluate the second operand, as the final result of the and-expression cannot become true anymore (remember: both operands must evaluate to true). Let's think about how to implement this. When evaluating an and-expression, we first evaluate the first operand, and if it's true, only then does it make sense to evaluate the second operand. This behavior looks very much the same as an if-statement, doesn't it? In an if-statement, the first child is always evaluated, and if true, the second child (the "then" block) is evaluated (remember, the third child -- the "else" clause -- is optional). It would be great to be able to implement the and operator using a PAST::Op( :pasttype('if') ) node. Well, you can, using the "is pasttype" clause! Here's how: proto 'infix:and' is tighter('infix:or') is pasttype('if') { ... } So what about the or operator? When evaluating an or-expression, the first operand is evaluated. If it evaluates to true, then there's no need to evaluate the second operand, as the result of the or-expression is already true! Only if the first operand evaluates to false, is it necessary to evaluate the second child. Mmmmm.... what we're saying here is, unless the first operand evaluates to true, evaluate the second child. Guess what pasttype you'd need for that!

Operators PAST types and PIR instructions


In the previous section, we introduced the "pasttype" clause that you can specify. This means that for that operator (for instance, the "and" operator we discussed), a PAST::Op( :pasttype('if') ) node is created. What happens if you don't specify a pasttype? In that case a default PAST::Op node is created, and the default pasttype is 'call'. In other words, a PAST::Op node is created that calls the declared operator. For instance, the "infix:+" operator results in a call to the subroutine "infix:+". This means you'll need to implement subroutines for each operator. Now, that's a bit of a shame. Obviously, some languages have very exotic semantics for the "+" operator, but many languages just want to use Parrot's built-in add instruction. How do we achieve that? Instead of adding a "pasttype" clause, specify a "pirop" clause. The "pirop", or "PIR operator", clause tells the code generator what operator should be generated. Instead of generating a subroutine invocation with the operands as arguments, it will generate the specified instruction with the operator's operands as arguments. Neat huh? Let's look at an example: proto 'infix:+' is tighter('infix:<') is pirop('n_add') { ... } This specifies to use the "n_add" instruction, which tells Parrot to create a new result object instead of changing one of the operands. Why not just the "add" instruction (which takes two operands, updating the first), you might think. Well, if you leave out this "is pirop" stuff, this will be generated: $P12 = "infix:+"($P10, $P11) You see, three registers are involved. As we mentioned before, PCT does not do any optimizations. Therefore, instead of the generated instruction above, it just emit the following: n_add $P12, $P10, $P11 which means that the PMCs in registers $P10 and $P11 are added, and assigned to a newly created PMC which is stored in register $P12.

Operators and Precedence

146

To circumfix or not to circumfix


Squaak supports parenthesized expressions. Parentheses can be used to change the order of evaluation in an expression, just as you're probably have seen this in other languages. Besides infix, prefix and postfix operators, you can define circumfix operators, which is specified with the left and right delimiter. This is an ideal way to implement parenthesized expressions: proto 'circumfix:( )' is looser('infix:+') is pirop('set') { ... } By default, a subroutine invocation will be generated for each operator, in this case a call to "circumfix:( )". However, we are merely interested in the expression that has been parenthesized. The subroutine would merely return the expression. Instead, we can use the pirop attribute to specify what PIR operation should be generated; in this case that is the "set" operation, which sets one register to the contents of another. This solution works fine, except that "set" instructions are a bit of a waste. What happens is, the contents of some register is just copied to another register, which is then used in further code generation. This "set" instruction might as well be optimized away. Currently, there are no optimizations implemented in the PCT. There is an alternative solution for adding grammar rules for the parenthesized expressions, by adding it as an alternative of term. The grammar rule term then ends up as: rule term { | <float_constant> {*} #= float_constant | <integer_constant> {*} #= integer_constant | <string_constant> {*} #= string_constant | <primary> {*} #= primary | '(' <expression> ')' {*} #= expression } Of course, although we save one generated instruction, the parser will be slightly more inefficient, for reasons that we discussed at the beginning of this episode. Of course, you are free to decide for yourself how to implement this; this section just explains both methods. At some point, optimizations will be implemented in the PCT. I suspect "useless" instructions (such as the "set" instruction we just saw) will then be removed.

Expression parser's action method


For all grammar rules we introduced, we also introduced an action method that is invoked after the grammar rule was done matching. What about the action method for the optable? Naturally, there must be some actions to be executed. Well, there is, but to be frank, I cannot explain it to you. Every time I needed the action method for an optable, I just copied it from an existing actions file. Of course, the action method's name should match the name of the optable (the rule that has the "is optable" clause). So, here goes: method expression($/, $key) { if ($key eq 'end') { make $($<expr>); } else { my $past := PAST::Op.new( :name($<type>), :pasttype($<top><pasttype>),

Operators and Precedence :pirop($<top><pirop>), :lvalue($<top><lvalue>), :node($/) ); for @($/) { $past.push( $($_) ); } make $past; } }

147

What's Next?
This episode covered the implementation of operators, which allows us to write complex expressions. By now, most of our language is implemented, except for one thing: aggregate data structures. This will be the topic of Episode 8. We will introduce the two aggregate data types: array and hashtables, and see how we can implement these. We'll also discuss what happens when we pass such aggregates as subroutine arguments, and the difference with the basic data types.

Exercises
Problem 1 Currently, Squaak only has grammar rules for integer and string constants, not floating point constants. Implement this grammar rule. A floating-point number consists of zero or more digits, followed by a dot and at least one digit, or, at least one digit followed by a dot and any number of digits. Examples are: 42.0, 1., .0001. There may be no whitespace between the individual digits and the dot. Make sure you understand the difference between a "rule" and a "token". Hint currently, the Parrot Grammar Engine (PGE), the component that "executes" the regular expressions (your grammar rules), matches alternative subrules in order. This means that this won't work: rule term { | <integer_constant> | <float_constant> ... } because when giving the input "42.0", "42" will be matched by <integer_constant>, and the dot and "0" will remain. Therefore, put the <float_constant> alternative in rule term before <integer_constant>. At some point, PGE will support longest-token matching, so that this issue will disappear. Solution token float_constant { [ | \d+ '.' \d* | \d* '.' \d+ ] {*} }

Operators and Precedence Problem 2 Implement the missing operators: (binary) "-", "<=", ">=", "==", "!=", "/", "%", "or" Solution For sake of completeness (and easy copy-paste for you), here's the list of operator declarations as I wrote them for Squaak: rule expression is optable { ... } proto 'infix:or' is precedence('1') is pasttype('unless') { ... } proto 'infix:and' is tighter('infix:or') is pasttype('if') { ... } proto proto proto proto proto proto 'infix:<' 'infix:<=' 'infix:>' 'infix:>=' 'infix:==' 'infix:!=' is is is is is is is is is is tighter('infix:and') { equiv('infix:<') { ... equiv('infix:<') { ... equiv('infix:<') { ... equiv('infix:<') { ... equiv('infix:<') { ... ... } } } } } }

148

proto 'infix:+' proto 'infix:-'

tighter('infix:<') pirop('n_add') { ... } equiv('infix:+') pirop('n_sub') { ... }

proto 'infix:..'

is equiv('infix:+') is pirop('n_concat') { ... } is is is is is is tighter('infix:+') pirop('n_mul') { ... } equiv('infix:*') pirop('n_mod') { ... } equiv('infix:*') pirop('n_div') { ... } tighter('infix:*') pirop('n_not') { ... } tighter('prefix:not') pirop('n_neg') { ... }

proto 'infix:*' proto 'infix:%' proto 'infix:/'

proto 'prefix:not' is is proto 'prefix:-' is is proto 'term:'

is tighter('prefix:-') is parsed(&term) { ... }

Operators and Precedence

149

References
docs/pct/pct_optable_guide.pod

References
[1] http:/ / epaperpress. com/ oper/ download/ oper. pdf

Hash Tables and Arrays


Squaak Language Tutorial
Episode 1: Introduction Episode 2: Poking in Compiler Guts Episode 3: Squaak Details and First Steps Episode 4: PAST Nodes and More Statements Episode 5: Variable Declaration and Scope Episode 6: Scope and Subroutines Episode 7: Operators and Precedence Episode 8: Hashtables and Arrays Episode 9: Wrap-Up and Conclusion Welcome to Episode 8! This is the second-last episode in this tutorial. After this episode, we'll have a complete implementation of our Squaak language. This episode focuses on aggregate data structures: arrays and hashtables. We'll discuss the syntax to assign to them and to construct them. We'll see that implementing the action methods is really easy, almost trivial. After that, we'll make some notes on aggregates as arguments, and how they differ from the basic data types when passing them around as subroutine arguments.

Arrays and Hashtables


Besides basic data types such as integer, floating-point and string, Squaak has two aggregate data types: array and hashtable. An array is an object that can store a sequence of values. The values in this sequence can be of different types, unlike some languages that require all elements of an array to be the same type. An example of using arrays is shown below: grades[0] grades[1] grades[2] grades[3] = = = = "A" "A+" "B+" "C+"

A hashtable stores key-value pairs; the key is used as index to store a value. Keys must be string constants, but the value can be of any type. An example is shown below: lastnames{"larry"} = "wall" lastnames{"allison"} = "randal"

Hash Tables and Arrays

150

Array constructors
Just as there are integer literals (42) and string literals ("hello world") that can be assigned to variables, you can have array literals. Below is the grammar rule for this: rule array_constructor { '[' [ <expression> [',' <expression>]*]? ']' {*} } Some examples are shown below: foo = [] bar = [1, "hi", 3.14] baz = [1, [2, 3, 4] ] The first example creates an empty array and assigns this to foo. The second example shows the construction of three elements, assigning the array to bar. Note that the elements of one array can be of different types. The third example shows the construction of nested arrays. This means that element baz[1][0] evaluates to the value 2 (indexing starts at 0).

Hashtable constructors
Besides array literals, Squaak supports hashtable literals, that can be constructed through a hashtable constructor. The syntax for this is expressed below: rule hash_constructor { '{' [<named_field> [',' <named_field>]* ]? '}' {*} } rule named_field { <string_constant> '=>' <expression> {*} } Some examples are shown below: foo = {} bar = { "larry" => "wall", "allison" => "randal" } baz = { "a" => { "b" => 42} } The first line creates an empty hashtable and assigns this to foo. The second creates a hashtable with two fields: "larry" and "allison". Their respective values are: "wall" and "randal". The third line shows that hashtables can be nested, too. There, a hashtable is constructed that has one field, called "a", and its value is another hashtable, containing a field "b" that has the value 42.

Hash Tables and Arrays

151

Implementation
You might think implementing support for arrays and hashtables looks rather difficult. Well, it's not. Actually, the implementation is rather straightforward. First, we're going to update the grammar rule for primary: rule primary { <identifier> <postfix_expression>* {*} } rule postfix_expression { | <index> {*} #= index | <key> {*} #= key } rule index { '[' <expression> ']' {*} } rule key { '{' <expression> '}' {*} } A primary object is now an identifier followed by any number of postfix-expressions. A postfix expression is either a hashtable key or an array index. Allowing any number of postfix expressions allows to nest arrays and hashtables in each other, allowing us to write, for instance: foo{"key"}[42][0]{"hi"} Of course, you as a Squaak programmer must make sure that foo is actually a hashtable, and that foo{"key"} yields an array, and so forth. Implementing this is actually quite simple. First, let us see how to implement the action method index. method index($/) { my $index := $( $<expression> ); my $past := PAST::Var.new( $index, :scope('keyed'), :viviself('Undef'), :vivibase('ResizablePMCArray'), :node($/) ); make $past; } First, we retrieve the PAST node for expression. Then, we create a keyed variable access operation, by creating a PAST::Var node and setting its scope to 'keyed'. If a PAST::Var node has keyed scope, then the first child is evaluated as the aggregate object, and the second child is evaluated as the index on that aggregate. But wait! The PAST::Var node we just created has only one child!

Hash Tables and Arrays Here's where the updated action method for primary comes in. This is shown below. method primary($/) { my $past := $( $<identifier> ); for $<postfix_expression> { my $expr := $( $_ ); $expr.unshift( $past ); $past := $expr; } make $past; } First, the PAST node for identifier is retrieved. Then, for each postfix-expression, we get the PAST node, and unshift the (current) $past onto it. Effectively, the (current) $past is set as the first child of $expr. And you know what $expr contains: that's the keyed variable access node, that was created in the action method index. After that, $past is set to $expr; either there's another postfix-expression, in which case this $past will be set as the first child of that next postfix-expression, or, the current $past is set as the result object.

152

Implementing Constructors
To implement the array and hashtable constructors, we're going to take advantage of Parrot's Calling Conventions (PCC). The PCC supports, amongst others, optional parameters, named parameters and slurpy parameters. If you're Dutch, you might think that slurpy parameters make a lot of noise ("slurpen" is a Dutch verb meaning drinking carefully, which you usually do if your beverage is hot, making noise in the process), but you would be wrong. Slurpy parameters will store all remaining arguments that have not yet been stored in other parameters (implying that there can only be one slurpy (positional) parameter, and it should come after all normal (positional) parameters). Parrot will automatically create an aggregate to store these remaining arguments. Besides positional slurpy parameters, you can also define a named slurpy parameter, which will store all remaining named parameters, after all normal (named) arguments have been stored. You might be confused by now. Let's look at an example, as this issue is worth a few brain cells to store. .sub foo .param .param .param .param .param .param .end foo(1, 2, 3, 4, 6 :named('y'), 5 :named('x'), 7 :named('p'), 8 :named('q') ) This will result in the following mapping: a: 1 b: 2 c: {3, 4}

pmc pmc pmc pmc pmc pmc

a b c k l m

:slurpy :named('x') :named('y') :named :slurpy

Hash Tables and Arrays k: 5 l: 6 m: {"p"=>7, "q"=>8} So, after the positional parameters (a, b), c is declared as a slurpy parameters, storing all remaining positional parameters. Parameters k and l are declared as named parameters, which have the respective names "x" and "y". Using these names, values can be passed. After the named parameters, there's the parameter m, which is both flagged as named and slurpy. This parameter will store all remaining named arguments that have not yet been stored by the normal named parameters. The interesting parameters for us are "c" and "m". For the positional slurpy parameter, Parrot creates an array, while for the named slurpy parameter a hashtable is created. This happens to be exactly what we need! Implementing the array and hash constructors becomes trivial: .sub '!array' .param pmc fields :slurpy .return (fields) .end .sub '!hash' .param pmc fields :named :slurpy .return (fields) .end Array and hashtable constructors can then be compiled into subroutine calls to the respective Parrot subroutines, passing all fields as arguments. (Note that these names start with a "!", which is not a valid Squaak identifier. This prevents us from calling these subs in normal Squaak code).

153

Basic data types and Aggregates as arguments


All data types, both basic and aggregate data types are represented by Parrot Magic Cookies (PMCs). The PMC is one of the four built-in data types that Parrot can handle; the others are integer, floating-point and string. Currently, the PCT can only generate code to handle PMCs, not the other basic data types. Parrot has registers for each its four built-in data types. The integer, floating-point and string registers store the actual data value, while PMC registers store a reference to the PMC object. This has consequences for how PMCs are handled when passing them as arguments. When passing a PMC as an argument, the invoked subroutine gets access to the PMC reference; in other words, PMCs are passed by reference. This means that the subroutine can change the original argument that was passed by the caller. Of course, it depends what instructions are being generated, what the invoked subroutine does to the references. In Squaak, when passing basic data values, these cannot be changed by the invoked subroutine. When assigning a new value to a parameter, a whole new object is created and bound to the parameter identifier. No changes are made to the original argument. Aggregate data types are handled differently, however. When an invoked subroutine assigns to an index or hashtable field of a parameter, then the original argument is affected. In other words, basic data types have by value semantics, while aggregate data types have by reference semantics. A short example to demonstrate this: sub foo(a,b,c) a = 42 b[0] = 1 c{"hi"} = 2 end

Hash Tables and Arrays var a = 0 var b = [] var c = {} foo(a,b,c) print(a, b[0], c{"hi"} ) # prints 0, 1, 2

154

What's Next?
This was the last episode to discuss implementation details to make Parrot (run) Squaak. After doing this episode's exercises, your implementation should be fairly complete. Next episode will be the last of this series, in which we'll recap what we did, and demonstrate our language with a nice demo program.

Exercises
Problem 1 We've shown how to implement keyed variable access for arrays, by implementing the action method for index. The same principle can be applied to keyed access for hashtables. Implement the action method for key. method key($/) { my $key := $( $<expression> ); make PAST::Var.new( $key, :scope('keyed'), :vivibase('Hash'), :viviself('Undef'), :node($/) ); } Problem 2 Implement the action methods for array_constructor and hash_constructor. Use a PAST::Op node and set the pasttype to 'call'. Use the "name" attribute to specify the names of the subs to be invoked (e.g., :name("!array") ). Note that all hash fields must be passed as named arguments. Check out PDD26 for doing this, and look for a "named " method. method named_field($/) { my $past := $( $ ); my $name := $( $ ); ## the passed expression is in fact a named argument, ## use the named() accessor to set that name. $past.named($name); make $past; } method ## ## ## ## my array_constructor($/) { use the parrot calling conventions to create an array, using the "anonymous" sub !array (which is not a valid Squaak name) $past := PAST::Op.new( :name('!array'),

Hash Tables and Arrays :pasttype('call'), :node($/) ); for $<expression> { $past.push($($_)); } make $past; } method ## ## ## my hash_constructor($/) { use the parrot calling conventions to create a hash, using the "anonymous" sub !hash (which is not a valid Squaak name) $past := PAST::Op.new( :name('!hash'), :pasttype('call'), :node($/) ); for $<named_field> { $past.push($($_)); } make $past;

155

} Problem 3 We'd like to add a little bit of syntactic sugar for accessing hashtable keys. Instead of writing foo{"key"}, I'd like to write foo.key. Of course, this only works for keys that do not contain spaces and such. Add the appropriate grammar rule (call it "member") that enables this syntax, and write the associated action method. Make sure this member name is converted to a string. Hint: use a PAST::Val node for the string conversion. rule postfix_expression | <key> {*} #= | <member> {*} #= | <index> {*} #= } rule member { '.' <identifier> {*} } method member($/) { my $member := $( $<identifier> ); ## x.y is syntactic sugar for x{"y"}, ## so stringify the identifier: my $key := PAST::Val.new( :returns('String'), :value($member.name()), :node($/) ); ## the rest of this method is the same { key member index

Hash Tables and Arrays ## as method key() above. make PAST::Var.new( $key, :scope('keyed'), :vivibase('Hash'), :viviself('Undef'), :node($/) ); }

156

Wrap-Up and Conclusion


Squaak Language Tutorial
Episode 1: Introduction Episode 2: Poking in Compiler Guts Episode 3: Squaak Details and First Steps Episode 4: PAST Nodes and More Statements Episode 5: Variable Declaration and Scope Episode 6: Scope and Subroutines Episode 7: Operators and Precedence Episode 8: Hashtables and Arrays Episode 9: Wrap-Up and Conclusion Welcome to the final Episode of the Parrot Compiler Tools Tutorial! Let's review the previous episodes, and summarize this tutorial.

Review
In Episode 1, we introduced the Parrot Compiler Tools (PCT), gave a high-level feature overview of Squaak, the case study language that we are implementing in this tutorial, and we generated a language shell that we use as a foundation to implement Squaak. Episode 2 discussed the general structure of PCT-based compilers. After this, we described each of the four default compilation stages: parse phase, parse tree to PAST, PAST to POST and POST to PIR. We also added a command line banner and command line prompt to the interactive language shell. In Episode 3, we introduced the full grammar of the Squaak language. After this, we started implementing the first bits, after which we were able to generate code for (simple) assignments. In Episode 4 we discussed the construction of Parrot Abstract Syntax Tree nodes in more detail, after which we implemented the if-statement and throw-statement. Episode 5 focused on variable declarations and variable scope. We implemented the necessary infrastructure to handle global and local variables correctly. In Episode 6 we continued the discussion of scope, but now in the context of subroutines. After this we implemented subroutine invocation. Episode 7 extended our grammar to handle complex expressions that allows us to use arithmetic and other operators. We discussed how to use PCT's built-in support for handling operator precedence. In the previous episode, Episode 8, we discussed the grammar and action methods for handling the aggregate data types of Squaak: arrays and hashes. We also touched on the topic of argument passing by reference and by value. If you followed the tutorial and did the exercises, your implementation should be complete. Although a lot of the implementation was discussed, some parts were left as the proverbial exercise to the reader. This is to stimulate you to get your hands dirty and figure out things for yourself, while the text contained enough hints (in my opinion) to

Wrap-Up and Conclusion solve the given problems. Sure enough, this approach requires you to spend more time and think for yourself, but I think you're reading all this stuff to learn something. The extra time spent is well worth it, in my opinion. Now it's time to see what we can do with this language. Squaak is more than just the average calculator example, which is often provided in beginner's discussions on parsers; it's a complete programming language.

157

What's Next?
This is the last episode of the Parrot Compiler Tools tutorial. We showed how we implemented a complete language for the Parrot virtual machine in only a few hundred lines of source code. Surely, this must be the proof that the PCT really is an effective toolkit for implementing languages. At the moment of writing, the PCT still lacks efficient support for certain language constructs. Therefore, we focused on the parts that are easy to build with the PCT. Once the PCT is feature complete, there's bound to be another tutorial on advanced features. Think of object-oriented programming, closures, coroutines, and advanced control-flow such as return statements. Most of them can be done already, but are too complex for this tutorial's level.

The Game of Life


You might have noticed that Squaak looks a bit like Lua, although it does differ in some points. This is not entirely accidental. In the distribution of the Lua source code, there's an example called "life.lua", which implements Conway's "Game of Life". This is a nice demonstration program, and it's easy to port it to Squaak. Its implementation is shown below. Run it, and enjoy! ## John Conway's Game of Life ## Implementation based on life.lua, found in Lua's distribution. ## var width = 40 # width of "board" var height = 20 # height of "board" var generation = 1 # generation couner var numgenerations = 50 # how often should we evolve? ## initialize board to all zeroes sub initboard(board) for var y = 0, height do for var x = 0, width do board[y][x] = 0 end end end ## spawn new life in board, at position (left, top), ## the life data is stored in shapedata, and shape width and ## height are specified. sub spawn(board, left, top, shapew, shapeh, shapedata) for var y = 0, shapeh - 1 do for var x = 0, shapew - 1 do board[top + y][left + x] = shapedata[y * shapew + x] end end end

Wrap-Up and Conclusion

158

## calculate the next generation. sub evolve(thisgen, nextgen) var ym1 = height - 1 var y = height var yp1 = 1 var yi = height while yi > 0 do var xm1 = width-1 var x = width var xp1 = 1 var xi = width while xi > 0 do var sum = + + + + + + + thisgen[ym1][xm1] thisgen[ym1][x] thisgen[ym1][xp1] thisgen[y][xm1] thisgen[y][xp1] thisgen[yp1][xm1] thisgen[yp1][x] thisgen[yp1][xp1]

nextgen[y][x] = sum==2 and thisgen[y][x] or sum==3 xm1 x xp1 xi end ym1 y yp1 yi end end ## display thisgen to stdout. sub display(thisgen) var line = "" for var y = 0, height do for var x = 0, width do if thisgen[y][x] == 0 then = = = = y yp1 yp1 + 1 yi - 1 = = = = x xp1 xp1 + 1 xi - 1

Wrap-Up and Conclusion line = line .. "-" else line = line .. "O" end end line = line .. "\n" end print(line, "\nLife - generation: ", generation) end ## main program sub main() var heart = [1,0,1,1,0,1,1,1,1] var glider = [0,0,1,1,0,1,0,1,1] var explode = [0,1,0,1,1,1,1,0,1,0,1,0] var thisgen = [] initboard(thisgen) var nextgen = [] initboard(nextgen) spawn(thisgen,3,5,3,3,heart) spawn(thisgen,5,4,3,3,glider) spawn(thisgen,25,10,3,4,explode) while generation <= numgenerations do evolve(thisgen, nextgen) display(thisgen) generation = generation + 1 ## prevent switching nextgen and thisgen around, ## just call evolve with arguments switched. evolve(nextgen, thisgen) display(nextgen) generation = generation + 1 end end ## start here. main() Note the use of a subroutine "print". Check out the file src/builtins/say.pir, and rename the sub "say" (which was generated by the language shell creation script) to "print".

159

Wrap-Up and Conclusion

160

Solution
If you don't feel like doing exercises or just want to see what it looks like without doing any trouble, here's what it looks like (this is life generation 9). -----------------------------------------------------------------------------------------------------------------------------O--------------------------------------OO------------------------------------OO--O--------------------------------------OO-----------------------------------OOOOO------------------OOO------------------------------------O---O----------------------------------O-----O--------------------------------O---O---O-------------------------------O--O-O--O-------------------------------O---O---O--------------------------------O-----O----------------------------------O---O------------------------------------OOO--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Life - generation: 9 But really, it doesn't compare to seeing this program run on Parrot :-)

Exercises
Squaak was designed to be a simple language, offering enough features to get some work done, but at the same time keeping it simple. Of course, after reading this tutorial, You are an expert too ;-) If you feel like adding more features, here are some suggestions. Implement prefix and postfix increment/decrement operators, allowing you to write "generation++" instead of "generation = generation + 1". Implement augmenting assign operators, such as "+=" and friends. Extend the grammar to allow multiple variable declarations in one statement, allowing you to write "var x = 1, y, z =3". Of course, the initialization part should still be optional. How do you make sure that the identifier and initialization expression are kept together? Implement a mechanism (such as an "import" statement) to include or load another Squaak file, so Squaak programs can be split into multiple files. The PCT does not have any support for this, so you'll need to write a bit of PIR to do this. Improve the for-statement, to allow for a negative step. Note that the loop condition becomes more complex when doing so. Note that these are suggestions, and I did not implement them myself, so I won't have a solution for you at the end.

Wrap-Up and Conclusion

161

Final words and Acknowledgments


By now, you should have got a good impression of the PCT and you should be able to work on other languages targeting Parrot. Currently, work has been done on ECMAScript, Python, Ruby and of course Perl 6. Most of them are not complete yet (hint, hint). I hope you enjoyed reading this tutorial and learned enough to feel confident about working on other (existing) languages targeting Parrot. The Perl 6 implementation can still use more contributors! Many thanks to all who read this tutorial and provided me with hints, tips and feedback! Thank You for reading this!

162

Resources and Licensing


Resources
Resources
These blog posts are released into the public domain and can be adapted directly: http://www.parrotblog.org/2008/03/targeting-parrot-vm.html http://www.parrotblog.org/2008/03/episode-2-poking-in-compiler-guts.html http://www.parrotblog.org/2008/03/episode-3-squaak-details-and-first.html http://www.parrotblog.org/2008/03/episode-4-past-nodes-and-more.html http://www.parrotblog.org/2008/03/episode-5-variable-declaration-and.html http://www.parrotblog.org/2008/03/episode-6-scope-and-subroutines.html http://www.parrotblog.org/2008/03/episode-7-operators-and-precedence.html

http://www.parrotblog.org/2008/03/episode-8-hashtables-and-arrays.html Other Resources Parrot Documentation [1] Parrot Design Docs [2] Parrot Project Home [3]

References
[1] http:/ / www. parrotcode. org/ docs/ [2] http:/ / www. parrotcode. org/ docs/ pdd/ [3] http:/ / www. parrotcode. org

Licensing

163

Licensing
Licensing
The text of this book is released under the following license:
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License."

Parrot, it's source code, and the tools required to build it are released under the terms of the Artistic License 2.0.

Compatibility with Artistic 2.0


Parrot and most of it's documentation library is released under the terms of the Artistic 2.0 license. For maximum compatibility, and for the ability to reuse content developed here on Wikibooks in other places in the Parrot ecosystem, some users have opted to have their contributions to this book explicitly dual-licensed GFDL and Artistic 2.0: Whiteknight Andrew Whitworth Contributions to this book by users who are listed above may be reused under the Artistic 2.0 license where necessary. This is not a condition to editing this book, and edits made by users who have not listed themselves above are to be considered GFDL-only.

Article Sources and Contributors

164

Article Sources and Contributors


Wikibooks:Collections Preface Source: http://en.wikibooks.org/w/index.php?oldid=2347851 Contributors: Adrignola, Jomegat, Magesha, Martin Kraus, Mike.lifeguard, RobinH, Whiteknight Introduction Source: http://en.wikibooks.org/w/index.php?oldid=1381261 Contributors: Jkeenan, Whiteknight, Yosri Introduction Source: http://en.wikibooks.org/w/index.php?oldid=1381261 Contributors: Jkeenan, Whiteknight, Yosri Building Parrot Source: http://en.wikibooks.org/w/index.php?oldid=1688198 Contributors: Jkeenan, KenJackson.US, Whiteknight, Yosri Running Parrot Source: http://en.wikibooks.org/w/index.php?oldid=1373737 Contributors: Dallas1278, Whiteknight, Yosri Parrot Programming Source: http://en.wikibooks.org/w/index.php?oldid=2176469 Contributors: Ayardley, Jkeenan, KenJackson.US, ReiniUrban, Van der Hoorn, Whiteknight, Yosri, 2 anonymous edits Parrot Assembly Language Source: http://en.wikibooks.org/w/index.php?oldid=1227227 Contributors: Whiteknight, Yosri Parrot Intermediate Representation Source: http://en.wikibooks.org/w/index.php?oldid=2286724 Contributors: Whiteknight, Yosri, 4 anonymous edits Parrot Magic Cookies Source: http://en.wikibooks.org/w/index.php?oldid=1597840 Contributors: Adrignola, Jrtayloriv, Whiteknight, Yosri, 1 anonymous edits Multithreading and Concurrency Source: http://en.wikibooks.org/w/index.php?oldid=1275298 Contributors: Whiteknight, Yosri, 4 anonymous edits Exception Handling Source: http://en.wikibooks.org/w/index.php?oldid=1238614 Contributors: Whiteknight, 1 anonymous edits Classes and Objects Source: http://en.wikibooks.org/w/index.php?oldid=2346012 Contributors: Mastersrp, Whiteknight, 1 anonymous edits The Parrot Debugger Source: http://en.wikibooks.org/w/index.php?oldid=1266967 Contributors: Whiteknight, 1 anonymous edits Parrot Compiler Tools Source: http://en.wikibooks.org/w/index.php?oldid=2146153 Contributors: Cinchent, Jrtayloriv, Whiteknight, 2 anonymous edits Parrot Grammar Engine Source: http://en.wikibooks.org/w/index.php?oldid=2146156 Contributors: Afbach, Cinchent, Furrykef, Panic2k4, Whiteknight, 4 anonymous edits Not Quite Perl Source: http://en.wikibooks.org/w/index.php?oldid=1705277 Contributors: Jrtayloriv, Whiteknight, 7 anonymous edits Optables and Expressions Source: http://en.wikibooks.org/w/index.php?oldid=1935729 Contributors: Jrtayloriv, Whiteknight, 3 anonymous edits Advanced PGE Source: http://en.wikibooks.org/w/index.php?oldid=2146157 Contributors: Cinchent, Whiteknight, 1 anonymous edits Building A Compiler Source: http://en.wikibooks.org/w/index.php?oldid=1828126 Contributors: Tedkat, Whiteknight, 2 anonymous edits HLL Interoperation Source: http://en.wikibooks.org/w/index.php?oldid=1274012 Contributors: Whiteknight Parrot Internals Source: http://en.wikibooks.org/w/index.php?oldid=1275313 Contributors: Whiteknight, 1 anonymous edits IMCC and PIRC Source: http://en.wikibooks.org/w/index.php?oldid=2249684 Contributors: Whiteknight, 2 anonymous edits Run Core Source: http://en.wikibooks.org/w/index.php?oldid=1258809 Contributors: Whiteknight, 1 anonymous edits Memory and Garbage Collection Source: http://en.wikibooks.org/w/index.php?oldid=1459282 Contributors: Van der Hoorn, Whiteknight, 2 anonymous edits PMC System Source: http://en.wikibooks.org/w/index.php?oldid=2372318 Contributors: DavidCary, Whiteknight, 1 anonymous edits String System Source: http://en.wikibooks.org/w/index.php?oldid=1274912 Contributors: Whiteknight Exception Subsystem Source: http://en.wikibooks.org/w/index.php?oldid=1443486 Contributors: Van der Hoorn, Whiteknight IO Subsystem Source: http://en.wikibooks.org/w/index.php?oldid=1238629 Contributors: Whiteknight, 1 anonymous edits JIT and NCI Source: http://en.wikibooks.org/w/index.php?oldid=1238630 Contributors: Whiteknight, 1 anonymous edits Parrot Embedding Source: http://en.wikibooks.org/w/index.php?oldid=1238631 Contributors: Whiteknight, 1 anonymous edits Extensions Source: http://en.wikibooks.org/w/index.php?oldid=1266974 Contributors: Whiteknight, 1 anonymous edits Packfiles Source: http://en.wikibooks.org/w/index.php?oldid=1274888 Contributors: Whiteknight PIR Reference Source: http://en.wikibooks.org/w/index.php?oldid=1238633 Contributors: Whiteknight, 1 anonymous edits PASM Reference Source: http://en.wikibooks.org/w/index.php?oldid=2020376 Contributors: Benabik, Whiteknight, 1 anonymous edits PAST Node Reference Source: http://en.wikibooks.org/w/index.php?oldid=1238635 Contributors: Whiteknight, 1 anonymous edits Languages on Parrot Source: http://en.wikibooks.org/w/index.php?oldid=2140991 Contributors: Whiteknight, 6 anonymous edits HLLCompiler Class Source: http://en.wikibooks.org/w/index.php?oldid=1377590 Contributors: Whiteknight, 1 anonymous edits Command Line Options Source: http://en.wikibooks.org/w/index.php?oldid=1238638 Contributors: Whiteknight, 1 anonymous edits Built-In PMCs Source: http://en.wikibooks.org/w/index.php?oldid=2372324 Contributors: DavidCary, Whiteknight, 1 anonymous edits Bytecode File Format Source: http://en.wikibooks.org/w/index.php?oldid=1274885 Contributors: Whiteknight, 1 anonymous edits VTABLE List Source: http://en.wikibooks.org/w/index.php?oldid=1356142 Contributors: Whiteknight Squaak Tutorial Source: http://en.wikibooks.org/w/index.php?oldid=2210477 Contributors: MC Scared of Bees, Whiteknight, 1 anonymous edits Introduction Source: http://en.wikibooks.org/w/index.php?oldid=1957002 Contributors: Whiteknight, Wtachi, 7 anonymous edits Poking in Compiler Guts Source: http://en.wikibooks.org/w/index.php?oldid=1756857 Contributors: Adrignola, Jrtayloriv, Whiteknight, Wtachi Squaak Details and First Steps Source: http://en.wikibooks.org/w/index.php?oldid=2080516 Contributors: Amoe, Jrtayloriv, Whiteknight, 1 anonymous edits PAST Nodes and More Statements Source: http://en.wikibooks.org/w/index.php?oldid=1635846 Contributors: Jrtayloriv, Whiteknight, 1 anonymous edits Variable Declaration and Scope Source: http://en.wikibooks.org/w/index.php?oldid=1635849 Contributors: Jrtayloriv, Whiteknight, 2 anonymous edits Scope and Subroutines Source: http://en.wikibooks.org/w/index.php?oldid=1635672 Contributors: Jrtayloriv, Whiteknight

Article Sources and Contributors


Operators and Precedence Source: http://en.wikibooks.org/w/index.php?oldid=1180380 Contributors: Whiteknight Hash Tables and Arrays Source: http://en.wikibooks.org/w/index.php?oldid=1635438 Contributors: Whiteknight, 1 anonymous edits Wrap-Up and Conclusion Source: http://en.wikibooks.org/w/index.php?oldid=1180383 Contributors: Whiteknight Resources Source: http://en.wikibooks.org/w/index.php?oldid=1147623 Contributors: Whiteknight Licensing Source: http://en.wikibooks.org/w/index.php?oldid=1278706 Contributors: Whiteknight

165

Image Sources, Licenses and Contributors

166

Image Sources, Licenses and Contributors


Image:Wikibooks-logo-en-noslogan.svg Source: http://en.wikibooks.org/w/index.php?title=File:Wikibooks-logo-en-noslogan.svg License: logo Contributors: User:Bastique, User:Ramac et al. File:Heckert GNU white.svg Source: http://en.wikibooks.org/w/index.php?title=File:Heckert_GNU_white.svg License: Creative Commons Attribution-Sharealike 2.0 Contributors: Aurelio A. Heckert <aurium@gmail.com>

License

167

License
Creative Commons Attribution-Share Alike 3.0 //creativecommons.org/licenses/by-sa/3.0/

Potrebbero piacerti anche