Sei sulla pagina 1di 3

New features Log in / create account

Article Discussion Read Edit View history Search

Scannerless parsing
From Wikipedia, the free encyclopedia
(Redirected from Lexerless parsing)

Main page Scannerless parsing (also called lexerless parsing) refers to the use of a single formalism to express
Contents both the lexical and context­free syntax used to parse a language.
Featured content This parsing strategy is suitable when a clear lexer­parser distinction is unneeded. Examples of when this
Current events is appropriate include TeX, most wiki grammars, makefiles, and simple per application control languages.
Random article
Donate Contents [hide]
1 Advantages
Interaction
2 Disadvantages
About Wikipedia
3 Required extensions
Community portal
4 Implementations
Recent changes
5 Notes
Contact Wikipedia
6 Further reading
Help

Toolbox
Advantages [edit]
Print/export
Only one metasyntax is needed
Languages Non-regular lexical structure is handled easily
Português "Token classification" is unneeded which removes the need for design accommodations such as "the
lexer hack" and language keywords (such as "while" in C)
Grammars can be compositional (can be merged without human intervention) [1]
Click to customize your PDFs pdfcrowd.com
Disadvantages [edit]

since the lexical scanning and syntactic parsing processing is combined, the resulting parser tends to
be harder to understand and debug for more complex languages
most parsers of character­level grammars are nondeterministic
there is no guarantee that the language being parsed is unambiguous

Required extensions [edit]

Unfortunately, when parsed at the character level, most popular programming languages are no longer
strictly context­free. Visser identified five key extensions to classical context­free syntax which handle
almost all common non­context­free constructs arising in practice:
Follow restrictions, a limited form of "longest match"
Reject productions, a limited form of negative matching (as found in boolean grammars)
Preference attributes to handle the dangling else construct in C­like languages
Per-production transitions rather than per­nonterminal transitions in order to facilitate:
Associativity attributes, which prevent a self­reference in a particular production of a nonterminal
from producing that same production
Precedence/priority rules, which prevent self­references in higher­precedence productions from
producing lower­precedence productions

Implementations [edit]

SGLR is a parser for the modular Syntax Definition Formalism SDF, and is part of the ASF+SDF
Meta­Environment and the Stratego/XT program transformation system.
JSGLR , a pure Java implementation of SGLR, also based on SDF.
TXL supports character­level parsing.
dparser generates ANSI C code for scannerless GLR parsers.
Spirit allows for both scannerless and scanner­based parsing.

Click to customize your PDFs pdfcrowd.com


SBP is a scannerless parser for boolean grammars (a superset of context­free grammars), written in
Java.
Wormhole has an option to generate scannerless GLR in various languages.

Notes [edit]

^ This is because parsing at the character level makes the language recognized by the parser a single
context­free language defined on characters, as opposed to a context­free language of sequences of
strings in regular languages. Some lexerless parsers handle the entire class of context­free languages,
which is closed under composition.

Further reading [edit]

Visser, E. (1997b). Scannerless generalized­LR parsing. Technical Report P9707, Programming Research
Group, University of Amsterdam

Categories: Parsing algorithms

This page w as last modified on 27 September 2010 at 05:46.

Text is available under the Creative Commons Attribution­ShareAlike License; additional terms may apply. See Terms of Use for
details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non­profit organization.

Contact us

Privacy policy About Wikipedia Disclaimers

Click to customize your PDFs pdfcrowd.com

Potrebbero piacerti anche