Sei sulla pagina 1di 2

Concept and implementation of the programming language and translator,

for embedded systems, based on machine code decompilation and equivalence


between source and executable code

Samir Ribić
Elektrotehnički fakultet, Sarajevo
megaribi@epn.ba
Advisor: dr Adnan Salihbegović, Elektrotehnički fakultet, Sarajevo

Abstract Alternatively there are just in time compilers which


internally convert pseudo machine code to native code.
In this thesis it will be investigated the possibility of Hybrid translators are designed in several ways. The
developing the programming language translator, most known is translator for programming language
heavily based on decompilation. Instead of keeping FORTH. This language introduced translations of
program in source code, it will be kept in native subroutines (called words) just after they are entered to
machine code, but it will be transparently visible as the compact code called Indirect threaded code [7],
high level language program, with the help of the from which the source code can be reconstructed. But
specialized editor. the format of the code is not native machine code, it is
usually interpreted. Furthermore, FORTH is very hard
1. Introduction to understand, due to reverse Polish notation concept.
The opposite process to compilation is
The most of the programming language translators decompilation, i.e. translation form machine code or
are realized in a form of pure compilers, pseudo pseudo machine code to high level language. This area
compilers, interpreters and hybrid translators. is far less developed than compilation area. The
The pure compilers translate from the high level success of the process varies depending on computer
languages to machine or assembly language. Compared architecture, source language, format of the translated
with all of the mentioned approaches, this concept code, optimization level etc. Although less developed
generates the fastest executable code [3], and therefore than compilation, the decompilation has growing
it is most often used on large computer systems. This success in recent years. [4]
concept is suitable for embedded systems too, if the
translation is performed on application developing 2. The proposed approach
computers, but it is not the most suitable method for
translating on the system where the program is The compilation, editing and decompilation,
executed. This is because of the memory requirements although regarded as related [6] are usually different
for both source and executable code (memory on many processes. Decompilation in the most of
embedded systems is limited) and compilation time. implementations is not an integral part of the translator,
The interpreters are slower than compilers, but they and even if it is part of the translator, usually it is not
do not make difference between source and object decompilation of machine code, but pseudo machine
code. They analyze the source code, and based on the code or some other compact format.
source code they call different subroutines. They do This thesis is researching the possibilities of
not require translation of the whole program even if realization of realizing of the high-level language
just one line of the code is changed. The disadvantage translator on a quite uncommon basis, which will unite
of the concept is slower work and the requirement for some good properties of the interpreted and compiled
the routine library and complete interpreter in memory. languages. These properties are: unique code in main
Pseudo compilers have similar behavior like or external memory (without separation to source and
compilers. They translate high level language to object part), avoiding need for translation of the
pseudo machine code. This code requires special complete program, execution speed of the native
interpreter, and both source and pseudo code. machine code, and independence from interpreters and
included libraries. The programming language needs to

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)


0-7695-2719-1/06 $20.00 © 2006
have standard elements of the procedural programming - Expression generating and recognizing priority
languages (algebra notation expressions with brackets - Code optimization, and problems of ambiguity
and priorities, conditions, arrays, loops, subroutines, - Implementation of some programming
different data types) languages, the simple custom language for very
The specialized editor takes the role of classic small controllers and language close to ANSI C
compiler or interpreter. This editor does not only care - Syntax and semantics
about the programing code being entered, but also - Memory organization of the programs
about program's memory interpretation. Every entered - Local and global variable naming
line is incrementally translated to the machine code, - Rellocatability and code insertion and deletion
and high level ASCII representation is removed from - Patching of executables on some OS-es
memory. However, the machine code is written in such
way that it is possible to recover the statement in the 5. Areas of usage and evaluation
form very close to the original. When the user wants to
edit or list the line of the program, it will be While this approach has some disadvantages, (poor
decompiled and shown on the screen as a line in high portability is the most serious one), it has some good
level language. The editor is not required to be in properties of both interpreted and compiled languages.
memory during program execution. It is expected to be useful on low memory computer
systems where the savings gained with not keeping
3. Comparison with other approaches both source and object code in the memory is
significant advantage. Other possible uses of this
The difference of the proposed approach, compared approach are for easier maintenance of the programs
with the existing ones can be shown in the table 1: that need to be to be frequently patched and updated,
Approach ASCII source Inter- Machine code and for open source applications in a situation where
code exists as mediate exists as separate maintaining of both source and binary versions are met
separate entity code entity
exists as with various kinds of difficulties..
separate The approach will be tested by real life and concrete
entity implementation of the editor-translator on different
Compiler Yes No yes platforms [1] [2] and compared with classic approaches
Compiler w. Yes Yes Yes
intermediate
in terms of execution speed, memory consumption and
code compilation speed, for different small and medium
Source level Yes No No sized testing cases.
interpreter
Pseudo Yes Yes No
compiler 6. References
Compact No Yes No
interpreter, [1] Dave Jaggar, ARM Architectural Reference
hybrids Manual, Prentice Hall, London 1996.
Just in time No (at least Yes Yes (although
compilers near final user) often unseen)
[2] Intel Corporation, IA-23 Architecture Software
Proposed No (but it is No Yes Developer’s Manual, Intel, MT Prospect, 2002.
approach transparently [3] Steven Muchnick. Advanced Compiler Design and
visible) Implementation. Morgan Kaufmann; 1997.
Table 1: Different approaches comparison [4] Christina Cifuentes, Reverse Compilation
It can be seen from the table, that this approach Techniques- PhD Dissertation, Queensland University of
differs from other existing approaches because it does Technology, Department of Comp. Science, Brisbane, 1994
not use neither intermediate code nor ASCII based [6] Mads Sig Ager, O. Danvy, M.Goldberg, A Symmetric
source code, but quick, native machine code for data Approach to Compilation and Decompilation, BRICS,
and program representation. However, the displayed University of Aarhus., Aarhus, 2002
and editable representation of the program code is in [7] James R. Bell , Indirect Threaded Code,
easily understandable, classic, high level language. Communications of the ACM archive, Volume 18 ,
Pages: 330 – 331, 1975
4. Research areas
The issues investigated in the thesis are:
- Resolving ambiguity
- Specific instruction set on some architectures

Proceedings of the 13th Working Conference on Reverse Engineering (WCRE'06)


0-7695-2719-1/06 $20.00 © 2006

Potrebbero piacerti anche