Sei sulla pagina 1di 16

7 Interpreting Numeral Systems

§7.1 Introduction
Math is often referred to as a unifying language of all humans, something that everyone can
understand. However, there is a surprising number of ways to count, whether it’s due to
civilizations counting in different bases or simply using different symbols or structures for
labeling their numbers, possibly even counting different objects with different numbers.
The purpose of this handout is to outline an approach for solving numeral system oriented
problems on NACLO, as well as providing information on how the different regions of the world
count and how they write numbers. This handout, like most of the handouts in this class, will
approach this topic on a deeper level than relying on intuition, providing practical strategies
and algorithms to solve problems.
NACLO problems involving interpreting numbers have become a staple of NACLO Round 1
tests over the past few years, with one of the seven or eight Round 1 questions being based on
numeral systems. While there is some variation between questions of the type on the exams, the
questions follow a specific type of layout which makes it possible to be able to solve a “general”
numeral problem on NACLO consistently.

§7.2 Identifying Numeral System Questions


Numeral system questions on NACLO can be identified by the presence of numbers in the
dataset, the usually tabular or listed information that’s often symbolic provided in the problem.
These questions involve several key elements that are important in solving them:

1. The language introduction at the start of most problems is a quick sentence or para-
graph that gives a description of what kind of language the numeral system is in. It
contains a geographical region where the numeral system is from, as well as its name
(we refer to the language as the target language), and how many people speak it. The
last two are usually largely irrelevant; NACLO chooses languages that aren’t widely spo-
ken for its translation based questions. Most of the languages that appear on NACLO are
spoken by less than 1 million people (and for those that are spoken more and do appear on
NACLO, often the problem focuses on some unconventional jargon). Occasionally origin
of the language is provided; this can be taken advantage of if you are familiar with the
languages the target language is it derived from.

2. As mentioned previously, the dataset for numeral system questions almost always takes
form of either some form of “phrase bank” along with what numbers they match to, or a
set of mathematical expressions with given values (all written in the target language).

3. Finally, the questions. These are what you have to provide answers to, and are what
you will be graded on during NACLO. The key difference between NACLO and other
competitions (such as those in math or physics), is that there is information given in
the question to solve the question. While providing correct answers to the questions
given is what gives you points, in reality, the task at hand is to be able to confidently
translate any numerical phrase similar to the ones given with ease by figuring out the
inner workings of the numbers in the language. You’ve been able to solve the question

1
Everaise Academy (2020) Computational Linguistics

successfully if you can interpret the numbers almost as confidently as you can in your
native language. More on how to approach this later...

Let’s start with a motivating example which can elucidate the format of NACLO numerical
system problems:

Example 7.1 (2020 Naclo Round 1).

2
Everaise Academy (2020) Computational Linguistics

We make some quick observations:

1. The target language seems to be Breton, which has distant relations to European lan-
guages (not very useful) and is spoken in the region around the United Kingdom.

2. As this problem is giving us expressions, the words with mathematical symbols between
them can be assumed to be numbers.

3. We notice that the questions provide us with more equations; although some terms are
missing, this can give us a better idea of how numbers are formed in this language. The
irregular triwec’h is a promising start into understanding how the numeral system works.

§7.3 The Common Types of Numeral Systems


Before we begin to try to solve the example problem, we briefly explain what numeral systems
exist (or have existed) in the world. We’ll also explain some general templates of how numerical
systems behave that you should keep in mind while solving problems.

Basic Base System

Remark. This is an idea that you may be familiar with.

Definition 7.2. A base system satisfies the following:

1. There exists some integer base, powers of which are “building blocks” of a number (most
of us count in base-10 in the modern world).

2. There are digits which serve as multipliers for the powers of the base when writing a
number. For example, in base-10, for the number 324, this is really 3 · 102 + 2 · 101 + 4 · 100 .

Base systems don’t appear as much in NACLO compared to other numeral systems because
the majority of human languages did not develop a basic base system naturally. However, it is
important to comment on a few key ideas. Numeral systems that are basic base systems come
in a few types:

1. Some will have words for powers of the base. This means, that if we’re in some imaginary
base-8 language named “Avatarese”, and we’re trying to represent 5 · 82 + 4 · 81 + 80 , if
the word for 82 is “appa”, the word for 81 is “sakka”, and the word for 80 is “toph”, then
this number might be written as “kak appa sam sakka glig toph”, where “kak”, “sam”,
and “glig” are the digits 5, 4 and 1. It depends on the language whether or not 1 · 8k is
just the word for 8k , or the word for 1 followed by the word for 8k .

2. Some might take those words for powers of the base to be assumed to be there (hence
they’re omitted). This means, that in the example we had above, one would just write
“kak sam glig”.

3. It is also very common for some seemingly random words to be included for no reason.
This happens a lot through the evolution of languages, which is part of the reason for why
some languages has such “oddities” in their numeral systems (A good example is French).

3
Everaise Academy (2020) Computational Linguistics

Remark. Regions which use basic base systems are for the most part those which are
involved in modern society. Very few languages spoken in regions that are fairly isolated
have a pure base system.

Base System with Irregular Bases

Remark. This is by far the most common numeral system you will see on the NACLO.

Definition 7.3. A irregular base system satisfies the following:

1. The concept of positional value still exists, with digits applying to certain “values”.

2. However, instead of being powers of some base, these values are fixed quantities that are
somehow important to a civilization (For example, a civilization can have words for 1,
5, 15, and 50, essentially expressing things in terms of these. They would use a greedy
algorithm, which essentially means when representing their numbers, the figure out how
many 50s go into it, then how many 15s go into the remainder, and so on. 93 would be
represented as 50 + 2 · 15 + 2 · 5 + 3 · 1.)

3. The ideas regarding whether or not the words for positional values are written or not, and
whether or not filler words are used remains similar to the basic base system.

To understand why such odd choices of position values were made, it is often sufficient
to simply investigate the human body itself. The reason that so many cultures have 5 as a
positional value is because that’s how many fingers we have on one hand. Some African and
Pacific Islander cultures count in irregular bases with position values such as 23, 27, and 32,
which while seeming rather arbitrary choices, actually count all the key parts of the body or of
the face; instead of counting on their fingers they simply touch various parts of their body.

Remark. Here are some common irregular base systems that appear in the world:

1. Base 2 - This is most commonly used in computer science, but is also used in the
British Imperial System for measuring volume.

2. Base 5 - This is one of the more popular choices and is used by civilizations in
Australia, southern Africa, and western parts of Latin America. It usually comes in
the standard 20, 10, 5, 1 positions with having digits from 1 to 4.

3. Base 6 - More common in the Pacific Islands, this numeral system operates on
powers of 6 with additional positional values dedicated to key multiples of 6 such as
12 and 18.

4. Base 12 - This one appears everywhere. You can find it on almost every continent,
and there often exists a “position value” word in such a language for 60.

5. Base 15 - Used in some Pacific Island regions, it hinges on the fact that 15 is a
multiple of 5: 5 is often included as a positional value.

6. Base 20 - This is extremely widespread, especially in Central America, with 1, 5, 10,


and 20 being the main positions. The difference from base 5 is that it avoids powers
of 5.

4
Everaise Academy (2020) Computational Linguistics

Unordered Numeral Systems


Perhaps the most basic type of numeral system is simply letting some word represent a value.
Definition 7.4. An Unordered Numeral System satisfies the following:
1. Has some set of vocabulary that mean certain quantities (let’s say “appa” is 13, “ganji”
is 4 and “set” is 1)

2. When trying to form a more complicated number, simply form it from the basic quantities
(since 45 = 13 + 13 + 13 + 4 + 1 + 1 it would be “appa appa appa ganji set set”)
It should be clear why this kind of system is inefficient, but it provides a clear way of writing
numbers with a small collection of vocabulary.

Remark. Some examples of this include Kharosthi (which uses 1, 2, 3, 4, 10, 20, 100, 1000 as
their vocabulary), as well as Telefol, Oksapmin, Kalam, Kobon, and Ngiti languages (Found
in Africa and New Guinea, based off of counting body parts, which use the numbers from
1 to 23, 27, or 32 as their quantities, simply adding them to form numbers that need to be
bigger).

Taking out a list of all known human languages and memorizing their numeral systems isn’t
the best idea, yet thinking in terms of these systems will end up being useful in most numeral
system questions.

Remark. All (almost all) the number systems have 1 as a positional value. Don’t try
searching for 0 in most number systems, it’s largely absent due to the simple way of
denoting 0 as the lack of a digit/position value.

§7.4 Strategy to Solve Questions

Suggestion. Here’s a method of solving through pattern-finding systematically :

1. Step 1: What words repeat in similar positions, or are there lots of words with common
endings?

2. Step 2: How many distinct words are there? Try to list them.

3. Step 3: What is the “order” of how numbers are written? Is it evident that there
exist phrases built out of words that have “smaller numerical meaning”?

4. Step 4: What kind of numeral system?

5. Step 5: Which words correspond to which numbers?

Let’s go more in depth on what exactly this means.


1. Step 1: What words repeat in similar positions? Position values are often words, and
different position values are often separated by fillers synonymous to “and”. These are
occasionally also found on the end of a word or position value. If you can find words or
symbols that repeat in similar positions, you have found evidence of a base system and
should start thinking of what that word represents. Maybe it means 1, maybe it means
“and”, maybe it means “double everything to the right”. You may be looking at:

5
Everaise Academy (2020) Computational Linguistics

• Reoccurring endings of words or words themselves that are symbolizing positional


values (the things irregular base systems are based on)
• Reoccurring words could be synonyms to “and” or separators.
• Reoccurring words can be special operators, similiar to prefixes or postfixes, meaning
half, or double. For example, a “half twelve” could be interpreted as a “six.”
Examine simple possibilities first, don’t jump to complicated conclusions immediately in
your pattern-finding (i.e. the civilization probably didn’t base their numeral system on
a Caesar cipher [they probably didn’t even have those, KEEP IT SIMPLE). Couldn’t
find any pattern? Don’t worry, just move on and cycle back here if you’ve discovered
something new.

2. Step 2: How many distinct words are there? Try to list them. The idea here is that if
they have 6 distinct words that are used, the base system is probably not a basic base 12
system, as that would require at least 11 digits. Use common sense in what base systems
are plausible. Make a vertical/column list of all the words that appear. This helps not
only organize the information you’ve been given in a problem, but naturally sets up the
next steps. When doing this you might encounter:
• “Root words” that appear in multiple different words. While it’s entirely possible
that these might just be different words with different meanings, try to see if it can be
explained by possibly adding prefixes or suffixes that denote place value in different
ways to a common root.
• Words and suffixes that were talked about in Step 1. Mark these words as “special”,
list them separately of the other words you encounter.
• If you have your list of suspected digits and suspected positional values, you have a
great piece of evidence to try creating theories about what kind of numeral system
is being used (keep it simple).
• Often the “teens” have interesting special properties in languages. For example,
unlike twenty-seven, seventeen is not teny-seven. Try assigning odd endings or orders
of symbols to that modify regular digits to apply to the teens (11-19).
• Another key thing to note is that numbers that are just one word are almost always
either just a positional value or a digit. When given singular words that have a
numerical meaning, try to assign them to digits or positional values.

3. Step 3: What is the “order” of how numbers are written? Is it evident that there exist
phrases built out of words that have “smaller numerical meaning”? Alright, no matter
what number system we’re dealing with, there’s got to be some “systematic” way that the
population that speaks the target language constructs their numbers. However, it just
so happens that this construction also gives an “order” to the phrase representing the
number. You’re most likely looking for either:
• Place values increasing from left to right (i.e. 1 + 1 · 10 + 5 · 20
• Place values decreasing left to right (i.e. 5 · 20 + 1 · 10 + 1
When considering the order of the problem, really try to look out for for the “digit-position
value” pairs, essentially things that would translate numerically to 4·10 or 3·20 or whatever
positional value being scaled by a digit. Either way, you should try to categorize what
you think are digits, what you think are place values, and what you think are filler words
in more depth.

6
Everaise Academy (2020) Computational Linguistics

4. Step 4: What kind of numeral system? Figure out if this looks more like a base system
or an unordered system, or something else. Note the following:
• Most systems are irregular base systems, this is simply the most common scenario,
and should be examined first unless you have strong evidence otherwise.
• The combination of 1, 5, 10, 20 forms a VERY common numeral system. Try this
out if you’re struggling for ideas.
• In general, numeral systems consisting of the powers of some number (like 5 or 6)
but also including double, quadruple, and/or half of each of the powers are pretty
encompassing of number systems. Try this out.
• If you are given seemingly irrelevant words to translate in the problem, try to think
about them in a numerical way. Note that humans have used body parts as a primary
tool of counting for a while, so maybe numbers of fingers on a hand, on a person
would be useful. Note that this generalization would provide natural bases for the
irregular base system based on body parts.
5. Step 5: Which words correspond to which numbers? This is a very delicate step. Quick
disclaimer, you probably made some form of wrong assumption (don’t worry you’ll find
it later) in steps 1 through 4, so naturally you might not be able to pair the words you
identified as “digits” with the digits you want them to be and that’s okay. Ask yourself
the following:
• Did I make any assumptions in my digit pairing (you must’ve started somewhere)
which weren’t necessarily right? Go try different values for those assumptions.
• Are my numbers just not fitting the patterns at all no matter what I try? Maybe
it’s time to circle back to Step 1-4, this is a cyclical process.
It’s important to clarify how you should try to find this “perfect matching” between your
digits and what words int he target language you suspect are digits. Don’t just try random
permutations of the digits (assigning them to some random words) and seeing if it works.
A much more fruitful approach is to try assuming “Word A” is some digit, lets say 3.
Then go from there, ask the questions:
• Which other equations/phrases use the word I just assumed was 3. What does
assuming that word meaning 3 let me conclude in these other phrases/equations?
• If you’re dealing with a word bank, assuming that some word is 3 means that other
words probably do not mean three. What does that tell you?
If you seem to be getting good results (no contradictions, things seem to be falling in
place), try assuming that some other word is some other digit, or use the logical chains
developed by asking yourself the questions just outlined. But importantly, you should
feel just as free to drop an assumption that is not giving you results as you did when you
picked it up. If you’re feeling sparks of intuition which are telling you that a certain word
is a certain digit (maybe you speak another language, or are a natural genius), go pursue
those in the method described above! But remember, just because something feels like it
should be a certain digit doesn’t mean it is, so again: you should feel just as free to drop
an assumption that is not giving you results as you did when you picked it up.
As mentioned before, this process is cyclic. Very few finish a problem in one continuous streak
of intuition that gives them only right answers. Your job is simply to try to make as educated
judgments as you can, and pursue other cases in a smart way. For those of you with a computer
science background, your thought process should be default Depth First Search until you reach
contradictions, or hopefully, the right answer.

7
Everaise Academy (2020) Computational Linguistics

§7.5 Explaining the Example


Keeping the entire previous sections in mind, let’s return to our example with the Breton
Numbers. We will follow the process presented above as closely as possible.

1. Step 1: What words repeat in similar positions, or are there lots of words with common
endings? We notice that “ha” repeats a whole lot, and almost always in the middle of a
phrase, and its a pretty short word, suggesting that it is an “and” type of word and use
for conjunction of smaller numbers to form bigger numbers.
Some other interesting repeating words are “warn” and “ugent”. The fact that “ugent”
always appears on the right side of the “ha”, and most of the time with some other word
in front of it suggests that it is a word with positional value. “warn” has the interesting
property of not being connected by a dash “-”, to ugent like all the other cases. This
suggests that it has some special property, we just don’t know it yet. “warn” seems to
function like a “ha”, but only for “ugents” without any number attached to them.
We note that “zek” is a very common ending, we’ll keep that in mind.

2. Step 2: How many distinct words are there? Try to list them. Here’s my list: Digits: tri
daou pemp unan pevar seizh c’hwec’h
Position values/modifiers: kant hanter warn ugent ha -zek
We note that the first class of words all behave very similarly, they can exist on their own,
by added to an “-ugent” or have a “-zek” added to the end of them. The fact that they
have such a homogeneous type of usage, and the fact that there 7 of them (a lot) imply
that these are most likely digits.
We separate “hanter” and “kant” from the rest based on the idea that they aren’t “-zek”
’ed anywhere, aren’t used with “ugent”, and that they appear in the places that we would
usually expect an “ugent”, so they are probably in a similar class of positional values.

3. Step 3: What is the “order” of how numbers are written? Is it evident that there exist
phrases built out of words that have “smaller numerical meaning”? We already kind of
addressed this, but again, we noticed the common “Digit”+”ha”+”digit”+”-ugent” struc-
ture as well as the “digit”+”zek” structure, which seem to provide us with a increasing
ordering in place value from left to right. Clearly we can see the digits appearing to form
bigger compound words.

4. Step 4: What kind of numeral system? Well, we found around 7 digits (we aren’t too
sure, after all we’re simply investigating at the moment), so going with our “assume the
simplest case” strategy we assume this is somehow based on a base-10 system. The most
likely option is the classic 1, 5, 10, 20 system, so we’ll try fitting into that first.

5. Step 5: Which words correspond to which numbers?


Alright, time to try stuff out. Looking at our digit list, we note that “unan”, “daou”, and
“tri” exist. With virtually any understanding of prefixes, one might speculate that this
are similar to the prefixes “un”, “duo”, “tri”, which mean 1, 2, and 3 respectively. We
don’t know this for certain, but it definitely seems like a reasonable speculation, after all,
Breton is from a European region slightly influenced by English.
We now go down the rabbit hole. A simple mathematical insight is that multiplication
is more restrictive than addition in the sense that if a + b = c, there is no independent
correlation between a and c or b and c, however if a · b = c, then both a|c and b|c. We

8
Everaise Academy (2020) Computational Linguistics

hence try to satisfy the multiplication expressions first, and pay more attention to them.
For example, the second expression gives us that:
nav x nav = unan ha pevar-ugent
So the square of nav is some multiple of ugent plus 1? We can assume that ugent isn’t
5, as then why would we need 7 digits? We could have then just gotten away with using
four. (This is a Step 2 argument). So maybe ugent is some multiple of 10. But unan is
1! What squares of digits have 1 as a last digit? 91 = 1 + 80 = 1 + 8 · 10. We seem to
have struck a great stockpile of information, not only does nav mean 9, but pevar means
8 and ugent means 10. Apart from this, the fact that nav sounds like nine only reassures
our claim.
If we look at the line where kant divides by daou is hanter kant, we remember that
irregular base systems often use positional values and modifiers that are divided by two,
and looking at the other phrases, the assumption that kant and hanter kant are positional
values seems supported, with hanter kant being half of a kant.
We look for equations that have values we already know inside of them, and find:
nav x c’hwec’h = pevar ha hanter kant
We assume that since the left side is a multiple of nine, and since c’hwec’h must be bigger
than 3 (1, 2, 3 are take by the words we assumed them to be earlier, remember), that
kant must be a big number, maybe 100 or 50 or something. Anyways, we’re trying to find
a multiple of 9 that ends with a pevar, or an 8. 18? That’s too small though. Is there
anything else? Wait maybe we read something wrong, let’s plug it in again. It’s still not
working. It can’t be 18 as c’hwe’ch isn’t 2. What went wrong?
We have just hit our first contradiction. This was inevitable, but now that we hit it, we
have some time to reflect. When did we make an assumption? If we go back, everything
seemed very logical...the kant and hanter kant stuff... the nav squared stuff...wait. In our
nav squared expression, we made an assumption. The fact that so many things lined up
so perfectly blinded us of the fact that we missed a case. While nav is still probably 9,
pevar-ugent may not be “eight 10’s”, but in fact “four 20’s”, making pevar a 4 and ugent
a 20. Let’s check this in our expression that just gave us a contradiction. Our right side
becomes 54 so....c’hwec’h is probably six! This doesn’t give contradictions, lets move on!
Let’s look at :
c’hwezek × c’hwec’h = c’hwezek ha pevar-ugent
We think that c’hwec’h is 6, and “ha” is additive, so we can subtract a c’hwezek from
both sides to get that 5 times c’hwezek is pevar-ugent, or 80 by our previous assumptions.
So dividing, does this mean c’hwezek is 16? That’s odd. Wait. Wait. c’hwec’h is 6 and
adding a -zek to it makes it 16? If we look at:
daouzek × pemp = tri-ugent
Using this knowledge daouzek should be 12 and tri-ugent should be 60 meaning pemp is
5? This doesn’t contradict anything we’ve figured out so far, which is very promising. We
already found the words for the numbers from 1-6, figured out the word for 20, 50, 100,
and how to teen numbers by adding a -zek to their last digit. By analyzing some more of
the equations, we reach the conclusion that 7 is seizh, warn is used for numbers between
21 and 39, and our understanding of the problem is complete. If we think about it, the
general numeral system going on here is actually base-20, we made a wrong assumption
assuming it was based on base-10. Essentially to make big numbers, one adds a multiple
of 20 to some number from 1-19 (or just not add anything if the big number is a perfect

9
Everaise Academy (2020) Computational Linguistics

multiple of 20), and for exceptionally large numbers hanter-kant and kant are used (50
and 100) to save time pronouncing. The idea of making numbers in the teens by simply
adding a zek to a number from 1 to 9 seems amazingly simple yet effective.
Solve the questions in the problem set below with this method to reinforce your under-
standing. In addition, submit your answers to the “Breton Numbers” questions given to
practice answering the questions they give once you do the hard part of figuring out how
the number system works.

10
Everaise Academy (2020) Computational Linguistics

§7.6 Problems
Problem 7.1 (2019 NACLO Round 1).

11
Everaise Academy (2020) Computational Linguistics

Problem 7.2 (2018 NACLO Round 1).

12
Everaise Academy (2020) Computational Linguistics

13
Everaise Academy (2020) Computational Linguistics

Problem 7.3 (2017 NACLO Round 1).

14
Everaise Academy (2020) Computational Linguistics

Problem 7.4 (2016 Naclo Round 1).

15
Everaise Academy (2020) Computational Linguistics

16

Potrebbero piacerti anche