Understanding Elasticsearch

Understanding Elasticsearch
Elasticsearchfulltext search
Data In Elasticsearch
Elasticsearch is document oriented, which meaning that it stores entire objects
or documents. It uses JSON as the serialization format for documents.
Document belongs to a type and those types live inside an index, while each
document has one or more fields. This is what it looks like compare
to Relational Database.
Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indice -> Types -> Documents -> Fields
While working with Elasticsearch you will likely come across the
term index quite a lot, and it has different meaning depending on the context,
so lets clearify this before we can move on.
 index (noun): Refers to a place where document will be store in other

word like a Database in a Relational Database.
 index (verb): To store a document muck like INSERT statement in
Relational Database.
 Inverted Index: Similar to index such as UNIQUE index in Relational
Database.
Interact With Elasticsearch

Java API
Elasticseach provides Java API that comes with two built-in clients which you
use in your code.
1. Node Client: this client doesn’t hold any data itself, but it knows what
data lives on which node in the cluster, and can forward requests
directly to the correct node.
2. Transport Client: The lighter weight transport client can be used to send
requests to a remote cluster. It doesn’t join the cluster itself, but simply
forwards requests to a node in the cluster.
Both of this nodes talk to the cluster over port 9300. A cluster is a group of
nodes with the same cluster name that are working together to share data and
to provide failover and scale. A node is a running instance of Elasticsearch.
RESTful API
Beside Java API, elasticsearch also provides RESTful API with JSON over HTTP
on port 9200 which will have the following form:
curl -XGET 'http://localhost:9200/_search?search_type=count' -d '
"query": {
"match_all": {}
'
Document Mapping
Elasticsearch supports the following simple field types:
String: string Number: byte, short, integer, long Float: float,
double Boolean: boolean Date: date Beside these core type elasticsearch also
support custom mapping the the using object notation.
Exact values & Full Text
Data in Elasticsearch can be broadly divided into two types: exact values & full
text. Exact values are exactly what they sound like. For example the exact
value "Foo" is not the same as the exact value "foo". Full text, on the other
hand, refers to textual data, usually written in some human language. Exact
values are easy to query. The decision is binary either matches the query, or it
doesn’t. Similar to WHERE clause in SQL. Querying full text data is different,
instead of asking "Does this document match the query?" it asks
"How well does this document match the query?". Full text search is where
elasticsearch were set apart from tranditional SQL. In order to facilitate these
types of queries on full text fields, Elasticsearch first analyzes the text, then
uses the results to build an inverted index.
Analysis & Analyzer
Analysis is the process of:
1. tokenizing a block of text into individual terms suitable for use in an

inverted index.
2. normalizing these terms into a standard form to improve their
“searchability” or recall.
The two steps above perform by analyzer which is the combination of three
functions:
1. Character filters: Their job is to tidy up the string before tokenization

such as strip out HTML, or to convert "&" characters to the word "and".
2. Tokenizer: split the text up into terms whenever it encounters
whitespace or punctua‐ tion.
3. Token filters: change terms (eg lowercasing "Quick"), remove terms (eg
stopwords like "a", "and", "the" etc) or add terms (eg synonyms like
"jump" and "leap")
Elasticsearch comes with four types of analyzers:
 Standard analyzer (default): It splits the text on word boundaries, as

defined by the Unicode Consortium, and removes most punctuation.
Finally, it lowercases all terms.
 Simple analyzer: splits the text on anything that isn’t a letter, and
lowercases the terms.
 Whitespace analyzer: splits the text on whitespace. It doesn’t
lowercase.
 Language analyzers: available for many languages, They are able to
take the peculiarities of the specified language into account. For
instance, the english analyzer comes with a set of English stopwords,
common words like and or the which don’t have much impact on
relevance which it removes, and it is able to stem English words
because it understands the rules of English grammar.
Full text query understand how each field is defined, and so they can do the
right thing. When you query a full text field, the query will apply the same
analyzer to the query string to produce the correct list of terms to search for.
By default analyzer will be apply to field with string type and turns this field
into full text field.
Inverted Index
Elasticsearch uses a structure called an inverted index which is designed to
allow very fast full text searches. An inverted index consists of a list of all the
unique words that appear in any document, and for each word, a list of the
documents in which it appears. For example two documents, each contain field
with string:
1. "No anime no life"

2. "The term anime is a japanese word for animation video"
This will result in iverted indice in the following table.
| Term | Doc 1 | Doc 2 |
| No | x | |
| anime | x | x |
| no | x | |
| life | x | |
| The | | x |
| Term | | x |
| is | | x |
| a | | x |
| japanese | | x |
| word | | x |
| for | | x |
| animation | | x |
| video | | x |
Now if we search for the term anime word we will get two results(hit) because
the term anime appear in both Doc 1 and Doc 2, but because the
term word only appear in Doc 2 this will pull Doc 2 in higher position(score) in
the result list.
Conclusion
In this part we cover elasticsearch basic concept like how it store & structure
data, how analysis being performed on data to create inverted index which is
later on used to perform full text query. In the next part we will dive
into Searching and take a look at Query DSL, which is a powerful feature of
elasticsearch.

Understanding Elasticsearch

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Understanding Elasticsearch

Caricato da

Copyright:

Formati disponibili

Understanding Elasticsearch

Relational DB -> Databases -> Tables -> Rows -> Columns

Elasticsearch -> Indice -> Types -> Documents -> Fields

 index (noun): Refers to a place where document will be store in other

Interact With Elasticsearch

curl -XGET 'http://localhost:9200/_search?search_type=count' -d '

1. tokenizing a block of text into individual terms suitable for use in an

1. Character filters: Their job is to tidy up the string before tokenization

Elasticsearch comes with four types of analyzers:

 Standard analyzer (default): It splits the text on word boundaries, as

1. "No anime no life"

This will result in iverted indice in the following table.

| Term | Doc 1 | Doc 2 |

Potrebbero piacerti anche