Sei sulla pagina 1di 11

Elasticsearch

● ElasticSearch is basically a data storage server and a search engine


● It ingests unstructured data and stores it in a sophisticated manner that is highly optimized
for language-based searches
● It is free and open-source software
● Elasticsearch is built in Java using Apache Lucene for indexing and searching
● Apache Lucene is just a library and working with it is complex
● Elastic search hides all the complexities by giving access to APIs
● API comes in the form of HTTP RESTful APIs that uses JSON as the data exchange format

Why Elasticsearch?
● You can ask query anyway you want.
● Lets you analyze billions of data.

Elasticsearch basic concepts


● Document
● Type
● Index
● Node
● Cluster
● Shards
● Replicas
● API

Document
Document is a basic unit of information.
It is nothing but JSON.
Each document has multiple fields.

Type
Type is defined for documents which have a
common set of fields.
It is a logical partition of index.
Each type has multiple documents.

Index
An index is a collection of documents having
similar characteristics.
Each index has multiple types.

Node
A node is a single instance of the elasticsearch
server which stores the data.
Each node has multiple indices.

Cluster
Cluster is a collection of one or more nodes that
work together.
This distributed nature grant the easy handling
of data that is too large for a single node to
handle.

Shard
Elasticsearch allows you to subdivide your
index into multiple pieces which are called
shards.
Each shard is a fully-functional and
independent “index” which can be hosted on
any node within the cluster.

Replica
Elasticsearch provides replicas.
Replicas are just an additional copy of a shard
and can be used for queries just as the original
shards.

API
Elasticsearch API comes in the form of HTTP RESTful APIs (GET, PUT, DELETE) that uses
JSON as the data exchange format.
Data Storage Mechanism in Elasticsearch
The act of storing data in Elasticsearch is called indexing. An Elasticsearch cluster can contain
multiple indices, which in turn contain multiple types. These types hold multiple documents, and
each document has multiple fields.

Types of Elasticsearch APIs


Elasticsearch APIs use HTTP requests
● Document API
● Search API
● Index API
● Cluster API
● Aggregations

Document API
1. SINGLE DOCUMENT API
● Index API ( PUT /playlist/kpop/1) ( PUT /index/type/id)
● Get API ( GET /playlist/kpop/1)
● Update API ( PUT /playlist/kpop/1)
● Delete API ( DELETE /playlist/kpop/1)
2. MULTI-DOCUMENT API
● Multi Get API
● Bulk API
● Delete By Query API
● Update By Query API
● Reindex API

Search API
There are various parameters which can be passed in a search operation having Uniform Resource
Identifier (URI).

Parameter Description

q This parameter specifies query string

lenient By setting this parameter’s value to true, format based errors can be
ignored

fields This parameter fetches response from selective fields

sort This parameter sorts the result


timeout This parameter helps in restricting the search time

terminate_after This parameter restricts the response to a specific number of


documents in each shard

from This parameter specifies the start index

size This parameter specifies the number of hits to return

1. MULTI INDEX API (GET playlist,my_playlist/_search?q=2014)


2. MULTI TYPE API (GET playlist/_search?q=2017)

Elasticsearch Advanced concepts


● Query DSL
● Mapping
● Analysis
● Modules

Implementing Elasticsearch using Python

Installation on Windows

Step1: Install the latest Java version

Step2: Go to https://www.elastic.co/downloads
Step3: Click on the Download to get the zip file
Step4: Once the file is download, unzip it and extract the contents
Step5: Go to elasticsearch-x.y.z>bin
Step6: Inside bin folder, find elasticsearch.bat file and double-click on it to start the Elasticsearch
server

Step7: Wait for the server to start

Step8: Open browser and type localhost:9200 to check whether the server is running or not

Step9: If you can see the above-shown message on the browser, it means everything is fine.
Step10: Last thing you need to do is to add the Sense(beta) plugin which will act as a developers
interface to Elasticsearch

In Python Code

pip install django-elasticsearch-dsl


pip install elasticsearch

# Import Elasticsearch package


from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es=Elasticsearch([{'host':'localhost','port':9200}])

Defining a document
e1={
"first_name":"Kiran",
"last_name":"Kumar",
"age": 22,
"about": "Love to play volleyball",
"interests": ['sports','music'],
}
print e1
{'interests': ['sports', 'music'], 'about': 'Love to play
volleyball', 'first_name': 'Kiran', 'last_name': 'Kumar', 'age':
22}

Inserting a document
res= es.index(index='megacorp',doc_type='employee',id=1,body=e1)

e2={
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
e3={
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}

res=es.index(index='megacorp',doc_type='employee',id=2,body=e2)
print res['created']

res=es.index(index='megacorp',doc_type='employee',id=3,body=e3)
print res['created']

False
True

Retrieving a document
res=es.get(index='megacorp',doc_type='employee',id=3)

print res
{u'_type': u'employee', u'_source': {u'interests': [u'forestry'],
u'age': 35, u'about': u'I like to build cabinets', u'last_name':
u'Fir', u'first_name': u'Douglas'}, u'_index': u'megacorp',
u'_version': 1, u'found': True, u'_id': u'3'}

print res['_source']

{u'interests': [u'forestry'], u'age': 35, u'about': u'I like to


build cabinets', u'last_name': u'Fir', u'first_name': u'Douglas'}

Deleting a document
res=es.delete(index='megacorp',doc_type='employee',id=3)
print res['result']

deleted

References
https://www.edureka.co/blog/what-is-elasticsearch/
https://towardsdatascience.com/elasticsearch-tutorial-for-beginners-using-python-b9cb48edcedc
https://www.elastic.co/guide/en/kibana/current/deb.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html

Implementing ELK stack using Django


https://www.codementor.io/samueljames/using-django-with-elasticsearch-logstash-and-kibana-
elk-stack-9l4fwx138

Step1: Install the latest Java version


$ sudo apt install default-jdk

$ java -version

Setting JAVA_HOME environment variable


$ cd /usr/lib/jvm/
$ ls
Note down the folder name of the java package ( in my case it is java-11-openjdk-amd64)

$ gedit .bashrc

Scroll down to the end of the file and append below lines

JAVA_HOME = “/usr/lib/jvm/java-11-openjdk-amd64/bin”
export JAVA_HOME
PATH = $PATH:$JAVA_HOME
export PATH

click on save.

$ source ~/.bashrc
$ echo $JAVA_HOME

Step2: Install Elasticsearch

Install below apt repository package provided by elasticsearch team to install elasticsearch on
ubuntu
$ sudo apt-get install apt-transport-https

Download and install the public signing key


$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Save the repository definition to /etc/apt/sources.list.d/elastic-7.x.list:


$ echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a
/etc/apt/sources.list.d/elastic-7.x.list

Install the Elasticsearch Debian Package with


$ sudo apt-get update && sudo apt-get install elasticsearch

Step 3: Running Elasticsearch

Running Elasticsearch with systemd

$ sudo /bin/systemctl daemon-reload


$ sudo /bin/systemctl enable elasticsearch.service
$ sudo systemctl start elasticsearch.service

By default the Elasticsearch service doesn’t log information in the systemd journal
To list journal entries for the elasticsearch service starting from a given time:
$ sudo journalctl --unit elasticsearch --since "2016-10-30 18:17:16"

Open localhost:9200 in browser


Step 4: Installing Kibana

$ sudo apt-get update && sudo apt-get install kibana

$ sudo /bin/systemctl daemon-reload

$ sudo /bin/systemctl enable kibana.service

$ sudo systemctl start kibana.service

$ sudo journalctl --unit kibana

Open localhost:5601 in browser

Potrebbero piacerti anche