Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Speakers:
Yash Badiani - Practice Lead, Big Data Analytics & AI/ML, CIGNEX Datamatics
Bhavin Shah – Technical Architect, CIGNEX Datamatics
Yash has a DW/BI, Big Data background and has Bhavin has 11+ years of global experience in architecting,
experience in architecting, designing & implementing designing & implementing scalable enterprise
large end to end Big Data Analytics & Machine Learning applications using Web, Data Management & Big Data
Platforms. Yash has extensive experience on proprietary, Technologies.
Open Source & Cloud based data management Bhavin has exposure to the full stack application &
technologies and works towards building the right product development life cycle involving UI, Services &
solution for the Customers to derive business insights Backend development using RDBMS & NoSQL databases
from their data for competitive advantage & better such as Marklogic & Elastic Search with Web
customer service. Yash has delivered several webinars on Development frameworks such as
areas such as NoSQL Databases(MongoDB & Java/J2EE/Spring/Hibernate.
Elasticsearch), Machine Learning & succeeding on Big
Data Platform builds
3 CIGNEX Datamatics Confidential www.cignex.com
Agenda
• Use Case
• Key Requirements
• Architectural Considerations
• Our Approach
– Platform Evaluation
– Extensibility
– Security
– Design
– Performance Testing & Tuning
– Deployment & Monitoring
• Key Takeaways
Monitoring
J2EE Microservice
Acquire Analyze/
Parse Store
Angular App Visualize
.Net App
Raw Log Archive
Backup
Application
Frameworks
Acquire data from disparate application frameworks Better control on customization through Open
(Docker, .NET, Java, etc.) Source Solution
Ability to visualize and analyze application Scalable to acquire high volumes of log
access patterns / failures in REAL TIME events(~3K-5K events/sec)
Non Functional
Support 75+ Concurrent Users
Functional
User Management
Deployments
Platform
3rd Party Plugin Custom Scripts
Supported
Key Takeaways
Deployments
Platform
3rd Party Plugin Custom Scripts
Supported
Key Takeaways
Yes Yes
TLS support (Docker) Yes • Supports UDP. Security • TCP+TLS support is available
not off the shelf with TCP driver.
Yes Yes Yes
Ease of querying data
• Its own query language • Similar to Lucene • Lucene query syntax
No
Open Source • /GB of log data and Yes Yes
retention
Customization No No Yes
Conclusion: ELK due to Flexibility, fully integrated Log Management solution to support Log
Shipping, Log Indexing, storage and visualization.
Deployments
Platform
3rd Party Plugin Custom Scripts
Supported
Key Takeaways
Conclusion: TCPSocketAppender due to secure and minimum footprint on source system (Java
based applications)
Deployments
Evaluation Criteria
No Yes
Open Source Yes
• Need X-Pack for security • Free license
Conclusion: Search Guard due to security feature with user management as part of free license.
Deployments
Architecture Design
• ELK 6.1.0 Basic license as Framework • Raw log file back and ingestion to Hadoop
• TCPSocketAppender as log shipper • Day-wise index for easy backup and restore
• Search Guard for security between ELK • Curator 5.4 for Elastic Index snapshot and
nodes 6.1.0-21.0 for elastic and 6.1.0- restore
10.0 for Kibana • OpenSSL for secure connection between
• Load balancer (HAProxy) before Logstash source systems and Logstash
and Kibana for High availability • Log rotate of ELK log files rotation
• Monitoring via X-Pack free license
OpenSSL
Docker Container on VM
OpenSSL
ELK 6.0 X-Pack Free License – On Premise
TCP Logstash 1
FileBeat
HTTPS
Elasticsearch Nodes ES Client
Kibana 1
TCP
Load Balancer Logstash
Micro Console Socket Node
service Appender Appender Curator
Search
Data Node 1 Guard
Search
Angular based UI Guard Load Balancer
Search Guard for Kibana
NFS for UM
HTTPS
Data Node 2 Search
Guard
FileBeat
HTTPS
UI App Browser logs
TCP
via POST API
ES Client
Logstash 2 Kibana 2
Data Node 3 Node
OpenSSL
.Net Application
FileBeat
OpenSSL
.Net User
Log File
App JSON Management
Format Data Warehouse via Search
HDFS Gaurd
Deployments
2 4000 500 2000 3k 1000811 , 1000927 16.68 4000000 3798548 201452 3996.527
2 4000 500 2000 3k 1000771 , 1000940 16.68 4000000 3954304 45696 3996.581
2 4000 500 2000 3k 1000743 , 1001010 16.68 4000000 4000000 0 3996.497
12002839 ,
2 4000 6000 2000 3k 12004548 200.06 48000000 48000000 0 3998.769
15003381 ,
2 4000 7500 2000 3k 15006113 250.07 60000000 60000000 0 3998.735
Threshold
Tuned Parameters:
Queue Size of TCPSocketAppender – 32768 |Flush Interval of File Output Plugin Logstash – 1 sec
304 ms (1560852
250 5 10 5 1 67 ms (5188 bytes) 267 ms (1353651 bytes) bytes) 3 seconds ~7
354 ms (1560852
3750 25 10 5 3 81 ms (5188 bytes) 298 ms (1353651 bytes) bytes) 3 seconds ~30
40000 200 50 4 1 382 ms (802 bytes) 593 ms (1180774 bytes) 417 ms (37414 bytes) 3 seconds ~48
100000 500 50 4 1 386 ms (802 bytes) 608 ms (1180770 bytes) 423 ms (37413 bytes) 3 seconds ~120
200000 1000 75 4 1 500 ms(803 bytes) 593 ms (1180768 bytes) 410 ms (37414 bytes) 3 seconds ~239
Throughput
Tuned Parameters:
Elastic Shard Configuration 3:1 | Kibana Default Search Page Size - 200
Elasticsearch
4000 events/sec ingestion throughput
800 millions log events
Data size: 750 GB
Logstash
Kibana
75 concurrent users search request
Search on 30 days/750GB of data
Avg. 500 milliseconds response time
Design
Deployments
Open Source Tools Custom Scripts
Secure Transmission (Ansible, Chef, Puppet) (Shell Scripts)
Platform
3rd Party Plugin Custom Scripts
Supported
(Search Guard) (Shell Scripts)
(X-Pack)
Key Takeaways
Able to pull deployment artifacts from local system Yes Yes Yes Yes
Able to pull deployment artifacts from central repository (Git) Yes Yes Yes Yes
One click installation facility on local host Yes Yes Yes Yes
One click installation facility on remote host Yes Yes Yes Yes
Ease of adding new node to existing cluster (install & configure) Yes Yes Yes Yes
Ease of upgrading existing node Yes Yes Yes Yes
Start/stop/restart of any ELK node Yes Yes Yes Yes
Security of functional user credentials (elastic/kibana etc) Yes Yes Yes Yes
Recommended by ELK Yes Yes Yes -
Ease of deployment tool maintenance Yes Moderate Complex -
Client agent installation required No Yes Yes -
Open source/licensed Open Source Open Source Open Source Open Source
Real time analysis of failures leading to quicker resolution & higher availability
High Availability & Scalability to handle a high volume of log events(4000 events/sec)
Ability to create customized log event processing pipelines for each source
Evaluate ALL aspects Design in detail for Gather volume details &
(Platform, Component, Consider all technology
Performance, Deployment, Security, Extensibility, Availability, plan ahead for sizing,
options – Cloud, Open
Operations) in Depth Performance, Security, performance.
Source, Proprietary before
(Requirements to Evaluation Auditing, Deployment & Benchmarking truly
Criteria to Tool Options) finalizing the stack
Operations helps
Faster Big Data Analytics with Building Scalable Big Data Text (Document) Classification
MongoDB Analytics Platform driving using Machine Learning
business ROI
Contact Us