Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
CONTENTS
öö MAVEN DEPENDENCIES
Introduction to
öö A SIMPLE SPRING BOOT
CLUSTER MEMBER
Hazelcast IMDG
Hazelcast is a clustered, in-memory data-grid that uses sharding references, the ID-generator, and a Countdown-latch. CRDTs
for data distribution and supports monitoring. (conflict-free replicated data types) are being added, starting with
the PN-counter.
DZO N E .CO M/ RE FCA RDZ
1
The Leading
Open Source
In-Memory
Data Grid
Multi-Language
Run Anywhere
In-Memory
and backups. When members leave, the backups become the client is a Java program where you create a HazelcastInstance.
primary data partitions and new backups are created on the That will load the launch and the framework, and form a cluster
remaining members. with any compatibly configured members they find on the network
(depending on your network and discovery configuration).
Monitoring in distributed systems is critical. The lack of monitoring
is the first step toward failure. Headless systems are often not well DEDICATED MEMBER
understood and are sometimes ignored. Hazelcast supports JMX A dedicated member is a Hazelcast process dedicated to storage
and a management console, so easy monitoring is available; you and a few other things. It won't run your code, except for a few
see issues coming in advance of major problems and you can set server-side-specific instances — entry processors, Executor
alerting thresholds that will allow the system to call for help. Tasks (Callables and Runnables), event code (Listeners and
Interceptors), and persistence code (MapLoader and MapStore).
WHAT'S AN IMDG FOR?
It's for almost any programming task — broadly speaking, the The advantage of this approach over the embedded model is that
three major areas are caching, distributed processing, and scalability will always become more important than simplicity.
distributed messaging. The primary benefits to applications are With this, you can scale your storage fleet separately from your
big and fast data. Big data is good; big, fast data is awesome. client fleet. If your storage demands soar but the processing
Start small and grow enormously. doesn't, you just scale these members. If you introduce new
processing demands for the same, or similar, data loads, you just
BEFORE YOU START add clients.
There are only a few things you need to get going:
• Java 8 is probably the most widely used JDK/JRE right LITE MEMBER
now and is the preferred one to start with. Lite members are interesting — they join the cluster, unlike
clients that just make a client-specific TCP connection. They
DZO N E .CO M/ RE FCA RDZ
All the dependencies for Hazelcast (any edition) are available on public Clients are Java programs that include the client (i.e. Hazelcast-
Maven repositories, in addition to the Hazelcast download page. client) JAR in their build, read, or create config that helps them
find a cluster and perform the widest scope, typically, of client
• Server: The server will be in one of two forms requests. These will be in your web-clients, your command-
— hazelcast or hazelcast-all — which also includes line tools, or anywhere you need to interface your systems
client dependencies. with Hazelcast. Don't think, though, that because they're
• Client: The Hazelcast client is generally included from clients, you're going to be doing all your processing there. Well
hazelcast-client, and that's the only addition to your written clients will use server-side constructs — particularly entry
client app build. processors, aggregation, and executor tasks to delegate processing
requests from single-threaded clients onto a massively scalable
PROGRAMMING MODELS clustered storage and processing environment.
Hazelcast is a toolkit. There are common patterns you can
Not everything will be delegated to the back-end, of course.
employ, but really, it's just Java. You can design your own
Many, many clients simply require extremely low latency access
infrastructure to meet your needs in any way you see fit. Here are
to fast, big data that isn't changed too often and isn't changed
some common deployment models:
(ideally) by separate clients (i.e. sticky sessions are good). For
EMBEDDED MEMBER these, near-caches are extremely effective. Each member can
Embedded members are really the easiest way to get started and host (within its process space) potentially large subsets of data
for some things, they may be all that you need. An embedded that are being actively managed by the cluster. We're talking
mostly about the open-source version here, but it's worth noting that can be used, or thresholds on the heap utilization that will
that IMDG Enterprise HD will allow off-heap near-caches, giving trigger eviction. In addition to evicting on space, you can set an
you low latency access to potentially many gigabytes of near- expiration interval on your data — you decide up front.
cache data in each client. This has a broad range of applications
A SLIGHTLY MORE ROBUST SERVER
across industries; real-time inventory for e-commerce and fraud
We can do better on the server code.
detection for credit card processors are two. Note that in neither
of these is the data static — that's not a requirement. But the package com.hazelcast.tao.gettingstarted;
data is read much more often than it's changed, making both of import org.slf4j.logger;
these ideal cases for near-caching. import org.slf4j.loggerFactory;
import org.springramework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.
A SIMPLE SPRING BOOT CLUSTER MEMBER
SpringBootApplication;
So, finally a little more code. All of this is on GitHub, so you can
look at the POM (pom.xml) file there. It's basic — you just need @SpringBootApplication
public class ServerV1
the spring-boot parent entry and the Hazelcast dependency {
(from above). Make your main class a Spring Boot application
private final static logger L =
with the @SpringBootApplication annotation. I'm adding the @
LoggerFactory.getLogger(ServerV1.class);
Configuration annotation so I can have one class that serves up
public static void main(String[] args)
the beans and executes them.
{
L.info("Calling the Spring
CONFIGURING HAZELCAST Application 'run' method");
So, you have already run code — why talk about configuration SpringApplicationrun(ServerV1.
class, args);
now? Because the simple examples use all the defaults. While
}
they're interesting to run, you wouldn't really go much past }
DZO N E .CO M/ RE FCA RDZ
Spring Boot will create a default instance of Hazelcast — SIMPLE MAP ACCESS
which may not give you what you want. Having a bean for This part is easy — a Hazelcast IMap is a java.util.map, so you
the config and one for an instance can be useful. Here's the can take existing code for the Java Collections API and just
commandLineRunner that makes the Spring app work: repurpose it. Here's a little code showing how easy that can be:
{
@Override Map<String, String> myMap = new
public void run(String... args) ConcurrentHashMap<>();
{ String key = "SomeKey";
String value = "Just a random
Object bean =applicationContext.
string";
getBean("hazelcastInstance");
myMap.put(key, value);
HazelcastInstance member =
(HazelcastInstance) bean;
L.info("getting key {} yields
System.out.printIn("this was all
that was needed to start a member"); {}", key, myMap.get(key))
IMap<String, String> map = member. }
getMap("foo");
for (int i = 0; i < 10; i++) public void hzMethod()
{ {
//wait-shouldn't this be an 'IMap'?
map.put("key:" + i, "value: Map<String, String> myMap =
" + i + ":: "+ new Date().
hazelcastInstance.getMap("myMap");
toString());
String key = "SomeKey";
}
String value = "Just a random
string";
l.debug("at startup, map size:
{}", map.size()) myMap.put(key, value);
@Override
In that bit, there's a method that creates a Map and uses it. In
public void setApplicationContext(Application
Context applicationContext) the second method (hzMethod), the only change was to use the
}
injected Hazelcast instance (injected via annotation) to get a
reference to a distributed map in the IMDG. There's no magic — and very careful about using them for things that are updated
Hazelcast is designed so that you can swap it in that easily, using from multiple points. For a web application with sticky sessions,
the familiar Collections API. But back to that comment for a you should be able to count on certain objects being in only one
second...shouldn't the declaration have been IMap, not Map? It client process — that's a good scenario.
depends: It could be, but it doesn't need to be. Hazelcast Maps
SIMPLE QUERY OPERATIONS
implement the java.util implements, so that's valid, but maybe
SQL QUERIES
not useful. In a minute, we're going to use some Hazelcast-specific
Hazelcast is not an SQL database or a SQL query tool, but it
methods on the Map, and to make those visible, you want to
provides a workable, robust subset of SQL query functionality. It's
change the declaration. If you're just doing put, get, size, remove, accessible for developers. If you have an SQL background, this is
and all of those, then no. One interesting note on that: It's easy nothing; if you don't, it's still pretty intuitive. The SqlPrecicate
to forget that "put" returns the old mapping, as it inserts the new. encapsulates the where clause of a query. Since you're dealing
Think about that in a network environment: When you do a "put", with purely in-memory data, this is going to be very fast.
Hazelcast (conforming to the contract) returns the old mapping
public void sqlQueryDemo()
over the network, incurring serialization for no reason because
{
nobody ever looks at it. Hazelcast has added a "set" method that IMap<Integer, Employee> employees =
works like "put", save that it doesn't return the value. This may hazelcastInstance.getMap("employees");
Employee emp = new Employee();
seem like small stuff, but think about a heavily utilized production
emp.setId(Integer.valueOf(1));
environment getting a surge of requests; you're busy and half of emp.setFirstName("John");
that flavor of network traffic is stuff you're never going to look at. emp.setLastName("Doe");
emp.setAge(new Random(System.
Change two letters in your code and the network traffic drops —
currentTimeMillis()).nextInt(99));
possibly by lots. emp.setDeptId(Integer.valueOf(13));
DZO N E .CO M/ RE FCA RDZ
Keep in mind, however, that there are differences. It's a distributed // put the dummy employee in the map using
Map — aside from security configuration, other clients/threads employee-id as the map key
employees.set(emp.getId(), emp);
can use the same Map. If you test the size of a new in-process Map
that you create in your thread, the size will be "0". When you get a
Predicate sqlPredicate =
reference to a distributed collection from the IMDG, it will create new SqlPredicate(String.format
it (if required), or return a reference to an existing collection if it's ("lastName = '%s'",
emp.getLastName()));
already been created. This can be a very powerful feature — you
Collection<Employee> matching = employees.
can pre-populate a collection from a persistent store or any other values(sqlPredicate);
data-source. Your client code will be smaller and simpler because
// wildcards are supported, too - look for
you can make assumptions about it. If you're using a Map for a
last names starting
scratchpad cache; however, keep in mind that you may want to // with 'D' using the same 'values' call.
create unique map instances or manage data so that your thread sqlPredicate = new SqlPredicate("lastName
times (you set the expiry interval in config). You should be careful
}
about using near caches for things that are updated frequently
This shows how easy it is to bridge an SQL background with Predicate<Integer, Employee> lastName = equal
("lastName", "Doe");
the IMDG SQL-like query. The caveat here is that out-of-the-box,
Predicate<Integer, Employee> age = greaterThan("age",
IMDG is not a good tool for joins because of the nature of the Interger.valueOf(99));
data. We split it up because it's big data, and because it's big
Predicate<Integer, Employee> dept = equal("deptId",
data, joining it back together is more complex. I mentioned Jet
Integer.valueOf(13));
earlier; this would a useful tool for that.
Predicate<Integer, Employee> ageEtAl = and(age,
PREDICATE QUERIES: CRITERIA API lastName, dept);
For Java developers who really never liked SQL, there's also a
// construct a paging predicate to get max of 10
pure Java approach to querying the IMDG: the Criteria API.
objects per
Integer.valueOf(99)); AGGREGATIONS
Predicate<Integer, Employee> dept = equal Aggregations — or, as they're now called, "fast aggregations" —
("deptId", allow data query and transformation to be dispatched to the
Integer.valueOf(13));
cluster. It can be extremely effective. Keeping with the Employee
// and is a variadic method, so you can
class and the "employees" Map from the other examples, let's do
just keep
// adding predicates and get the logical a quick and dirty aggregation. You could just do the aggregation
'and' of all of them across the entire entry set of the Map, but using a predicate to filter,
Predicate<Integer, Employee> ageEtAl = or map, the objects before the aggregator does the reduction on
and(age,
them will prove more effective. Department 13 is used to represent
lastName, dept);
group-W. You know those people; they're everywhere.
protected long sum = 01; allow access to the count and the sum. The final step, also in the
protected long count = 01;
calling member, the aggregate method is invoked and performs
@Override the simple calculation of average age. This is really not much
public void accumulate(Map. code for that kind of power.
Entry<Integer, Employee> entry)
{
There is also a very complete set of built-in aggregators for things like
count++;
min, max, count, and avg of different numeric types, distinct for any
sum += entryget Value() comparable type. They can be used with very little setup, like this:
.getAge();
}
// get a list of distinct last name
Set<String> lastNames = employees
Override
public void combine(Aggregator .aggregate(Aggregators.<Map.Entry<Integer,
aggregator) Employee>, String>)
{
this.sum += this getClass() Implicit there was a static import of distinct.
.cast(aggregator).sum;
this.count += this getClass()
ENTRY PROCESSORS
.cast(aggregator).count;
Entry processors are pretty cool. You can do highly efficient
}
in-place processing with a minimum of locking. Consider what
@Override people often end up doing to work with remote objects: lock a key,
public Double aggregate()
fetch the value, mutate the value, put it back (in a finally block),
{
if (count == 0) and unlock the key. That's four network calls to start with — three
{ if you're only looking at the data and not updating the central
return null;
}
source. Your objects may be large and incur significant cost in
DZO N E .CO M/ RE FCA RDZ
double dsum = (double) sum; terms of CPU and network for serialization and transport.
return Double.valueOf(dsum
/ count); Entry processors allow you to dispatch a key-based "task" object
}
across the LAN — directly to the member owning a key, where it
};
is executed in a lock-free, thread-safe fashion. Hazelcast has an
// find the average age of employees in department 13 interesting threading model that allows this to happen.
Double avgAge =
employees.aggregate(ageAggregator,deptPredicate);
Here's a brain-dead simple entry processor example — but it's
L.info("average age: {}", avgAge); still a really useful approach:
}
} @Component
@Profile("client")
public class EntryProcessorRunner implements
The code is pretty self-explanatory. The predicate will ensure that
CommandLineRunner, ApplicationCon
only matching elements from the distributed map are included {
in the calculation. The aggregator will "accumulate" data — private ApplicationContext applicationContext;
@Override
examining the matching subset and adding the age into the sum
public void run(String... args) throws Exception
— but where does that happen? The accumulate call is called on {
each storage member (i.e. not the clients and not Lite members); HazelcastInstance instance = (HazelcastInstance)
it's passed by each filtered (by deptPredicate) matching entry applicationContext.ge
IMap<String, String> demo = instance getMap
and it accumulates the raw values. Note that these run in parallel
("demo");
on each member involved. Because the data is partitioned across String key = "someKey";
members and only a filtered subset is processed, it's going to be demo.set(key, "Just a String value...");
very fast. In the second phase, each of these aggregator instances demo.executeOnKey(key, new DemoEntry Processor
());
are returned to the caller for processing — the instance of the
EntryProcessor<String, String> asyncProcessor =
anonymous aggregator class examines each returned aggregator new DemoOffloadableEnt
(instances of the same anonymous class) and combines all demo.submitToKey(key, asyncProcessor);
ExecutionCallback<String> callback = new
the raw results. In that part of the code, because this wasn't a
AsynchCallbackDemo();
concrete class, it's necessary to call the class.cast() method, to
CODE CONTINUED ON NEXT PAGE
CODE CONTINUED ON NEXT COLUMN Because nothing is free, there's a bit of server-side configuration
@Component("loggingRunnable")
# configure how long hz will wait for invocation
public class LoggingRunnable implements Runnable,
resources hazelcast.backpressure.backoff.timeout.
HazelcastInstanceAware
millis=60000
{
<snip>
Back pressure is a topic worth some consideration. Hazelcast is
private HazelcastInstance hazelcastInstance;
using its threading model to execute these, so there's a limit to
DZO N E .CO M/ RE FCA RDZ
@Override
how many invocations can be in flight at any one point in time. public void run()
The absolute number doesn't matter, as that would depend
{
upon the size of your cluster and the number of CPUs/cores/
l.info("into run, cluster size: {}",
physical threads. What's likely to be interesting is how many
getHazelcastInstance().getCluster
can be queued up for one partition --- by default, mutating }
entry processors operate on a partition thread. In configuring
your cluster, you know how many physical threads you have, so <snip>
}
you can configure the partition thread count to be a somewhat
sensible number. Too few and you'll have contention; too many This code wasn't particularly profound, but there's one cool
(one-to-one sounds ideal, though it rarely is) and you won't get aspect to it — you can direct processing to a member that owns
good resource utilization. a key (or other members) and process that key and or other keys
in multiple maps. So, complex manipulation may be performed
RUNNABLE TASKS
outside your client, eliminating multiple network round-trips.
These are simply Java runnable objects that are dispatched to
They data need not come all from one member, either — there
one or more members. Keep in mind that the salient part of the
are no restrictions on that. It can be a significant performance
signature is public void run() — i.e. nothing is returned. The
boost to design your data so that related items are all within one
way that they're dispatched to the members is very flexible; it
node — then, this kind of task will tend not to make network calls
can be one member, all members, a member that owns a key, or
but will not be restricted from doing so.
members selected by an attribute you have set on them.
CALLABLE TASKS
Here's an example of running something on the member that
As with runnable tasks, callable tasks are dispatched to one or more
owns a key:
members, but offer more options for things like bringing back data.
@Component("runnableDemo") Here's a really simple callable that will be dispatched to a member,
@Profile("client")
log some noise to show it ran, and return the partition count. There
public class RunnableDemoCaller implements
CommandLineRunner, ApplicationConte are better ways to monitor or manage partitions, but this should just
CODE CONTINUED ON NEXT COLUMN show how you get a value — easily — from a member.
<snip> - imports. They should be just what you'd expect public void callableTaskDemoMembers(Set<Member>
members)
IExecutorService executorService =
@Override
hazelcastInstance.
public Integer call()
getExecutorService("default");
{
Member member = hazelcast
Map<Member, Future<Integer>> futures =
Instance.getCluster()
executorService.
.getLocalMember(); submitToMembers(partitionReporter,
String fmt = "member listening on members);
%s:%d"; // process these
String whoAmI = String format (fmt, }
member.getSocketAddress()
.getHostName(),
EVENTS
member.getSocketAddress()
There are lots of events, but for just right now, let's stick to the
.getPort());
data events: listeners and interceptors. This is still a fairly big
PartitionService service =
hazelcastInstance.get
topic, so let's talk about a workable subset of it. Within data-
PartitionService(); data events, there are what are called map events and entry
boolean memberIsSafe = service. events. Map events are called for map-level changes, specifically
isLocalMemberSafe();
DZO N E .CO M/ RE FCA RDZ
Adding the entry-added listener could be done in config, but will create high volumes of events. Look carefully at your
here's how to use the Java API to do it: resources, like client CPU/RAM and especially network. Think in a
distributed perspective and put the listener where it needs to be,
public void addEntryAddedListener(IMap<String, String>
not simply where it seems convenient.
myMap)
{
myMap.addEntryListener(newMyEntryAdded
CONCLUSION
Listener<>(), true); This is just a little of what you can do with Hazelcast. Hazelcast
} has been doing distributed systems for some time now; it is
deliberately designed to deliver performance and simplicity. You
This code will add the entry listener — listening only for entries can be up-and-running in minutes and rolling out production-
being added. The boolean parameter tells Hazelcast that the quality code that looks an awful lot like your Java collections
value should be available (i.e. getValue()) in the entry event code. It's a fun environment for programmers. A little Java gets
that's going to be delivered to the listener. can be all you need on the server side, then you can cut loose
with Java, .NET, C++, Node.js, Python, Go, or Scala — and that list
Clients may add these listeners, also — in addition to client
is going to grow as new languages emerge.
lifecycle events, cluster membership events and distributed
object creation/deletion. So, they may be notified of their
own client lifecycle: starting, started, shutting down, and
shutdown; they may be notified of membership changes or
storage members joining and leaving, and they may be notified
of distributed object creation or destruction — Maps, caches,
queues, and all. A word of caution, though: High-volume activity
DZO N E .CO M/ RE FCA RDZ
DZone, Inc.
DZone communities deliver over 6 million pages each 150 Preston Executive Dr. Cary, NC 27513
month to more than 3.3 million software developers, 888.678.0399 919.678.0300
architects and decision makers. DZone offers something for
Copyright © 2018 DZone, Inc. All rights reserved. No part of this publication
everyone, including news, tutorials, cheat sheets, research
may be reproduced, stored in a retrieval system, or transmitted, in any form
guides, feature articles, source code and more. "DZone is a or by means electronic, mechanical, photocopying, or otherwise, without
developer’s dream," says PC Magazine. prior written permission of the publisher.