Sei sulla pagina 1di 12

272 BROUGHT TO YOU IN PARTNERSHIP WITH

CONTENTS

öö WHATS AN IMDG FOR?

öö BEFORE YOU START

öö MAVEN DEPENDENCIES

Introduction to
öö A SIMPLE SPRING BOOT
CLUSTER MEMBER

öö SIMPLE QUERY OPERATIONS

Hazelcast IMDG

WRITTEN BY TOM OCONNELL


SENIOR SOLUTIONS ARCHITECT, HAZELCAST

Hazelcast is a clustered, in-memory data-grid that uses sharding references, the ID-generator, and a Countdown-latch. CRDTs
for data distribution and supports monitoring. (conflict-free replicated data types) are being added, starting with
the PN-counter.
DZO N E .CO M/ RE FCA RDZ

Clustering refers to how some network-centric software remains


resilient and highly available. You can just start processes, either Sharding, which Hazelcast calls it partitioning, is a means of
members or clients, as you need, and they find each other to horizontally partitioning the data across multiple member
form a consistent whole. Members join and the load spreads out; processes. You can think of the Hazelcast shards (partitions)
members terminate and the load is absorbed by others. as the hash buckets of a distributed hash Map. Each cluster
uses a configured number of partitions — the default is 271
In-memory storage is an ideal use case for Hazelcast. You can
(that often doesn't need to be changed). A single-member
scale the storage in a number of ways. For IMDG open-source,
deployment would get all the partitions, without backups. As the
you pick your JVM size based on your own testing and tuning.
first member joins, roughly half the partitions are transferred
There are no good portable recommendations for that; you just
to that new member, and backups are created at that time. As
have to test. You pick your backups. More than one backup is
each subsequent member joins, a basically equal fraction of the
probably not necessary. Consider adding persistence if multiple
partitions are transferred to that member — both primary data
backups are needed.

The term "data grid" means a lot of different things and


marketing terminology can obscure this, but basically, it's data
held in caches that are available for retrieval by members or
clients, processing (either in-place or in different processes)
that can support events, triggers, transformations — basically,
anything you can think up and code. Get a 30 DAY TRIAL of
Hazelcast IMDG provides a robust, wide array of distributed Hazelcast IMDG Enterprise
processing possibilities. There are Maps, Queues, Lists, Sets — all
extending the collections classes. There are also RingBuffers SIGN UP
and Multimaps. Alongside Queues, there are Topics and Reliable
Topics for more messaging options. The available concurrency
utilities include Locks, Semaphores, Atomic-longs, Atomic- hazelcast.com/trial

1
The Leading
Open Source
In-Memory
Data Grid
Multi-Language

Run Anywhere

In-Memory

Try us out today 30-day trial


hazelcast.com/trial
COMPLIANT DEVOPS

and backups. When members leave, the backups become the client is a Java program where you create a HazelcastInstance.
primary data partitions and new backups are created on the That will load the launch and the framework, and form a cluster
remaining members. with any compatibly configured members they find on the network
(depending on your network and discovery configuration).
Monitoring in distributed systems is critical. The lack of monitoring
is the first step toward failure. Headless systems are often not well DEDICATED MEMBER
understood and are sometimes ignored. Hazelcast supports JMX A dedicated member is a Hazelcast process dedicated to storage
and a management console, so easy monitoring is available; you and a few other things. It won't run your code, except for a few
see issues coming in advance of major problems and you can set server-side-specific instances — entry processors, Executor
alerting thresholds that will allow the system to call for help. Tasks (Callables and Runnables), event code (Listeners and
Interceptors), and persistence code (MapLoader and MapStore).
WHAT'S AN IMDG FOR?
It's for almost any programming task — broadly speaking, the The advantage of this approach over the embedded model is that

three major areas are caching, distributed processing, and scalability will always become more important than simplicity.

distributed messaging. The primary benefits to applications are With this, you can scale your storage fleet separately from your

big and fast data. Big data is good; big, fast data is awesome. client fleet. If your storage demands soar but the processing

Start small and grow enormously. doesn't, you just scale these members. If you introduce new
processing demands for the same, or similar, data loads, you just
BEFORE YOU START add clients.
There are only a few things you need to get going:
• Java 8 is probably the most widely used JDK/JRE right LITE MEMBER
now and is the preferred one to start with. Lite members are interesting — they join the cluster, unlike
clients that just make a client-specific TCP connection. They
DZO N E .CO M/ RE FCA RDZ

• IDEs Eclipse, IntelliJ, and NetBeans all work well with


do not, however, host data. They are for a small number of
Hazelcast/Maven. The sample code in the GitHub repo
advanced things — they may be used for class-loading or they
was tested in Eclipse, as well as from the command line,
may be used as high-performance compute clients. You could, for
so it's an easy import.
example, direct runnable and callable tasks to Lite members if
• Maven is one of the popular build tools used with they require data that's spread across the cluster for some kind
Hazelcast. There is also the option of using Gradle. of computation or processing.

MAVEN DEPENDENCIES CLIENTS

All the dependencies for Hazelcast (any edition) are available on public Clients are Java programs that include the client (i.e. Hazelcast-

Maven repositories, in addition to the Hazelcast download page. client) JAR in their build, read, or create config that helps them
find a cluster and perform the widest scope, typically, of client
• Server: The server will be in one of two forms requests. These will be in your web-clients, your command-
— hazelcast or hazelcast-all — which also includes line tools, or anywhere you need to interface your systems
client dependencies. with Hazelcast. Don't think, though, that because they're
• Client: The Hazelcast client is generally included from clients, you're going to be doing all your processing there. Well
hazelcast-client, and that's the only addition to your written clients will use server-side constructs — particularly entry
client app build. processors, aggregation, and executor tasks to delegate processing
requests from single-threaded clients onto a massively scalable
PROGRAMMING MODELS clustered storage and processing environment.
Hazelcast is a toolkit. There are common patterns you can
Not everything will be delegated to the back-end, of course.
employ, but really, it's just Java. You can design your own
Many, many clients simply require extremely low latency access
infrastructure to meet your needs in any way you see fit. Here are
to fast, big data that isn't changed too often and isn't changed
some common deployment models:
(ideally) by separate clients (i.e. sticky sessions are good). For
EMBEDDED MEMBER these, near-caches are extremely effective. Each member can
Embedded members are really the easiest way to get started and host (within its process space) potentially large subsets of data
for some things, they may be all that you need. An embedded that are being actively managed by the cluster. We're talking

3 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO HAZELCAST IMDG

mostly about the open-source version here, but it's worth noting that can be used, or thresholds on the heap utilization that will
that IMDG Enterprise HD will allow off-heap near-caches, giving trigger eviction. In addition to evicting on space, you can set an
you low latency access to potentially many gigabytes of near- expiration interval on your data — you decide up front.
cache data in each client. This has a broad range of applications
A SLIGHTLY MORE ROBUST SERVER
across industries; real-time inventory for e-commerce and fraud
We can do better on the server code.
detection for credit card processors are two. Note that in neither
of these is the data static — that's not a requirement. But the package com.hazelcast.tao.gettingstarted;

data is read much more often than it's changed, making both of import org.slf4j.logger;
these ideal cases for near-caching. import org.slf4j.loggerFactory;
import org.springramework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.
A SIMPLE SPRING BOOT CLUSTER MEMBER
SpringBootApplication;
So, finally a little more code. All of this is on GitHub, so you can
look at the POM (pom.xml) file there. It's basic — you just need @SpringBootApplication
public class ServerV1
the spring-boot parent entry and the Hazelcast dependency {
(from above). Make your main class a Spring Boot application
private final static logger L =
with the @SpringBootApplication annotation. I'm adding the @
LoggerFactory.getLogger(ServerV1.class);
Configuration annotation so I can have one class that serves up
public static void main(String[] args)
the beans and executes them.
{
L.info("Calling the Spring
CONFIGURING HAZELCAST Application 'run' method");
So, you have already run code — why talk about configuration SpringApplicationrun(ServerV1.
class, args);
now? Because the simple examples use all the defaults. While
}
they're interesting to run, you wouldn't really go much past }
DZO N E .CO M/ RE FCA RDZ

"hello world" with that.


Here's a concise server. Everything that it's going to do is going to
The main things you'll want to change are the network be injected by Spring Boot. It relies on some configuration code:
configuration — particularly the join configuration — and the
import com.hazelcast.config.Config;
definition of Maps and caches. import com.hazelcast.core.Hazelcast;
import com.hazelcast.core.HazelcastInstance;
The join configuration dictates how members find each other.
It starts with multicast, which probably won't fly in most @Configuration
public class ServerConfig
production environments, as multicast traffic is generally {
frowned upon by network administrators. TCP discovery is great, @Bean
public Config config(int instanceId)
if you know the addresses and they don't change. A data center
with dedicated hardware works well. In the cloud, however, this {
falls apart — you don't know the IPs up front and they're pretty Config config = new Config();

much guaranteed to change. There are cloud-based cluster config.getGroupConfig()


discovery plugins that will make this work well and easily, but .setName("dev");
config.getGroupConfig()
that's too much for this intro.
.setPassword("dev-pass");
return config;
Maps (IMAP) are the workhorse of distributed storage data- }
structures, and caches (ICache) are basically the same thing but
@Bean
for the JCache (JSR107) API. By default, you can create a Map
public HazelcastInstancehazelcastInstance\
using default configuration and you get a container with no (Config config)
limitations on it. The data is not bounded, it doesn't expire, and {
HazelcastInstance instance =
it isn't evicted — the problems with that should be obvious. The Hazelcast.newHazelcastInstance(config);
system will store and retain data until it runs out of memory
return instance;
and that's ugly. For any IMDG, you want to be careful about
managing memory, and Hazelcast has lots of options, e.g. using }
a number of objects in the cache, an absolute size of memory }

4 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO HAZELCAST IMDG

Spring Boot will create a default instance of Hazelcast — SIMPLE MAP ACCESS
which may not give you what you want. Having a bean for This part is easy — a Hazelcast IMap is a java.util.map, so you
the config and one for an instance can be useful. Here's the can take existing code for the Java Collections API and just
commandLineRunner that makes the Spring app work: repurpose it. Here's a little code showing how easy that can be:

package com.hazelcast.tao.gettingstarted; package com.hazelcast.tao.gettingstarted.port;

import java.util.Date; import java.util.Map;


import java.util.concurrent.ConcurrentHashMap;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.Logger;
import org.springframework.boot.CommandLineRunner;
import org.slf4j.LoggerFactory;
import org.springframework.context.ApplicationContext;
import org.springframework.context.
ApplicationContextAware; import com.hazelcast.core.HazelcastInstance;
import org.springframework.stereotype.Component; import com.hazelcast.core.HazelcastInstanceAware;

import com.hazelcast.core.HazelcastInstance; public class MapClientDemo implements


import com.hazelcast.core.IMap; HazelcastInstanceAware
{
@Component("commandLineRunner")
public class ServerRunner implements CommandLineRunner,
private HazelcastInstance hazelcastInstance;
ApplicationContextAware
{
private final static Logger L =
LoggerFactory.getLogger
private final static Logger | =
LoggerFactory.getLogger(ServerRunner.class); (MapClientDemo.class);

private ApplicationContext applicationContext; public void oldMethod()


DZO N E .CO M/ RE FCA RDZ

{
@Override Map<String, String> myMap = new
public void run(String... args) ConcurrentHashMap<>();
{ String key = "SomeKey";
String value = "Just a random
Object bean =applicationContext.
string";
getBean("hazelcastInstance");
myMap.put(key, value);
HazelcastInstance member =
(HazelcastInstance) bean;
L.info("getting key {} yields
System.out.printIn("this was all
that was needed to start a member"); {}", key, myMap.get(key))
IMap<String, String> map = member. }
getMap("foo");
for (int i = 0; i < 10; i++) public void hzMethod()
{ {
//wait-shouldn't this be an 'IMap'?
map.put("key:" + i, "value: Map<String, String> myMap =
" + i + ":: "+ new Date().
hazelcastInstance.getMap("myMap");
toString());
String key = "SomeKey";
}
String value = "Just a random
string";
l.debug("at startup, map size:
{}", map.size()) myMap.put(key, value);

} L.info("getting key {} yields


{}", key, myMap.get(key));
publicApplicationContext getApplicationContext() }
{
// getters, setters, random noise follow.
return applicationContext;
}
}

@Override
In that bit, there's a method that creates a Map and uses it. In
public void setApplicationContext(Application
Context applicationContext) the second method (hzMethod), the only change was to use the
}
injected Hazelcast instance (injected via annotation) to get a

5 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO HAZELCAST IMDG

reference to a distributed map in the IMDG. There's no magic — and very careful about using them for things that are updated
Hazelcast is designed so that you can swap it in that easily, using from multiple points. For a web application with sticky sessions,
the familiar Collections API. But back to that comment for a you should be able to count on certain objects being in only one
second...shouldn't the declaration have been IMap, not Map? It client process — that's a good scenario.
depends: It could be, but it doesn't need to be. Hazelcast Maps
SIMPLE QUERY OPERATIONS
implement the java.util implements, so that's valid, but maybe
SQL QUERIES
not useful. In a minute, we're going to use some Hazelcast-specific
Hazelcast is not an SQL database or a SQL query tool, but it
methods on the Map, and to make those visible, you want to
provides a workable, robust subset of SQL query functionality. It's
change the declaration. If you're just doing put, get, size, remove, accessible for developers. If you have an SQL background, this is
and all of those, then no. One interesting note on that: It's easy nothing; if you don't, it's still pretty intuitive. The SqlPrecicate
to forget that "put" returns the old mapping, as it inserts the new. encapsulates the where clause of a query. Since you're dealing
Think about that in a network environment: When you do a "put", with purely in-memory data, this is going to be very fast.
Hazelcast (conforming to the contract) returns the old mapping
public void sqlQueryDemo()
over the network, incurring serialization for no reason because
{
nobody ever looks at it. Hazelcast has added a "set" method that IMap<Integer, Employee> employees =
works like "put", save that it doesn't return the value. This may hazelcastInstance.getMap("employees");
Employee emp = new Employee();
seem like small stuff, but think about a heavily utilized production
emp.setId(Integer.valueOf(1));
environment getting a surge of requests; you're busy and half of emp.setFirstName("John");
that flavor of network traffic is stuff you're never going to look at. emp.setLastName("Doe");
emp.setAge(new Random(System.
Change two letters in your code and the network traffic drops —
currentTimeMillis()).nextInt(99));
possibly by lots. emp.setDeptId(Integer.valueOf(13));
DZO N E .CO M/ RE FCA RDZ

Keep in mind, however, that there are differences. It's a distributed // put the dummy employee in the map using
Map — aside from security configuration, other clients/threads employee-id as the map key
employees.set(emp.getId(), emp);
can use the same Map. If you test the size of a new in-process Map
that you create in your thread, the size will be "0". When you get a
Predicate sqlPredicate =
reference to a distributed collection from the IMDG, it will create new SqlPredicate(String.format
it (if required), or return a reference to an existing collection if it's ("lastName = '%s'",
emp.getLastName()));
already been created. This can be a very powerful feature — you
Collection<Employee> matching = employees.
can pre-populate a collection from a persistent store or any other values(sqlPredicate);
data-source. Your client code will be smaller and simpler because
// wildcards are supported, too - look for
you can make assumptions about it. If you're using a Map for a
last names starting
scratchpad cache; however, keep in mind that you may want to // with 'D' using the same 'values' call.
create unique map instances or manage data so that your thread sqlPredicate = new SqlPredicate("lastName

doesn't collide with other clients. like 'D%'");


matching = employees.values(sqlPredicate);

NEAR CACHE ACCESS // compound predicates work, as well


Hazelcast supports second-level, or edge, caching in client sqlPredicate =
processes and refers to it as near caching. Near caches are new SqlPredicate("lastName like 'D%' and
"+ "age between 21 and 30");
almost transparent to your code — there are things you need
matching = employees.values(sqlPredicate);
to be aware of, though. Each mapping in the near cache is
fundamentally managed by the IMDG member that owns the // this could go on, but it's a pretty
"master" copy of the data. It may be cached on multiple clients in robust subset of sql
// functionality.
multiple apps, and each caching client app may define their own
sqlPredicate = new SqlPredicate("deptId not
policy for managing updates. The data in your near cache may in (13, 23, 33)");
be stale — you probably shouldn't cache things for overly long matching = employees.values(sqlPredicate);

times (you set the expiry interval in config). You should be careful
}
about using near caches for things that are updated frequently

6 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO HAZELCAST IMDG

This shows how easy it is to bridge an SQL background with Predicate<Integer, Employee> lastName = equal
("lastName", "Doe");
the IMDG SQL-like query. The caveat here is that out-of-the-box,
Predicate<Integer, Employee> age = greaterThan("age",
IMDG is not a good tool for joins because of the nature of the Interger.valueOf(99));
data. We split it up because it's big data, and because it's big
Predicate<Integer, Employee> dept = equal("deptId",
data, joining it back together is more complex. I mentioned Jet
Integer.valueOf(13));
earlier; this would a useful tool for that.
Predicate<Integer, Employee> ageEtAl = and(age,
PREDICATE QUERIES: CRITERIA API lastName, dept);

For Java developers who really never liked SQL, there's also a
// construct a paging predicate to get max of 10
pure Java approach to querying the IMDG: the Criteria API.
objects per

public void criteriaAPIDemo() // page, based on


// the previously constructed predicate.

{ PagingPredicate<Integer, Employee> pagedResults =

IMap<Integer, Employee> employees = new PagingPredicate<>(ageEtAl, 10);

hazelcastInstance.getMap("employees"); while (true)


{
Collection<Employee> oldPeople =
<snip> same employee as before
employees.values(pagedResults);
}
// create a couples predicate and then, the
and of them
This was only taking the previously used predicate and
Predicate<Integer, Employee> lastName =
equal("lastName", "Doe"); returning the results in paged sets, with the page size set to 10
Predicate<Integer, Employee> age = in this example.
greaterThan("age",
DZO N E .CO M/ RE FCA RDZ

Integer.valueOf(99)); AGGREGATIONS
Predicate<Integer, Employee> dept = equal Aggregations — or, as they're now called, "fast aggregations" —
("deptId", allow data query and transformation to be dispatched to the
Integer.valueOf(13));
cluster. It can be extremely effective. Keeping with the Employee
// and is a variadic method, so you can
class and the "employees" Map from the other examples, let's do
just keep
// adding predicates and get the logical a quick and dirty aggregation. You could just do the aggregation
'and' of all of them across the entire entry set of the Map, but using a predicate to filter,
Predicate<Integer, Employee> ageEtAl = or map, the objects before the aggregator does the reduction on
and(age,
them will prove more effective. Department 13 is used to represent
lastName, dept);
group-W. You know those people; they're everywhere.

Collection<Employee> matching = employees.


This is going to create an anonymous class, which is quick and
value(ageEtAl);
}
easy and highly effective for this.

public void simpleAverageAgeAggregation(IMap<Integer,


One thing that, looking at this, really needs to be mentioned, Employee> employees)
if only briefly, is memory in the query client. Hazelcast query {

operations can quickly retrieve very large volumes of data. If you


Predicate<Integer, Employee> deptPredicate =
had, say, 20 storage members and queried the cluster bringing equal("deptId", Integer.valueOf(13));
back 1GB or so of data from each, you'd be looking at 20GB of data Aggregator<Map.Entry<Integer, Employee>,
Double> ageAggregator
from one query. You may or may not have that much memory
= new Aggregator<Map.Entry<Integer,
available in your client. The fix for that is paging predicates. Employee>, Double>()

These predicates wrap (logically subsume the functions of) other


{
predicates so that you get the same logical comparisons — the <snip> - serialVersionUID:
same filtering — wrapped in a container that lets you bring results don't forget this is
going over the wire
back in batch sizes that you specify. It's as though you're reading a
in serialized format.
printed book and seeing one page at a time.
CODE CONTINUED ON NEXT PAGE

7 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO HAZELCAST IMDG

protected long sum = 01; allow access to the count and the sum. The final step, also in the
protected long count = 01;
calling member, the aggregate method is invoked and performs
@Override the simple calculation of average age. This is really not much
public void accumulate(Map. code for that kind of power.
Entry<Integer, Employee> entry)
{
There is also a very complete set of built-in aggregators for things like
count++;
min, max, count, and avg of different numeric types, distinct for any
sum += entryget Value() comparable type. They can be used with very little setup, like this:
.getAge();
}
// get a list of distinct last name
Set<String> lastNames = employees
Override
public void combine(Aggregator .aggregate(Aggregators.<Map.Entry<Integer,
aggregator) Employee>, String>)
{
this.sum += this getClass() Implicit there was a static import of distinct.
.cast(aggregator).sum;
this.count += this getClass()
ENTRY PROCESSORS
.cast(aggregator).count;
Entry processors are pretty cool. You can do highly efficient
}
in-place processing with a minimum of locking. Consider what
@Override people often end up doing to work with remote objects: lock a key,
public Double aggregate()
fetch the value, mutate the value, put it back (in a finally block),
{
if (count == 0) and unlock the key. That's four network calls to start with — three
{ if you're only looking at the data and not updating the central
return null;
}
source. Your objects may be large and incur significant cost in
DZO N E .CO M/ RE FCA RDZ

double dsum = (double) sum; terms of CPU and network for serialization and transport.
return Double.valueOf(dsum
/ count); Entry processors allow you to dispatch a key-based "task" object
}
across the LAN — directly to the member owning a key, where it
};
is executed in a lock-free, thread-safe fashion. Hazelcast has an
// find the average age of employees in department 13 interesting threading model that allows this to happen.
Double avgAge =
employees.aggregate(ageAggregator,deptPredicate);
Here's a brain-dead simple entry processor example — but it's
L.info("average age: {}", avgAge); still a really useful approach:
}
} @Component
@Profile("client")
public class EntryProcessorRunner implements
The code is pretty self-explanatory. The predicate will ensure that
CommandLineRunner, ApplicationCon
only matching elements from the distributed map are included {
in the calculation. The aggregator will "accumulate" data — private ApplicationContext applicationContext;
@Override
examining the matching subset and adding the age into the sum
public void run(String... args) throws Exception
— but where does that happen? The accumulate call is called on {
each storage member (i.e. not the clients and not Lite members); HazelcastInstance instance = (HazelcastInstance)
it's passed by each filtered (by deptPredicate) matching entry applicationContext.ge
IMap<String, String> demo = instance getMap
and it accumulates the raw values. Note that these run in parallel
("demo");
on each member involved. Because the data is partitioned across String key = "someKey";
members and only a filtered subset is processed, it's going to be demo.set(key, "Just a String value...");

very fast. In the second phase, each of these aggregator instances demo.executeOnKey(key, new DemoEntry Processor
());
are returned to the caller for processing — the instance of the
EntryProcessor<String, String> asyncProcessor =
anonymous aggregator class examines each returned aggregator new DemoOffloadableEnt
(instances of the same anonymous class) and combines all demo.submitToKey(key, asyncProcessor);
ExecutionCallback<String> callback = new
the raw results. In that part of the code, because this wasn't a
AsynchCallbackDemo();
concrete class, it's necessary to call the class.cast() method, to
CODE CONTINUED ON NEXT PAGE

8 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO HAZELCAST IMDG

demo.submitToKey(key, asyncProcessor, callback); <snip>


} public final static String OFFLOADABLE_EXECUTOR =
<snip> } "default";
@Override
Here's the entry processor that is called in this: public Object process(Entry<String, String> entry)
{
public class DemoEntryProcessor extends
String key = entry.getKey();
AbstractEntryProcessor<String, String>
{ String value = entry.getValue();
<snip> l.info("in-place processing called for {}::{}",
@Override key, value);
public Object process(Entry<String, String> entry) SimpleDateFormat sdf = new SimpleDateFormat
{ (dtFormat);
String key = entry.getKey(); String newValue = String.format("This value was
String value = entry.getValue(); modified at %s -- %s",
l.info("in-place processing called for {}::{}", entry.setValue(newValue);
key, value); return newValue;
SimpleDateFormat sdf = new SimpleDateFormat }
(dtFormat); @Override
entry.setValue(String.format("This value was public String getExecutorName()
modified at %s -- %s", sd {
return null; return OFFLOADABLE_EXECUTOR;
} <snip> }
}

We recommend submitting the processing, tagging it with a


Here, we describe the simplest case, but as simple as it is, this is
callback, and going on your way. This could be used if you have
a very powerful tool to have. From your client, you can dispatch
DZO N E .CO M/ RE FCA RDZ

a stream of data requests that need to be initiated without the


computation to your servers — no locking, no contention,
caller needing to see a result. If you're going to do this, be sure to
efficient use of the LAN. But you're waiting for the result to be
read ahead to the section on back pressure.
returned, so it may not seem like a really big deal. Hazelcast also
has a number of options to run synchronously, asynchronously, The execution callback executes in the caller's process space, so
and on multiple entries. it's your notification that the invocation is complete. This one
just logs, like this:
executeOnKey(Object, EntryProcessor); //synch
submitToKey(Object, EntryProcessor); //asynch
package com.hazelcast.tao.gettingstarted.executors;
submitToKey(Object, EntryProcessor, ExecutionCallback);
//asynch
<snip> - imports
executeOnEntries(EntryProcessor); //synch
executeOnEntries(EntryProcessor, Predicate); //synch
public class SimpleExecutionCallback<V> implements
executeOnKeys(Set, EntryProcessor); //synch
ExecutionCallback<V>
{
This is a really useful set of calls — the first one, executeOnKey,
@Override
does exactly that; it makes one direct call to the key owner and
public void onFailure(Throwable throwable)
executes synchronously on that entry. The next two execute {
asynchronously — your client code doesn't need to wait for long L.error("execution failed - {}"),
throwable.getMessage());
running operations. A word of warning, though — long-running
}
entry processors can be truly evil. Use off-loadable in your code,
by annotation, to tell Hazelcast to move the operation off of the @Override
default threading structure. public void onResponse(V value)
{
Here's an async call, similar in function to the first. L.info("processing complete
- {}", value)
public class DemoOffloadableEntryProcessor extends }
AbstractEntryProcessor<Stri }
{

CODE CONTINUED ON NEXT COLUMN Because nothing is free, there's a bit of server-side configuration

9 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO HAZELCAST IMDG

to go with this. Back pressure, both in the client and in server {


private ApplicationContext applicationContext;
members, is important when using asynchrony. Configuring it
@Override
can be done with system properties like this: public void run(String... args) throws Exception
{
# the default is false, you should enable it for async
HazelcastInstance client = (HazelcastInstance)
stuff hazelcast.backpressure.enabled=true
applicationContext.getB
IExecutorService executorService = client.
# that's the default, look at your hardware resources getExecutorService("default"
before increasing it. hazelcast.backpressure.max. executorService.executeOnAllMembers(new
concurrent.invocations.per.partition=100 LoggingRunnable());
} <snip>
# this one really just relates to making async backups, }
not async data operations,
# but I thought I'd put it here for completeness. If The Runnable object is pretty ordinary, with the caveat
the system cannot keep up with that it needs to be serializable. This runnable is also
# async backup requests, they will periodically turn
HazelcastInstanceAware so that when it's set up on the target
into sync backups based on
# this window. (millis, of course)
node, the Hazelcast framework will inject the correct instance, so
hazelcast.backpressure.syncwindow=1000 that it can communicate with the cluster.

@Component("loggingRunnable")
# configure how long hz will wait for invocation
public class LoggingRunnable implements Runnable,
resources hazelcast.backpressure.backoff.timeout.
HazelcastInstanceAware
millis=60000
{
<snip>
Back pressure is a topic worth some consideration. Hazelcast is
private HazelcastInstance hazelcastInstance;
using its threading model to execute these, so there's a limit to
DZO N E .CO M/ RE FCA RDZ

@Override
how many invocations can be in flight at any one point in time. public void run()
The absolute number doesn't matter, as that would depend
{
upon the size of your cluster and the number of CPUs/cores/
l.info("into run, cluster size: {}",
physical threads. What's likely to be interesting is how many
getHazelcastInstance().getCluster
can be queued up for one partition --- by default, mutating }
entry processors operate on a partition thread. In configuring
your cluster, you know how many physical threads you have, so <snip>
}
you can configure the partition thread count to be a somewhat
sensible number. Too few and you'll have contention; too many This code wasn't particularly profound, but there's one cool
(one-to-one sounds ideal, though it rarely is) and you won't get aspect to it — you can direct processing to a member that owns
good resource utilization. a key (or other members) and process that key and or other keys
in multiple maps. So, complex manipulation may be performed
RUNNABLE TASKS
outside your client, eliminating multiple network round-trips.
These are simply Java runnable objects that are dispatched to
They data need not come all from one member, either — there
one or more members. Keep in mind that the salient part of the
are no restrictions on that. It can be a significant performance
signature is public void run() — i.e. nothing is returned. The
boost to design your data so that related items are all within one
way that they're dispatched to the members is very flexible; it
node — then, this kind of task will tend not to make network calls
can be one member, all members, a member that owns a key, or
but will not be restricted from doing so.
members selected by an attribute you have set on them.

CALLABLE TASKS
Here's an example of running something on the member that
As with runnable tasks, callable tasks are dispatched to one or more
owns a key:
members, but offer more options for things like bringing back data.
@Component("runnableDemo") Here's a really simple callable that will be dispatched to a member,
@Profile("client")
log some noise to show it ran, and return the partition count. There
public class RunnableDemoCaller implements
CommandLineRunner, ApplicationConte are better ways to monitor or manage partitions, but this should just
CODE CONTINUED ON NEXT COLUMN show how you get a value — easily — from a member.

10 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO HAZELCAST IMDG

package com.hazelcast.tao.gettingstarted.port; heavily from the Java Executors API:

<snip> - imports. They should be just what you'd expect public void callableTaskDemoMembers(Set<Member>
members)

public class PartitionReporter implements throws InterruptedException, ExecutionException


{
Callable<Integer>, HazelcastInstanceAware
Callable<Integer> partitionReporter = new
{
PartitionReporter();
<snip> - local data, ctor and such

IExecutorService executorService =
@Override
hazelcastInstance.
public Integer call()
getExecutorService("default");
{
Member member = hazelcast
Map<Member, Future<Integer>> futures =
Instance.getCluster()
executorService.
.getLocalMember(); submitToMembers(partitionReporter,
String fmt = "member listening on members);
%s:%d"; // process these
String whoAmI = String format (fmt, }
member.getSocketAddress()
.getHostName(),
EVENTS
member.getSocketAddress()
There are lots of events, but for just right now, let's stick to the
.getPort());
data events: listeners and interceptors. This is still a fairly big
PartitionService service =
hazelcastInstance.get
topic, so let's talk about a workable subset of it. Within data-
PartitionService(); data events, there are what are called map events and entry
boolean memberIsSafe = service. events. Map events are called for map-level changes, specifically
isLocalMemberSafe();
DZO N E .CO M/ RE FCA RDZ

clear() or evictAll(). Entry events are called after changes


to map-entries and there's an interesting set of those changes —
String memberSafety = memberIsSafe?
there are events for entries being added, removed, updated, and
"": "not ";
evicted. This isn't a listener for get, however. The entry events
boolean clusterIsSafe = service. are only for changes. In that context, it makes sense that there's
isClusterSafe(); an interceptor for "get". Events may be added in a number of
String clusterSafety = cluster
ways — they can be added in configuration or programmatically.
IsSafe? "": "not ";
In addition to that, they can be done from a member or a client.
int partitionCount = Within each member, you get local events — events triggered
service.getPartitions() by (after) data-mutating events within that JVM. Here's a simple
.size(); example of listening for entry:
fmt = "executing in member
@ {} - cluster is {}safe, " + public static class My EntryAddedListener<K, V>
"member is {}safe, hosting implements EntryAddedListener<K, V>
{} partitions"; {
L.debug(fmt, whoAmI, Override
clusterSafety, memberSafety, public void entryAdded(EntryEvent<K, V> event)
partitionCount); {
String whoami = event.getMember()
return Integervalue Of(
partitionCount); .getSocketAddress()
} .getHostName() + ":"
+ event.getMember()
<snip> - getters, setters .getSocketAddress()
} .getPort();
L.trace("member: {} - added
- key: {}, value: {}", whoami,
That's just looking at the member, which you might want to
event.getKey(),event.
do. Importantly, if you wanted to get/set/remove data, run a getValue());
query, or perform any other Hazelcast operation, you can do that }
from that code. The call is easy. As with the collections, it draws }

11 BROUGHT TO YOU IN PARTNERSHIP WITH


INTRODUCTION TO HAZELCAST IMDG

Adding the entry-added listener could be done in config, but will create high volumes of events. Look carefully at your
here's how to use the Java API to do it: resources, like client CPU/RAM and especially network. Think in a
distributed perspective and put the listener where it needs to be,
public void addEntryAddedListener(IMap<String, String>
not simply where it seems convenient.
myMap)
{
myMap.addEntryListener(newMyEntryAdded
CONCLUSION
Listener<>(), true); This is just a little of what you can do with Hazelcast. Hazelcast
} has been doing distributed systems for some time now; it is
deliberately designed to deliver performance and simplicity. You
This code will add the entry listener — listening only for entries can be up-and-running in minutes and rolling out production-
being added. The boolean parameter tells Hazelcast that the quality code that looks an awful lot like your Java collections
value should be available (i.e. getValue()) in the entry event code. It's a fun environment for programmers. A little Java gets
that's going to be delivered to the listener. can be all you need on the server side, then you can cut loose
with Java, .NET, C++, Node.js, Python, Go, or Scala — and that list
Clients may add these listeners, also — in addition to client
is going to grow as new languages emerge.
lifecycle events, cluster membership events and distributed
object creation/deletion. So, they may be notified of their
own client lifecycle: starting, started, shutting down, and
shutdown; they may be notified of membership changes or
storage members joining and leaving, and they may be notified
of distributed object creation or destruction — Maps, caches,
queues, and all. A word of caution, though: High-volume activity
DZO N E .CO M/ RE FCA RDZ

Written by Tom OConnell, Senior Solutions Architect, Hazelcast


is a Senior Solutions Architect at Hazelcast. He is interested in Java, distributed computing topics, and
IMDG in particular.

DZone, Inc.
DZone communities deliver over 6 million pages each 150 Preston Executive Dr. Cary, NC 27513
month to more than 3.3 million software developers, 888.678.0399 919.678.0300
architects and decision makers. DZone offers something for
Copyright © 2018 DZone, Inc. All rights reserved. No part of this publication
everyone, including news, tutorials, cheat sheets, research
may be reproduced, stored in a retrieval system, or transmitted, in any form
guides, feature articles, source code and more. "DZone is a or by means electronic, mechanical, photocopying, or otherwise, without
developer’s dream," says PC Magazine. prior written permission of the publisher.

12 BROUGHT TO YOU IN PARTNERSHIP WITH

Potrebbero piacerti anche