Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Deepak Mali
1
RIGI PUBLICATION
Versatile Java
By
Deepak Mali
2
PREFACE
This book wouldnt have seen the light of the day without the
blessings of my parents Prof. S.L. Mali and Mrs. Sita and a
constant support and encouragement of my wife Mrs. Shikha
who is herself a computer enthusiast and helped me a lot with
the proofreading and corrections to make the book better. My
sisters Dr. Sunita, Er. Vinita and Dr. Chetna have been my
constant source of inspiration throughout the journey.
3
I will be happy to learn about any errors which could have crept
in by mistake and suggestions for further improvements of the
book .My email ID is ssm.deepak@gmail.com
4
TABLE OF CONTENTS
2. Concurrency in Java 22
5
CHAPTER 1 - FUNCTIONAL PROGRAMMING
USING LAMBDA EXPRESSIONS
Getting Started
@Override
System.out.println("Hello There");
6
A thread invoking the above Runnable target can be spawned
using the below line of Program.
t.start();
t.start();
System.out.println("Hello Lambda");
System.out.println("Runnable");
});
7
Console Output is:
Hello Lambda
Runnable
button.addActionListener(new ActionListener() {
System.out.println("button clicked");
});
button.addActionListener(event ->System.out.println("button
clicked"));
8
into functional interfaces later) defined to represent an add
method.
button.addActionListener(event->System.out.println(name));
(2)
Functional Interfaces
@FunctionalInterface accept(Object)
@FunctionalInterface test(Object)
Represents a predicate
(boolean-valued function) of
one argument.
10
@FunctionalInterface apply(Object)
public interface
Function<T,R>
Represents a function that
accepts one argument and
produces a result
@FunctionalInterface get()
Represents a supplier of
results
@FunctionalInterface Function.apply(Object)
public interface
UnaryOperator<T>
extends Function<T,T>
Represents an operation on a
single operand that produces a
result of the same type as its
operand.
@FunctionalInterface BiFunction.apply(Object,
Object)
public interface
BinaryOperator<T>
extends BiFunction<T,T,T>
11
type, producing a result of the
same type as the operands
long count = 0;
Iterator<Trade>trades_iterator = trades.iterator();
while (trades_iterator.hasNext()) {
trade_region = trades_iterator.next().getTrade_region();
if (trade_region.equals("EMEA"))
count++;
13
programming paradigm.Thus, the above code can be modified
as below
long l = trades.stream().filter(trade -
>trade.isTradeRegion("EMEA"))
.count();
collect(toList())
List<String>trades_collected=Stream.of("Trade1","Trade2","Tr
ade3"). collect(Collectors.toList());
map
14
List<String>trades_mapped=Stream.of("trade1","trade2","trade
3").map(String->String.toUpperCase()).collect(Collectors.toList());
intreduce_operation_value = Stream.of(1, 2, 3)
Console Output: 6
Method References
15
expression .We can also use the abbreviated notation for new
Object in the following way .
trades.stream().collect(Collectors.averagingLong(Trade::getTra
saction_amount));
System.out.println(reduce_operation_value);
Element Ordering
List<Integer>numbers=asList(1,2,3,4);
List<Integer>sameOrder=numbers.stream().collect(toList());
Set<Integer>numbers=newHashSet<>(asList(4,3,2,1));
List<Integer>sameOrder=numbers.stream()
.collect(toList());
16
// sorting ensures encounter order
.sorted()
.collect(toList());
trades.stream().collect(Collectors.averagingLong(Trade::getTra
saction_amount));
We can also group the data which the Stream has given us using
the groupingBy() method .
// the below code group the Stream data based on the Trader.
trades.stream().collect(Collectors.groupingBy(Trade::getTrader
));
17
String
result=Stream.of("Trade1","Trade2","Trade3").collect(Collecto
rs.joining(",","[","]"));
Optional<String> a = Optional.of("a");
System.out.println(a.get());
Optional<String> b =Optional.empty();
System.out.println(b.isPresent());
Optional<String> c = Optional.ofNullable(null);
if(c.isPresent())
System.out.println(c.get());
Console Output : a
false
<No Output>
18
Parallel Stream Operations
ArrayList<Trade>list_of_trades = new
ArrayList<Trade>();
for(inti=0;i<=100;i++)
list_of_trades.add(new Trade(1000000+i,
"APAC"));
for(inti=101;i<=200;i++)
list_of_trades.add(new Trade(1000000+i,
"EMEA"));
for(inti=201;i<=300;i++)
list_of_trades.add(new Trade(1000000+i,
"NA"));
for(inti=301;i<=400;i++)
list_of_trades.add(new Trade(1000000+i,
"JAPAN"));
19
ConcurrentMap<Long, List<Trade>>map =
list_of_trades.parallelStream()
.filter(r ->r.getRegion().equals("EMEA"))
.collect(Collectors.groupingByConcurrent
(Trade::getTrade_id));
class Trade
privatelongtrade_id ;
super();
this.trade_id = trade_id;
this.region = region;
publiclong getTrade_id() {
returntrade_id;
20
publicvoid setTrade_id(longtrade_id) {
this.trade_id = trade_id;
returnregion;
this.region = region;
21
CHAPTER2 CONCURRENCY IN JAVA
Getting Started
As you can see, both concepts are very similar and this
similarity has increased with the development of multicore
processors.
22
Synchronization and Immutable Objects
23
There are different mechanisms to get synchronization in a
concurrent system. The most popular mechanisms from a
theoretical point of view are:
24
An example of an immutable object is the String class in Java.
When you assign a new value to a String object, you are
creating a new string
25
Possible Problems in Concurrent Applications
Data Race
You can have a data race (also named race condition) in your
application when you have two or more tasks writing a shared
variable outside a critical sectionthat's to say, without using
any synchronization mechanisms.
package com.packt.java.concurrency;
public class Account {
private float balance;
public void modify (float difference) {
float value=this.balance;
this.balance=value+difference;
}
}
Imagine that two different tasks execute the modify() method in
the same Account object. Depending on the order of execution
of the sentences in the tasks, the final result can vary. Suppose
that the initial balance is 1000 and the two tasks call
the modify() method with 1000 as a parameter. The final result
should be 3000, but if both tasks execute the first sentence at
the same time and then the second sentence at the same time,
the final result will be 2000. As you can see,
the modify() method is not atomic and the Account class is not
thread-safe.
26
Deadlock
27
Detection: The system has a special task that analyzes the
state of the system to detect if a deadlock has occurred. If it
detects a deadlock, it can take action to remedy the problem.
For example, finishing one task or forcing the liberation of a
resource.
Livelock
A livelock occurs when you have two tasks in your systems that
are always changing their states due to the actions of the other.
Consequently, they are in a loop of state changes and unable to
continue.
28
Resource starvation
Fairness is the solution to this problem. All the tasks that are
waiting for a resource must have the resource in a given period
of time. An option is to implement an algorithm that takes into
account the time that a task has been waiting for a resource
when it chooses the next task that will hold a resource.
However, fair implementation of locks requires additional
overhead, which may lower your program throughput.
Priority inversion
Executors
29
ThreadPoolExecutor: This is a class that allows you to get an
executor with a pool of threads and optionally define a
maximum number of parallel tasks
package com.java8.tutorial.threads;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
30
import java.util.concurrent.Future;
publicclass ExecutorTest {
Future<String>future = null;
ExecutorService executorService =
Executors.newFixedThreadPool(20);
future = executorService.submit(new
Callable<String>() {
return
(Thread.currentThread().getName() + " Asynchronous Callable
");
});
try {
System.out.println(future.get()
+ counter);
} catch (InterruptedException |
ExecutionException e) {
e.printStackTrace();
31
}
package com.java8.tutorial.threads;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveAction;
32
import java.util.concurrent.RecursiveTask;
arr_int[i]=i;
System.out.println(final_ans);
{ int lower_index;
int upper_index;
int[] int_array;
33
static final int UPPER_BOUND = 2000;
{ lower_index=low;
upper_index=high;
int_array = arr;
@Override
sum+=int_array[i];
System.out.println("Simple
Computation");
return sum;
else
34
Task left = new Task(lower_index,
mid, int_array);
left.fork();
long rightans= right.compute();
long leftans= left.join();
System.out.println("Fork Join is executed ");
return leftans+rightans;
}
35
To overcome those problems, the producer thread must wait
until its notified that the previously produced data item has
been consumed, and the consumer thread must wait until its
notified that a new data item has been produced. Below
code shows you how to accomplish this task
via wait() and notify().
package com.java8.tutorial.threads;
publicclass ProducerConsumer {
privatecharc ;
privatevolatilebooleanwritable=true ;
publicsynchronizedvoid setC(charc){
while(!writable){
try {
wait();
} catch (InterruptedException e) {
e.printStackTrace();
this.c=c;
writable = false ;
notify();
36
publicsynchronizedchar getC(){
while (writable){
try {
wait();
} catch (InterruptedException e) {
e.printStackTrace();
writable = true;
notify();
returnc;
ProducerConsumer pc =newProducerConsumer();
p.start();
c.start();
37
}
ProducerConsumer pc;
Producer(ProducerConsumer pc){
this.pc=pc;
pc.setC(c);
System.out.println("Producer" + c);
}}
ProducerConsumer pc ;
this.pc =pc;
publicvoid run ()
38
{
charc;
do {
c =pc.getC();
System.out.println("Consumer" + c);
while (c !='Z');
Phasers
39
The java.util.concurrent.Phaser class implements a phaser.
Because this class is thoroughly described in its Javadoc, Ill
point out only a few constructor and methods:
package com.java8.tutorial.threads;
import java.util.concurrent.Phaser;
publicclass PhasorTest {
40
Phaser phasor =new Phaser();
phasor.register();
System.out.println(phasor.getPhase());
new
PhasorTest().addPhasorTask(phasor,2000);
new
PhasorTest().addPhasorTask(phasor,4000);
new
PhasorTest().addPhasorTask(phasor,6000);
phasor.arriveAndDeregister();
Thread.sleep(10000);
System.out.println(phasor.getPhase());
phasor.register();
publicvoid run ()
System.out.println (Thread.currentThread().getName());
phasor.arriveAndAwaitAdvance();
41
try {
Thread.sleep(sleeptime);
}catch(InterruptedException e)
{
e.printStackTrace();
}.start();
42
as, ConcurrentModificationException and ArrayIndexOutOfBo
undsException), there may be silent data loss or your program
may even stuck in an endless loop.
package com.java8.tutorial.threads;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
try {
blocking_queue.put(i);
} catch (InterruptedException e) {
e.printStackTrace();
44
}
};
service.execute(producer);
Integer j = 0;
do {
try {
j=blocking_queue.take();
} catch (InterruptedException e) {
e.printStackTrace();
while (j != 100);
service.shutdown();
};
service.execute(consumer);
Completion Service
45
A completion service is an implementation of
the java.util.concurrent.CompletionService<V> interface that
decouples the production of new asynchronous tasks (a
producer) from the consumption of the results of completed
tasks (a consumer). V is the type of a task result.
package com.java8.tutorial.threads;
import java.math.BigDecimal;
import java.math.MathContext;
import java.math.RoundingMode;
46
import java.util.concurrent.Callable;
import java.util.concurrent.CompletionService;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorCompletionService;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
ExecutorService es = Executors.newFixedThreadPool(10);
CompletionService<BigDecimal> cs =
new ExecutorCompletionService<BigDecimal>(es);
cs.submit(new CalculateE(17));
cs.submit(new CalculateE(170));
System.out.println(result.get());
System.out.println();
result = cs.take();
47
System.out.println(result.get());
es.shutdown();
this.lastIter = lastIter;
@Override
48
BigDecimal res = BigDecimal.ONE.divide(factorial,
mc);
result = result.add(res);
return result;
if (n.equals(BigDecimal.ZERO))
return BigDecimal.ONE;
else
return n.multiply(factorial(n.subtract(BigDecimal.ONE)));
49
One terminal operation that generates a result that could be
either a simple object, array, collection, map, or anything
else
publicclass MapCollect {
Path path
=FileSystems.getDefault().getPath("C://Users//malidee//thread_
strem.txt");
try {
List<String>lines =
Files.readAllLines(path,StandardCharsets.UTF_8);
List<String[]>records=
lines.parallelStream().skip(1).map(t ->t.toUpperCase()).map(l -
>l.split(";"))
50
.collect(Collectors.toList());
Iteratoriter = records.iterator();
while (iter.hasNext()){
for(inti=0;i<str.length;i++)
System.out.print( str[i]);
System.out.println();
} catch (IOException e) {
e.printStackTrace();
51
CHAPTER 3 - NETWORK PROGRAMMING USING
JAVA
Getting Started
The important factors that have been the driving forces for more
network applications include the availability of faster networks
with greater bandwidth. This has made it possible to transmit
wider ranges of data, such as video streams. In recent years, we
have seen an increase in connectivity, whether it has been for
new services, more extensive social interactions, or games.
Knowing how to develop network applications is an important
development skill.
Network Addressing
InetAddress address =
52
InetAddress.getByName("www.google.com");
System.out.println(address);
System.out.println("CanonicalHostName: "
+ address.getCanonicalHostName());
System.out.println("HostAddress: " +
address.getHostAddress());
System.out.println("HostName:"+address.getHostName());
53
getNetworkInterfaces: This provides an enumeration of
available interfaces
try {
Enumeration<NetworkInterface> interfaceEnum =
NetworkInterface.getNetworkInterfaces();
for(NetworkInterface element :
Collections.list(interfaceEnum)) {
System.out.printf("%-8s %-32s\n",
element.getName(), element.getDisplayName());
// Handle exceptions
54
such as HTTP, or FTP. For example, the following two URLs
use different protocols. The first one uses the HTTPS protocol,
and the second one uses the FTP protocol:
https://www.packtpub.com/
ftp://speedtest.tele2.net/
[scheme:] scheme-specific-part
There are many schemes that are used with a URI, including:
55
HTTP: This is commonly used for websites
URI("https://www.packtpub.com/books/content/support");
uri = new
URI("https","en.wikipedia.org","/wiki/URL_normalization",
"Normalization_process");
There are several ways of creating a URL instance. The easiest
is to simply provide the URL of the site as the argument of the
56
class' constructor. This is illustrated here where a URL instance
for the Packtpub website is created:
"News-Center/index.php");
57
InetAddress names[] =
InetAddress.getAllByName("www.google.com");
for(InetAddress element : names) {
System.out.println(element);
}
NIO Support
try {
URL url = new URL("http://www.google.com");
URLConnection urlConnection = url.openConnection();
BufferedReader br = new BufferedReader(
new InputStreamReader(urlConnection.getInputStream()));
String line;while ((line = br.readLine()) != null) {
System.out.println(line);
}
br.close();
} catch (IOException ex) {
// Handle exceptions
}
59
We can rework the previous example to illustrate the use of
channels and buffers. The URLConnection instance is created
as before. We will create a ReadableByteChannel instance and
then a ByteBuffer instance, as illustrated in the next example.
The ReadableByteChannel instance allows us to read from the
site using its read method. A ByteBuffer instance receives data
from the channel and is used as the argument of
the read method. The buffercreated holds 64 bytes at a time.
try {
InputStream inputStream =
urlConnection.getInputStream();
ReadableByteChannel channel =
Channels.newChannel(inputStream);
System.out.println(new String(buffer.array()));
buffer.clear();
60
}
channel.close();
// Handle exceptions
61
continue executing and not block. Using this object, you can use
one of the following methods:
62
protocol, such as the Hypertext Transfer Protocol (HTTP), is
used. For simpler architectures, a series of text messages are
sent back and forth.
/*
63
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package simpleechoserver;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.logging.Level;
import java.util.logging.Logger;
/**
*
* @author deepakmali
*/
public class SimpleEchoServer {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
try {
64
System.out.println("Waiting for the Connection....");
System.out.println("Connection Established");
System.out.println(reader.readLine());
String inputLine;
while((inputLine=reader.readLine())!=null){
System.out.println(inputLine);
writer.write(inputLine);}
Logger.getLogger(SimpleEchoServer.class.getName()).log(Lev
el.SEVERE, null, ex);
}
}
/*
* To change this license header, choose License Headers in
Project Properties.
65
* To change this template file, choose Tools | Templates
*/
package simpleechoclient;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.io.PrintWriter;
import java.net.Inet4Address;
import java.net.InetAddress;
import java.net.Socket;
import java.net.UnknownHostException;
import java.util.Scanner;
import java.util.function.Supplier;
import java.util.logging.Level;
import java.util.logging.Logger;
/**
* @author deepakmali
66
*/
/**
*/
try {
System.out.println(localAddress.getHostAddress());
try {
System.out.println("Connected to Server
...."+clientSocket.getRemoteSocketAddress());
67
while(true){
try {
return reader.readLine();
catch(IOException e)
{return null;
}; */
//System.out.println(inputline);
if ("quit".equalsIgnoreCase(readline))
break;
writer.println(readline);
68
}
Logger.getLogger(SimpleEchoClient.class.getName()).log(Lev
el.SEVERE, null, ex);
Logger.getLogger(SimpleEchoClient.class.getName()).log(Lev
el.SEVERE, null, ex);
public WebServer() {
System.out.println("Webserver Started");
while (true) {
69
Socket remote = serverSocket.accept();
System.out.println("Connection made");
ex.printStackTrace();
new WebServer();
Mac users may encounter an error when using port 80. Use
port 3000 or 8080 instead. Threads are concurrently executing
sequences of code within a process. In Java, a thread is created
using the Thread class. The constructor's argument is an object
that implements the Runnable interface. This interface consists
of a single method: run. When the thread is started using
the start method, a separate program stack is created for the new
thread, and the runmethod executes on this stack. When
the run method terminates, the thread terminates.
The ClientHandler class, shown next, implements
the Runnable interface. Its constructor is passed to the socket
70
representing the client. When the thread starts, the run method
executes. The method displays, starting and terminating
messages. The actual work is performed in
the handleRequest method:
this.socket = socket;
@Override
this.socket);
handleRequest(this.socket);
+ this.socket + "\n");
71
In the code that follows, the input and output streams are
created and the first line of the request is read.
The StringTokenizer class is used to token this line. When
the nextToken method is invoked, it returns the first word of the
line, which should correspond to an HTTP method:
new InputStreamReader(socket.getInputStream()));) {
StringTokenizer tokenizer =
new StringTokenizer(headerLine);
...
} catch (Exception e) {
e.printStackTrace();
72
The next code sequence handles the GET method. A message is
displayed on the server side to indicate that a GET method is
being processed. This server will return a simple HTML page.
The page is built using the StringBuilderclass where
the append methods are used in a fluent style.
The sendResponse method is then invoked to actually send the
response. If some other method was requested, then a 405 status
code is returned:
if (httpMethod.equals("GET")) {
responseBuffer
.append("</html>");
} else {
73
If we wanted to handle other methods, then a series of else-if
clauses can be added. To further process the GET method, we
will need to parse the remainder of the initial request line. The
following statement will give us a string that we can process:
String statusLine;
new DataOutputStream(socket.getOutputStream());) {
...
out.close();
74
// Handle exception
if (statusCode == 200) {
+ responseString.length() + "\r\n";
out.writeBytes(statusLine);
out.writeBytes(serverHeader);
out.writeBytes(contentTypeHeader);
out.writeBytes(contentLengthHeader);
out.writeBytes("\r\n");
out.writeBytes(responseString);
out.writeBytes(statusLine);
75
out.writeBytes("\r\n");
} else {
out.writeBytes(statusLine);
out.writeBytes("\r\n");
public HTTPClient() {
try {
InetAddress serverInetAddress =
InetAddress.getByName("127.0.0.1");
76
try (OutputStream out = connection.getOutputStream();
BufferedReader in =
new BufferedReader(new
InputStreamReader(
connection.getInputStream()))) {
sendGet(out);
System.out.println(getResponse(in));
ex.printStackTrace();
...
new HTTPClient();
77
bytes. The String class's getBytes method returns this array of
bytes:
try {
out.write("GET /default\r\n".getBytes());
out.write("User-Agent: Mozilla/5.0\r\n".getBytes());
ex.printStackTrace();
try {
String inputLine;
response.append(inputLine).append("\n");
return response.toString();
78
} catch (IOException ex) {
ex.printStackTrace();
return "";
79
A MulticastServer class is declared next, where
a DatagramSocket instance is created. The try-catch blocks will
handle exceptions as they occur:
try {
The body of the try block uses an infinite loop to create an array
of bytes to hold the current date and time. Next,
an InetAddress instance representing the multicast group is
created. Using the array and the group address,
a DatagramPacket is instantiated and used as an argument
80
to the DatagramSocket class' send method. The data and time
sent is then displayed. The server then pauses for one second:
while (true) {
buffer = dateText.getBytes();
DatagramPacket packet;
group, 8888);
serverSocket.send(packet);
try {
Thread.sleep(1000);
// Handle exception
81
the client must use the same group address and port number.
Before it can receive messages, it must join the group using
the joinGroup method. In this implementation, it receives 5 date
and time messages, displays them, and then terminates.
The trim method removes leading and trailing white space, from
a string. Otherwise, all 256 bytes of the message will be
displayed:
InetAddress group =
InetAddress.getByName("224.0.0.0");
socket.joinGroup(group);
DatagramPacket packet =
socket.receive(packet);
System.out.println(received.trim());
82
}
socket.leaveGroup(group);
// Handle exception
Scalability
this.clientSocket = clientSocket;
83
}
...
The main method creates the server socket as before, but when
a client socket is created, the client socket is used to create a
thread, as shown here:
while (true) {
clientSocket = serverSocket.accept();
ThreadedEchoServer tes =
new ThreadedEchoServer(clientSocket);
new Thread(tes).start();
// Handle exceptions
}
84
The actual work is performed in the run method as shown next.
It is essentially the same implementation as the original echo
server, except that the current thread is displayed to clarify
which threads are being used:
@Override
+ Thread.currentThread() + "]");
new InputStreamReader(
clientSocket.getInputStream()));
clientSocket.getOutputStream(), true)) {
String inputLine;
out.println(inputLine);
}
System.out.println("Client [" + Thread.currentThread()
+ " connection terminated");
85
} catch (IOException ex) {
// Handle exceptions
}
}
Thread-per-request
Thread-per-connection
Thread-per-object
Network Security
86
start with a brief overview of many of these terms. In later
sections of this chapter, we will go into more details about the
ones that are relevant to our discussion.
87
Server and client authentication
Data encryption
Data integrity
try {
SSLServerSocketFactory.getDefault();
ServerSocket serverSocket =
88
ssf.createServerSocket(8000);
System.out.println("SSLServerSocket Started");
socket.getOutputStream(), true);
new InputStreamReader(
socket.getInputStream()))) {
System.out.println(line);
out.println(line);
br.close();
System.out.println("SSLServerSocket Terminated");
// Handle exceptions
89
// Handle exceptions
System.out.println("SSLClientSocket Started");
SSLSocketFactory sf =
(SSLSocketFactory) SSLSocketFactory.getDefault();
socket.getOutputStream(), true);
new InputStreamReader(
socket.getInputStream()))) {
90
while (true) {
if ("quit".equalsIgnoreCase(inputLine)) {
break;
out.println(inputLine);
br.readLine());
System.out.println("SSLServerSocket Terminated");
}
}
}
If we executed this server followed by the client, they will abort
with a connection error. This is because we have not provided a
set of keys that the applications can share and use to protect the
data passed between them.
91
Within the Java SE SDK's bin directory is a program
titled keytool. This is a command-level program that will
generate the necessary keys and store them in a key file. In
Windows, you will need to bring up a command window and
navigate to the root directory of your source files. This directory
will contain the directory holding your application's package.
ou will also need to set the path to the bin directory using a
command that is similar to the following one. This command is
needed to find and execute the keytool application:
[Unknown]: packt
[Unknown]: publishing
[Unknown]: home
92
What is the name of your State or Province?
[Unknown]: calm
[Unknown]: me
[no]: y
With the keystore created, you can run the server and client
applications. How these applications are started depends on
how your projects have been created. You may be able to
execute it from an IDE, or you may need to start them from a
command window.
java -Djavax.net.ssl.keyStore=keystore.jks -
Djavax.net.ssl.keyStorePassword=123456
packt.SSLServerSocket
java -Djavax.net.ssl.trustStore=keystore.jks -
Djavax.net.ssl.trustStorePassword=123456
packt.SSLClientSocket
93
If you want to use an IDE, then use the equivalent settings for
your runtime command arguments. The following one
illustrates one possible interchange between the client and the
server. The output of the server window is shown first, followed
by that of the client:
SSLServerSocket Started
SSLServerSocket Terminated
SSLClientSocket Started
SSLServerSocket Terminated
94
CHAPTER 4- MICROSERVICES USING SPRING
CLOUD
Getting Started
95
The issue here is earlier we used to support only web interfaces
as the client interfaces but now we have proliferation of client
technologies .The figure below depicts the scene .The
controllers which were designed for the web interfaces were not
designed for these other channels. There are also a lot of
superior and numerous backend technologies .Monolithic code
base is also tough to be managed by larger and bigger teams
working in parallel. Microservices allow the team to work on
separate business functionalities (cross functional teams ) .We
are also stuck in monolithic applications in case we want to use
better technologies for some pieces of the functionalities .
Advantages :
96
Drawbacks:
97
their communication should be dump. Thus the advantages of
microserivces are as follows:
Advantages:
Disadvantages :
Spring Boot
98
generation .It gives Easier dependency management .Its about
automatic configuration and provides different build options
.We can go to start.spring.io to download the Spring Initializer
and can be used in any IDE.The following demonstrate the set
up screen .
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
- <plugin>
<groupId>org.springframework.boot</groupId>
99
<artifactId>spring-boot-maven-plugin</artifactId>
- <dependencies>
- <dependency>
<groupId>org.springframework</groupId>
<artifactId>springloaded</artifactId>
<version>1.2.0.RELEASE</version>
</dependency>
</dependencies>
</plugin>
100
We can add the dependency Spring-boot-starter-web in pom to
turn it to a web application and run it as a Java application .The
test controller we may write looks like below :
package demo.controller;
import org.springframework.stereotype.Controller;
import
org.springframework.web.bind.annotation.RequestMapping;
@Controller
@RequestMapping("/hi")
return "hello";
101
It will respond back to localhost:8080/hi with hello .The
following actions are taken here : (Spring Boot will launch
Tomcat on its own )
102
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-thymeleaf</artifactId>
@Controller
@RequestMapping("/hi/{name}")
model.put("name",name);
return "hello";
package demo.domain;
103
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
@Entity
@Id
@GeneratedValue
Long id;
String name;
String position;
public Player() {
super();
this();
this.position = position;
this.name = name;
return id;
104
}
this.id = id;
return name;
this.name = name;
return position;
this.position = position;
package demo.domain;
import java.util.Set;
import javax.persistence.CascadeType;
105
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.JoinColumn;
import javax.persistence.OneToMany;
@Entity
@Id
@GeneratedValue
Long id;
String name;
String location;
String mascotte;
@OneToMany(cascade=CascadeType.ALL)
@JoinColumn(name="teamId")
Set<Player> players;
public Team() {
super();
106
public Team(String location, String name, Set<Player>
players) {
this();
this.name = name;
this.location = location;
this.players = players;
return id;
this.id = id;
return name;
this.name = name;
107
return location;
this.location = location;
return mascotte;
this.mascotte = mascotte;
return players;
this.players = players;
package demo.controller;
108
import org.springframework.stereotype.Controller;
import
org.springframework.web.bind.annotation.RequestMapping;
@RRestController
@RequestMapping("/hi")
return team;
109
In case we need the response in XML , annotate the domain
class with JAXB .Next we will add some JPA capability by
adding the spring-boot-starter-data-jpa (<artifactId>spring-boot-
starter-data-jpa</artifactId>) which adds Spring Transaction
management, adds Spring ORM, adds Hibernate /Entity
manager and Spring Data JPA .It will not bring Data base driver
.For the DAO layer, we can provide the interface and Spring
Data will implement it for us .
<groupId>org.hsqldb</groupId>
<artifactId>hsql-db</artifactId>
110
We need to create the repository objects for Team and Player as
follows : Crud Repository is from Spring data repository
package .
package demo.repository;
import org.springframework.data.repository.CrudRepository;
import
org.springframework.data.rest.core.annotation.RestResource;
import demo.domain.Team;
@RestResource(path="teams", rel="team")
In the application class ,we can declare the Crud repository for
Team and Player and we can Autowire them as follows .
111
The controller code can be written to fetch the Team based on
the id or name .
package demo.controller;
import
org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PathVariable;
import
org.springframework.web.bind.annotation.RequestMapping;
import
org.springframework.web.bind.annotation.RestController;
import demo.domain.Team;
import demo.repository.TeamRepository;
@RestController
@RequestMapping("/teams")
@RequestMapping("/teams/{id}")
return teamRepository.findOne(id);
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-rest</artifactId>
114
Thus when the applications starts the RestResource annotations
are interpreted and Spring Data creates the Controllers and
Request Mappings .In case we provide the crud repository for
Players , the output will return us the links for payers as follows
:
package demo.repository;
import org.springframework.data.repository.CrudRepository;
import
org.springframework.data.rest.core.annotation.RestResource;
import demo.domain.Player;
@RestResource(path="players", rel="player")
115
Spring Cloud
116
Etc,Netflix became the trailblazers in cloud computing and
choose to publish many general use technologies as open source
projects .Spring team realized that Netflix has addressed quite a
few key architectural items in the cloud environment .Spring
team realized the work already done by the Netflix team easily
consumable .Spring cloud project are based on Spring Boot .We
need to use the spring-cloud-starter-parent and spring-cloud
starter- dependencies .
118
Fig 4.5 Spring Config
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-config-server</artifactId>
spring:
cloud:
config:
server:
git:
uri: https://github.com/malidee/Microservices-With-
Spring-Student-Files
searchPaths: ConfigData
# "native" is used when the native profile is active, for
local tests with a classpath repo:
119
native:
searchLocations: classpath:offline-repository/
server:
port: 8001
3. In the application class just add the @EnableConfigServer
annotation .
120
Service Discovery with Spring Cloud Eureka
Service discovery means that the client has discovered the other
clients though registering .When we use microservices ,there is
a huge number of inter service calls.Service discovery provides
a single lookup service .Possible solutions in the space are
Eureka ,Zookeeper ,etc .Eureka comes from Netflix .Eureka
provides a lookup server .Client services register with
Eureka.by providing meta data on host ,port,health
indicatorURL ,etc. and sends heatbeats to Eureka .By enabling
the @EnableEurekaServer on the spring boot application
configuration class , we can make a eureka server (as follows)
.Generally multiple eureka servers run simultaneously and they
share a state with each other .
121
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-eureka-server</artifactId>
eureka:
client:
serviceUrl:
defaultZone:
http://localhost:8010/eureka/,http://localhost:8011/eureka/,http:/
/localhost:8012/eureka/,http://localhost:8013/eureka/
122
Fig 4.6 Spring Cloud Ribbon
123
1. Use the Spring Cloud Starrter parent as a Parent pom.
124
In the Spring configuration classes ,we need to put an
annotation @EnableFeignClients so that during the
application start Feing libraries can provide a runtime
implementation of whatever we have configured.The below
diagram illustrates this .Feign integrates very well with Ribbon
and Eureka.
126
We can fine tune the failure detection and recovery mode as
follows:
127
CHAPTER 5 - BIG DATA TECHNOLOGY (HADOOP)
Getting Started
Larger will be the data, better will be the result of the data
analytics algorithm. A simple algorithm on a large data set will
yield better result than a sophisticated algorithm on smaller data
128
set. More will be the attributes better will be the result .The data
should be write once and read many times .As pert the
international data conference 6th annual study the data will
increase by 300 times from 2005 to 2020 from 130 exabytes to
4000 exabytes and 33% of the data will be usesul as compared
to 25% used today Distributed computing is the core concept to
Hadoop.
Advantages of Hadoop
Hadoop Releases
X = major release
Y = minor release
Z = point release
130
Fig 5.2 Features in different releases
HDFS
131
idea behind large block size is to keep the seek time 1% of the
transfer rate.
HDFS Architecture
132
name node is backed up by Remote NFS mount .There is also a
secondary name node where the name node can transfer the
edit logs in case the space is a constraint through check points
.Name node and secondary name nodes are Java programs .In
case of name node fails, the administrator has to reboot the
name node .In case of failure the machine running the
secondary name node is often the best candidate for name node
.The secondary name node in this case takes the information
from NFS mount before starting to act as Name node .As a rule
of thumb, 1000 MB per million storage blocks is required for
the name node main memory .
133
hardware failures and we should always spent more on the
name node .Name node is backed up with a remote NFS mount
and there is also a secondary name node feature .It donesnt
function like a name node ..Its only purpose is to merge the edit
logs and write it to a file system. And create check points on the
combined name space and edit log .Name node and secondary
name nodes are java programs .In case of failure of the name
node Hadoop admin has to boot up a new name node .in case of
failure the secondary name node machine is often the best
choice to be booted as the name node .In the later releases,
name node has been made more resilient .Name node should
have enough main memory to manage the pool of data blocks in
the cluster .1000 mb per million storage blocks is the thumb
rule.
HDFS client is a JVM has to run on the node that runs hdfc
.the client will first communicates with name node and name
node makes certain checks like if the file exists or not or the
clinet has correct permission levels or not .(dfs.replicationfacot
is set to 3 )
The name node returns the data nodes to be copied on .The clist
connects to the first dat anode and ask for the subsequent
pipeleine to copy on other data nodes. Data node acknowledges
the copy .and this goes on till the files files is copied on the
HDFS
In order to wirte the dat on the data node the first node is either
the node on which the client resides or the a given rack.. such
that the node is not overly loaded .the second node is chosed off
134
the rack and the third node is chosen on the same rask on which
the second node was cohhsen which forms the pipeline
.selection and replication happens behind the scenes . node
distance is related to the bandwidth .
135
Fig 5.5 HDFS Write
Block on the different rack but the same data centre distance
=4
The client sends the read request to the name node .The name
node sends the data blocks containing the first few blocks
.Name node will returns the blocks starting with the closest
node to the farthest .The client will start to read the blocks one
by one.In case a data node fails client makes a note of it and
that node is not picked for later reads.
136
Fig 5.6 HDFS Read
A new name node can be added and the file tree structures and
the doc pool can be divided amongst the name nodes .Thus each
name node has to manage only the pool of blocks it is
associated with and not the complete pool .Data nodes can be
associated with multiple name nodes .Name nodes wont
communicate with each other and failure of one wont effect the
other.
137
Fig 5.7 HDFS Federation
High Availability
It is the time taken to come back to the stable state in case of the
name node failure .To address this the name node is always
running on stand-by .The primary name node and the secondary
name node shares the name space and the edit logs via highly
available NFS mount. In future Zoo Keeper would be used to
transition from the primary to the secondary one .Data nodes
must send reports to both the name nodes .The reserve node
fences the primary node when it takes over wherein the standby
node will revoke the shared access and disable the network port.
138
Map reduce
Split is a fixed chunk of data that will act as an input to the map
task .Blocks belong to the HDFS world and splits belong to the
Map reduce world. All map jobs run in parallel and produce
output. All the results are merged, shuffled and act as an input
to the reduce job .The whole job execution is controlled by Job
Tracker and Task Tracker. A name node in HDFS is an
analogues to job tracker in map reduce and data node in HDFS
is analogues to task tracker in map reduce .Task tracker runs the
map and reduce job on data nodes .Job Tracker schedule tasks
for the task trackers. Job Tracker and task tracker run as Java
Jobs and they are not the hardware .It is wiser to spend on the
hardware that runs task tracker .
139
Fig 5.8 Map Reduce Paradigm
The map jobs get their inputs which are local to the data nodes
.This is called as data locality else latency will be added to the
network .Hence optimal size of the block is equal to the split
size .Map task write the output on the local disk and not on
HDFS with replications since its only the intermediate result
.Job tracker cleans the map output only after the successful
completion of the reduce job .All the maps output is merged,
sorted and partitioned .Reducer wont get the data locally .It
would be fetched from the network .Reducers output is written
to HDFS for reliability .
140
Fig 5.10 Map Reduce Classes
Input to map is split and split will contain many records .The
map functions has input in the form of keys and values and
output in the form of keys and values as well .The key is unique
to every record. Map produces one or more key value pairs. The
input to map will have unique keys .The output may have non-
unique keys .This will be helpful since we will sort the values
based on key values and would like to make sense of values
with the same keys. One particular reducer must get all the
values for particular keys .This is managed by Hadoop and
programmer need not manage anything here .The input to
reducer has list of values associated to single key .The sequence
of values is not much important here .The output of the reducer
will be sorted since its receiving the inputs in the sorted manner.
The output of the reducer depends on the inputs to the map and
subsequently on the keys in the input to the map.
Err 1
Forgive 1
Human 1
Is 2
To 2
142
Fig 5.11 Map Reduce functioning
Lets take an example of the code base for writing the map,
reduce functions.
importjava.io.IOException;
importjava.util.StringTokenizer;
The following are the import statements for key and value
datatypes.These are Hadoop datatypes.LongWritable is
something similar to Long in Java .Text is analogues to String
in Java .Intwritable is a datatype similar to Int in Java
143
importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.LongWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Mapper;
publicclassWordCountMapper
Every map class will extent the Mapper class and override the
map method .
LongWirtable and Text are input key and value data types
followed by the output key and value data types .
privatefinalstaticIntWritableone = newIntWritable(1);
@Override
throwsIOException, InterruptedException {
StringTokenizeritr = newStringTokenizer(line);
144
while (itr.hasMoreTokens()) {
word.set(itr.nextToken().toLowerCase());
if(Character.isAlphabetic((word.toString().charAt(0)))){
context.write(word, one);
importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Reducer;
145
publicclassWordCountReducer
@Override
Context context)
throwsIOException, InterruptedException {
intsum = 0;
sum += value.get();
context.write(key, newIntWritable(sum));
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Job;
importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat
;
146
importorg.apache.hadoop.mapreduce.lib.output.FileOutputFor
mat;
publicclassWordCount {
if (args.length != 2) {
System.exit(-1);
Driver class set up the job so that Haddop can take up from that
point and execute as specifcied by the programmer .
job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
we set the input and output file paths for the job through
command line arguments .
147
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
We finally set the out key and value parameters by setting the
setOutputKeyClass and setOutputValueClass.This is the output
key and value type of the reducer .
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
The input is supplied via the input file name and the output
directory contains the output .
publicclassWordCountWithCombinerextends Configured
implements Tool{
@Override
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
149
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
returnjob.waitForCompletion(true) ? 0 : 1;
System.exit(exitCode);
151
network .De-serialization is the reverse process where the byte
stream is converted back into structured objects .All the
communications between map and reduce happens via RPC or
remote procedure calls .
Partition function
@Override
job.setJarByClass(getClass());
returnjob.waitForCompletion(true) ? 0 : 1;
}
152
publicstaticvoid main(String[] args) throws Exception {
System.exit(exitCode);
job.setMapperClass(Mapper.class);
153
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setPartitionerClass(HashPartitioner.class);
job.setNumReduceTasks(1);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setInputFormatClass(TextInputFormat.class);
returnjob.waitForCompletion(true) ? 0 : 1;
154
runs the map and reducer jobs . Task tracker sends reports to
the Job Tracker periodically.
YARN
155
Client
YARN Client
The Job get submitted to Job client .Job Client requests for a
new application ID. It checks if the output directory is already
created .If it is it will throw and error and stop there itself .It
verifies the input directory and copies the resources to HDFS
with a high replication .It finally submits the application to
resource Manager .As soon as the resource Manager picks up
the new job , it will contact a Node Manager to start a new
container and launch a new Application master for the job
.Application Master creates an object for book keeping
purposes and task management purposes .It retrieves the split
from HDFS and creates one task per split .Then Application
master decides how to run the map reduce task .If the task is
small , it may be run on the same JVN itself .These small tasks
are known as Uber Tasks .If the task is not Uber , it requests the
resource manager to allocate the resources .Scheduler considers
data locality while assignment .In case its not able to find a
node which is Rack Local , it allocates any node randomly
.Application master then contacts the node manager to launch a
container for task executions .Then the YARN child is launched
.it runs on a separate JVM.Then YARN child retrieves all the
resources from HDFS , localizes them and run the Map Reduce
Task .YARN child sends the progress report every 3 seconds to
Application Master which aggregates the progress and updates
156
the client directly .In the completion phase , application master
and task Container clean up the intermediate data and
terminates itself on job completion .There can be some failure
scenarios namely
157
Fig 5.14 YARN functioning
Pig
158
Embedded mode (Pig commands can be embedded into a
Java Program)
2. Environment to run the Pig Programs Its a tar file that
needs to be installed at the client node which translates the
Pig queries into Map Reduce jobs .The environment can
have the following 2 types of execution a. Local Mode
Execution (runs on a single JVM) b. Map Reduce Mode
(Translate to map reduce program and connects to Hadoop
and runs it on the Hadoop cluster)
159
Hive
160
drop command for extended tables , the entry gets deleted from
the metastore and the data still resides there .The operations
discussed are shown below :
Sqoop
163
CHAPTER 6 - CRYPTO-ALGORITHMS IN JAVA
Getting Strarted
Crypto Algorithms
Java 1.8 includes all state of the art crypto algorithms .We used
apache codec which includes hash encoding base 64 which
most java platforms do not include .Java cryptography
framework uses the factory pattern in the form of message
digest for creating hashes.
Simple Hash
package javacryptography;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
164
import java.util.logging.Level;
import java.util.logging.Logger;
/**
*
* @author deepakmali
*/
public class SimpleHash {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
String str = "Test String";
try {
MessageDigest md =MessageDigest.getInstance("MD5");
System.out.println(hash.length);
Logger.getLogger(SimpleHash.class.getName()).log(Level.SE
VERE, null, ex);
165
} catch (UnsupportedEncodingException ex) {
Logger.getLogger(SimpleHash.class.getName()).log(Level.SE
VERE, null, ex);
Input and output both are in bytes format for the hash function
and commons codec will display the result of the hash function
.Digest method produces a direct output.The method
getByptes() is very error prone since it depends on the default
encoding of the JVM running.Its good to mention the encoding
we want to use with getBytes() method- in our case its UTF -8 .
Lets now write a code to verify the MD5 checksum for the
apache-codec file downloaded from apache website .We will
create the hash value for the file .We will create the helper
method to read the bytes from the file .It will read a file or any
other input stream and return a byte array. Following is the
snippet.
package javacryptography;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
166
import java.security.NoSuchAlgorithmException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.commons.codec.binary.Hex;
/**
* @author deepakmali
*/
String path =
"C:\\Users\\deepakmali\\Desktop\\Aadhar\\commons-codec-
1.10-bin.zip";
try {
MessageDigest md =MessageDigest.getInstance("MD5");
System.out.println(hash.length);
System.out.println (hash_encoded);
Logger.getLogger(SimpleHash.class.getName()).log(Level.SE
VERE, null, ex);
package javacryptography;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
/**
* @author deepakmali
*/
while (true)
length= is.read(buffer);
if (length <0)
break;
bos.write(buffer,0,length);
is.close();
return bos.toByteArray();
We want to work stream based in case the file is huge .We will
now work on the file stream hash We make use of an Index
based method to update the digest . In cryptography ,
authentication codes are abbreviated as MAC .We should not
confuse them with the Ethernet address .Mac can also be
realized using hash functions as a basis. In SSL ,a HMAC is
used to build a MAC and is used to hash a secret document .
Secret can be arbitrary byte vector or password based .We need
to adhere a secret to the hash using an init method .We will still
get a 16 bytes output but its now a password based hash (a
MAC)We can use other hash functions and also once the
169
password is change the result is going to change .For MAC , we
can use the SHA256 algorithm .Crypto-specialists have created
the algorithm which is very mature and easy to use .The
following are the snippet for File Stream Hash and MAC.
import java.io.FileInputStream;
import java.io.IOException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.commons.codec.binary.Hex;
/**
* @author deepakmali
*/
String path =
"C:\\Users\\deepakmali\\Desktop\\Aadhar\\commons-codec-
1.10-bin.zip";
170
try {
MessageDigest md =MessageDigest.getInstance("MD5");
int length ;
while (true)
length= fis.read(buffer);
if (length <0)
break;
md.update(buffer,0,length);
fis.close();
System.out.println(hash.length);
System.out.println (hash_encoded);
171
Logger.getLogger(SimpleHash.class.getName()).log(Level.SE
VERE, null, ex);
}
}
}
package javacryptography;
import java.io.UnsupportedEncodingException;
import java.security.InvalidKeyException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import org.apache.commons.codec.binary.Hex;
/**
*
* @author deepakmali
*/
public class SimpleMac {
public static void main(String[] args) {
try {
172
// Mac mac = Mac.getInstance("HMACMD5");
System.out.println(hash.length);
System.out.println (hash_encoded);
Logger.getLogger(SimpleHash.class.getName()).log(Level.SE
VERE, null, ex);
Logger.getLogger(SimpleHash.class.getName()).log(Level.SE
VERE, null, ex);
Logger.getLogger(SimpleMac.class.getName()).log(Level.SEV
ERE, null, ex);
}
}
}
173
PBKDF2
package pbkdf2_test;
/**
*
* @author deepakmali
*/
import java.util.Arrays;
/**
* This class demonstrates the usage of PBKDF2 secure
password hash creation
* and allows to play around with the input parameters (***)
* for better understanding.
*
174
* For production usage, please note the "TODO" tags!
*
* Works for Java and Android.
*
*/
public class PBKDF2Example {
public static void main(String[] args) throws Exception{
String demo_password = "Password";
//TODO: Has to be
//1)generated by secure random
//2) individual per case
//3) with appropriate length
byte[] demo_salt = {
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,
//1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,1
6, //***
};
175
//demo_size+=8; //***
//TODO: Salt + Iteration count
+ Size have to be stored with the hash to recalculate the hash
later
// -> Iteration count + Size
can be constant values, the salt has to be stored individually
//--------------
//---CREATION---
//--------------
PBKDF2 hashgen = new PBKDF2();
hashgen.init(demo_password.getBytes
("UTF8"), demo_salt, demo_iterationcount, demo_size);
byte[] rawhash =
hashgen.generateDerivedParameters();
*
org.bouncycastle.crypto.generators.PKCS5S2ParametersGenera
tor
177
* HmacSHA512 is included in JRE 1.8++. This class also runs
below 1.8.
* (http://www.bouncycastle.org)
* <p>
178
* copies of the Software, and to permit persons to whom the
Software is
* <p>
* <p>
* SOFTWARE.
179
*
* <p>
* href=http://www.rsasecurity.com/rsalabs/pkcs/pkcs-
5/index.html> RSA's PKCS5
* Page</a>
*/
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
public PBKDF2() {
try {
180
hMac = Mac.getInstance("HmacSHA512");
} catch (Exception e) {
e.printStackTrace();
state = new
byte[hMac.getMacLength()];
if (c <= 0) {
throw new
IllegalArgumentException("iteration count must be at least 1.");
if (S != null) {
hMac.update(S, 0, S.length);
hMac.update(iBuf, 0, iBuf.length);
hMac.doFinal(state, 0);
181
hMac.update(state, 0, state.length);
hMac.doFinal(state, 0);
out[outOff + j] ^= state[j];
int outPos = 0;
int pos = 3;
while (++iBuf[pos] == 0) {
--pos;
182
}
outPos += hLen;
return outBytes;
keysize = keysize / 8;
this.password = password;
this.salt = salt;
this.iterationCount = iterationCount;
this.keysize = keysize;
byte[] dKey =
generateDerivedKey(keysize);
183
if (dKey.length > keysize){
dKey = dk2;
return dKey;
184
CHAPTER 7 MODULARITY IN JAVA
Getting Started
Meaning of Modularity
185
Pre-Java9 Era
188
had to be untangled. Going forward, this effort will definitely
pay off in terms of development speed and increased flexibility
for the JDK.
Module Descriptors
module java.prefs {
requires java.xml;
exports java.util.prefs;
189
Lets move on to the body of the module descriptor
for java.prefs. Code in java.prefs uses code from java.xml to
load preferences from XML files. This dependency must be
expressed in the module descriptor. Without this dependency
declaration, the java.prefs module would not compile (or run).
Such a dependency is declared with the requires keyword
followed by a module name, in this case java.xml. The implicit
dependency on java.base may be added to a module descriptor.
Doing so adds no value, similar to how you can (but generally
dont) add "import java.lang.String" to code using Strings.
Readability
Accessibility
190
the containing package is exported by the first module
191
break), the application can be compiled and executed exactly as
with Java 8. Simply add the artifact (and its dependencies if it
has any) to the classpath and call main. And voil, it runs!
module com.infoq.monitor {
jdeps-module ServiceMonitor.jar
This will list the packages our application uses and, more
importantly, from which modules they
come: java.base, java.logging, java.sql, javafx.base,
javafx.controls, javafx.graphics.
module com.infoq.monitor {
192
requires java.logging;
requires java.sql;
requires javafx.base;
requires javafx.controls;
requires javafx.graphics;
// no packages to export
jar -c \
--file=mods/com.infoq.monitor.jar \
--main-class=com.infoq.monitor.Monitor \
193
In stark contrast to the old model, there is a whole new
sequence to launch the application. We use the new -mp switch
to specify where to look for modules, and -m to name the one
we want to launch:
Kinds Of Modules
All Java code depends on Object and virtually all code uses
basic features like threading and collections. These types can be
found in java.base, which plays a special role; it is the only
module that the module system inherently knows about, and
since all code depends on it, all modules automatically read it.
194
Module Jars
Module Path
Module Graph
195
For our example, that looks like this:
196
Splitting Modules
com.infoq.monitor.stats
com.infoq.monitor.db
com.infoq.monitor.ui
com.infoq.monitor
Service Monitor
src
com.infoq.monitor
com ...
197
module-info.java
com.infoq.monitor.db
com ...
module-info.java
com.infoq.monitor.stats
com ...
module-info.java
com.infoq.monitor.ui
com ...
module-info.java
module com.infoq.monitor.stats {
requires java.logging;
exports com.infoq.monitor.stats.get;
198
To reiterate the accessibility rules: non-public types in
com.infoq.monitor.stats.get and all types in other packages are
completely hidden from all other modules. Even the exported
package is only visible to modules that read this one.
module com.infoq.monitor.db {
requires java.logging;
requires java.sql;
exports com.infoq.monitor.db.write;
}
module com.infoq.monitor.ui {
requires javafx.base;
requires javafx.controls;
requires javafx.graphics;
exports com.infoq.monitor.ui.launch;
exports com.infoq.monitor.ui.show;
199
Now that we covered the actual functionality, we can turn our
attention to the main module, which wires all of the parts
together. This module requires each of our three modules, plus
Javas logging facilities. Since the UI requires its
dependent module to work with JavaFX properties, it depends
on javafx.base as well. And since the main module is not used
by anybody else, it has no API to export.
module com.infoq.monitor {
requires com.infoq.monitor.stats;
requires com.infoq.monitor.db;
requires com.infoq.monitor.ui;
requires java.logging;
// no packages to export
200
Our modularization creates a very different module graph:
201
Like before and exactly as with Java 8, we compile the
modules source, write the resulting files into a subfolder of
classes with the modules name, and create a JAR in mods. Its
a modular JAR because the class files include the compiled
module descriptor module-info.class.
javac \
-mp mods \
-d classes/com.infoq.monitor \
jar -c \
--file=mods/com.infoq.monitor.jar \
--main-class=com.infoq.monitor.Monitor \
202
The JVM will start by looking for module com.infoq.monitor
because we specified that as our initial module. When it finds
that, it will try to transitively resolve all of its dependencies
inside the universe of observable modules (in this case, our four
application modules and all platform modules).
Implied Readability
203
So code that wants to call the dependent module might have to
use types from the depended-upon module. But it cant do that
if it does not also read the second module. Hence for the
dependent module to be usable, client modules would all have
to explicitly depend on that second module as well. Identifying
and manually resolving such hidden dependencies would be a
tedious and error-prone task.
Implying Readability
module com.infoq.monitor.ui {
// expose javafx.base to modules depending on this one
requirespublic javafx.base;
requires javafx.controls;
requires javafx.graphics;
exports com.infoq.monitor.ui.launch;
204
exports com.infoq.monitor.ui.show;
}
module com.infoq.monitor {
requires com.infoq.monitor.stats;
requires com.infoq.monitor.db;
requires com.infoq.monitor.ui;
// we dont need javafx.base anymore to update the UI model
requires java.logging;
// no packages to export
}
205
able to read the second module and, hence, access all the types
in that modules exported packages.
module com.infoq.monitor.db {
requires java.logging;
requires java.sql;
exports com.infoq.monitor.db.write;
206
Alternatively we might be using logging
throughout com.infoq.monitor.db. Then, types
from java.logging appear in many places independent
of Driver and can no longer be considered to be limited to the
boundary of com.infoq.monitor.db and java.sql.
With Jigsaw being cutting edge, the community still has time to
discuss such topics and agree on recommended practices. My
take is that if a module is used on more than just the boundary
to another module it should be explicitly required. This
approach clarifies the systems structure and also future-proofs
the module declaration for refactorings. So as long as our
database module uses logging independently of the SQL
module I would keep it.
Aggregator Modules
module com.infoq.monitor.api {
requirespublic com.infoq.monitor.stats;
requirespublic com.infoq.monitor.db;
requirespublic com.infoq.monitor.ui;
207
// implied readability is not transitive
// so we have to explicitly list `javafx.base`
requirespublic javafx.base
}
module com.infoq.monitor {
requires com.infoq.monitor.api;
requires java.logging;
// no packages to export
}
Services
208
qualified name of the service interface and Y the fully qualified
name of the implementing class. Y needs to have a public,
parameterless constructor so that the module system can
instantiate it.
209
We are now free to create one or more modules with
implementations for concrete microservices. Let's call these
modules com.infoq.monitor.watch.login, com.infoq.monitor.wa
tch.shipping and so forth. Their module descriptors are as
follows:
module com.infoq.monitor.watch.login {
requirespublic com.infoq.monitor.watch;
provides com.infoq.monitor.watch.Watcher
with com.infoq.monitor.watch.login.LoginWatcher;
}
module com.infoq.monitor.stats {
requires java.logging;
requires com.infoq.monitor.watch;
// we have to declare which service we are depending on
uses com.infoq.monitor.watch.Watcher;
exports com.infoq.monitor.stats.get;
210
}
ServiceLoader.load(Watcher.class).forEach(watchers::add);
211
Fig 7.4 Module Graph with use of Services
Note how the arrows all point towards the new module; a
typical example for the dependency inversion principle.
212
CHAPTER 8 - MACHINE LEARNING USING SPARK
Getting Started
214
System.out.println("Total tweets in file : " +
tweetsRDD.count());
/*
//Convert to upper case
JavaRDD<String> ucRDD = tweetsRDD.map(
str -> str.toUpperCase());
//Print upper case lines
for ( String s : ucRDD.take(5)) {
System.out.println(s);
}*/
while(true) {
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
The context starts the embedded spark instance .We load the
data from the csv file into Resilient Data sets and then convert
the data into uppercase using a map function .When the session
is active we can go and see the website localhost:4040, it will
tell us what is happening in the spark website actually (this can
be done running a Thread in an infinite while loop).Spark is
written in scala which is derived from java .
215
Fig 8.1 Spark localhost:4040
216
in a real time manner.From programming point of view, it has
interfaces to scala, python, R and Java .Following is the picture
of the Spark Ecosystem.
RDD
Internals of Spark
218
first 2 records in partition 1 and 3-5 records into the second
partition . They will be moved and stored within the executors
.RDD thus created will be moved to the worker nodes .
219
We can run spark in a batch mode. (job basis) or Interactive
mode (similar to an interactive shell)
220
visited which page , which product they bought, which page
they spent most time on.etc)So we acquire data ,process
data(cleanse ,filter, augment) based on the questions and
perform exploratory data analysis .We do some Machine
learning work like correlation analysis , training and test split ,
model building , prediction ,etc and based no this we take
action.Feedback will be provided by the actions and then we
correct the workflow to make the process better .RDD can be
created in a no of ways and operate in a Lazy Evaluation mode.
(It means Spark will not load or transfer the data unless an
action is performed)
If the data is very large create HDFS files out of spark using
apache Sqoop and then create RDD from them .Spark
understands the Hadoop partitions very well .storing the RDD
directly to text files , json ,sequence files or collections .Spark
provides simple functions to persist RDD to a variety of data
sinks Text Files ,JSON ,Sequence Files and collections
221
Spark OperationsSnippet
1. Get the spark context .This is the main entry point in the
spark activity
publicstaticJavaSparkContextgetContext() {
if ( spContext == null ) {
getConnection();
}
returnspContext;
}
2. Give the Spark master node (2 partitions by default ) and
appname and Create the conf object Post this spark gets
started
if ( spContext == null) {
//Setup Spark configuration
SparkConfconf = newSparkConf()
.setAppName(appName)
.setMaster(sparkMaster);
System.setProperty("hadoop.home.dir",
"c:\\spark\\winutils\\");
222
3. We create spark session for spark sql
sparkSession = SparkSession
.builder()
.appName(appName)
.master(sparkMaster)
.config("spark.sql.warehouse.dir", tempDir)
.getOrCreate();
publicstaticJavaRDD<Integer>getCollData() {
JavaSparkContextspContext =
SparkConnection.getContext();
List<Integer>data = Arrays.asList(3,6,3,4,8);
JavaRDD<Integer>collData =
spContext.parallelize(data);
collData.cache();
returncollData;
Spark context parallelize and will convert the list into JavaRDD
(it makes number of partitions based on the no of cores)
importorg.apache.spark.api.java.JavaRDD;
importorg.apache.spark.api.java.JavaSparkContext;
223
5. Till the time we perform a count on the RDD , RDD will
not be created
//1
JavaRDD<Integer>collData = DataResource.getCollData();
//2
JavaRDD<String>autoAllData
= spContext.textFile("data/auto-data.csv");
Every line in the file is stored as a String object and all the
string objects are stored in the RDD.
System.out.println(s);
6. The below code splits the original csv and store them as
partitions saveAsTextFile() will save each partition into
individual files
JavaRDD<String>autoAllData
= spContext.textFile("data/auto-data.csv");
224
autoAllData.saveAsTextFile("data/auto-data-modified.csv");
Map function
JavaRDD<String>autoAllData
= spContext.textFile("data/auto-data.csv");
JavaRDD<String>tsvData = autoAllData
225
It will return the tsvdData RDD where in each comma is
replaced with a tab character .Result can be of different type
and the data can be different -only thing is the number of
records will be the same .We can use inline function or our own
function.We can use for data standardization , data type
conversion ,element level computations like computing tax ,and
adding new attributes , data checking ,
FlatMap
newRdd = rdd.flatMap(function)
JavaRDD<String>words
=
toyotaData[TestRDD].flatMap(newFlatMapFunction<String,
String>() {
public Iterator<String> call(String s) {
returnArrays.asList(s.split(",")).iterator();
}
});
226
The first parameter is input datatype and second is output data
type We need to implement call () and return value is iterator
of string
Filter
JavaRDD<String>autoAllData
= spContext.textFile("data/auto-data.csv");
JavaRDD<String>autoData
JavaRDD<String>toyotaData
= autoData.filter(str ->str.contains("toyota"));
JavaSparkContextspContext = SparkConnection.getContext();
List<Integer>data = Arrays.asList(3,6,3,4,8);
227
JavaRDD<Integer>collData = spContext.parallelize(data);
+
collData.distinct().collect());
Set operations
UnionRDD= firstRDD.union(secondRDD)
IntersectionRDD = firstRDD.intersect(secondRDD)
JavaRDD<String>words1 = spContext.parallelize(
JavaRDD<String>words2 = spContext.parallelize(
ExerciseUtils.printStringRDD(words1.union(words2), 10);
228
System.out.println("Example for Set operations : Intersection");
ExerciseUtils.printStringRDD(words1.intersection(words2),10);
Actions
229
The important thing is to take care of data type return data
type. Following is the snippet of the reduce function .
Pair RDD
Special kind of RDD s that can store Key Value pairs .They
can be created through map operations on regular RDDs also
.All transformations for regular RDD applies to map RDD.
Popular functions for pair RDD are map values and
flatMapValues .For e.g. mapValues transform each value
without changing the key .Thus we can calculate the tax for
sales amount under the same key .flatMapValues can generate
multiple values with the same key.for eg a value to a key is a
list then flat map will creates multiple values with the same key.
230
Advanced Spark
Partioning
A RDD is broken down and distributed across all the nodes .By
default the number of partitions is equal to the no of cores on
the entire cluster .For very large clusters ,we need to configure
it .using spark .default.parallelism parameter .It can be done
during the RDD creation .
Persistance
231
JavaSparkContext spContext = SparkConnection.getContext();
LongAccumulator sedanCount =
spContext.sc().longAccumulator();
JavaRDD<String> autoOut
232
JavaRDD<String> wordsList =
spContext.parallelize(
Arrays.asList("hello", "to",
"Spark", "world"), 4);
System.out.println("No. of partitions in
wordsList = " +
wordsList.getNumPartitions());
Spark SQL
233
tables. We can perform operations like filter, joins, groupby,
aggregate and the regular map and reduce operations on
dataframes. Below we will load a Dataframe and perform
operations using Spark SQL.we can convert a dataframe into a
RDD anytime .We can also construct a dataframe from the Java
Objects,RDD,CSV files .
235
Spark Streaming
236
Fig 8.6 Spark Streaming
Machine Learning with Spark
237
.These attributes show a lot of relationships between the entities
.The process of learning for understanding the relationship
between the data and using a computer to do the needful
comprises of machine learning .Using machine learning we
build a model which could be a decision tree ,etc .These model
can be used to group similar data or predict the outcome .Since
machines only understand numbers ,text data needs to be
converted to equivalent numerical representations for ML
algorithms to work .
239
Fig 8.9 Variation and Bias
Types of errors :
-Linear Regression
-Multiple Regression
/*---------------------------------------------------------
-----------------
Load Data
242
---------------------------------------------------------
-----------------*/
Dataset<Row> autoDf = spSession.read()
.option("header","true")
.csv("data/TestCsvWithAutomobilesData.csv");
autoDf.show(5);
autoDf.printSchema();
/*-------------------------------------------------------
-------------------
Cleanse Data
---------------------------------------------------------
-----------------*/
//Convert all data types as double; Change
missing values to standard ones.
DataTypes.createStructField("MPG",
DataTypes.DoubleType, false),
243
DataTypes.createStructField("CYLINDERS",
DataTypes.DoubleType, false),
DataTypes.createStructField("DISPLACEMENT",
DataTypes.DoubleType, false),
DataTypes.createStructField("HP",
DataTypes.DoubleType, false),
DataTypes.createStructField("WEIGHT",
DataTypes.DoubleType, false),
DataTypes.createStructField("ACCELERATION",
DataTypes.DoubleType, false),
DataTypes.createStructField("MODELYEAR",
DataTypes.DoubleType, false),
DataTypes.createStructField("NAME",
DataTypes.StringType, false)
});
// Removing of special characters ?
Broadcast<Double> avgHP = spContext.broadcast(80.0);
//Change data frame back to RDD
JavaRDD<Row> rdd1 = autoDf.toJavaRDD().repartition(2);
//Function to map.
JavaRDD<Row> rdd2 = rdd1.map( new
Function<Row, Row>() {
244
@Override
public Row call(Row iRow) throws Exception {
double hp = (iRow.getString(3).equals("?") ?
avgHP.value()
: Double.valueOf(iRow.getString(3)));
Row retRow =
RowFactory.create( Double.valueOf(iRow.getString(0)),
Double.valueOf(iRow.getString(1)),
Double.valueOf(iRow.getString(2)),
Double.valueOf(hp),
Double.valueOf(iRow.getString(4)),
Double.valueOf(iRow.getString(5)),
Double.valueOf(iRow.getString(6)),
iRow.getString(7)
);
return retRow;
}
});
//Create Data Frame back.
245
Dataset<Row> autoCleansedDf =
spSession.createDataFrame(rdd2, autoSchema);
System.out.println("Transformed Data :");
autoCleansedDf.show(5);
/*----------------------------------------------------------
----------------
Analyze Data
---------------------------------------------------------
-----------------*/
//Perform correlation analysis
for ( StructField field : autoSchema.fields() ) {
if ( !
field.dataType().equals(DataTypes.StringType)) {
System.out.println(
"Correlation between MPG and " + field.name()
+ " = " +
autoCleansedDf.stat().corr("MPG", field.name()) );
}
}
/*-------------------------------------------------------
-------------------
Prepare for Machine Learning.
---------------------------------------------------------
-----------------*/
//Convert data to labeled Point structure
246
JavaRDD<Row> rdd3 =
autoCleansedDf.toJavaRDD().repartition(2);
Vectors.dense(iRow.getDouble(2),
iRow.getDouble(4),
iRow.getDouble(5)));
return lp;
}
});
Dataset<Row> autoLp = spSession.createDataFrame(rdd4,
LabeledPoint.class);
autoLp.show(5);
// Split the data into training and test sets (10% held out for
testing).
Dataset<Row>[] splits =
autoLp.randomSplit(new double[]{0.9, 0.1});
Dataset<Row> trainingData = splits[0];
Dataset<Row> testData = splits[1];
247
/*-------------------------------------------------------
-------------------
Perform machine learning.
---------------------------------------------------------
-----------------*/
//Create the object
LinearRegression lr = new LinearRegression();
//Create the model
LinearRegressionModel lrModel =
lr.fit(trainingData);
//Print out coefficients and intercept for LR
System.out.println("Coefficients: "
+ lrModel.coefficients() + "
Intercept: " + lrModel.intercept());
//Predict on test data
Dataset<Row> predictions =
lrModel.transform(testData);
//View results
predictions.select("label", "prediction",
"features").show(5);
//Compute R2 for the model on test data.
RegressionEvaluator evaluator = new
RegressionEvaluator()
.setLabelCol("label")
.setPredictionCol("prediction")
248
.setMetricName("r2");
double r2 = evaluator.evaluate(predictions);
System.out.println("R2 on test data = " + r2);
/*---------------------------------------------------------
-----------------
Load Data
---------------------------------------------------------
-----------------*/
Dataset<Row> irisDf = spSession.read()
.option("header","true")
.csv("data/iris.csv");
irisDf.show(5);
irisDf.printSchema();
/*-------------------------------------------------------
-------------------
Cleanse Data
---------------------------------------------------------
-----------------*/
//Convert all data types as double; Change
missing values to standard ones.
250
//Create the schema for the data to be loaded
into Dataset.
StructType irisSchema = DataTypes
.createStructType(new StructField[] {
DataTypes.createStructField("SEPAL_LENGTH",
DataTypes.DoubleType, false),
DataTypes.createStructField("SEPAL_WIDTH",
DataTypes.DoubleType, false),
DataTypes.createStructField("PETAL_LENGTH",
DataTypes.DoubleType, false),
DataTypes.createStructField("PETAL_WIDTH",
DataTypes.DoubleType, false),
DataTypes.createStructField("SPECIES",
DataTypes.StringType, false)
});
//Change data frame back to RDD
JavaRDD<Row> rdd1 =
irisDf.toJavaRDD().repartition(2);
//Function to map.
JavaRDD<Row> rdd2 = rdd1.map( new
Function<Row, Row>() {
@Override
public Row call(Row iRow) throws Exception {
251
Row retRow = RowFactory.create(
Double.valueOf(iRow.getString(0)),
Double.valueOf(iRow.getString(1)),
Double.valueOf(iRow.getString(2)),
Double.valueOf(iRow.getString(3)),
iRow.getString(4)
);
return retRow;
}
});
//Create Data Frame back.
Dataset<Row> irisCleansedDf =
spSession.createDataFrame(rdd2, irisSchema);
System.out.println("Transformed Data :");
irisCleansedDf.show(5);
/*-------------------------------------------------------
-------------------
Analyze Data
---------------------------------------------------------
-----------------*/
//Add an index using string indexer.
StringIndexer indexer = new StringIndexer()
252
.setInputCol("SPECIES")
.setOutputCol("IND_SPECIES");
StringIndexerModel siModel =
indexer.fit(irisCleansedDf);
Dataset<Row> indexedIris =
siModel.transform(irisCleansedDf);
indexedIris.groupBy(col("SPECIES"),col("IND_SPECI
ES")).count().show();
//Perform correlation analysis
for ( StructField field : irisSchema.fields() ) {
if ( !
field.dataType().equals(DataTypes.StringType)) {
System.out.println(
"Correlation between IND_SPECIES and " + field.name()
+ " = " +
indexedIris.stat().corr("IND_SPECIES", field.name()) );
}
}
/*-------------------------------------------------------
-------------------
Prepare for Machine Learning.
---------------------------------------------------------
-----------------*/
//Convert data to labeled Point structure
253
JavaRDD<Row> rdd3 =
indexedIris.toJavaRDD().repartition(2);
Vectors.dense(iRow.getDouble(0),
iRow.getDouble(1),
iRow.getDouble(2),
iRow.getDouble(3)));
return lp;
}
});
Dataset<Row> irisLp =
spSession.createDataFrame(rdd4, LabeledPoint.class);
irisLp.show(5);
// Split the data into training and test sets (30%
held out for testing).
254
Dataset<Row>[] splits =
irisLp.randomSplit(new double[]{0.7, 0.3});
Dataset<Row> trainingData = splits[0];
Dataset<Row> testData = splits[1];
/*-------------------------------------------------------
-------------------
Perform machine learning.
---------------------------------------------------------
-----------------*/
//Create the object
// Train a DecisionTree model.
DecisionTreeClassifier dt = new
DecisionTreeClassifier()
.setLabelCol("label")
.setFeaturesCol("features");
// Convert indexed labels back to original
labels.
IndexToString labelConverter = new
IndexToString()
.setInputCol("label")
.setOutputCol("labelStr")
.setLabels(siModel.labels());
labelConverter.transform(rawPredictions));
//View results
System.out.println("Result sample :");
predictions.select("labelStr", "predictionStr",
"features").show(5);
//View confusion matrix
System.out.println("Confusion Matrix :");
predictions.groupBy(col("labelStr"),
col("predictionStr")).count().show();
//Accuracy computation
MulticlassClassificationEvaluator evaluator =
new MulticlassClassificationEvaluator()
.setLabelCol("label")
.setPredictionCol("prediction")
.setMetricName("accuracy");
256
double accuracy =
evaluator.evaluate(predictions);
System.out.println("Accuracy
= " + Math.round( accuracy * 100) + " %" );
// Keep the program running so we can
checkout things.
ExerciseUtils.hold();
}
-Dimensionality Reduction
-Random Forest
257
ensemble decision making ,we build multiple models - a
decision tree model Each tree is used to predict an individual
result .A vote is taken on all the results to find the best answer
.Its in a way a collection of decision tress .Lets say the dataset
contains m samples and n predictors .We built x trees with a
different subset of data .For each tree ,a subset of m rows and n
columns are chosen randomly. For prediction ,new data is
passed to each of these x trees and x possible results are
obtained .The most found result is the aggregate prediction .Its a
highly accurate method and efficient with large number of
predictors .it is fully parallelizable .It is very good with missing
data .Disadvantage is its a time consuming and resource
consuming process. For categorical variables ,bias might exist if
levels are disproportionate. It is used in scientific research and
medical diagnosis. The following program snippet demonstrates
the PCA and random forest in action .
/*--------------------------------------------------------------------------
Load Data
---------------------------------------------------------
-----------------*/
.option("header","true")
.option("sep", ";")
.csv("data/bank.csv");
bankDf.show(5);
bankDf.printSchema();
258
/*-------------------------------------------------------
-------------------
Cleanse Data
---------------------------------------------------------
-----------------*/
.createStructType(new StructField[] {
DataTypes.createStructField("OUTCOME",
DataTypes.DoubleType, false),
DataTypes.createStructField("AGE",
DataTypes.DoubleType, false),
DataTypes.createStructField("SINGLE",
DataTypes.DoubleType, false),
DataTypes.createStructField("MARRIED",
DataTypes.DoubleType, false),
DataTypes.createStructField("DIVORCED",
DataTypes.DoubleType, false),
259
DataTypes.createStructField("PRIMARY",
DataTypes.DoubleType, false),
DataTypes.createStructField("SECONDARY",
DataTypes.DoubleType, false),
DataTypes.createStructField("TERTIARY",
DataTypes.DoubleType, false),
DataTypes.createStructField("DEFAULT",
DataTypes.DoubleType, false),
DataTypes.createStructField("BALANCE",
DataTypes.DoubleType, false),
DataTypes.createStructField("LOAN",
DataTypes.DoubleType, false)
});
//Function to map.
@Override
double age =
Double.valueOf(iRow.getString(0));
double outcome =
(iRow.getString(16).equals("yes") ? 1.0: 0.0 );
double single =
(iRow.getString(2).equals("single") ? 1.0 : 0.0);
double married =
(iRow.getString(2).equals("married") ? 1.0 : 0.0);
double divorced =
(iRow.getString(2).equals("divorced") ? 1.0 : 0.0);
double primary =
(iRow.getString(3).equals("primary") ? 1.0 : 0.0);
double secondary =
(iRow.getString(3).equals("secondary") ? 1.0 : 0.0);
double tertiary =
(iRow.getString(3).equals("tertiary") ? 1.0 : 0.0);
double dflt =
(iRow.getString(4).equals("yes") ? 1.0 : 0.0);
261
//Convert balance to float
double balance =
Double.valueOf(iRow.getString(5));
double loan =
(iRow.getString(7).equals("yes") ? 1.0 : 0.0);
Row retRow =
RowFactory.create( outcome, age, single, married, divorced,
return retRow;
});
Dataset<Row> bankCleansedDf =
spSession.createDataFrame(rdd2, bankSchema);
bankCleansedDf.show(5);
/*-------------------------------------------------------
-------------------
Analyze Data
---------------------------------------------------------
-----------------*/
262
//Perform correlation analysis
if ( !
field.dataType().equals(DataTypes.StringType)) {
System.out.println(
"Correlation between OUTCOME and " + field.name()
+ " = " +
bankCleansedDf.stat().corr("OUTCOME", field.name()) );
/*-------------------------------------------------------
-------------------
---------------------------------------------------------
-----------------*/
JavaRDD<Row> rdd3 =
bankCleansedDf.toJavaRDD().repartition(2);
@Override
263
LabeledPoint lp = new
LabeledPoint(iRow.getDouble(0) ,
Vectors.dense(iRow.getDouble(1),
iRow.getDouble(2),
iRow.getDouble(3),
iRow.getDouble(4),
iRow.getDouble(5),
iRow.getDouble(6),
iRow.getDouble(7),
iRow.getDouble(8),
iRow.getDouble(9),
iRow.getDouble(10)));
return lp;
}
});
264
Dataset<Row> bankLp =
spSession.createDataFrame(rdd4, LabeledPoint.class);
bankLp.show(5);
.setInputCol("label")
.setOutputCol("indLabel");
Dataset<Row> indexedBankLp =
siModel.transform(bankLp);
indexedBankLp.show(5);
//Perform PCA
.setInputCol("features")
.setOutputCol("pcaFeatures")
.setK(3);
PCAModel pcaModel =
pca.fit(indexedBankLp);
265
Dataset<Row> bankPCA =
pcaModel.transform(indexedBankLp);
bankPCA.show(5);
Dataset<Row>[] splits =
bankPCA.randomSplit(new double[]{0.7, 0.3});
/*-------------------------------------------------------
-------------------
---------------------------------------------------------
-----------------*/
RandomForestClassifier rf = new
RandomForestClassifier()
.setLabelCol("indLabel")
.setFeaturesCol("pcaFeatures");
266
// Convert indexed labels back to original
labels.
.setInputCol("indLabel")
.setOutputCol("labelStr")
.setLabels(siModel.labels());
.setInputCol("prediction")
.setOutputCol("predictionStr")
.setLabels(siModel.labels());
RandomForestClassificationModel rfModel =
rf.fit(trainingData);
Dataset<Row> rawPredictions =
rfModel.transform(testData);
Dataset<Row> predictions =
predConverter.transform(
labelConverter.transform(rawPredictions));
//View results
267
System.out.println("Result sample :");
predictions.select("labelStr", "predictionStr",
"features").show(5);
predictions.groupBy(col("labelStr"),
col("predictionStr")).count().show();
//Accuracy computation
MulticlassClassificationEvaluator evaluator =
new MulticlassClassificationEvaluator()
.setLabelCol("indLabel")
.setPredictionCol("prediction")
.setMetricName("accuracy");
System.out.println("Accuracy
= " + Math.round( accuracy * 100) + " %" );
ExerciseUtils.hold();
268
-Nave Bayes
/*-----------------------------------------------------------------
---------
Load Data
---------------------------------------------------------
-----------------*/
269
.createStructType(new StructField[] {
DataTypes.createStructField("label",
DataTypes.DoubleType, false),
DataTypes.createStructField("message",
DataTypes.StringType, false)
});
.csv("data/SMSSpamCollection.csv");
smsDf.show(5);
smsDf.printSchema();
/*-------------------------------------------------------
-------------------
Cleanse Data
---------------------------------------------------------
-----------------*/
JavaRDD<Row> rdd1 =
smsDf.toJavaRDD().repartition(2);
//Function to map.
270
JavaRDD<Row> rdd2 = rdd1.map( new
Function<Row, Row>() {
@Override
return retRow;
});
Dataset<Row> smsCleansedDf =
spSession.createDataFrame(rdd2, smsSchema);
smsCleansedDf.show(5);
/*-------------------------------------------------------
-------------------
---------------------------------------------------------
-----------------*/
271
// Split the data into training and test sets (30%
held out for testing).
Dataset<Row>[] splits =
smsCleansedDf.randomSplit(new double[]{0.7, 0.3});
/*-------------------------------------------------------
-------------------
---------------------------------------------------------
-----------------*/
.setInputCol("message")
.setOutputCol("words");
.setInputCol("words")
.setOutputCol("rawFeatures");
.setInputCol("rawFeatures")
.setOutputCol("features");
272
NaiveBayes nbClassifier = new NaiveBayes()
.setLabelCol("label")
.setFeaturesCol("features");
.setStages(new
PipelineStage[]
{tokenizer,
hashingTF, idf, nbClassifier});
predictions.show(5);
//View results
predictions.show(5);
predictions.groupBy(col("label"),
col("prediction")).count().show();
//Accuracy computation
273
MulticlassClassificationEvaluator evaluator =
new MulticlassClassificationEvaluator()
.setLabelCol("label")
.setPredictionCol("prediction")
.setMetricName("accuracy");
double accuracy =
evaluator.evaluate(predictions);
System.out.println("Accuracy
= " + Math.round( accuracy * 100) + " %" );
ExerciseUtils.hold();
-K-means Clustering
274
available like Euclidian distance and Manhattan distance .The
below diagram show the stages of clustering .
275
/*---------------------------------------------------------
-----------------
Load Data
---------------------------------------------------------
-----------------*/
Dataset<Row> autoDf = spSession.read()
.option("header","true")
.csv("data/auto-data.csv");
autoDf.show(5);
autoDf.printSchema();
/*-------------------------------------------------------
-------------------
Cleanse Data convert data type
---------------------------------------------------------
-----------------*/
//Create the schema for the data to be loaded
into Dataset.
StructType autoSchema = DataTypes
.createStructType(new
StructField[] {
DataTypes.createStructField("DOORS",
DataTypes.DoubleType, false),
DataTypes.createStructField("BODY",
DataTypes.DoubleType, false),
DataTypes.createStructField("HP",
DataTypes.DoubleType, false),
276
DataTypes.createStructField("RPM",
DataTypes.DoubleType, false),
DataTypes.createStructField("MPG",
DataTypes.DoubleType, false)
});
JavaRDD<Row> rdd1 = autoDf.toJavaRDD().repartition(2);
//Function to map.
JavaRDD<Row> rdd2 = rdd1.map( new Function<Row,
Row>() {
@Override
public Row call(Row iRow) throws Exception {
double doors = ( iRow.getString(3).equals("two") ? 1.0 : 2.0);
double body = ( iRow.getString(4).equals("sedan") ? 1.0 : 2.0);
Row retRow = RowFactory.create( doors, body,
Double.valueOf(iRow.getString(7)),
Double.valueOf(iRow.getString(8)),
Double.valueOf(iRow.getString(9)) );
return retRow;
}
});
//Create Data Frame back.
Dataset<Row> autoCleansedDf =
spSession.createDataFrame(rdd2, autoSchema);
System.out.println("Transformed Data :");
277
autoCleansedDf.show(5);
/*-------------------------------------------------------
-------------------
Prepare for Machine Learning - Perform
Centering and Scaling
---------------------------------------------------------
-----------------*/
Row meanRow =
autoCleansedDf.agg(avg(autoCleansedDf.col("DOORS")),
avg(autoCleansedDf.col("BODY")),
avg(autoCleansedDf.col("HP")),
avg(autoCleansedDf.col("RPM")),
avg(autoCleansedDf.col("MPG")))
.toJavaRDD().takeOrdered(1).get(0) ;
Row stdRow =
autoCleansedDf.agg(stddev(autoCleansedDf.col("DOORS")),
stddev(autoCleansedDf.col("BODY")),
stddev(autoCleansedDf.col("HP")),
stddev(autoCleansedDf.col("RPM")),
stddev(autoCleansedDf.col("MPG")))
.toJavaRDD().takeOrdered(1).get(0) ;
System.out.println("Mean Values : " +
meanRow);
278
System.out.println("Std Dev Values : " +
stdRow);
Broadcast<Row> bcMeanRow =
spContext.broadcast(meanRow);
Broadcast<Row> bcStdRow =
spContext.broadcast(stdRow);
DoubleAccumulator rowId =
spContext.sc().doubleAccumulator();
rowId.setValue(1);
//Perform center-and-scale and create a vector
JavaRDD<Row> rdd3 =
autoCleansedDf.toJavaRDD().repartition(2);
JavaRDD<LabeledPoint> rdd4 = rdd3.map(
new Function<Row, LabeledPoint>() {
@Override
public LabeledPoint call(Row iRow) throws Exception {
double doors = (bcMeanRow.value().getDouble(0) -
iRow.getDouble(0))
/ bcStdRow.value().getDouble(0);
double body =
(bcMeanRow.value().getDouble(1) - iRow.getDouble(1))
/ bcStdRow.value().getDouble(1);
double hp = (bcMeanRow.value().getDouble(2) -
iRow.getDouble(2))
/ bcStdRow.value().getDouble(2);
279
double rpm = (bcMeanRow.value().getDouble(3) -
iRow.getDouble(3))
/ bcStdRow.value().getDouble(3);
double mpg = (bcMeanRow.value().getDouble(4) -
iRow.getDouble(4))
/ bcStdRow.value().getDouble(4);
double id= rowId.value();
rowId.setValue(rowId.value()+1);
280
KMeans kmeans = new KMeans()
.setK(4)
.setSeed(1L);
KMeansModel model =
kmeans.fit(autoVector);
Dataset<Row> predictions =
model.transform(autoVector);
System.out.println("Groupings : ");
predictions.show(5);
System.out.println("Groupings Summary : ");
predictions.groupBy(col("prediction")).count().show();
281
CHAPTER 9 - GENETIC ALGORITHMS USING
JAVA
Getting started
282
Artificial Intelligence
283
slightly different areas, such as modeling the human brain. This
field of research is called artificial neural networks, and it uses
models of the biological nervous system to mimic its learning
and data processing capabilities.
Biological Evolution
Terminology
Search Spaces
286
101 is only 1 difference away from, 111. This is
because there is only 1 change required (flipping the 0 to 1) to
transition from 101 to 111. This means these solutions are
only 1 space apart on the search space.
Fitness Landscapes
287
simplification of what would be found in practice. Most real
world applications have multiple values that need optimizing
creating a multi-dimensional fitness landscape.
288
time frame available. In these conditions, genetic algorithms
and evolutionary algorithms in general, are very effective at
finding feasible, near optimum solutions in a relatively short
time frame.
289
Fig 9.4 Mutated Solution
290
Local Optimums
291
off at a random point in the search space, then attempts to find a
better solution by evaluating its neighbor solutions. When the
hill climber finds a better solution amongst its neighbors, it will
move to the new position and restart the search process again.
This process will gradually find improved solutions by taking
steps up whatever hill it found itself on in the search space
hence the name, hill climber. When the hill climber can no
longer find a better solution it will assume it is at the top of the
hill and stop the search.
292
runs. This optimization method is relatively easy to implement
and surprisingly effective. Other approaches such as, Simulated
Annealing (see Kirkpatrick, Gelatt, and Vecchi (1983)) and
Tabu search (see Glover (1989) and Glover (1990)) are slight
variations to the hill climbing algorithm which both having
properties that can help reduce local optimums.
293
Fig 9.7 Genetic Algorithm Process II
Mutation Rate
294
genetic algorithm avoid getting stuck in local optimums, when
its set too high it can have a negative impact on the search.
This, as was said before, is due to the solutions in each
generation being mutated to such a large extent that theyre
practically randomized after mutation has been applied.
Population Size
295
however they require less computational resources per
generation.
Crossover Rate
Genetic Representations
297
Fig 9.8 Genetic Algorithm Flow
298
set. Usually this will be because the algorithm has reached a
fixed number of generations or an adequate solution has
been found.
299
7: evaluatePopulation(population[generation]);
8: generation++;
9: End loop;
300
checking. After the class has been created, add a constructor
which accepts the four parameters: population size, mutation
rate, crossover rate, and number of elite members.
packagetest;
/**
* Lots of comments in the source that are omitted here!
*/
public class GeneticAlgorithm {
private int populationSize;
private double mutationRate;
private double crossoverRate;
private int elitismCount;
/**
* Many more methods implemented later...
*/
}
301
algorithm and provide a starting point for the application. Name
this class AllOnesGA and define a main method:
packagetest;
public class AllOnesGA {
public static void main(String[] args) {
// Create GA object
GeneticAlgorithm ga = new GeneticAlgorithm(100,
0.01, 0.95, 0);
// Well add a lot more here...
}
}
For the time being, well just use some typical values for
the parameters, population size = 100; mutation rate = 0.01;
crossover rate = 0.95, and an elitism count of 0 (effectively
disabling it for now). After you have completed your
implementation at the end of the chapter, you can experiment
with how changing these parameters affect the performance of
the algorithm.
302
fitness, or get the fittest individual in the population, for
example.
package test;
304
constructors. One constructor accepts an integer (representing
the length of the chromosome) and will create a random
chromosome when initializing the object. The other constructor
accepts an integer array and uses that as the chromosome.
Package test;
import java.util.Arrays;
import java.util.Comparator;
305
for (int individualCount = 0; individualCount <
populationSize; individualCount++) {
Individual individual = new
Individual(chromosomeLength);
this.population[individualCount] = individual;
}
}
return this.population[offset];
}
306
public double getPopulationFitness() {
return this.populationFitness;
}
307
addition to holding the individuals it also stores the populations
total fitness which will become important later on when
implementing the selection method.
/**
* We still have lots of methods to implement down here...
*/
}
308
public class AllOnesGA {
public static void main(String[] args){
// Create GA object
GeneticAlgorithm ga = new GeneticAlgorithm(100, 0.01,
0.95, 0);
// Initialize population
Population population = ga.initPopulation(50);
}
}
309
counting the number of ones found within an individuals
chromosome.
// Calculate fitness
double fitness = (double) correctGenes
/ individual.getChromosomeLength();
// Store fitness
individual.setFitness(fitness);
return fitness;
}
310
We also need a simple helper method to loop over every
individual in the population and evaluate them (i.e., call
calcFitness on each individual). Lets call this method
evalPopulation and add it to the GeneticAlgorithm class as well.
It should look like the following, and again you may add it
anywhere:
population.setPopulationFitness(populationFitness);
}
package chapter2;
311
public double calcFitness(Individual individual) { }
public void evalPopulation(Population population) { }
}
312
public boolean isTerminationConditionMet(Population
population) {
for (Individual individual : population.getIndividuals()) {
if (individual.getFitness() == 1) {
return true;
}
}
return false;
}
313
0.95, 0);
Population population = ga.initPopulation(50);
// Apply crossover
// TODO!
// Apply mutation
// TODO!
// Evaluate population
ga.evalPopulation(population);
314
In addition to the various selection methods that can be
used during crossover, there are also different methods to
exchange the genetic information between two individuals.
Different problems have slightly different properties and work
better with specific crossover methods. For example, the all-
ones problem simply requires a string that consists entirely of
1s. A string of 00111 has the same fitness value as a string of
10101 they both contain three 1s. With genetic algorithms
of this type, this isnt always the case. Imagine we are trying to
create a string which lists, in order, the numbers one to five. In
this case the string 12345 has a very different fitness value
from 52431. This is because were not just looking for the
correct numbers, but also the correct order. For problems such
as this, a crossover method that respects the order of the genes
is preferable.
315
public Individual selectParent(Population population) {
// Get individuals
Individual individuals[] = population.getIndividuals();
// Find parent
double spinWheel = 0;
for (Individual individual : individuals) {
spinWheel += individual.getFitness();
if (spinWheel >= rouletteWheelPosition) {
return individual;
}
}
return individuals[population.size() - 1];
}
Now that the selection method has been added, the next step
is to create the crossover method using this selectParent( )
316
method to select the crossover mates. To begin, add the
following crossover method to the GeneticAlgorithm class.
317
offspring.setGene(geneIndex,
parent2.getGene(geneIndex));
}
}
return newPopulation;
}
318
From here we can implement the crossover function into
our main method in the AllOnesGA class. The entire
AllOnesGA class and main method is printed below; however
the only change from before is the addition of the line that calls
crossoverPopulation( ) below the Apply crossover comment.
package test;
public class AllOnesGA {
public static void main(String[] args) {
// Create GA object
GeneticAlgorithm ga = new GeneticAlgorithm(100,
0.001, 0.95, 0);
// Initialize population
Population population = ga.initPopulation(50);
// Evaluate population
ga.evalPopulation(population);
while (ga.isTerminationConditionMet(population) ==
false) {
// Print fittest individual from population
System.out.println("Best solution: " +
population.getFittest(0).toString());
// Apply crossover
population = ga.crossoverPopulation(population);
// Apply mutation
// TODO
319
// Evaluate population
ga.evalPopulation(population);
320
This advice seems moot and over-obvious in this chapter,
but consider a different simple problem where you need to order
the numbers one through six without repeating (i.e., end up with
123456). A mutation algorithm that simply chose a random
number between one and six could yield 126456, using 6
twice, which would be an invalid solution because each number
can only be used once. As you can see, even simple problems
sometimes require sophisticated techniques.
321
if (this.mutationRate > Math.random()) {
// Get new gene
int newGene = 1;
if (individual.getGene(geneIndex) == 1) {
newGene = 0;
}
// Mutate gene
individual.setGene(geneIndex, newGene);
}
}
}
package test;
public class AllOnesGA {
// Initialize population
Population population = ga.initPopulation(50);
// Evaluate population
ga.evalPopulation(population);
while (ga.isTerminationConditionMet(population) ==
false) {
// Print fittest individual from population
System.out.println("Best solution: " +
population.getFittest(0).toString());
// Apply crossover
population = ga.crossoverPopulation(population);
// Apply mutation
323
population = ga.mutatePopulation(population);
// Evaluate population
ga.evalPopulation(population);
324
CHAPTER 10 NATURAL LANGUAGE
PROCESSING
Getting Started
Tokenization
Sentence detection
Clssification
extraction
Ex of APIs
Stanford NLP
LingPipe
Gate
325
In this book Apaches Open NLP will be used for the egs
What is NLP?
326
Why NLP?
1. Searching
2. Machine Translation
3. Summation
5. Information Grouping
7. Sentiment analysis
8. Answering queries
9. Speech Recognition
327
Survey of NLP Tools
328
referring to words, sometimes called tokens. Morphology is the
study of the structure of words. We will use a number of
morphology terms in our exploration of NLP. However, there
are many ways of classifying words including the following:
329
special character to reflect scientific notation or numbers of a
specific base.
Finding Sentences
330
by the use of quotes With more specialized text, such as tweets
and chat sessions, we may need to consider the use of new lines
or completion of clauses Punctuation ambiguity is best
illustrated by the period. It is frequently used to demark the end
of a sentence. However, it can be used in a number of other
contexts as well, including abbreviation, numbers, e-mail
addresses, and ellipses. Other punctuation characters, such as
question and exclamation marks, are also used in embedded
quotes and specialized text such as code that may be in a
document. Periods are used in a number of situations: To
terminate a sentence To end an abbreviation To end an
abbreviation and terminate a sentence For ellipses For ellipses
at the end of a sentence Embedded in quotes or brackets Most
sentences we encounter end with a period. This makes them
easy to identify. However, when they end with an abbreviation,
it a bit more difficult to identify them.
331
String class' split method is used to split the text into
sentences.Sample snippet is provided below.
{ System.out.println(string); }
[^.!?\s][^.!?]*(?:[.!?](?!['"]?\s|$)[^.!?]*)*[.!?]?['"]?(?=\s|$)
332
+ "[^.!?]* # Greedily consume up to punctuation.\n" + "(?: #
Group for unrolling the loop.\n"
333
Techniques for name recognition
Performance measure:
335
When working with regular expressions, it is advantageous to
avoid reinventing the wheel. There are many sources for
predefined and tested expressions. One such library can be
found at http://regexlib.com/Default.aspx. We will use several
of the regular expressions in this library for our examples. To
test how well these approaches work, we will use the following
text for most of our examples:
336
while (matcher.find())
The find method will return true when a match occurs. Its group
method returns the text that matches the expression. Its start and
end methods give us the position of the matched
text in the target text. When executed, we will get the following
output:
URL \\b(https?|ftp|file|ldap)://[-A-Za-z0-
9+&@#/%?=~_|!:,.;]*[A-Za-z0-9+&@#/%=~_|]
http://example.com [256:274]
E-mail [a-zA-Z0-9'._%+-]+@(?:[a-zA-Z0-9-]+\\.)+[a-zA-
Z]{2,4} rgb@colorworks.com [27:45]
Time (([0-1]?[0-9])|([2][0-3])):([0-5]?[0-9])(:([0-5]?[0-9]))?
8:00 [217:221]
4:30 [229:233]
337
Date
((0?[13578]|10|12)(-|\\/)(([1-9])|(0[1-9])|([12])([0-9]?)|
(3[01]?))(-|\\/)((19)([2-9])(\\d{1})|(20)([01])(\\d{1})|
([8901])(\\d{1}))|(0?[2469]|11)(-|\\/)(([1-9])|(0[1-9])| ([12])([0-
9]?)|(3[0]?))(-|\\/)((19)([2-9])(\\d{1})|(20)
([01])(\\d{1})|([8901])(\\d{1})))
2/25/1954 [315:324]
regularExpressionText =
888-555-2222 [27:39]
339
CHAPTER 11 PERFORMANCE ENHANCEMENTS
IN JAVA
Getting started
341
Similar to CMS GC, there is a fail-safe to collect and
compact the entire old generation in dire situations such as
when old generation space is exhausted.
342
Also at the end of the remark phase, G1 can identify an
optimal set of old generations to collect.
344
regions are the same size, and their size does not change during
execution of the JVM. The region size calculation is based on
the average of the initial and maximum Java heap sizes such
that there are about 2000 regions for that average heap size. As
an example, for a 16GB Java heap with -Xmx16g -
Xms16g command-line options, G1 will choose a region size of
16GB/2000 = 8MB.
If the initial and maximum Java heap sizes are far apart or if
the heap size is very large, it is possible to have many more than
2000 regions. Similarly, a small heap size may end up with
many fewer than 2000 regions.
345
generation and contain objects whose size is 50 percent or more
of a region. Until a JDK 8u40 change, humongous regions were
collected as part of the old generation, but in JDK 8u40 certain
humongous regions are collected as part of a young collection.
There is more detail on humongous regions later in this chapter.
The fact that a region can be used for any purpose means
that there is no need to partition the heap into contiguous young
and old generation segments. Instead, G1 heuristics estimate
how many regions the young generation can consist of and still
be collected within a given GC pause time target. As the
application starts allocating objects, G1 chooses an available
region, designates it as an eden region, and starts handing out
memory chunks from it to Java threads. Once the region is full,
another unused region is designated an eden region. The process
continues until the maximum number of eden regions is
reached, at which point a young GC is initiated.
346
regions are added to the available region set. Old regions
containing live objects are scheduled to be included in a future
mixed collection.
Humongous Objects
347
this happens, all the regions containing the humongous object
can be reclaimed at once.
Concurrent Cycle
348
The purpose of the initial-mark phase is to gather all GC
roots. Roots are the starting points of the object graphs. To
collect root references from application threads, the application
threads must be stopped; thus the initial-mark phase is stop-the-
world. In G1, the initial marking is done as part of a young GC
pause since a young GC must gather all roots anyway.
349
The marking phases must be completed in order to find out
what objects are live so as to make informed decisions about
what regions to include in the mixed GCs. Since it is the mixed
GCs that are the primary mechanism for freeing up memory in
G1, it is important that the marking phase finishes before G1
runs out of available regions. If the marking phase does not
finish prior to running out of available regions, G1 will fall back
to a full GC to free up memory. This is reliable but slow.
Heap Sizing
350
more aggressive in their decision to increase Java heap size and
by default are targeted to spend less time in GC relative to the
time spent executing the application.
351