Sei sulla pagina 1di 13
Future Generation Computer Systems 83 (2018) 14–26 Contents lists available at ScienceDirect Future Generation

Contents lists available at ScienceDirect

Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs

Systems journal homepage: www.elsevier.com/locate/fgcs A GSA based hybrid algorithm for bi-objective workflow

A GSA based hybrid algorithm for bi-objective workflow scheduling in cloud computing

Anubhav Choudhary, Indrajeet Gupta, Vishakha Singh, Prasanta K. Jana * ,1

Gupta , Vishakha Singh , Prasanta K. Jana * , 1 Department of Computer Science and

Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, India

h i g h l i g h t s

Proposed an efficient hybrid scheme of GSA and HEFT, called HGSA for workflow scheduling.

Systematic derivation of fitness function based on makespan and cost.

Novelty in introducing a proficient elimination strategy of inferior agents.

Demonstration of better performance through simulation results and statistical test ANOVA.

a r t i c l e

i n f o

Article history:

Received 28 February 2017 Received in revised form 12 October 2017 Accepted 3 January 2018 Available online 8 January 2018

Keywords:

Gravitational Search Algorithm Workflow scheduling Cost Makespan Cost time equivalence

a b s t r a c t

Workflow Scheduling in cloud computing has drawn enormous attention due to its wide application in both scientific and business areas. This is particularly an NP-complete problem. Therefore, many researchers have proposed a number of heuristics as well as meta-heuristic techniques by considering several issues, such as energy conservation, cost and makespan. However, it is still an open area of research as most of the heuristics or meta-heuristics may not fulfill certain optimum criterion and produce near optimal solution. In this paper, we propose a meta-heuristic based algorithm for workflow scheduling that considers minimization of makespan and cost. The proposed algorithm is a hybridization of the popular meta-heuristic, Gravitational Search Algorithm (GSA) and equally popular heuristic, Heterogeneous Earliest Finish Time (HEFT) to schedule workflow applications. We introduce a new factor called cost time equivalence to make the bi-objective optimization more realistic. We consider monetary cost ratio (MCR) and schedule length ratio (SLR) as the performance metrics to compare the performance of the proposed algorithm with existing algorithms. With rigorous experiments over different scientific workflows, we show the effectiveness of the proposed algorithm over standard GSA, Hybrid Genetic Algorithm (HGA) and the HEFT. We validate the results by well-known statistical test, Analysis of Variance (ANOVA). In all the cases, simulation results show that the proposed approach outperforms these algorithms. © 2018 Elsevier B.V. All rights reserved.

1. Introduction

Workflow has wide applications in business as well as in the scientific areas such as astronomy, weather forecasting, medical and bio-informatics. Generally, these workflows are vast in size as they consist of a large number of independent and/or depen- dent tasks and thus they demand huge infrastructure for their computation, communication, and storage. Clouds [1] provide such an infrastructure in order to execute the workflow on virtualized

* Corresponding author. E-mail addresses: anubhav.choudhary@live.com (A. Choudhary), indrajeet7830@gmail.com (I. Gupta), vs.make.a.vish@gmail.com (V. Singh), prasantajana@yahoo.com (P.K. Jana). 1 IEEE Senior Member.

0167-739X/© 2018 Elsevier B.V. All rights reserved.

resources which are provisioned dynamically. However, the allo- cation of the resources and the order in which the execution of tasks of a given workflow will be performed, are matters of great importance. This is commonly referred as workflow scheduling problem. In fact, workflow scheduling is an NP-complete problem which has been extensively studied for other paradigms, such as grid and cluster computing. It is noteworthy that if there are n tasks in a workflow and m available virtual machines (VMs), then there exist m n different ways in which the tasks can be mapped to the VM pool. For a large value of n and m, finding an optimal solution by brute force approach is computationally very expensive. Therefore, a meta-heuristic approach can be very effective for solving this problem. However, every meta-heuristic algorithm has its own merits and demerits. Hybridization of such meta-heuristic approaches has evidence to produce better results

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

15

[2,3] and therefore this has become the recent trend of research in cloud computing. Heterogeneous Earliest Finish Time (HEFT) [4] is an efficient heuristic proposed for task scheduling in heterogeneous multipro-

cessor which is also used for cloud computing [57]. This algorithm maps each task arranged in a priority order to a VM for which the earliest finish time is minimum. It should be noted that it is essentially a single objective algorithm which can only optimize the makespan. The Gravitational Search Algorithm (GSA) [8] is

a popular meta-heuristic approach which utilizes the concept of

the law of gravitation to find the near-optimal solution. The algo- rithm starts with a set of random particles, where each particle represents a solution and the mass of each particle is calculated using a fitness function based on the application. Particles with higher fitness value have the higher mass and, hence, it can exert more force to attract other particles towards it. Eventually, all the particles converge towards an optimal point. It is capable of obtaining a global optimum faster than other meta-heuristic algorithms and, hence, has a higher convergence rate. Moreover, it provides better results than the Central Force Optimization (CFO) and Particle Swarm Optimization (PSO) as demonstrated in [8]. In this paper, we propose a meta-heuristic based algorithm for workflow scheduling problem which is a hybrid of the HEFT and the GSA. Specifically, we address the following workflow scheduling problem. Given a workflow consisting of a set of tasks

, t n } with their computational load and precedence

, v m }, our

objective is to map all the tasks to the available VMs so that the entire workflow can be executed in minimum time and minimum computational cost. The proposed algorithm is presented with an efficient agent representation and systematic derivation of fitness function. The algorithm is extensively simulated using the scien- tific workflows of different sizes and is shown to produce better results as compared to other related algorithms such as Hybrid Genetic Algorithm (HGA), GSA and HEFT. We use ANOVA [9], a sta-

tistical test to validate the simulation results. This test determines

if a given result set has significant difference statistically with other

set of results.

Many algorithms have been proposed for workflow scheduling

in cloud computing which are based on meta-heuristic approaches.

For instance, Rodriguez et al. [10] have proposed a PSO based algorithm with the objective of minimizing the execution cost while meeting the deadline constraints. Similarly, HGA has been presented in [3] that also has a single objective, i.e., minimizing the makespan. Many other meta-heuristic approaches have been developed for workflow scheduling, the survey of which can be found in [11]. However, our approach is different from all such approaches and has the following novelty. We consider parameters such as communication bandwidth, output data size of each task, VM boot time, VM shutdown time and performance variability of VMs in order to create a more realistic environment for scheduling. Most of these features are absent in the existing works. Moreover, our approach deals with two objectives in contrast to single objec- tive in many existing algorithms. Hybridization of GSA with HEFT is also novel in the sense that there is full exploitation of the benefits of both these algorithms. Our contribution can be summarized as follows.

T = {t 1 , t 2 ,

constraints, and also given a set of VMs, V = { v 1 , v 2 ,

A hybrid algorithm based on GSA and HEFT to minimize makespan and total computational cost.

An efficient agent representation and derivation of system- atic fitness function.

Introduction of cost time equivalence and a procedure for eradicating the inferior agents.

Demonstration of better performance of the algorithm through extensive simulation and comparison with other heuristic/meta-heuristic based approaches.

Validation of the performance through the statistical test ANOVA.

The rest of the paper is organized as follows. Related works are stated in Section 2. Section 3 explains the application and cloud model. Section 4 describes the terminologies used in the paper and the problem statement. Section 5 presents the proposed work with an illustration. Performance metrics, experimental results, and comparison are discussed in Section 6 followed by Section 7 which concludes the paper.

2. Related works

Many heuristic and meta-heuristic based algorithms have been proposed for workflow scheduling in cloud computing. In this section, we present a short review of some of the works that are relevant to our proposed scheme. HEFT [4] is a popular heuristic which was initially developed for task scheduling in heterogeneous multiprocessor systems. It is well-known that HEFT performs better than many other heuris- tics such as [12,13] for task scheduling. However, it considers the minimization of makespan only. An extension of HEFT called Pareto Optimal Scheduling Heuristic (POSH) was proposed by Su et al. [5] for workflow scheduling in cloud, to minimize makespan and cost of execution. The POSH produces acceptable solution. Nevertheless, the solution is derived from a constricted search space and thus, it may miss on the better solutions. An energy efficient scheduling with deadline constraint for heterogeneous cloud environment was proposed in [14]. In this work, a new VM scheduler is developed which is shown to reduce energy consump- tion in the execution of workflows. They claimed to achieve up to 20% reduction in energy requirement and improvement in the processing capacity by 8%. Fard et al. [15] proposed another heuris- tic called multi-objective list scheduling (MOLS), which provides a general framework for multi-objective static workflow scheduling. It supports four objectives namely, makespan, cost, reliability, and energy. Based on selected objectives it provides the execution plan. Abrishami et al. [16] adopted the Partial Critical Path (PCP) for workflow scheduling and designed two algorithms, a one-phase algorithm which is called IaaS Cloud Partial Critical Paths (IC-PCP) and a two-phase algorithm called IaaS Cloud Partial Critical Paths with Deadline Distribution (IC-PCPD2). Here, homogeneous cloud environment is assumed. Recently, Casas et al. [17] have proposed balanced and file reuse-replication scheduling (BaRRS) algorithm to schedule workflows that are based on two optimization constraints, i.e., makespan and cost. They have also focused on finding an optimal number of VMs required for a given workflow. However, it has large computational overhead. Panda et al. [18] have developed a normalization based task scheduling for a heterogeneous multi- cloud environment. This technique provides a way to schedule tasks over multiple cloud providers. In another work [19], they have proposed a modification of min–min algorithm with uncer- tainty parameter for scheduling tasks in a heterogeneous multi- cloud environment. Gupta et al. [20] have also reported a work- flow scheduling algorithm for the multi-cloud environment. How- ever, this work focuses more on compute intensive workflows for scheduling. Meta-heuristics are well-known techniques to obtain near op- timal solution. For instance, Pandey et al. [21] proposed a PSO based workflow scheduling algorithm for cost optimization. It is designed to consider computational cost and data transmission cost to provide an execution plan such that overall cost is min- imized. However, this approach has been tested only on limited workflow applications. Jena et al. [22] proposed a multi-objective nested Particle Swarm Optimization (TSPSO) algorithm for work- flow scheduling to optimize energy as well as processing time.

16

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

al. / Future Generation Computer Systems 83 (2018) 14–26 Fig. 1. An example of workflow: Nodes

Fig. 1. An example of workflow: Nodes represent the task and the edges represent precedence relation. The numeric value 7 inside the node t 2 is the computational load of the task t 2 and the label 11 is the size of the data generated by t 2 .

However, a crucial scheduling factor, i.e, bandwidth between the VMs is not considered in this work. Moreover, no experiment was reported on large-scale workflows. Many GA based solutions have also been proposed. Garg et al. [23] proposed a hybrid GA driven by linear programming (LP) for cost optimization in the grid computing. This work combines the capabilities of both LP and GA to find a schedule such that it minimizes the combined cost of all users of the grid. Wang et al. [24] proposed a look-ahead genetic algorithm (LAGA) to optimize both reliability and makespan. But this work focused only on the compute intensive workflows and did not consider the communication time, which is a very vital factor for scheduling workflows.

3. Models

This section contains a detailed description about workflow application model and the cloud server model assumed for the development of the proposed algorithm. The important terminolo- gies used throughout the paper are also described in this section.

3.2. Cloud server model

We assume a cloud server which contains set of m VMs, rep-

resented by V = {v 1 , v 2 , v 3 ,

putational power measured in terms of million instructions per second (MIPS). All VMs are fully connected to each other and may reside in one or more physical cloud server. The time required to transfer output data from task t i to t j is described as communica- tion overhead time. Note that combination overhead time is the ratio of output data size of a task t i to the bandwidth between the VMs. If both t i and t j execute on the same VM, then communication overhead is assumed to be zero.

, v m }. Each VM has its own com-

4. Terminologies and problem statement

4.1. Constraints and assumptions

The notations used in the proposed work are given in Table 1. The proposed algorithm considers the following constraints and assumptions similar to the work as presented in [10].

1. We consider performance variance for the VMs to calcu- late effective cpu cycles for the execution of the tasks. The reasons behind this variability is due to heterogeneity and shared nature of infrastructure of the underlying hardware. Based on a survey [27], it is found that an overall perfor- mance variability of Amazon’s EC2 cloud is 24%. For the VM v j , performance variance is represented as deg v j . Thus, using deg v j , execution time of task t i on VM v j can be written as

ET

t

i

v

j

=

Load(t i )

Capacity(v j ) × (1 deg v j )

(1)

is execution time of task t i on VM v j , Load(t i ) is

computational load of task t i and Capacity( v j ) is computa- tional capacity of VM v j .

2. The unit chargeable time τ is considered to calculate the cost of execution. If we utilize the leased VM for time less than τ , then also it is charged for a full time period.

3. An initial boot time is always required when a VM is leased. So, we consider VM boot time for calculation of the makespan. We also consider VM shutdown time as it is required to release the provisioned VM.

4. Each VM is assumed to be roughly connected with the same bandwidth.

v j

where ET

t

i

3.1. Workflow application model

The workflow application is represented by a Direct Acyclic

Graph (DAG) [25,26], G = (T , E) as shown in Fig. 1, where T =

, t n } is the set of tasks and E is the set of edges. An edge

t i t j indicates the precedence relation between the predecessor t i and the successor t j . Thus, task t j cannot start unless task t i is complete. Each task t i is labeled with its computational load in million instructions (MI). Also, the label on each edge t i t j indicates the size of the output data generated by t i . This data is required to start the execution of the task t j . The task without any predecessor is termed as entry task (t entry ) and task without any successor is termed as exit task (t exit ). If there are more than one entry tasks in the workflow, a new pseudo entry task is created with zero computational load as well as no output data. Then all the entry tasks are connected with the pseudo task so formed. Similarly, pseudo exit task can also be created, if required.

{t 1 , t 2 ,

4.2. Problem formulation

For the sake of bi-objective problem formulation, we first de- scribe the two important parameters, i.e., makespan and cost as follows.

Makespan Calculation:

Makespan is also referred as the total execution time for the entire workflow. It includes the boot time of the leased VM, execu- tion time, data transfer time between two VMs and the shutdown time of VM. While computing makespan, it is assumed that a VM cannot execute a task while data is being transferred from it to another VM. The makespan is equal to the summation of VM boot time, maximum of VM-time for all VM and the VM shutdown time. VM-time[ i ] denotes the last timestamp (starting from zero for each workflow) up to which the VM v i executes its assigned task. We need to add VM boot time and VM shutdown time only once because only the boot time of first VM and the shutdown time of last VM will contribute to makespan and the rest will be overlapped

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

Table 1 Notations and definitions.

Notation

Definition

N

Population size.

n

Number of the tasks in a given workflow. ith task.

t i

m

v j

X i

X best

Number of available VMs. jth VM. It represents the ith agent. It represents the best agent known so far.

17

α It determines the weight of makespan and cost for calculation fitness.

β It is the cost makespan equivalence factor. It is a part of SLA and its value depends on the priority and urgency of the application. Fitness value of the ith agent based on its makespan and cost. Mass of the ith agent.

M i

fit i

σ

V cbase

dig v j

γ A small constant, which regulates the declination of gravitational constant.

δ Threshold mass for replacing the inferior agents.

A random variable used in the pricing model. Base price based on slowest VM Performance degradation of VM v j .

with other events. Therefore, makespan can be mathematically formulated as follows.

Makespan = VM-boot-time + max (VM-time[ i] )

m

i =1

+ VM-shutdow n-time

(2)

Cost Calculation:

In our assumed model, a cloud server consists of VMs with varying computational capacity for different types of workload.

is the execution time of the task t i on VM v j as defined in Eq.

(1). Let τ be the unit chargeable time for which charge of execution of any task will be accounted. Let V cbase be the base price charged for the slowest VM, then as per the exponential pricing model [5], the cost of execution of the task t i on VM v j is denoted as, cost(t i , v j ) and is formulated as follows.

ET

v j

t

i

cost(t i , v j ) = σ × ET

v

j

t

i

τ

× V cbase × exp ( CPU cycles of v j

slow est CPU cycle ) (3)

where, σ is a random variable used to generate different combina- tion of VM pricing and capacity. Let B i , j be a boolean variable, such that

B i, j = {

1,

0,

if task t i is assigned to v j otherwise

}

(4)

Therefore, the total cost of execution for a workflow is defined as

Total Cost =

n

m

∑ ∑

i =1 j =1

B i, j × cost(t i , v j )

(5)

However, the values of makespan and cost may have different scales. So, the value of one attribute may overwhelm the value of the other attribute, in the case of agent evaluation. Thus, it is not valid to perform a linear formulation directly using the actual values. One of the institutive approaches is to normalize both makespan and cost in order to scale these values in the same range. Normalization can be done using any of the well-known methods such as min–max normalization, Z-score normalization, etc. However, this solution has a major drawback, i.e, if there is a change in global minimum and maximum values of makespan and cost, it may lead to the variation in the relative rank of agents in two consecutive iterations. To resolve this issue, we use makespan equivalent of total cost (ME cost ) calculated using Eq. (6) instead of total cost.

ME cost = β × Total Cost

Problem Statement:

(6)

The objective of the proposed work is to minimize makespan

in Eqs. (2) and

and makespan equivalent of total cost as given

(6) respectively. Therefore, it is wise to minimize their linear com- bination. The workflow scheduling problem can be formulated as

follows.

Minimize

subject to

z = α × Makespan + (1 α ) × ME cost

(i)

m

j =1

B i , j = 1, i = 1, 2, 3,

(ii) 0 α 1

, n

(7)

The constraint (i) indicates that any task of the workflow can be assigned to one and only one VM and the constraint (ii) limits the range of α which balances makespan and total cost.

5. Proposed work

As our algorithm is a hybrid of GSA and HEFT, we first provide

a brief description about both of these algorithms as follows.

5.1. Overview of heterogeneous earliest finish time (HEFT)

The main idea behind HEFT [4] is to schedule tasks in such a way that earliest finish time (EFT ) is minimized for all the tasks. HEFT

is executed in two phases which is described as follows.

Phase 1:

Phase 2:

Calculating priority of tasks In this phase, the priority of each task is calculated using average execution time and average communi- cation time. The priorities are calculated in a bottom - up approach. The sequence of tasks generated based on higher to lower priority value, satisfying the precedence constraints of the given workflow. The priority of task t i is given by

pri(t i ) = w i +

j succ(t i ) ( c i , j + pri(t j ) )

max

t

(8)

where, w i is the average execution time of task t i on

available VMs and c i, j is the average communication time between task t i and task t j . Mapping tasks to VMs This is the main phase of the algorithm, where actual mapping of task to VM is performed according to the priority of the tasks. The task with the highest priority is scheduled first, by calculating earliest start time (EST ) and EFT on all available VMs.

18

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

al. / Future Generation Computer Systems 83 (2018) 14–26 Fig. 2. Working of GSA. Table 2
al. / Future Generation Computer Systems 83 (2018) 14–26 Fig. 2. Working of GSA. Table 2
al. / Future Generation Computer Systems 83 (2018) 14–26 Fig. 2. Working of GSA. Table 2
al. / Future Generation Computer Systems 83 (2018) 14–26 Fig. 2. Working of GSA. Table 2
al. / Future Generation Computer Systems 83 (2018) 14–26 Fig. 2. Working of GSA. Table 2

Fig. 2. Working of GSA.

Table 2 Example of a Mapping / Agent.

Task

t 1

t 2

t 3

t 4

t 5

t 6

t 7

t 8

VM

1

3

2

1

5

1

3

1

5.3. Agent representation

In the proposed algorithm, a mapping of the tasks of a given workflow to the VMs, is considered as an agent. For workflow having n tasks and cloud server having m number of VMs, the ith agent can be represented as a vector of dimension 1 × n, i.e.,

X i = [ x

1

i

, x

2

i

, x

3

i

,

.

.

. , x

n

i

]

(9)

is

an integer which lies in interval [ 1, m] . Table 2 shows an example of an agent for 8 tasks on 5 VMs. Here, task t 1 is mapped with VM v 1 , i.e., task t 1 will be executed on VM v 1 , while preserving the precedence constraints.

where, x

d

i

represents the VM assigned to the task t d . Note that x

d

i

5.4. Fitness evaluation

For a given agent, we can always calculate its makespan and makespan equivalent of total cost using Eqs. (2) and (6) respec- tively. Let Makespan i and ME cost be the makespan and makespan equivalent of total cost for the ith agent in the population. Now, the fitness value of ith agent can be computed as

i

Fitness(fit i ) =

1

1 + α × Makespan i + (1 α ) × ME

i

cost

(10)

The fitness value calculated using Eq. (10) is an absolute fit- ness value. As its calculation does not require any information about the population, the relative difference in fitness value of two agents will remain same irrespective of the iteration in which they are present. Agents with lower makespan and lower cost, will have higher fitness value and they can be considered as potential candidates for a final solution instead of the agents with higher makespan or higher cost.

5.2. Overview of gravitational search algorithm (GSA)

The GSA is based on the law of gravity [28] introduced by Rashedi et al. [8]. It is a population-based search algorithm where each agent is considered as a particle (so, we use agent and particle interchangeably throughout the paper) and its fitness value is considered as its mass. Each particle represents a solution of a given problem. The main idea is that the heavier particles, i.e., superior solutions do not move much as compared to lighter particles, i.e., inferior solutions. All particles apply force on every other particle. As each particle has some mass, its acceleration and velocity can be calculated using the net force. Using calculated velocity, the new position of the particle can be found. When algorithm terminates, the particle with highest mass provides the near optimal solution. The working of the GSA is depicted in Fig. 2. To use this algorithm in scheduling, we first identify the search space and particle representation. Then we initialize the popu- lation randomly. Now in each iteration, we calculate the fitness value of all particles using the fitness function defined as per the optimization constraints. Based on the best and worst particles identified, we calculate the mass to update the position of each particle. The gravity constant that is used to calculate the velocity and the position, is also updated in each iteration. We repeat all the steps till the algorithm attains a certain termination criterion.

Remark. The problem of workflow scheduling is to minimize the linear combination of the makespan and the makespan equivalent of total cost as described in Section 4.2. As our fitness value is the reciprocal of the same, so the higher fitness value would be desirable.

In the proposed algorithm, we apply min–max normalization in order scale the fitness values of all the agents in population in the range [ 0, 1] to get the mass of each agent. So, mass of the ith agent is given by

Mass(M i ) =

fit i min N ( fit j )

j =1 to

max N ( fit j ) min N ( fit j )

j =1 to

j =1 to

(11)

We use Eq. (11) as the fitness for our proposed work which is used in the simulations.

5.5. Proposed scheduling technique

The proposed HGSA algorithm works in two phases. The first phase focuses only on optimization of makespan. In the second phase, it attempts to optimize the cost while trying to minimize the fitness value which is calculated from both makespan and cost. The result from the first phase guides the particle movement in GSA, which is considered in the second phase of the proposed work. This improves the result as compared to GSA with random initial particles. We use GSA by incorporating HEFT with following steps.

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

19

Step 1: The initial population is seeded with the output of the HEFT algorithm. The HEFT heuristic provides guidance to GSA algorithm that improves the overall performance of the proposed algorithm. It helps to generate better solutions in fewer number of iterations.

The best particle identified from current generation based

on the fitness function is preserved. This is done to ensure that the best agent does not get degraded in the future generation. Step 3: The agents having mass less than threshold mass ( δ ) is removed from the current population as they have very little or no contribution in updating the population. In place of all the removed agents, new agents are added to the population generated with the help of the best agent identified so far. This improves the overall fitness of the population.

Step 2:

5.6. Position update of particle

Let us consider a system of N agents. We define the position of the ith agent as follows.

X i = [ x

1

i

, x

2

i

, x

3

i

,

.

.

. , x

n

i

]

for i = 1, 2, 3 ,

, N

(12)

d i shows the position of ith agent in the dth dimension.

Let M i (k) and G(k) be the mass of ith agent and the gravitational constant respectively in kth iteration. We can define the force acting on the ith agent by jth agent for the kth iteration as follows.

where, x

,j (k) = G(k) × M i (k) × M j (k)

d

F

i

R i , j (k) + ϵ

d

j

× ( x

(k) x

d

i

(k) )

(13)

where, ϵ is a very small constant and R i, j (k) is the Euclidean dis- tance between ith and jth agent in the kth iteration. R i, j (k) is defined as

R i , j (k) = X i (k) · X j (k)) 2

We suppose that the total force that acts on the ith agent in the dth dimension be a randomly weighted sum of the forces exerted in the dth dimension from the other agents. Then,

(14)

d

F

i

(k)

=

N

j= 1, j ̸= i

d

rand j × F j (k)

i

,

(15)

where rand j is a random number that lies in the interval [ 0, 1] . By the law of motion [28], the acceleration of the ith agent in the dth dimension in kth iteration is given by,

a

d

i

(k)

=

d (k)

F

i

M i

(k)

(16)

Furthermore, the next velocity of the agent is considered as a fraction of its current velocity added to its acceleration. Therefore, its position and velocity can be calculated as follows.

v el (k + 1) = rand i × v el (k) + a

i

i

d

d

d

i

(k)

(17)

x (k + 1) = x (k) + v el (k + 1)

i

i

i

defined as

(18)

The gravitational constant G 0 , is initialized in the beginning and reduces as the algorithm proceeds, in order to improve the search

accuracy. G(k) is a function of initial value G 0 and iteration number

k,

d

d

d

G(k) = G 0 × ( k 0

k

)

γ

, γ < 1

(19)

where, γ is a small constant that regulates the reduction in the gravitational constant.

5.7. Algorithm

We start by generating initial population by random mapping of the tasks on VMs in step 1 of the proposed algorithm, followed by seeding of the results of HEFT into the population in step 2. Once the initial population is ready, a set of iterative steps is applied to each agent of the population to get the final result as per step 3 through step 16. The first step of iteration is to calculate the gravitational constant in step 4. Then in step 5, we compute the fitness value of each agent using Eq. (10). Note that Eq. (10) requires the values of both cost and makespan. Algorithms 2 and 3 can be utilized for calculating these values. Based on fitness value, we identify the best and the worst agents in step 6 for calculating the mass of all the agents in step 7. In step 8, we update the position of each agent by calculating the net force, net acceleration and velocity. In remaining steps, we replace the inferior agents by the new agents generated with the help of best agent known so far. The new agent is generated by mapping one of the tasks to a randomly selected VM and rest of the mapping remains same as the best agent.

Algorithm 1 : Proposed Workflow Scheduling Algorithm

Input: Workflow Application (W ) and Cloud Server Specification (CSS)

Output: Task mapping with VMs (M)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

Initialize population X with N randomly generated agents.

Replace one of the agent by mapping generated by HEFT

for k = 1 to MAX_ITERATION do

Compute gravitational constant G(k) using Eq. (19)

N using Eq. (10)

Identify best and worst agent based on calculated fitness value.

Compute fitness value fit i for i = 1, 2, 3,

,

Compute mass M i for i = 1 , 2 , 3,

Update velocity and position of each agent using Eq. (13) to

,

N using Eq. (11)

(18)

for i = 1 to N do

if M i < δ then

Pos = a random integer from interval [ 1, n ]

x

x

i

= x d

Pos

i

d

best for d = 1, 2, 3 ,

, n.

= a random integer from interval [ 1, m]

end if

end for

end for

Find M corresponding to the best agent based on the fit i for i =

1, 2, 3,

return M

,

N

Algorithm 2 : Cost-Calculation

Input: Workflow Application (W ), Mapping (M), Cloud Server Specification (CSS)

Output: Cost value (Cost)

1 Set Cost = 0

2 for each task t i W do

3 Execution_time ET

v M [i ]

t

v M [i ]

i

Unit_used = ET

t

i

τ

4

=

Load(t i )

Capacity( v M [i ] )×(1 deg v M [i ] )

5 Rate_per_unit = σ × V cbase × exp ( CPU cycles of v M [i]

slowest CPU cycle )

6 Cost = Cost + Unit_used × Rate_per_unit

7 end for

8 return Cost

20

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

Algorithm 3 : Makespan-Calculation

Input: Workflow Application (W ), Mapping (M), Cloud Server Specification (CSS)

Output: Makespan value (Makespan)

1 for each v i V do

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

VM_time[ i] = 0

end for

for each task t i W in topological order do

if t i .ParentCount ! = 0 then

Parent_finishtime =

end if

if t i . ChildCount ! = 0 then

max pred(t i ) (Tast_actual_finsh_time[ k])

t k

Transfer_time = 0

for each task t j where t j succ(t i ) and M [i ] ̸= M [ j] do

if output data of t i task is not transferred to v M [ j ] then

Transfertime = Transfertime + t i . Outputdatasize

Bandw idth

end if

end for

end if

Execution_time ET

v M [i]

t

i

=

Load(t i )

Capacity( v M [i] ) ×(1deg v M [i] )

Actual_start_time = max(Parent_finishtime , VM_time[M[i]])

Task_actual_finish_time[ i ] = Actual_start_time + Execution_time + Transfer_time

VM_time[ M [ i ]] = Task_actual_finish_time[ i]

20 end for

21 Makespan = VM_boot_time +

22

return Makespan

Cloud ( VM_time [i ]) + VM_shutdown_time

max

v i

( VM_time [ i ] ) + VM_shutdown_time max v i ∈ (a) Montage workflow of

(a) Montage workflow of 16 tasks.

max v i ∈ (a) Montage workflow of 16 tasks. (b) Cloud environment. Fig. 3. Workflow

(b) Cloud environment.

Fig. 3. Workflow and cloud environment.

5.8.

An illustration

Consider a Montage workflow [29,30] consisting of 16 tasks,

T

= {t 1 , t 2 ,

, t 16 } and a set of 4 VMs, V = {v 1 , v 2 , v 3 , v 4 }

as shown in Fig. 3. We have to schedule the workflow on the given VMs which are fully connected to each other. The output of this illustration is a mapping of given tasks on the VMs, which

Table 3 Parameters used in the illustration.

Parameter

Value

Number of VMs Computational power of all VMs Network bandwidth Boot time and shutdown time of VM Performance variance of VM MAX _ITERATION Population size (N) Gravitational constant (G 0 ) Weight of makespan and cost (α ) Cost time equivalence (β ) Small constant used in gravity (γ ) Mass threshold for inferior agents (δ ) Small constant used in force (ϵ )

4

2.0, 3.5, 4.5 and 5.5 MIPS 1 MBps 0.5 sec

24%

10

100

5

0.5

1

0.3

0.1

10

is optimized in terms of makespan and cost. Table 3 shows the parameters used in this illustration and Table 4 shows the initial population generated, as described in step 1 of our proposed algo- rithm. We now compute a schedule using HEFT for the given workflow. The resultant schedule is then included into the generated popula- tion. To compute HEFT schedule, we need to find the priority using Eq. (8). Now, starting from the highest priority, each task is mapped to a VM in such a way that it minimizes the EFT . Table 5 shows the priority as well as the mapping of the tasks to VMs. The makespan and cost of schedule generated by HEFT are 36.57 sec and $50.31 respectively. Fig. 4 demonstrates the process of including the mapping gen- erated by HEFT into the initial population by replacing the shaded agent. The selection of the agent to be replaced is purely random. This completes the step 2 of the algorithm. Now, the current population contains HEFT generated agent as well as the agents generated randomly. These agents are processed as described in step 3 through step 16, for a certain number of iterations. Table 6 shows the details of the best agent identified

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

Table 4 Initial population of agents.

Agent 1

4

1

1

2

3

4

4

2

3

1

2

3

2

1

1

4

Agent 2

2

3

1

1

1

2

3

4

4

1

1

2

1

3

3

3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Agent N

1

2

1

1

1

3

3

4

4

1

2

2

2

2

1

1

1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of
1 3 3 4 4 1 2 2 2 2 1 1 Fig. 4. Seeding of

Fig. 4. Seeding of HEFT solution into population.

21

Table 5 Task priority and task mapping by HEFT.

Task

Priority

Virtual machine

t

1

107.71

3

t

2

103.37

2

t

3

107.74

4

t 4

103.24

4

t

5

94.38

2

t

6

94.44

3

t

7

90.26

2

t

8

90.20

4

t

9

90.05

3

t

10

90.08

2

t

11

90.06

4

t

12

90.08

4

t

13

77.89

4

t

14

77.57

4

t

15

2.63

4

t

16

0.28

4

in each iteration based on calculated fitness value. The resultant schedule is shown in Table 7 having makespan of 32.64 sec and cost as $47.71.

6. Experimental results and comparison

This section presents the simulation results of the proposed algorithm and its comparison with three workflow scheduling algorithms including the standard GSA based approach, HEFT and HGA as follows. Note that, for the sake of comparison, we convert the single objective of the HGA (minimization of makespan) into bi-objective, keeping all the constraints same as the proposed algorithm.

6.1. Experimental setup

The simulations were carried out using C++ coding environment on an Intel(R) Core(TM) i5-2540M CPU with 2.60 GHz and 4GB RAM running on Linux platform. The specifications of the cloud environment, as well as the parameters used for evaluation of our proposed algorithm, are given in Table 8.

Table 6 Iteration wise specification of the best agent.

Iteration

Makespan

Cost

Fitness

1

36.575226

50.313644

2 . 250 × 10 2 2 . 391 × 10 2 2 . 422 × 10 2 2 . 428 × 10 2 2 . 428 × 10 2 2 . 428 × 10 2 2 . 428 × 10 2 2 . 428 × 10 2 2 . 428 × 10 2 2 . 428 × 10 2

2

32.338966

49.307339

3

32.536972

48.014904

4

32.642235

47.711269

5

32.642235

47.711269

6

32.642235

47.711269

7

32.642235

47.711269

8

32.642235

47.711269

9

32.642235

47.711269

10

32.642235

47.711269

6.2. Performance metrics

We normalize the makespan and monetary cost similar to that in [5] and call them the schedule length ratio (SLR) and monetary cost ratio (MCR) of tasks, as follows.

SLR = Makespan

(20)

t i CP

m

min { ET

j

=1

v

j

t

i

}

MCR =

Total Cost

t i CP

m

min

j =1

{ cost(t i , v j ) }

(21)

The denominator is the summation of the minimum execution time and monetary cost of the tasks on the critical path (CP) without communication cost. For a given task graph, an algorithm that produces a scheduling plan with lower SLR and lower MCR value is more effective. We also calculate the normalized fitness value for easy compar- ison and visualization of the overall quality of the results. We use max-normalization to normalize the absolute fitness value as cal- culated using Eq. (10). After applying normalization, the maximum value is mapped to one and the rest of the values lie in the interval (0,1]. Mathematically, max-normalization is defined as

ˆx i

=

x i

max N (x j )

j =1 to

(22)

where, ˆx i is the normalized value for x i .

22

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

Table 7 Resultant schedule of montage with 16 tasks.

Task

t 1

t 2

t 3

t 4

t 5

t 6

t 7

t 8

t 9

t 10

t 11

t 12

t 13

t 14

t 15

t 16

VM

3

1

4

4

2

3

1

1

3

2

4

4

4

4

4

4

t 13 t 14 t 15 t 16 VM 3 1 4 4 2 3 1

Fig. 5. (a) CyberShake (b) Epigenomics (c) Inspiral (d) Montage (e) SIPHT. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

6.3. Dataset

The proposed algorithm is evaluated on various scientific work- flows as considered by Bharathi et al. [29] and Juve, et al. [30]. These workflows are synthesized using the generator program provided by Pegasus project [31]. It uses information gathered from actual execution of scientific workflows to generate a syn- thetic workflow. This in turn, is a near approximation of a real workflow. We used CyberShake (IO and network-intensive), Epige- nomics (both compute-intensive and network-intensive), Inspiral (compute-intensive), Montage (network-intensive) and SIPHT (IO intensive) in the simulation. For all the types of workflow, we di- vide them into three categories based on the number of constituent tasks as shown in Table 9. Each workflow has some characteristic features, which play pivotal roles in process of scheduling. The detailed characterization of each workflow can be found in [29]. Topology of tasks in a given workflow is also an major criterion for scheduling. These workflows have variety of topological features such as pipeline (yellow task nodes), data aggregation (red task nodes) and data partitioning (green task nodes) as shown in Fig. 5.

6.4. Result analysis and performance evaluation

In this subsection, we evaluate the performance of our proposed algorithm against the HEFT, standard GSA and the HGA with re- spect to makespan and monetary cost as follows. The MCR, SLR and the normalized fitness are used as the performance metrics for the comparative analysis which are defined in Section 6.2. Note that the lower values of the SLR and MCR are desirable as they indicate lower makespan and cost, respectively. However, the higher value of the normalized fitness is preferred. We present the results obtained by using the same machine configuration, same constraints, and the same set of workflow applications (of various sizes and types). Figs. 69 show the bar charts for the MCR, SLR and the normalized fitness value, so as to compare between the HGSA, HGA, standard GSA, and the HEFT. From the figures, we can observe that the MCR of the proposed HGSA is better as compared to the HEFT, HGA and the GSA for all the workflow categories, such as small, medium and large. Thus, it is visible that the performance of the proposed HGSA is better than others in terms of MCR for any aforementioned workflows. We also observe that the value of the SLR obtained by the proposed algorithm is much better than the GSA and the HGA. However, it is somewhat lesser to that of given by the HEFT. This is due to the fact

Table 8 Parameters used during experiment.

Parameter

Value

Network bandwidth Boot time and shutdown time of VM Performance variance of VM MAX _ITERATION Population size (N) Gravitational constant (G 0 ) Weight of makespan (α ) Cost time equivalence (β ) Small constant used in gravity (γ ) Mass threshold for inferior agents (δ ) Small constant used in force (ϵ )

1 MBps

0.5 sec

24%

200

500

5

0.5

50

0.3

0.1

10

that it is a single objective scheduling algorithm which focuses on makespan only. To calculate the normalized fitness value, we used two input parameters, such as α and β as shown in Table 1. The normalized fitness value shows the overall quality as per the user requirement. From Fig. 9(a)–(c), we observe that the proposed HGSA algorithm performs better than the HEFT, HGA and the GSA. We get the better results using the HGSA even for the case where SLR is poor with respect to the HEFT as the difference in cost is enough to compensate for the difference in makespan.

6.5. Analysis of variance (ANOVA)

We also conducted hypothesis testing using ANOVA [9]. It is a statistical method which compares the mean of two or more groups to determine whether there is a significant difference among the groups or not. This test has a null hypothesis (H 0 ) and an alternate hypothesis (H 1 ), defined as

H 0 : µ 1 = µ 2 = µ 3 =

= µ n

(23)

H 1 :

Means are not equal

(24)

During the test, if F statistical < F critical then we fail to reject the null hypothesis and all groups have the same mean. But if F statistical > F critical then we reject the null hypothesis and accept the alternate hypothesis. If the alternative hypothesis is accepted, we can easily conclude that one of the group is having significant differences with the others.

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

23

(a) Monetary cost ratio. (b) Schedule length ratio.
(a) Monetary cost ratio.
(b) Schedule length ratio.

Fig. 6. Results for small sized workflows.

length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 6. Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length

(a) Monetary cost ratio.

Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results
Results for small sized workflows. (a) Monetary cost ratio. (b) Schedule length ratio. Fig. 7. Results

(b) Schedule length ratio.

Fig. 7. Results for medium sized workflows.

length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length
length ratio. Fig. 7. Results for medium sized workflows. (a) Monetary cost ratio. (b) Schedule length

(a) Monetary cost ratio.

(b) Schedule length ratio.

Fig. 8. Results for large sized workflows.

24

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.
al. / Future Generation Computer Systems 83 (2018) 14–26 (a) Normalized fitness for small sized workflows.

(a) Normalized fitness for small sized workflows.

14–26 (a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows. (c)
14–26 (a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows. (c)
14–26 (a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows. (c)
14–26 (a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows. (c)
14–26 (a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows. (c)
14–26 (a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows. (c)
14–26 (a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows. (c)
14–26 (a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows. (c)
14–26 (a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows. (c)
14–26 (a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows. (c)

(b) Normalized fitness for medium sized workflows.

(b) Normalized fitness for medium sized workflows. (c) Normalized fitness value for large sized workflows. Fig.
(b) Normalized fitness for medium sized workflows. (c) Normalized fitness value for large sized workflows. Fig.
(b) Normalized fitness for medium sized workflows. (c) Normalized fitness value for large sized workflows. Fig.
(b) Normalized fitness for medium sized workflows. (c) Normalized fitness value for large sized workflows. Fig.
(b) Normalized fitness for medium sized workflows. (c) Normalized fitness value for large sized workflows. Fig.
(b) Normalized fitness for medium sized workflows. (c) Normalized fitness value for large sized workflows. Fig.
(b) Normalized fitness for medium sized workflows. (c) Normalized fitness value for large sized workflows. Fig.
(b) Normalized fitness for medium sized workflows. (c) Normalized fitness value for large sized workflows. Fig.
(b) Normalized fitness for medium sized workflows. (c) Normalized fitness value for large sized workflows. Fig.

(c) Normalized fitness value for large sized workflows.

Fig. 9. Comparison of normalized fitness.

Table 9 Category of the workflow based on the number of tasks.

Number of tasks

Category

24 to 60

Small

100

to 400

Medium

800

to 2000

Large

The test was performed to compare the standard GSA, hybrid

GA and the hybrid GSA. In order to do this experiment, all three

algorithms were executed 10 times for each of the five scientific

Table 10 ANOVA using CyberShake workflow of 2000 tasks.

workflows of various sizes. Tables 1014 shows the results for each workflows of 2000 tasks. As we can see that, for all workflows we have F statistical > F critical. Thus, we can reject the null hypothesis. Therefore, means of all the groups are significantly different. This implies that the performance of HGSA is better and consistent than HGA and GSA.

7. Conclusion

In this paper, we have presented a hybrid gravitational search algorithm for scheduling workflows, with the basic objective of re- ducing the makespan as well as the cost of execution. The efficiency

(a) Summary of input

Group

Count

Sum

Average

 

Variance

HGSA

10

3.51E

05

3.51E

06

3.44E

16

GSA

10

2.96E

05

2.96E

06

5.10E

17

HGA

10

3.17E 05

 

3.17E 06

 

2.01E 16

(b)

ANOVA test result

Source of variation

SS

df

MS

F stat

P-value

F crit

Between groups

1.56E

12

2

7.81E

13

3925.06

5.278E34

3.35

Within groups

5.37E

15

27

1.99E

16

Total

1.57E

12

29

A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26

Table 11 ANOVA using Epigenomics workflow of 2000 tasks.

(a) Summary of input

Group

Count

 

Sum

Average

 

Variance

HGSA

10

3.60E

07

3.60E

08

5.90E

20

GSA

10

3.00E

07

3.00E

08

1.30E

20

HGA

10

3.10E 07

 

3.10E 08

 

5.90E 20

(b)

ANOVA test result

Source of variation

 

SS

df

MS

F stat

P-value

F crit

Between groups

 

1.60E

16

2

7.80E

17

1788.54

 

2.00E29

3.35

Within groups

1.20E

18

27

4.40E

20

Total

1.60E 16

29

Table 12 ANOVA using Inspiral workflow of 2000 tasks.

 

(a)

Summary of input

Group

Count

 

Sum

Average

 

Variance

HGSA

10

3.90E

06

3.90E

07

3.70E

18

GSA

10

3.50E

06

3.50E

07

4.50E

19

HGA

10

3.60E 06

 

3.60E 07

 

2.30E 18

(b)

ANOVA test result

Source of variation

 

SS

df

MS

F stat

P-value

F crit

Between groups

 

9.60E

15

2

4.80E

15

2238.84

 

1.00E30

3.35

Within groups

5.80E

17

27

2.10E

18

Total

9.70E 15

29

Table 13 ANOVA using Montage workflow of 2000 tasks.

 

(a)

Summary of input

Group

Count

 

Sum

Average

 

Variance

HGSA

10

8.30E

05