Aic 15220

Data-Driven Mathematical Modeling and Global Optimization
Framework for Entire Petrochemical Planning Operations

Jie Li*
Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, P. R. China
Artie McFerrin Dept. of Chemical Engineering, Texas A&M University, College Station, TX 77843
Texas A&M Energy Institute, Texas A&M University, College Station, TX 77843
Xin Xiao
Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, P. R. China
Fani Boukouvala* and Christodoulos A. Floudas

Artie McFerrin Dept. of Chemical Engineering, Texas A&M University, College Station, TX 77843
Texas A&M Energy Institute, Texas A&M University, College Station, TX 77843
Baoguo Zhao, Guangming Du, Xin Su, and Hongwei Liu

PetroChina Dushanzi Petrochemical Company, Xinjiang 833600, P. R. China
DOI 10.1002/aic.15220
Published online March 27, 2016 in Wiley Online Library (wileyonlinelibrary.com)
In this work we develop a novel modeling and global optimization-based planning formulation, which predicts product
yields and properties for all of the production units within a highly integrated refinery-petrochemical complex. Distillation is modeled using swing-cut theory, while data-based nonlinear models are developed for other processing units.
The parameters of the postulated models are globally optimized based on a large data set of daily production. Property
indices in blending units are linearly additive and they are calculated on a weight or volume basis. Binary variables
are introduced to denote unit and operation modes selection. The planning model is a large-scale non-convex mixed
integer nonlinear optimization model, which is solved to e-global optimality. Computational results for multiple case
studies indicate that we achieve a significant profit increase (3765%) using the proposed data-driven global optimization framework. Finally, a user-friendly interface is presented which enables automated updating of demand, specificaC 2016 American Institute of Chemical Engineers AIChE J, 62: 30203040, 2016
tion, and cost parameters. V
Keywords: refinery, petrochemical, planning, Big-Data, global optimization
Introduction
In the last 20 years, the petrochemical industry has succeeded by creating markets and supplying them with suitable
products used to create goods such as plastics, cosmetics,
lubricants, and paints. Petrochemical production begins in a
refinery that separates crude oils mainly into lighter components such as naphtha, light naphtha, top oil, and liquid fuels
including gasoline, diesel, and jet fuel. The lighter components
are further processed into various petrochemicals such as ethylene, propylene, butadiene, benzene, toluene, xylol, and some
other high-valued products via cracking, butadiene extraction,
hydrotreating, etherification, and polymerization processes.
Additional Supporting Information may be found in the online version of this
article.
*Contributed equally to this work.
Current address of J. Li: School of Chemical Engineering and Analytical Science, The University of Manchester, Manchester.
Correspondence concerning this article should be addressed to X. Xiao and
C. A. Floudas at xxiao@ipe.ac.cn and floudas@tamu.edu.
C 2016 American Institute of Chemical Engineers
V
3020
Nowadays, tight competition, environmental regulations, and

lower profit margins drive the petrochemical industry to
improve planning operations. Optimal planning and scheduling of various operations in a petrochemical plant through
mathematical modeling and global optimization offers significant opportunities for saving costs, increasing profit margins,
and improving energy efficiency and demand satisfaction.
The entire set of petrochemical operations consists of refinery and chemical production operations. The refinery operations can be divided into three components including crude oil
blending and processing, production unit operations, and product blending and distribution.17 Chemical plant operations
include extraction, etherification, cracking, isomerization, and
separation units, which increase the complexity and nonlinearity of the planning problem when coupled with the refinery
processes. The entire set of petrochemical planning operations
involve crude blending and distillation, production processing,
production mode selection, flow connections between production units and plants, and pooling and blending operations to
satisfy quality requirements of production units, intermediates,
September 2016 Vol. 62, No. 9
AIChE Journal
and final products. Mathematical modeling of processing, production, pooling, and blending operations may introduce bilinear, quadratic, polynomial, signomial, exponential, and higher
order terms. Conversely, the selection of parallel production
units and production modes introduces binary variables.
Hence, the overall problem is a large-scale non-convex mixed
integer nonlinear optimization (MINLP) problem.
The refinery planning problem has received considerable
attention since the introduction of linear programming in
1950s.810 Research focused on developing different models
and algorithms to solve large-scale industrial problems, leading to commercial software such as RPMS (Refinery and
Petrochemical Modeling System),11 PIMS (Process Industry
Modeling System),12 GRTMPS (Haverly Systems).13 The
commercially available software can be extended to model
and optimize integrated petrochemical processes, however,
inaccuracy caused by non-rigorous linear models and approximate algorithms may reduce the overall profitability or sacrifice product quality.
Nonlinear models and specialized algorithms have also
been proposed for refinery planning problems.5,1424 For
instance, Pinto and Moro15 developed a nonlinear planning
model for production planning which allows for the implementation of nonlinear process models and blending relations.
Pinto et al.5 proposed a planning and scheduling model for
refinery operations. They presented a formulation based on
discretization of time for production and distribution scheduling and their model included features such as sequence
dependent transition cost of products within an oil pipeline. Li
et al.17 presented a refinery planning model that utilizes simplified empirical nonlinear process models with an considerations for crude characteristics, product yields and qualities.
Alhajri et al.18 developed a nonlinear model to address the
refinery planning problem. Alattas et al.21 developed a fractionation index based nonlinear model for crude distillation units
(CDUs) and integrated it into the linear refinery planning
model, solving it with nonlinear programming (NLP) solvers
without guaranteeing global optimality. Mouret et al.25
addressed the problem of the integration of refinery planning
and crude-oil scheduling. Menezes et al.23 used the improved
swing cut approach26 to develop a single-period nonlinear
optimization model for an oil-refinery production planning
model to predict the national overall capacity for different oil
refinery units in Brazil considering four future market scenarios. In all of the literature dealing with nonlinear models thus
far, no global optimality is guaranteed. A comprehensive
review on refinery planning can be found in Shah et al.6
Another level of complexity which has not been dealt with
sufficiently in the literature insofar is the integration and interaction between refineries and chemical plants. Refineries are
typically integrated with chemical plants through exchange of
intermediate streams both from refinery to chemicals and from
chemicals to refinery. Consequently, the operation of the former highly affects the operation of the latter and vice
versa,2732 and hence planning of the refinery operations and
chemical operations separately, will not lead to the globally
operating points of the entire integrated petrochemical complex. In Al-Qahtani et al.27,28 the formulation solved for the
coordination between multisite refineries and chemical plants
is a MINLP problem; in Swaty32 a refinery and ethylene plant
are integrated through exchange of intermediate streams using
linear programming; in Gonzalo et al.30 the advantages of the
integration between a refinery and a single hydrocracking unit
AIChE Journal
are presented; and lastly the RPMS software is used in Baulin

et al.29 for a special case of planning and scheduling of a
refinery-petrochemical complex. To the best of our knowledge, very few theoretical developments and computational
results have been reported for the global optimization of integrated refinery and petrochemical planning problems using
MINLP formulations.
Despite the significant theoretical developments regarding
the detailed modeling of the units present in refinery and petrochemical complexes, such models cannot be used for
enterprise-wide operations due to computational expense. At
the same time, recent technological developments have enabled
the industry to collect and store vast amounts of data from their
processes.33 Data-driven modeling has been used in the industry and the literature- both as a black-box modeling approach
and coupled with theoretical knowledge as hybrid models- to
describe processes and correlations which have not been yet
theoretically explained, or to serve as inexpensive surrogates to
existing expensive models.17,22 Given the existence of vast
amounts of data and the need for inexpensive nonlinear models
which can relate relevant inputs to relevant outputs to describe
planning operations, the role of data-driven modeling can be
extremely valuable. However, when following a data-driven
approach one must be aware of the limitations and challenges,
such as poor extrapolating capabilities, risk of overfitting, and
errors in experimental data, all of which must be taken into
account to develop useful and accurate models. Moreover, a
data-driven approach must not be used blindly, since if a theoretical model exists in a form which can be used within a planning formulation, this should be used. Lastly, in this work we
have not used any multivariate analysis techniques, which have
been successfully used in the literature and in industrial practice for data-driven process analysis, monitoring, control, and
multi-mode modeling.3437 In this work, we aim to develop
inputoutput correlations between independent sets of input
variables and the output variables of interest, which can be
explicitly incorporated within a MINLP formulation.
In this article, we propose the data-driven model development and integration of nonlinear models to predict product
yields and properties in production units including a CDU, a
vacuum distillation unit, hydrocracking units, catalytic cracking units, ethylene-cracking units, and other processing units
present in a large refinery-petrochemical complex. The yield
and property prediction models for the crude distillation and
vacuum distillation units are developed using swing-cut theory
based on crude assay data. Empirical nonlinear models are
developed for other processing units, including bilinear, and
quadratic terms. Moreover, property indices in blending units
are linearly additive and calculated on weight or volume basis,
which introduce bilinear and trilinear terms. We also introduce
binary variables to denote different operation modes for several production units, or parallel production units. The entire
planning model is a non-convex MINLP model, which is
solved to e-global optimality using the commercial global
optimization solver ANTIGONE.3841 Finally, a user-friendly
platform is developed to allow the user to modify the planning
model when new data is available, or parameters related to
pricing, product demands, specifications, cost parameters, and
many more. Several large-scale industrial examples are solved
to illustrate the efficiency of our proposed model and global
optimization approach.
The structure of the article is as follows. The overall planning problem is introduced in section Problem Definition.
Published on behalf of the AIChE
DOI 10.1002/aic
3021
Figure 1. A schematic diagram of a refinery-petrochemical complex with multiple crude oil sources.
[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]
An overview of the overall framework for data-analysis, model

development and global optimization is described in section
Overview of Global Optimization Approach. Section
Industrial Data describes the type and amount of industrial data
which is used to perform all of the analysis, followed by section
Yield and Property Prediction Modeling which describes the
models which were developed for yield and property prediction.
Section Overall Petrochemical Planning Model introduces all
of the additional constraints which are required to formulate the
entire petrochemical planning model. The computational platform
developed to facilitate the implementation and usage of the planning model is described in section Computational Platform,
while computational results for three case studies are presented in
section Results. Section Conclusions presents a summary of
the work and its main outcomes.
Problem Definition
Figure 1 illustrates a representation of a refinerypetrochemical complex which we model in this work; a more
detailed representation of the refinery is shown in Figure 2a
and a representation of the chemical plants is shown in Figure
2b. Figures 2a, b show the main connections and processes
present in the complex studied in this work. Typically there
are C crude oils (c 2 f1; 2; . . . Cg) which are processed in a
refinery, where each one has different availability, consistency, properties, and price depending on its source. In this
work, a maximum of three different types of crudes are used.
In the entire petrochemical plant, there are total U
(u 2 f1; 2; . . . U g) units including processing units, pools, and
blenders. The processing units, which are denoted as UPRO
include CDUs, delayed coking units, hydrotreating units,
hydrocracking units, catalytic cracking units, reforming units,
extraction units, hydrogen generation units, sulfur production
3022
DOI 10.1002/aic
units, ethylene cracking units, polyethylene production units,

polypropylene production units, methanol production units,
ethylene glycol production units, and many other processing
units for the production and refinement of specialty chemicals.
The units are highly connected within the refinery and chemical plants through material streams. The blenders which are
denoted as UBLD are gasoline, diesel, and kerosene blenders.
The pools are also included into UBLD. In the entire petrochemical flowsheet, there is a total of S (s 2 f1; 2; . . . Sg)
streams. Each unit has a distinct set of inlet and outlet streams.
out
We use sets Sin
u and Su to denote all inlet streams and outlet
streams for each unit u, respectively. In other words,
out
Sin
u 5 {s|stream s that is an input to unit u} and Su 5 {s|stream
s that is an output from unit u}. In a refinery, we have to consider E (e 2 f1; 2; . . . Eg) properties such as specific gravity
(SPG), sulfur (SUL), flash point (FLP), carbon residue (CCR),
research octane number (RON), cetane number (CET), and
nitrogen content (N2), and others depending on the specifications of the specific company and market. Each inlet and outlet
stream of every unit is characterized by a different subset of
properties which are significant for the specific unit and
out
stream. We use set Ein
U s; u and EU s; uto denote properties
that are considered in the inlet and outlet streams s of unit u.
Moreover, several processing units may be operated in M
(m 2 f1; 2; . . . Mg) production modes. Each unit included in
the set Umod has Mu production modes. Different units are
connected to each other by material streams. We define set
UC to denote those unit connections. In other words,
UC 5 {(s,u,s0 ,u0 )|stream s from unit u goes to stream s0 to unit
u0 }. The final products denoted as SFP, are sent to the sales
department, which is denoted as USAL. All final products must
satisfy the minimum [DmdSL s] and maximum [DmdSU s]
demand requirements of customers, while their prices [Price(s)] can vary in different time periods. Besides meeting the
AIChE Journal
Figure 2. (a) A simplified refinery process in a real petrochemical plant. (b) A simplified chemical process in a real
petrochemical plant.
demand requirements, the final products including gasoline,

diesel, and kerosene must also meet the property specifications. The planning horizon is denoted as H. Finally, it should
be noted that there are multiple catalytic cracking units and
hydrocracking units which can be operated in parallel to produce the same products. We use sets UPAR to denote those parallel units.
The operation of the entire petrochemical plant involves
decisions such as (a) unit selection among parallel processing
AIChE Journal
units, (b) production mode selection for multimode units, (c)

magnitude of inlet and outlet stream flow rates for each unit,
(d) blending recipe determination, and (e) final product production, given the amount and properties of crude materials,
prices, demands and specifications for the planning horizon.
With this, the entire petrochemical planning problem
addressed in this article can be stated as:
Given:
1. a planning horizon [0, H];
DOI 10.1002/aic
3023
Figure 3. Overview of the proposed data-driven global optimization framework.

2. amount of crudes, and their crude assay data;

3. units, their minimum and maximum capacities, suitable
production modes, inlet and outlet streams, limits on
inlet and outlet stream flow rates, properties, and
yields;
4. products, limits on their property specifications, and
demands; and
5. final product prices, raw material costs, and operational
costs for production units.
Determine:
1. the amounts of intermediates that are processed;
2. the units that are used for processing and their production modes;
3. the streams that are inputs to each processing unit, flow
rates, and properties;
4. the products that each processing unit should produce,
and the corresponding yields, flow rates, and properties;
and
5. the products that each blender should process and the
blending recipes.
The main assumptions are:
1. the petrochemical plant structure is fixed;
2. all parameters are deterministic;
3. mixing in each pooling or blending unit is perfect and
instantaneous;
4. steady-state operations are considered; and
5. unit operation cost is linearly correlated with inlet and
outlet flow rates.
The objective is to maximize the total profit which is calculated as the revenue from product sales minus cost from raw
materials and unit operations. Next, we introduce the detailed
MINLP formulation for the above problem.
Overview of Global Optimization Approach

Figure 3 illustrates the overview of the proposed datadriven modeling and global optimization framework for the
formulation of the planning problem for the entire petrochemi3024
DOI 10.1002/aic
cal complex. In the framework, the first step is to collect data

from the refinery and chemical plants. Then, data preprocessing is carried out which includes filtering, grouping, outlier
analysis, and interpolation of missing data, to increase the reliability and accuracy of the industrial data. The processed data
sets are then used to develop processing unit operation models
which include yield and property prediction equations. Specifically, a nonlinear parametric form is postulated for each yield
and property correlation of all of the outlet streams of each
unit u. Then all of the yield and property parametric equations
related to u are grouped together along with mass balance constraints and operational bounds to form the nonlinear parameter estimation problem for unit u. Each of the parameter
estimation problems for each unit are NLP problems which
are solved to global optimality to identify the optimal parameters which best describe the industrial data. This procedure is
followed for all units in set Upro. Subsequently, the MINLP
planning formulation is formed through integration of all of
the developed unit models with additional models for the
blending and pooling units as well as mass balance constraints,
capacity constraints, operational constraints, demand constraints, connectivity equations, utility constraints, and the
objective function. The objective of the overall problem is the
maximization of the total profit from revenues of product sales
minus the cost of raw materials and the cost of operation of all
of the units in the plant. Finally, we use the global optimization solver ANTIGONE3841 to solve the large MINLP formulation to e-global optimality.
Industrial Data
This work is enabled by the availability of industrial data
which is used to develop hybrid or purely data-based input
output models for all of the processing units. Specifically, the
data which was provided for this work consists of daily average flow rate and property measurements for the entire set of
streams in all units of operation of the integrated refinerypetrochemical complex. In addition, crude amounts and crude
AIChE Journal
Table 1. List of Properties and Relevance

Property
Abbreviation
Addition basis
Specific gravity
Sulfur
Research octane number
Motor octane number
Nitrogen
Fe Content
Carbon residue
Freeze point
Flash point
Smoke point
Reid vapor pressure
Aromatics
Olefins
Viscosity at 208C, 1008C
Cetane number
Cold filter plugging point
SPG
SUL
RON
MON
N2
Fe
CCR
FRP
FLP
SMP
RVP
ARO
OLE
V20,V100
CET
CFPP
Volume
Weight
Volume
Volume
Weight
Weight
Weight
Volume
Weight
Volume
Volume
Volume
Volume
Volume
Volume
Volume
assay data is provided for the three crude materials used. Handling, analysis and use of large data-set poses great challenges,
but also creates opportunities for efficient decision making in
todays Big-Data era.33 First, we will describe the raw data
which was provided and then we will discuss the challenges,
advantages and limitations of a data-based approach.
The plant which is studied in this work uses three type of
crudes (C1, C2, C3) for which the daily processed amount and
crude assay data analysis is provided. The crude assay analysis
provides information such as specific gravity, sulfur content,
nitrogen content, flash point, viscosity, freezing point, and
Reid vapor pressure (Table 1) for different crudes and boiling
ranges of distillates. The flow rate and property data of the
crudes are specifically important for modeling the CDU unit
using a cutting-point temperature method described in the next
section.
The big data set provided also consists of daily measurements of flow rates of each input and output stream for every
unit u present in the entire petrochemical complex. The flow
rate data provided depicts in detail, not only the amount that is
produced on a daily basis for each unit, but also the fractions
of each stream that are sent to different processing units,
blending units, pooling units, inventory, and sales on a daily
basis. The flow rate data set does not contain missing points,
since a measurement is provided for each day for a total number of N days within the whole period. Daily averages are used
as data points, which in many cases comprise of measurements
taken at different time points during the day. These averages
alleviate the effect of the time delay which may be caused for
a material to flow from crude feeding to downstream processing and blending. In addition, our final goal is the optimization
of the planning operations over a monthly time horizon, thus
such delays are assumed to be negligible. First, we collect and
group the flow rate data into submatrices,each corresponding
out
to one unit u: FdataU 5 N3 jSin
U j1jSU j , where j j represents the cardinality of the corresponding set. Moreover, we
are provided with a large set of property data for certain input
and output streams within each unit of the petrochemical plant.
Preprocessing of the property data set is more challenging
since measurements are not provided for every day of operation. This creates missing data gaps and inconsistencies
between connections of different units which we need to overcome to solve the necessary parameter estimation problems.
The first step is to group all of the input properties and output
properties corresponding to all of the input and output streams
AIChE Journal
Relevance to (important for)

Crudes, all products
Gasoline
Gasoline
Crudes, residue streams, vacuum gas oil
Crudes
Kerosine, diesel
Kerosine
Crudes, gasoline, kerosene
Naphtha range boiling below 2008C
Naphtha range boiling below 2008C
Diesel
Diesel

of a unit u into a matrix: EdataU 5 N3 jEin
U s; uj1
jEout
U s; uj, which contains the same number of rows as
matrix FdataU .
Due to measurement errors caused by uncertain tank measurements, inconsistencies between measurements at different
locations in the plant, and human error, there is a need for preprocessing of the flow rate and property data prior to its use
for parameter estimation. First, unreliable measurements were
flagged as erroneous by the company and were immediately
removed from the data set. Following, we also perform additional processing of the data on a per-unit basis. First, we combine the two data matrices FdataU and EdataU into one
matrix TdataU 5 FdataU EdataU (Figure 4). Then, we calculate the daily mass balance by subtracting the total sum of
flow rates of the outlet streams from the total sum of flow rates
of the inlet streams. If this amount exceeds a small percentage
threshold which is accepted as loss for each unit, then the
entire row of matrix TdataU is removed. After the above filtering steps, we further removed measurements beyond 63 sd
from the average of the data set. Due to the amount of the data
available for each stream, we assume that the calculation of
average properties and yields is accurate. Also, the occurrence
Figure 4. (a) Data matrix for each unit operation with

flow rate and property data with missing elements. (b) Schematic of outlier removal and
missing data imputation.
[Color figure can be viewed in the online issue, which is
available at wileyonlinelibrary.com.]
DOI 10.1002/aic
3025
of significant outliers was rare (i.e., less than 0.1% of the data
set, if any), thus we believe that this preprocessing was sufficient to alleviate any undesirable effects to the fitted parameters of the models. Any removed data measurements are
treated as missing elements, without removal of the entire row
of the data matrix. Finally, the TdataU matrix contains several
missing elements, which are imputed using nearest-neighbor
imputation (knnimpute function in Matlab) to enable the procedure of parameter estimation. This approach uses the nearest
measurement, based on the Euclidean distance of points
among the data set. Using this interpolation approach we
assume that missing property measurements are equal to existing measurements which are the most similar to the missing
element row-wise. In other words, missing property measurements are assumed to be equal to existing measurements from
days of operation which are the most similar based on both
property and flow rate information. This assumption has
shown to be very successful given the amount of data which
we have available for over a 2-year operation, but also due to
the realization that a plant measurement was typically not
made during days for which the operation is stable and consistent with prior days. Consequently, it is highly probable that
a very similar operation point exists in the data set to impute a
reasonable property measurement.
Filtering, sorting, and preprocessing of the data requires a
significant amount of time and effort, however, it is crucial to
ensure that the quality and reliability of the data is improved.
The processed data contains a reasonable amount of noise due
to uncertainty and variability in data-collection by different
operators and equipment throughout the plant. However, due
to the fact that we have in our possession a significant amount
of daily data, we will later show that the trends and correlations between flow rates, properties, and yields can be captured by postulating empirical nonlinear models and
performing parameter estimation using the processed data. In
fact, the fitted parametric models have the ability to smoothout the noise in the data, while capturing the true trends of the
data, when the parameter estimation is performed correctly
and overfitting is avoided.
There are several limitations of the data-based approach
which is followed in this work which should be mentioned
here. First, the data-based models have a good predictive ability which is bounded by the upper and lower bounds of the
experimental data which was used for their development. Consequently, if there is a significant change in the operation, the
flowsheet superstructure, or the material properties of the
investigated plant, then the model will not be able to capture
the performance of the plant unless new data is collected and
used to update the models. Second, due to certain missing
information regarding properties and operating conditions of
the processes, the data-based models are based only on the
available inputs. However, the aim of this work is to develop a
fast decision-making tool for the petrochemical plant which
captures the nonlinear behavior of the plant based on the information which is available to the decision makers at all times,
and which will be used for decisions at the planning level of
the plant. Due to the latter, we have found that a data-driven
approach is very powerful for petrochemical planning operations. Lastly, we must mention that all of the fitted models
were validated against data that was provided for a recent
monthly operation of the plant, which was not used in the
training data set. This validation procedure was performed in
collaboration with the operators of the plant. Depending on
3026
DOI 10.1002/aic
Figure 5. A schematic diagram of a crude distillation

unit
(http://www.cs.mcgill.ca/~rwest/wikispeedia/wpcd/wp/o/Oil_refinery.htm).
the importance of the property or yield which was predicted

by the models, and the amount of noise that the measurements
were expected to have, the acceptable prediction error R2 had
a range between 0.80 and 0.95.
Yield and Property Prediction Modeling

All units in the entire petrochemical flowsheet can be classified as CDUs, secondary processing units, pools, and blenders.
We first develop mathematical models for CDUs and secondary processing units, and then describe the models used for
pools and blenders. The secondary processing units include
delayed coking units, hydrotreating units, hydrocracking units,
catalytic cracking units, reforming units, ethylene cracking
units, and a set of units for production and refining of specialty
chemicals. We use swing-cut techniques to develop yield and
property prediction models for CDUs that include an atmospheric distillation unit and a vacuum distillation unit. For the
secondary processing units, we postulate linear or full quadratic parametric models and then perform parameter estimation to find the globally optimal parameters of the models
which best describe the data collected of product yields and
properties. It should be noted that the parameter estimation
problems of all of the yield and property equations associated
with a single processing unit are solved simultaneously
coupled with additional mass balance constraints, to ensure
that the obtained parameters lead to feasible process models.
Before presenting our models, we define a variable F(s,u,s0 ,u0 )
to denote the flow rate of an intermediate outlet stream s from
unit u to unit u0 as an intermediate inlet stream s0 ;Fin
U s; u to
denote the inlet mass flow rate of stream s to unit u; Fout
U s; u
to be the outlet mass flow rate of stream s from unit u;
Yield(s,u) to denote the yield of outlet stream s from unit u;
Ein
U e; s; u as the property e in the inlet stream s to unit u; and
Eout
U e; s; u as the property e of the outlet stream s from unit u.
Crude distillation unit

Figure 5 illustrates a schematic diagram of a CDU in a
petrochemical plant. We use a set UCDU to denote CDUs.
AIChE Journal
Different types of crude oils are fed as feedstock into CDUs

for processing after blending, salt removal, and heating. Due
to different boiling point temperatures of mixtures in crude-oil
feedstock, the feedstock is separated into various cuts or distillates such as Dry Gas, LPG, Naphtha, Diesel, Wax Oil, and
Residue, which are withdrawn from the side of the column
and then distributed to downstream processing and treating
units. Therefore, distillation or fractionation models for the
quantity and quality predictions of distillates play an important
role to avoid potential inconsistencies in decision making
within the petrochemical sector.
In the beginning, CDUs were designed and modeled using
empirical correlations and past experience.42,43 Then, shortcut methods were typically used in distillation column design
and modeling, including the Fenske-Underwood-Gilliland
methods.21 However, this method cannot be directly applied to
the CDU due to its complexity.21 Although some simulation
packages can perform more rigorous calculations for the
CDU, these rigorous models cannot be readily incorporated
into a planning model because of their high nonlinearity and
high computational cost, which are the reasons for often causing failure of convergence. Therefore, suitable models that
retain a balance between simplicity, accuracy, and optimization robustness should be developed for CDU processes. A
fixed-yield approach is proposed to model the CDU for the
simplest LP refinery planning where operating conditions are
indirectly embedded into the coefficients. Despite its historic
popularity and simplicity, the fixed yield approach does not
optimize the distillation cuts nor its operating conditions. The
swing cut approach17,26,4447 was developed to determine the
different yields and properties of crude cuts. This model which
has been commercialized and successfully implemented in
most refineries, is found in many current refinery planning
packages such as Aspen PIMS12 and RPMS.11 Recently, the
well-known fractionation index48 is used to predict yields for
the refinery planning problem21,49 which only involve mass or
mole flow balances.
In this work, we use the commercialized swing-cut
approach to predict distillate yields and properties in a CDU
unit. One of the important inputs for this model is the crude
assay data, in which each crude-oil feedstock is decomposed
or separated into pseudocomponents or microcuts26 with a
predefined true boiling point (TBP) temperature interval ranging across the entire crude oil. Usually the entire temperature
for a crude-oil feedstock in the crude assay data ranges from
the boiling point of methane to 8508C.16 The pseudocomponents or microcuts TBP temperature interval for crudes used
in this article range from 20 to 508C. Figure 6 illustrates the
correlation of the weight yields for a single crude-oil feedstock. Crude assay data from the petrochemical plant data is
used to develop correlations of yield and properties as a function of TBP temperature. We define a variable T(c) to denote
the TBP temperature of crude c. Based on each crude assay
data, nonlinear models are developed to correlate the yield of
each distillate contributed from each crude with the defined
TBP temperature. The generic type of the yield-TBP curve
model for a crude-oil feedstock to CDU is given as follows
(Eq. 1),
Yieldc; u5f Tc
8u 2 UCDU ; c 2 Sin
u
(1)
where, Yield(c,u) is the accumulated yield of crude c to CDU.

The function f is usually polynomial, exponential, or their
combinations. It is not difficult to decide that the accumulated
AIChE Journal
Figure 6. Crude oil TBP curve for yield with swing cuts
illustration (based on crude oil assay data
from the real petrochemical plant).
yield is a polynomial function of TBP temperature for Figure

6 which is given as follows, where m 5 5:
Yieldc; u5
m
X
an; c Tcn
8u 2 UCDU ; c 2 Sin
u
n50
where, a(n,c) is a parameter array which needs to be determined. Parameter a(n,c) is determined using a parameter estimation approach, which will be introduced and explained
later.
Since the cutting temperature is the same for all crude-oil
feedstock, we define Tcut(s) to be the cutting temperature for
distillate s and Yield(s,c,u) as the yield of distillate s contributed from crude c in CDU u.
X
Yields; c; u5f T cut s2
yields0 ; c; u
out
(2)
s0 <s;s0 2Su
in
;
c
2
S
8u 2 UCDU ; s 2 Sout
u
u
For the example in Figure 6, the yield of distillate s contributed from crude c in CDU u is presented as follows,
Yields; c; u5
m
X
an; c T cut sn 2
yields0 ; c; u
s0 <s;s0 2Sout
u
n50
in
u ; c 2 Su
The final production yield of each distillation [Yield(s,u)] is

calculated according to the contribution of each crude.
P
0
0
Fin
U s ; u Yields; s ; u
P
0
Fin
U s ; u
s0 2Sin
u
CDU
out
8u 2 U
; s 2 Su
Yields; u5
s0 2Sin
u
(3)
Most distillation models in the refinery planning problem

using a swing cut approach assume that the properties for the
swing cut fractions are constant across their temperature
ranges, which is not true and can subsequently result in inaccuracies in the prediction quality.17,26 To address this problem,
DOI 10.1002/aic
3027
Ee; s; c; u5
4
X
am; e; c
m50
m
Yields ; c; u10:5 Yields; c; u
s0 <s;s0 2Sout
u
in
out
u ; c 2 Su ; e 2 EU s; u
Figure 8 illustrates the correlation between viscosity and

accumulated yield from one crude assay. We need to use an
exponential function to accurately describe this correlation.
Ee; s; c; u5a0e; s; c; u1a1e; s; c; u
EXPa2e; s; c Yields; c; u
in
0
out
u ; s 2 Su ; e 2 EU s; u
Figure 7. Example of the relationship between SPG

and accumulated yield based on crude assay
data from the actual petrochemical plant.
(6)
The final property e for distillate s in the distillation unit u

is calculated using the following constraints,
0
1
X
out
in
FU c; u Yields; uA
EU e; s; u @
5

c2Sin
u
Fin
U c; u
Yields; c; u Ee; s; c; u
(7)
c22Sin
u
out
u ; e 2 EU s; u
Li et al.17 used regression models based on crude properties to

calculate properties of CDU distillates. Menezes et al.26
divided distillates as distillate cuts and swing cuts with the
TBP temperature interval of 108C. Each swing cut is split into
two internal streams, the light going to the lighter final-cut and
the heavy moving to the heavier final-cut. Therefore, the light
(heavy) stream from the swing cut fraction will be blended
with the lighter (heavier) final-cut to improve property predictions. Although the approach of Menezes et al.26 may generate
better predictions for distillate properties than that of Li
et al.,17 it involves more blending operations and it is better to
have smaller TBP temperature intervals (e.g., 108C) for
microcuts in crude assay data to determine their properties
accurately. Therefore, we follow the approach of Li et al.17 to
develop regression models for property predictions based on
crude assay data, to reduce the number of equations and nonlinear terms. The properties of each distillate from each crude
are correlated with the average accumulative yield based on
each crude assay. We define a variable E(e,s,c,u) to denote a
property e of distillate s contributed from crude c in the CDU
u. Then, the general property prediction model is presented as
follows,
Ee; s; c; u5f MYields; c; u
in
out
u ; c 2 Su ; e 2 EU s; u
Secondary processing units

The remaining processing units consist of coking, hydrotreating, hydrocracking, reforming, ethylene cracking, and
specialty chemicals processing and refining. Rigorous models
for coking, cracking, and hydrotreating units can be found in
the literature, however, these are not suitable for the current
application due to their computational expense and their
inability to be connected with their upstream or downstream
connecting units through the same set of inputoutput variables.17,22 For this reason, in this work we have followed a
data-based approach to model the remaining processing units.
Our main goal is to develop models which will predict the
yields and properties of each output stream for each unit as a
(4)
where, MYield(s,c,u) is the mid-point yield of CDU fractions,

which is determined using the following constraints.
X
MYields; c; u5
Yields0 ; c; u10:5 Yields; c; u
8u 2 U
CDU
;s
s0 <s;s0 2Sout
u
2 Sout
u ;c
out
2 Sin
u ; e 2 EU s; u
(5)
Usually, the function f is a polynomial, exponential, and their
combinations. Figure 7 illustrates the relationship between
specific gravity to accumulated yield from one crude assay
dataset. It is found that we can use a polynomial function to
accurately describe this correlation.
3028
DOI 10.1002/aic
Figure 8. Example of the relationship between viscosity

and accumulated yield based on crude assay
data from the actual petrochemical plant.

AIChE Journal
function of inlet flow rates and inlet properties of the specific

unit. Based on the information provided, we assume that a
decision must be made using only flow rate measurements and
property measurements, but no information about the operating variables is provided. In other words, given the amount of
the different streams and their properties that enter a unit, we
need to develop correlations to capture the yields and properties of the outlet streams of that unit using a large historical
data base from this plant. In doing so, we need to take into
account that the outlet yields are connected through a mass
balance constraint, and the outlet properties are correlated to
the yields. For this reason, the parameter estimation of each of
the yield and property equations related to an outlet stream of
a unit should not be performed separately, but simultaneously
for each unit restricted by the operational bounds and the mass
balance constraints.
First, it is necessary to identify the form of the inputoutput
equations which best describe each of the mappings between
the input variables and the output properties and outlet yields
for each unit. As a general rule, we have assumed that the outlet yield prediction equations are a function of the inlet stream
properties and inlet flow rates (Eq. 8). Outlet property prediction equations are a function of inlet properties, inlet flow rates
and the outlet flow rate of the specific stream of the property
(Eq. 9)
0 0
in 0
Yields; u5f Ein
U e ; s ; u; FU s ; u
(8)
in 0
PRO
CDU
0
in 0
8u 2 U ; u 62 U
; s 2 Sout
u ; s 2 Su ; e 2 EU s ; u
in 0

out
in 0 0
Eout
U e; s; u5f FU s ; u; FU s; u; EU e ; s ; u
in
0
8u 2 UPRO ; u 62 UCDU ; s 2 Sout
u ; s 2 Su ;
0
e 2
0
Ein
U s ; u; e
(9)
Eout
U s; u
For each unit in U, it is necessary to develop jSout

u j (denoted as
the number of elements in set Sout
u ) equations based on Eq. 8 to
model each outlet stream, as well as jEout
U s; uj equations
based on Eq. 9 to represent each of the properties of various
outlet streams of interest. It is very important to be able to
accurately model properties of streams which will be inputs to
downstream models, since these will be further used as inputs
to the models of that unit.
Equations 8 and 9 range from linear to general quadratic,
depending on the nature of the experimental data. Specifically,
if a linear correlation is found to be accurate, then it is used,
otherwise, quadratic and bilinear terms are added to capture
the nonlinear behavior of the data. It is found that higher order
terms were not necessary to capture the industrial data for all
of the secondary processes. Moreover, careful inspection of
the experimental data reveals streams which do not need to be
modeled due to a constant response. Finally, there are cases
where we have mutually exclusive streams which are modeled
separately as different operating modes of a unit and are associated with a binary selection in the optimization problem.
This occurs in the hydrocracking units where it is known that
light diesel and kerosene cannot be produced simultaneously.
For these units the industrial data is separated into the two different modes and two different models are developed using
the same approach described above.
Once the data preprocessing is completed for each unit and
the TdataU matrix is formed, the next important step is the
decision of which subset of inputs will be considered as input
variables to each of the set of outputs of the unit. The complete
in
out in
set of inputs is denoted as xin
U 5FU ; FU EU e; s; u, including
AIChE Journal
the flow rates of all the inlet and outlet streams and the properties of all the inlet streams. The set of outputs to be modeled is
the set of yields for all outlet streams: Yieldsout ; u and the set
out
of properties for all outlet streams: Eout
U e; s ; u. Each yield
and property equation is a function of a subset of xin
U based on
the following rules:
1. all input flow rates are used as input variables in the
yield prediction and property prediction models;
2. all input properties of all input streams are used as
input variables in yield prediction models;
3. only the outlet flow rate corresponding to the same
stream of the outlet property prediction is considered
as an input variable to that property prediction
model;
4. a subset of input properties is used as an input variable
in property prediction models based on theoretical
knowledge, prior experience, and statistical analysis.
For example, if the outlet property to be predicted is
SPG, then only the inlet SPG is found to be sufficient
for a prediction. If the outlet property to be predicted is
viscosity, then inlet viscosity and SPG are found to be
important toward its prediction; and
5. if a unit can be operated in different modes, then the
data set is divided into as many modes of operation
and a different model is developed for each mode of
operation of the unit.
Once the set of input variables for Eqs. 8 and 9 are identified, the appropriate model must be postulated to describe the
data. The set of input variables for each of the yield and property prediction equations of forms based on Eqs. 8 and 9
depends on the amount of streams and available inlet properties of the streams of the specific unit. We denote the set of
inputs for models based on Eq. 8 as:
x yieldiin sout ; u xin
i51; . . .; Iyield , and the general
U
quadratic form is shown in the following equation:
Iyield
X
Yieldsout ; u5byield;0 sout ; u1 byield;i sout ; u x yieldiin sout ; u
i51
Iyield X
Iyield
X
byield;i;j sout ; u x yieldiin sout ; u x yieldjin sout ; u
i51
j52
ji
in 0
in 0
0
u ; s 2 Su ; e 2 EU s ; u
(10)
Equation 10 contains bilinear and quadratic terms between the
selected inlet properties and inlet flow rates. Similarly, if the
set of inputs of Eq. 9 are summarized by variables
out
in
x ein
i51; . . .; Ie , then the general quadratic
i s ; u xU
model which any model of Eq. 9 may take is shown in
Eq. 11.
out
out
Eout
U e; s ; u5be;0 e; s ; u1
Ie
X
be;i e; sout ; u x ei sout ; u
i51
Ie X
Ie
X
i51
be;i;j e; sout ; u x ei sout ; u x ej sout ; u
j52
ji
in 0
in 0
out
0
u ; s 2 Su ; e 2 EU s ; u; e 2 EU s; u
(11)
In Eqs. 10 and 11, parameters byield and be are those which
must be estimated in order for the developed models to best
describe the data. This is achieved by least-squares
DOI 10.1002/aic
3029
Figure 9. Yield data (a) Raw industrial data with outliers, (b) Filtered data, (c) Industrial data vs. predicted data
after parameter estimation.
minimization between the data and model predictions. However, it is realized that each of the equations of a unit u are not
independent due to mass balance constraints and intercorrelations. Consequently, it is not sufficient to estimate the
parameters of each of these models individually through simple parameter estimation. Once each set of inputs and their
postulated equation forms are identified for each of the outlet
streams, simultaneous parameter estimation is performed for
all of the equations of form (10) and (11) associated with u,
subject to the mass balance constraints. Moreover, the model
predictions are restricted within the lower and upper bounds
identified by physical constraints and/or the experimental data.
The parameter estimation optimization problem has the objective of minimization of the sum of all the mean squared errors
(MSE) of each output, between experimental measurements
and predictions (Problem 12). The parameter estimation problems are nonlinear optimization problems (NLP) due to the
objective function in Problem (12), and are solved to global
optimality using solver ANTIGONE.39 The optimal parameters and bounds of each of the input and output variables for
each unit stream are stored in text files in an appropriate format so that they can be used by the planning model which will
be described subsequently.
X
X
min
MSEyield sout ; u1
MSEe e; sout ; u
be ;byield
e2sout
sout
in
in in
Yield out sout ; u5f Ein
U s ; u; FU s ; u; byield

out
in in
out out
in
in
Eout
U e; s ; u5f FU s ; u; FU s ; u; EU e; s ; u; be
X
MSEyield sout ; u5 yieldN sout ; u2yieldN;exp sout ; u2
N
X
2
out
out
MSEe e; sout ; u5 Eout
U;N e; sout ; u2EU;N;exp s ; u
X
N
out
Yields ; u1LossU 5100
sout
X
Fout sout ; u5 Yieldsout ; u
Fin sin ; u
sin
Yield lo sout ; u Yieldsout ; u Yield up sout ; u

out;up
out
EU out;lo e; sout ; u Eout
e; sout ; u
U e; s ; u EU
(12)
3030
DOI 10.1002/aic
Once the optimal parameters are identified through the solution of the nonlinear optimization problem (12), it is imperative to add a set of equations to the unit model for
completeness and integration within the planning model.
These additional constraints are described in the next section.
To further clarify the above procedure and also show the
type of data which this work was based on, we describe here
an example of a property prediction and a yield prediction
model for a catalytic cracking unit. In this unit there is only
one inlet stream, five inlet properties describing the inlet
stream and six outlet streams. Thus the input variable set
out
in
out
out
out
out
contains 12 variables: xin
U 5f 1U ; f 1U ; f 2U ; f 3U ; f 4U ; f 5U ;
in
in
out in
in
in
f 6U ; eU e1; s; u; eU e2; s; u; eU e3; s; u; eU e4; s; u; eU e5; s; u.
The model for the prediction of a yield of one of the outlet
streams is a function of six variables following the rules
described above (inlet flow rate and inlet properties). A linear
model is not found to be accurate for this prediction, thus a
full quadratic model (Eq. 10) is used to predict the data as
shown in Figure 9.
The model for the prediction of the specific gravity of one
of the outlet streams is a function of the three input variables:
the inlet flow rate, the outlet flow rate of the same stream, and
the input property corresponding to SPG. The function represented by Eq. 11 is used with three inputs and all of the nonlinear terms to produce an accurate prediction for this property.
The data and prediction of this model are shown in Figure 10.
It should be noted that the optimal parameters of both of the
predictions shown in Figures 9 and 10 are estimated through
one parameter estimation model of form 12 for the catalytic
cracking unit, along with all of the other yield and property
prediction functions of the same unit.
Blending and pooling units

Figure 11 illustrates a schematic diagram of a blending or
pooling unit. We use set UBLD to denote blending or pooling
units. Several blending components for instance, sin1, sin2, sin3,
sin4, and sin5 are used as the feedstock of a blender to produce
desired products. The following constraints are proposed to
ensure mass balances in blending and pooling units.
The amount of a blending component s that is used to produce different products should not exceed the total inlet
amount of this component s.
AIChE Journal
Figure 10. Specific Gravity data (a) Raw industrial data with outliers, (b) Filtered data, (c) Industrial data vs. predicted data after parameter estimation.
Fin
U s; u
Fs; u; s0 ; u 8u 2 UBLD ; s 2 Sin

u
(13)
s0 2Sout
u
Similarly, the amount of product s that is produced should not

exceed the total amount from different components.
X
BLD
Fs0 ; u; s; u Fout
; s 2 Sout
(14)
u
U s; u 8u 2 U
s0 2Sin
u
Several important properties such as RON, SPG, RVP, sulfur,

nitrogen, benzene, aromatics, olefin, flash point, freezing
point, cetane number, cold filter plug point, smoking point
are used in practice (Table 1). Many of these properties (e.g.,
RVP) involve highly nonlinear mixing rules. However, as
noted by Li et al.,24 a linear blending index usually exists
and is used for almost every hydrocarbon property with nonlinear mixing correlations. These blending indices are linearly additive on either volume or weight basis. Table 1 lists
several properties, indices, and their additive bases. To calculate those blending indices which are based on a volume
basis, we define a variable xf(s,u,s0 ,u0 ) to represent the volumetric flow rate of component s into blending and pooling
units that are used to produce product s0 . The relationship
between the mass and volumetric flow rates are represented
as follows,
Fs; u; s0 ; u0 5xf s; u; s0 ; u0 Ein
U e; s; u
out
0
8u0 2 UBLD ; s 2 Sin
u ; s 2 Su
Overall Petrochemical Planning Model

In real petrochemical operations, the planning horizon is
usually considered as 30 days (i.e., 1 month). However, each
operation in processing units such as light diesel production
and jet fuel production is conducted on daily basis. In this article, we develop a single-period (i.e., 1 day) planning model
for the entire petrochemical plant. In the future, we will divide
the entire horizon into t (t 5 1, 2, 3, . . ., T) periods and propose
a multi-period planning model. So far we have described the
yield and property prediction models for the CDUs, secondary
processing units, blenders, and pools. These must be complemented with capacity constraints, unit selection constraints,
production mode selection constraints, and some other operational constraints.
Unit selection
There may be several parallel processing units to produce
the same products. To model the selection of those parallel
units, we define a binary variable Y(u) as follows,
(15)
The desired product quality is ensured using the following

constraints.
P
0
xf s0 ; u; s; u Ein
U e; s ; u
s0 2Sin
ou t
u
P
EU e; s; u5
0
(16)
xf s ; u; s; u
s0 2Sin
u
out
BLD
out
8u 2 U ; s 2 Su ; e 2 EU s; u
P
0
Fs0 ; u; s; u Ein
U e; s ; u
s0 2Sin
out
u
P
EU e; s; u5
(17)
Fs0 ; u; s; u
s0 2Sin
u
out
BLD
out
8u 2 U ; s 2 Su ; e 2 EU s; u
t
U
ELU e; s; u Eou
U e; s; u EU e; s; u
out
8u 2 UBLD ; s 2 Sout
u ; e 2 EU s; u
(18)
where, Ein
U e; s; u is the blending index of property e in inlet
stream s to blending or pooling unit u. Note that Eq. 16 is for
volume-based indices, and Eq. 17 for weight-based indices.
Equations 1517 introduce bilinear terms.
AIChE Journal
Figure 11. A schematic diagram of a blending or pooling unit.

[Color figure can be viewed in the online issue, which
is available at wileyonlinelibrary.com.]
DOI 10.1002/aic
3031
(
Yu5
If unit u is selected
otherwise
8u 2 UPAR
The total inlet flow rate to each unit must satisfy its minimum
[CAPLU u] and maximum [CAPU
U u] capacity. Then we have,
X
U
Fin
CAPLU u Yu
U s; u CAPU u Yu
in
(19)
s2Su
8u 2 UPAR
X
U
PAR
Fin
(20)
CAPLU u
U s; u CAPU u 8u 62 U
s2Sin
u
Figure 12. A schematic diagram of the input from and

output to of a unit.
Mode selection
In some processing units, several operational modes may be
involved during the entire planning horizon. At each time
(e.g., each day) only one production mode is involved. To
model the selection of different operational modes, the following binary variable x(m,u) is defined.
(
1 If operational mode m in unit u is selected
xm; u5
0 otherwise
8u 2 Umod ; m 2 Mu
Note that the optimization model developed is used for the
global optimization of an average daily profit during the entire
planning horizon. Therefore, we need to know how many days
are used for mode m in the planning horizon. To achieve this,
we introduce the following auxiliary binary variable,
(
1 If mode m in unit u is operated in day d
zm; u; d5
0 otherwise
mod
8u 2 U ; m 2 Mu ; d
The total number of days for operational mode m in unit u,
which is defined as dm(m,u), can be calculated using the
equivalent mathematical form from Floudas (1995) given as
follows,
dmm; u5
D
X
2d21 zm; u; d 8u 2 Umod ; m 2 Mu ; m < M
d51
(21)
where, parameter D is used to denote the maximum number of
days to represent the total number of days in the planning horizon. For instance, D is assigned to 5 which is enough to represent 31 days in the planning horizon.
If mode m in unit u is operated in day d, then mode m in
unit u should be operated.
xm; u zm; u; d 8u 2 Umod ; m 2 Mu ; m < M; d D
(22)
Alternatively, if mode m in unit u is not operated in any day d,
then mode m in unit u should not be operated.
D
X
mod

The total number of days assigned for different operational

modes should be equal to the planning horizon.
M
X
dmm; u5DH Yu 8u 2 Umod
(24)
m2Mu ;m51
where, DH is the total number of days in the planning horizon.

We also need to impose x(m,u) to be binary when m 5 M,
which are ensured using the following constraints.
dmm; u DH3xm; u
xm; u dmm; u
8u 2 Umod ; m5M
(25)
mod
(26)
8u 2 U
; m5M
Since Eq. 24 imposes dm(m,u) to be an integer variable for

m 5 M, x(m,u) can be guaranteed to be binary for m 5 M
through Eqs. 25 and 26. By doing this, we can reduce the number of binary variables by at least D.
Mass balance for connected units

The inlet stream s to a unit u mainly comes from its prior
stream units and the initial inventory of the unit u as shown in
Figure 12. The total inlet flow rate of stream s to a unit u
should not exceed total flow rate from its prior stream units
and its initial inventory, which can be enforced as follows,
X
in
Fs0 ; u0 ; s; u1Inv0s; u Fin
U s; u 8u; s 2 Su
s0 ;u0 :s0 ;u0 ;s;u2UC
(27)
where, Inv0(s,u) denotes the initial inventory for stream s of a
unit u.
For some processing units such as hydrocracking and hydrotreating units, H2 is one of the feedstocks. The inlet flow rate
of H2 must satisfy its minimum [roLH s; u] and maximum
[roU
H s; u] ratio requirements.
X
X
0
in
U
0
Fin
Fin
roLH s; u
U s ; u F s; u roH s; u
U s ; u
s0 2Sin
u
8u;
s2Sin
u;
s0 2Sin
u
0
s 5 f H2 g
; m 2 Mu ; m < M (23)
(28)
Note that Eqs. 22 and 23 enforce 01 variable x(m,u) to be

binary although we define it as 01 continuous variable when
m < M.
The inlet properties of a stream s to a processing unit u should

be the same as the properties of a stream s0 from its prior
stream unit u0 . These restrictions can be imposed using the following constraints,
xm; u
zm; u; d 8u 2 U
d51
3032
DOI 10.1002/aic
AIChE Journal
0 0
in
PAR
Eout
; s0 ; u0 ; s; u 2 UC
U e; s ; u 5EU e; s; u 8u 62 U
(29)
in;L
0 0
in
Eout
U e; s ; u EU e; s; u2EU e; s; u 12Yu
PAR
0 0
8u 2 U ; s ; u ; s; u 2 UC
(30)
out;U
0 0
in
e; s0 ; u0 12Yu
Eout
U e; s ; u EU e; s; u1EU
(31)
PAR
0 0
8u 2 U ; s ; u ; s; u 2 UC
The inlet properties should meet their minimum [Ein;L e; s; u]

and maximum [Ein;U e; s; u] requirements as follows,
in;L
in;U
EU
e; s; u Ein
U e; s; u EU e; s; u
PAR
0 0
8u 62 U ; s ; u ; s; u 2 UC
(32)
in;U
in
Ein;L
U e; s; u Yu EU e; s; u EU e; s; u Yu (33)
8u 2 UPAR ; s0 ; u0 ; s; u 2 UC
The yield prediction model for those units without or with different operational modes can be developed by solving the parameter estimation problem (12). For completeness, the generic
correlations for yield and property predictions are given here.
0 0
in 0
mod
Yields; u5f Ein
U e ; s ; u; FU s ; u 8u 62 U
in 0 0

0
Yields; u; m5f EU e ; s ; u; Fin
U s ; u ;
0 0
in 0
where Ein
U e ; s ; u; FU s ; u 2 m
8u 2 Umod
(34)
(35)
Note that for the units with multiple modes, the property and
flow rate data is divided into different sets, each corresponding
to a different mode of operation. The summation of product
yields must be equal to 1 as follows:
X
Yields; u1YieldLossu51 8u 62 Umod
(36)
s2Sout
u
Yields; u; m1YieldLossu51 8u 2 Umod ; m 2 Mu

(37)
where, YieldLoss(u) is the loss yield of a unit u.

The outlet flow rate of a product s can be correlated with its
yield using the following constraints:
2
3
X
in
0
4
FU s ; u5 Yields; u5Fout
U s; u
(38)
out
0
s 2Su
8u 2 UPRO ; u 62 Umod ; s 2 Sout

u
hP
s0 2Sin
u
i
0
Fin
U s ; u Yields; u; m dmm; u
(39)
Ie
X
be;i e; sout ; ux ei sout ; u
i51
Ie X
Ie
X
i51
be;i;j e; sout ; ux ei sout ; ux ej sout ; u
8u 62 Umod
j52
ji
(40)
AIChE Journal
Objective function
The final objective for the entire planning model is to maximize the total profit, which consists of revenue from the sale
of final products, raw material purchase cost, and operational
cost. The operational cost for each unit u is assumed to be linear with the inlet or outlet stream flow rates of products of this
unit. The inlet and outlet streams involved in their operational
cost are denoted as SuUTOC;in and SUTOC;out
, respectively. We
u
define Price(s) as the price of each final product s ($/ton),
Cost(s) as the purchase cost for raw material s ($/ton), and
UTOC(u) as the operational cost for each production unit u.
The total profit is calculated by,
8
2
X X<
X
5
Prices 4
:
SAL
in
u0
u2U
s2Su
8
X X<
u2U
PURC
s2Sout
u
8
X <
PRO
Costs
0
UTOCu @
X
s0 2Sout
:s0 ;u0 ;s;u2UC
u0
u0
s0 s0 2Sin
:s0 ;u0 ;s;u2UC
u0
X
s2SuUTOC;in
Fin
U s; u1
39
=
Fs ; u ; s; u5
;
0
Fs; u; s0 ; u0
9
=
;
19
=
A
Fout
s;
u
U
;
UTOC;out
X
s2Su
(43)
At this point, we have completed the description of our entire
single-period model which comprises of Eqs. 1343. Detailed
representations of the forms of the models and parameter values are provided in Supporting Information.
Computational Platform
DH
8u 2 UPRO \ Umod ; s 2 Sout
u
out
out
Eout
U e; s ; u5be;0 e; s ; u1
Each final product s has its own demand during the planning
horizon. The minimum [DmdSL s] and maximum [DmdSU s]
demand requirements for each final product s can be constrained as follows,
X X
Fs0 ; u0 ; s; u DmdSU s
DmdSL s
0
0
0
0
s u :s ;u 2UC
(42)
FP
;
s
2
S
8u 2 USAL ; s 2 Sin
u
u2U
The average outlet flow rate of each product with different

operational modes can be calculated by the following
constraints,
m2Mu ;m51
Demand constraints
PROFIT
s2Sout
u
Fout
U s; u
PM
Any product s produced from a unit u or its initial inventory

will be sent to its downstream units for further processing as
shown in Figure 12. The total amount of product s produced
from a unit and its initial inventory should exceed the total
amount of product s that is sent to downstream units.
X
X
Fs; u; s0 ; u0
Fout
U s; u1Inv0s; u
0
in
0
0
0
u s 2S 0 ;u ;s 2UC
(41)
u
8u 2 UPRO [ UBLD ; s 2 Sout
u
To make the developed framework easy to use by a wide

range of users, we have developed a user-friendly Excel-based
computational platform. The platform can communicate with
the planning model developed in GAMS modeling language
by modifying several inputs of the model, but also by reading
the optimal solution obtained from the model to produce final
result tables and graphs. The computational platform is developed using Excel-VBA and allows the user to input various
information through tables in Excel Sheets. Through the platform the user can change (a) crude oil amounts and properties,
DOI 10.1002/aic
3033
tion of the steps of the computational platform architecture is

shown in Figure 13.
Computational Results
Figure 13. Schematic representation of procedure followed by the user using the Excel computational platform.
We solve three cases studies (CS1CS3) from a real petrochemical plant to illustrate the predictive power and superiority of the developed data-driven and global optimization
framework and compare our results with those based on an
empirical decision-making approach from the actual operation. In these three case studies, 36 raw materials including
three different crudes are used to produce 200273 products
including dry gas, LPG, naphtha, gasoline, diesel, Jet fuel,
benzene, toluene, xylol, sulfur, ethylene, styrene, methane,
and many more proprietary products. There are 34 processing
units, 5 pools, and 5 blenders. Two parallel hydrocracking
units (denoted as HC1 and HC2) and two parallel catalytic
cracking units (denoted as FC1 and FC2) with different
capacities are used to produce the same products. While two
production modes such as Naphtha and Reforming
Feedstock mode can be operated in the CDU unit, Light
Diesel and Jet Fuel production modes can be operated in
HC1 and HC2. These two different production modes cannot
be operated during the same day. Several important properties
including SPG, SUL, CCR, RON, V20, V100, FLP, FRP,
CFPP, CEN, ARO, OLE, BEN, SMP, RVP, and N2 are considered (Table 1). The formulated mixed integer nonlinear
optimization problem contains on average 11 binary variables,
1150 continuous variables, 1100 equations, and 1050 nonlinear terms. The breakdown of the model statistics for the integrated refinery and chemical plant complex is shown in Figure
14. The values of Figure 14 change slightly for different case
studies depending on the number of parallel units as well as
the number of variables which are fixed to constant values.
The main differences of these three case studies are crude
amounts and cost, final product price and demands, and raw
material cost. All examples are solved using ANTIGONE
1.13841 in GAMS 24.2.2 on Dell OPTIPLEX 960 (IntelV
XeonR CPU 3.00 GHz, 2 GB RAM memory) running Linux.
The default optimality gap of 1e-04 is used for convergence
and the solutions reported did not require the specification of
an initial point.
R
(b) intermediate and final streams property specifications/

bounds, (c) demand constraints, (d) capacity constraints, (e)
cost and price parameters, and (f) intermediate streams flow
rate and yield bounds. The user can navigate through the various sheets which contain detailed information about each specific unit through the Main Page but also through clickable
flowsheets of the plant (Figure 13). A set of default bounds for
all of the flow rates and properties are provided in the Excel
platform, and the user is informed that these bounds should
not be expanded because this will decrease the predictive ability of the model. However, the bounds can be reduced to even
a constant value by setting the lower and upper bound to the
desired value. If a variable is fixed to a constant value, the
model which is used for the prediction of this variable is automatically removed from the planning optimization problem to
achieve computational savings and avoid numerical problems.
Once the user has made any modifications through the Excel
platform to reproduce a specific case study, the interface automatically incorporates all of the above information by modifying the planning model code and solves the global
optimization problem to global optimality using ANTIGONE,3841 through a simple click of a button on the main page
of the Excel file. Once the global optimum has been found the
results are sent back to the Excel platform and are presented to
the user in several tables and graphs. A schematic representa3034
DOI 10.1002/aic
Figure 14. Model Statistics of MINLP planning model

for entire petrochemical plant.

AIChE Journal
Table 2. Model statistics, computational cost (CPU time) for obtaining the global optimum solution, and comparison of actual
vs. optimized operation for CS1, CS2, CS3 in US $Million
CS1
Total equations
Total continuous variables
Total binary variables
Total nonlinear terms
Total CPU cost for GO (s)
Total income
Total material cost
Total operating cost
Total profit
Rel. profit increase (%)
CS2
CS3
Actual
operation
Optimized
operation
Actual
operation
Optimized
operation
Actual
operation
Optimized
operation
721.57
646.17
39.67
35.73
1067
1117
11
1026
18
742.5
655.74
37.536
49.22
600.80
586.84
28.20
214.23
1100
1149
11
1053
195
662.27
611.52
30.02
20.73
747.94
669.99
40.30
37.64
1118
1160
11
1072
212
769.57
668.67
39.08
61.83
37.75
245.65
For a fair comparison, the following information in the three

case studies is set to the values identical to the actual
operation:
Amounts, properties and crude assay of C1, C2, and C3.
Demand limits for all refinery and chemical final products.
Prices and costs of final products and raw materials.
Limits on property specifications.
Input properties of all raw materials.
Inventory amounts and properties.
To solve each case study and perform analysis of the results,
we use the developed user interface described in the previous
section. The information which the model decides based on
the provided information is:
1. yields, flow rates, and properties of all intermediate and
final products;
2. operating modes of multimode units; and
3. operation or non-operation of parallel units.
From the above information, the optimized total profit of
the plant is reported and compared to that of the actual operation. Moreover, the feasibility of the obtained optimized solution is verified by the plant operators to ensure that the
operation is possible. Table 2 shows a comparison of the total
income, total sales, and total operating cost of the actual oper-
64.25
ation and the optimized operation for all of the three case
studies.
Table 2 shows that the global optimization framework was
able to identify significantly improved feasible solutions for
the entire petrochemical planning problem for the three different case studies. For CS1 the total income is increased in the
optimized operation by $13.5 million, the material cost is also
increased by $9.57 million but the operating cost is decreased
by $2.1 million. In other words, a higher profit is made by purchasing more raw materials to make more sales, while
decreasing the operating cost. In the case of CS2, the total
income is dramatically increased by $62 million while the
material cost is increased by $24 million and the operating
cost is increased slightly by $2 million. For the operation of
this month, the global optimization framework suggests that
more expenses should be made to achieve a much higher
profit. The estimation of the actual profit for this month based
on the information that was provided, suggests that there was
no profit made during this month. This can be explained by
the low amount of crude materials that were processed during
this month. Finally, in CS3 the total income is actually
increased by $22 million by reducing the amount of raw materials purchased by $1 million and the operating cost by $1
Figure 15. Actual vs. Optimized relative income contributions (%) for CS1.
AIChE Journal
DOI 10.1002/aic
3035
Figure 16. Actual vs. Optimized relative cost contributions (%) for CS1.
million. In this case, market demands were satisfied by reducing the purchase and operating cost to achieve a higher profit.
Table 2 also summarizes the model statistics and the CPU
time required to obtain global optimality.
Following, we provide a more detailed comparison of the
actual operation vs. the optimized operation for each case
study separately. To analyze the optimized results and verify
their feasibility, it is important to identify the main sources of
the improved performance in the total profit when comparing
the actual operation to the optimized operation. For this we
have calculated the fractional percent contributions of each
stream, raw material and unit to the total income, total material cost, and total operating cost, respectively. Then, we rank
these contributions with decreasing order and compare them
to identify the main differences between the actual and the
optimized operation. Due to proprietary information restrictions, we are not able to disclose the names of the chemicals
produced by the petrochemical plant.
Case Study 1 (CS1)

In this example, three crudes C1C3 with the amounts of
64,446 ton of C3, 46,643 tons of C2, and 665,631 tons of C1
are processed in the CDU unit. Figures 15 and 16 show the
comparison between the contributions toward the total income
(Figure 15) and the contributions toward the total material cost
(Figure 16). The values shown in these figures represent the
total relative contribution of a specific stream toward the
income or cost. This is calculated by dividing the total income
or profit attributed to the specific steam by the total income or
profit from all streams for this month, respectively. In terms of
the total operating cost, we have identified as the main source
of operating cost reduction, the non-operation of a parallel catalytic cracking unit since the demands can be satisfied by the
operation of a single unit.
In terms of the total income breakdown, we observe that the
contribution of Gasoline and Diesel products, Benzene and
Figure 17. Actual vs. Optimized relative income contributions (%) for CS2.
3036
DOI 10.1002/aic
AIChE Journal
Figure 18. Actual vs. Optimized relative cost contributions (%) for CS2.
Kerosene toward the total income is reduced, while several

chemicals such as Styrene, Crude Xylene, and Chemical Toluene are produced and sold. This result is explained by the
increased selling prices of the above chemicals when compared to refinery products. Another main difference is the
increased sold amount of Naphtha which previously was used
within the chemical plants for the operation of certain processes. Based on the optimized solution, it is preferable to sell
Naphtha during the time period of this case study, rather than
use it for the operation of units which can be operated to satisfy the demand constraints without it. In terms of the Raw
Material cost contributions (Figure 16) it is clearly observed
that less Naphtha and Topped Oil is purchased while more
LPG is purchased to achieve the optimized operation, while
the remaining streams have very similar contributions in both
cases.
Case Study 2 (CS2)

This example involves 68,305 tons of C3, 1790 tons of C2,
and 658,191 tons of C1. A similar analysis is performed for
this case study for which the improvements in profit are significantly higher. The main difference between the actual and the
optimized solution in terms of the total income is the significant amount of Naphthae which is produced and sold, which is
explained by changes in its selling price (Figure 17). Moreover, increased income comes from selling styrene, MTBE,
and Chemical Benzene which had a smaller contribution in
the actual operation. To achieve this production plan, the contribution toward the total raw material cost of LPG and
Topped Oil is increased, while the remaining raw material purchases are very similar (Figure 18). Even if the total operating
and material cost is higher for the optimized case, the profit is
generated due to increased sales.
Figure 19. Actual vs. Optimized relative income contribution (%) for CS3.
AIChE Journal
DOI 10.1002/aic
3037
Figure 20. Actual vs. Optimized relative cost contribution (%) for CS3.
Case Study 3 (CS3)

This case study has 49,127 tons of C3, 45,441 tons of C2,
and 702,691 tons of C1. Figures 19 and 20 summarize the
main differences between the actual and optimized operation for case study 3. The most important observations in
terms of total sales, can be summarized by the increased
amount of Naphtha which is produced and sold to the market, as well as the increased amount of chemicals such as
Xylene and Toluene which are produced and sold to
increase the total profit. Moreover, the total material cost is
slightly reduced while keeping the same relative contributions toward the total cost. Finally, the total operating cost
is reduced by not operating the second parallel catalytic
cracking unit.
study which has used a data-driven approach for planning

operations, coupling the invaluable interactive communication
between experienced plant operators, data and global
optimization.
Acknowledgments
The authors would like to acknowledge financial support
from the Chinese Academy of Sciences Visiting Professorship for Senior International Scientists (Grant No.
2010T2G34-2012T1GY11-Continue to March, 2014) with
Science Research and Technology Development Program of
PetroChina Company Limited (No. 2012D-3202-0313),
National Science Foundation (CBET-0827907 and CMMI08856021). Jie Li is further thankful from the National Natural Science Foundation of China (21206174).
Conclusions
This work is a novel Big-Data application for developing
and optimizing a MINLP model for the planning operations of
a large petrochemical plant. The abundance of data allowed
for the development of inputoutput models for all of the processing units of the petrochemical plant by postulating theoretical or empirical models and then identifying the optimal
parameters of the models to best describe the existing data. All
of the developed models are integrated within a large-scale
planning model by introducing additional mass balance constraints, stream connections, blending and pooling unit equations. Discrete variables are introduced due to units which can
be operated at different modes, and also due to the existence
of multiple parallel units which can be used to process the
same materials to produce the same products. The entire planning model is highly non-convex nonlinear and was solved to
global optimality using global optimization software ANTIGONE. In this work we briefly introduce the development of a
user-friendly computational platform which is developed to
allow for the easy and fast automated updating, solution and
results analysis of this complex model through Excel. Results
shown for three case studies show that through our approach
we can obtain significant increase in profit, when compared to
an empirical decision-making practice. This the first complete
3038
DOI 10.1002/aic
Notation
CDU =
LPG =
MINLP =
FC1 =
FC2 =
HC1 =
HC2 =
HKU =
GHU =
GRU =
ARU =
DCU =
DHU =
ET1 =
ET2 =
HTU =
BU1 =
BU2 =
ER1 =
ER2 =
HT2 =
ST2 =
crude distillation unit

liquefied petroleum gas
mixed-integer nonlinear programming
first catalytic cracking unit
second catalytic cracking unit
first hydrocracking unit
second hydrocracking unit
hydrotreating unit
gasoline hydrotreating unit
gasoline reforming unit
aromatics extraction unit
delayed coking unit
diesel hydrotreating unit
ethylene cracking unit 1
ethylene cracking unit 2
hydrotreating unit
butadiene extraction unit 1
butadiene extraction unit 2
etherification unit 1
etherification unit 2
hydrotreating unit
styrene unit
Sets
UCDU = crude distillation units

UPRO = processing units
UBLD = blending and pooling units
AIChE Journal
Umod =
USAL =
UPAR =
Sin
u =
Sout
u =
SFP =
Ein
U s; u =
Eout
U s; u =
Mu =
UC =
units with different production modes

sale department or market
parallel units that can produce the same products
inlet streams to a unit u
outlet streams from a unit u
final products
properties of inlet stream s to unit u
properties of outlet stream s to unit u
production modes in unit u
set of unit connections (stream s, unit u, stream s0 , unit u0 )
such that s from u goes to unit u0 as s0
Subscripts
u = unit
s = stream
e = property specification
Superscripts
U=
L=
in =
out =
UTOC =
upper limit
lower limit
inlet
outlet
involved in the calculation of operational cost
Parameters
H=
N=
DH =
D=
Price(s) =
Cost(s) =
UTOC(u) =
DmdSL s =
DmdSU s =
A(n,s) =
ELU e; s; u =
EU
U e; s; u =
CAPLU u =
CAPU
U u =
Inv0(s,u) =
roLH s; u =
roU
H s; u =
planning horizon
total number of days for which industrial data is available
total number of days in the planning horizon
a parameter that is used to represent the total number of
days in the planning horizon
price ($/ton) of final product s
cost ($/ton) of raw material s
operational cost ($/ton) of unit u
minimum demand requirement for product s
maximum demand requirement for product s
a parameter in the correlation for yield prediction
minimum requirement for property e in stream s from unit u
maximum requirement for property e in stream s from unit
u
minimum capacity
maximum capacity
initial inventory of stream s to (from) unit u
minimum ratio requirement for stream s to unit u
maximum ratio requirement for stream s to unit u
Binary variables
Y(u) = 1 if unit u is selected
z(m,u,d) = 1 if mode m in unit u is operated in day d
01 Continuous variables
x(m,u) = 1 if production mode m in unit u is selected
Continuous variables
F(s,u, s0 , u0 ) = mass flow rate of stream s from unit u to unit u0 as
stream s0
xf(s,u,s0 ,u0 ) = volumetric flow rate of stream s into unit u that are used
to produce product s0
Fin
U s; u = inlet mass flow rate of stream s to unit u
Fout
U s; u = outlet mass flow rate of stream s from unit u
Yield(s,u) = yield of outlet stream s from unit u
YieldLoss(u) = loss yield of a unit u
Ein
U e; s; u = property e in the inlet stream s to unit u
Eout
U e; s; u = property e of outlet stream s from unit u
Tcut(s) = cutting temperature for distillate s
Yield(s,s0 ,u) = yield of distillate s contributed from crude s0 in the crude
distillation unit u
E(e,s,s0 ,u) = property e of distillate s contributed from crude s0 in the
crude distillation unit u
dm(m,u) = total number of days for operational mode m in unit u
Profit = total profit ($)
Literature Cited
1. Jia Z, Ierapetritou M. Mixed-integer linear programming model for
gasoline blending and distribution scheduling. Ind Eng Chem Res.
2003;42(4):825835.
AIChE Journal
2. Li J, Karimi IA. Scheduling gasoline blending operations from recipe determination to shipping using unit slots. Ind Eng Chem Res.
2011;50(15):91569174.
3. Li J, Karimi IA, Srinivasan R. Recipe determination and scheduling
of gasoline blending operations. AIChE J. 2010;56(2):441465.
4. Li J, Li W, Karimi IA, Srinivasan R. Improving the robustness and
efficiency of crude scheduling algorithms. AIChE J. 2007;53(10):
26592680.
5. Pinto JM, Joly M, Moro LFL. Planning and scheduling models for
refinery operations. Comp Chem Eng. 2000;24(910):22592276.
6. Shah NK, Li Z, Ierapetritou MG. Petroleum refining operations: key
issues, advances, and opportunities. Ind Eng Chem Res. 2011;50(3):
11611170.
7. Bengtsson J, Nonas S-L. Refinery planning and scheduling: an overview. In: Bjrndal E, Bjrndal M, Pardalos PM, R
onnqvist M, editors. Energy, Natural Resources and Environmental Economics.
Berlin: Springer, 2010:115130.
8. Symonds GH. Linear Programming: The Solution of Refinery Problems, Vol 8. New York: Esso Standard Oil, 1955.
9. Aronofsky JS, Dutton JM, Tayyabkhan MT. Managerial Planning
with Linear Programming: In Process Industry Applications. Michigan: Wiley; 1978.
10. Pelham R, Pharris C. Refinery operation and control: a future vision.
Hydrocarb Proces. 1996;75(7):8994.
11. Bonner & Moore. RPMS (Refinery and Petrochemical Modeling System): A System Description [Computer Program]. Houston: Bonner
& Moore Management Science; 1979.
12. ASPEN Technology Inc. ASPEN P.I.M.S. System Reference (v7.2.)
[computer program]. ASPEN Technology Inc.; 2010.
13. Haverly Systems. GRTMPS [computer program]. Haverly Systems;
2015.
14. Moro LFL, Zanin AC, Pinto JM. A planning model for refinery diesel production. Comp Chem Eng. 1998;22:S1039S1042.
15. Pinto JM, Moro LFL. A planning model for petroleum refineries.
Brazil J Chem Eng. 2000;17:575586.
16. Kelly JD. Formulating production planning models. Chem Eng Prog.
2004;100:4350.
17. Li W, Hui C-W, Li A. Integrating CDU, FCC and product blending
models into refinery planning. Comp Chem Eng. 2005;29(9):20102028.
18. Alhajri I, Elkamel A, Albahri T, Douglas PL. A nonlinear programming model for refinery planning and optimisation with rigorous
process models and product quality specifications. Int J Oil Gas
Coal Technol. 2008;1(3):283307.
19. Zhang BJ, Hua B. Effective MILP model for oil refinery-wide production planning and better energy utilization. J Clean Prod. 2007;
15(5):439448.
20. Guyonnet P, Grant FH, Bagajewicz MJ. Integrated model for refinery planning, oil procuring, and product distribution. Ind Eng Chem
Res. 2009;48(1):463482.
21. Alattas AM, Grossmann IE, Palou-Rivera I. Integration of nonlinear
crude distillation unit models in refinery planning optimization. Ind
Eng Chem Res. 2011;50(11):68606870.
22. Mahalec V, Sanchez Y. Inferential monitoring and optimization of
crude separation units via hybrid models. Comp Chem Eng. 2012;45:
1526.
23. Menezes BC, Moro LFL, Lin WO, Medronho RA, Pessoa FLP. Nonlinear production planning of oil-refinery units for the future fuel
market in Brazil: process design scenario-based model. Ind Eng
Chem Res. 2014;53(11):43524365.
24. Zhang BJ, Liu K, Luo XL, Chen QL, Li WK. A multi-period mathematical model for simultaneous optimization of materials and energy
on the refining site scale. Appl Energ. 2015;143:238250.
25. Mouret S, Grossmann IE, Pestiaux P. A new Lagrangian decomposition approach applied to the integration of refinery planning and
crude-oil scheduling. Comp Chem Eng. 2011;35(12):27502766.
26. Menezes BC, Kelly JD, Grossmann IE. Improved swing-cut modeling for planning and scheduling of oil-refinery distillation units. Ind
Eng Chem Res. 2013;52(51):1832418333.
27. Al-Qahtani K, Elkamel A. Multisite facility network integration
design and coordination: an application to the refining industry.
Comp Chem Eng. 2008;32(10):21892202.
28. Al-Qahtani K, Elkamel A. Multisite refinery and petrochemical network design: optimal integration and coordination. Ind Eng Chem
Res. 2009;48(2):814826.
29. Baulin ES, Boronin AB, Khokhlov AS. Rolling detailed short-term
planning of oil refineries and petrochemical complexes and optimization model updating. Autom Remote Control. 2015;76(1):139148.
DOI 10.1002/aic
3039
30. Gonzalo MF, Balseyro IG, Bonnardot J, Morel F, Sarrazin P. Consider integrating refining and petrochemical operations. Hydrocarb
Process. 2004;83:6165.
31. Sadhukhan J, Zhang N, Zhu XX. Analytical optimisation of industrial systems and applications to refineries, petrochemicals. Chem
Eng Sci. 2004;59(20):41694192.
32. Swaty TE. Consider over-the-fence product stream swapping to raise
profitability. Hydrocarb Process. 2002;81:3742.
33. Qin SJ. Process data analytics in the era of big data. AIChE J. 2014;
60(9):30923100.
34. Jaeckle CM, Macgregor JF. Product design through multivariate statistical analysis of process data. AIChE J. 1998;44(5):11051118.
35. Kourti T, Lee J, Macgregor JF. Experiences with industrial applications of projection methods for multivariate statistical process control. Comp Chem Eng. 1996;20(Suppl. 1):S745S750.
36. Kourti T, MacGregor JF. Process analysis, monitoring and diagnosis,
using multivariate projection methods. Chemomet Intell Lab Syst.
1995;28(1):321.
37. Zhang Y, Li S. Modeling and monitoring of nonlinear multi-mode
processes. Control Eng Pract. 2014;22:194204.
38. Misener R, Floudas C. GloMIQO: global mixed-integer quadratic
optimizer. J Glob Optim. 2013;57(1):350.
39. Misener R, Floudas C. ANTIGONE: algorithms for coNTinuous/
integer global optimization of nonlinear equations. J Glob Optim.
2014;59(23):503526.
40. Misener R, Floudas C. A framework for globally optimizing mixedinteger signomial programs. J Optim Theory Appl. 2014;161(3):905932.
3040
DOI 10.1002/aic
41. Misener R, Smadbeck JB, Floudas CA. Dynamically generated cutting planes for mixed-integer quadratically constrained quadratic programs and their incorporation into GloMIQO 2. Optim Method
Software. 2014;30(1):215249.
42. Packie J. Distillation equipment in the oil refining industry. AIChE
Trans. 1941;37(51):5178.
43. Watkins RN. Petroleum Refinery Distillation. Houston, TX: Gulf
Publishing Co., 1979.
44. Brooks RW, Van-Walsem FD, Drury J. Choosing cutpoints to optimize product yields: refining developments: special report. Hydrocarb Process. 1999;78(11):5360.
45. Guerra OJ, Le Roux GAC. Improvements in petroleum refinery planning: 1. formulation of process models. Ind Eng Chem Res. 2011;
50(23):1340313418.
46. Guerra OJ, Le Roux GAC. Improvements in petroleum refinery
planning: 2. Case studies. Ind Eng Chem Res. 2011;50(23):
1341913426.
47. Zhang J, Zhu XX, Towler GP. A simultaneous optimization strategy
for overall integration in refinery planning. Ind Eng Chem Res.
2001;40(12):26402653.
48. Geddes RL. A general index of fractional distillation power for
hydrocarbon mixtures. AIChE J. 1958;4(4):389392.
49. Alattas AM, Grossmann IE, Palou-Rivera I. Refinery production
planning: multiperiod MINLP with nonlinear CDU model. Ind Eng
Chem Res. 2012;51(39):1285212861.
Manuscript received Dec. 20, 2015, and revision received Feb. 9, 2016.
AIChE Journal

Aic 15220

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Aic 15220

Caricato da

Copyright:

Formati disponibili

Data-Driven Mathematical Modeling and Global Optimization

Framework for Entire Petrochemical Planning Operations

Fani Boukouvala* and Christodoulos A. Floudas

Baoguo Zhao, Guangming Du, Xin Su, and Hongwei Liu

Nowadays, tight competition, environmental regulations, and

September 2016 Vol. 62, No. 9

September 2016 Vol. 62, No. 9

are presented; and lastly the RPMS software is used in Baulin

Published on behalf of the AIChE

An overview of the overall framework for data-analysis, model

units, ethylene cracking units, polyethylene production units,

Published on behalf of the AIChE

September 2016 Vol. 62, No. 9

demand requirements, the final products including gasoline,

September 2016 Vol. 62, No. 9

units, (b) production mode selection for multimode units, (c)

Published on behalf of the AIChE

Figure 3. Overview of the proposed data-driven global optimization framework.

2. amount of crudes, and their crude assay data;

Overview of Global Optimization Approach

cal complex. In the framework, the first step is to collect data

Published on behalf of the AIChE

September 2016 Vol. 62, No. 9

Table 1. List of Properties and Relevance

September 2016 Vol. 62, No. 9

Relevance to (important for)

Figure 4. (a) Data matrix for each unit operation with

Published on behalf of the AIChE

Figure 5. A schematic diagram of a crude distillation

the importance of the property or yield which was predicted

Yield and Property Prediction Modeling

Crude distillation unit

Published on behalf of the AIChE

September 2016 Vol. 62, No. 9

Different types of crude oils are fed as feedstock into CDUs

where, Yield(c,u) is the accumulated yield of crude c to CDU.

September 2016 Vol. 62, No. 9

yield is a polynomial function of TBP temperature for Figure

an; c T cut sn 2

The final production yield of each distillation [Yield(s,u)] is

Most distillation models in the refinery planning problem

Published on behalf of the AIChE

Yields ; c; u10:5 Yields; c; u

Figure 8 illustrates the correlation between viscosity and

Figure 7. Example of the relationship between SPG

The final property e for distillate s in the distillation unit u

[Color figure can be viewed in the online issue, which is

Li et al.17 used regression models based on crude properties to

Secondary processing units

where, MYield(s,c,u) is the mid-point yield of CDU fractions,

Figure 8. Example of the relationship between viscosity

Published on behalf of the AIChE

[Color figure can be viewed in the online issue, which is

September 2016 Vol. 62, No. 9

function of inlet flow rates and inlet properties of the specific

For each unit in U, it is necessary to develop jSout

September 2016 Vol. 62, No. 9

byield;i;j sout ; u x yieldiin sout ; u x yieldjin sout ; u

be;i;j e; sout ; u x ei sout ; u x ej sout ; u

Published on behalf of the AIChE

Yields ; u1LossU 5100

Yield lo sout ; u  Yieldsout ; u  Yield up sout ; u

Yield lo sout ; u Yieldsout ; u Yield up sout ; u