Sei sulla pagina 1di 4

Case Study

More Data,
More Problems?
“I
n God we trust; all others bring data,” W.
Advice on Edwards Deming once said. And do we ever

ensuring bring data:


It’s been estimated that we create about 2.5 quin-
data quality tillion bytes of data per day, VcloudNews reported

and integrity
a few years back. That’s the equivalent of 10 million
Blu-ray discs that, if piled high, would almost reach
in the age of the height of four Eiffel Towers stacked one on top
of another.¹ At the same time, it was estimated that
big data about 90% of the world’s data have been created
in the last two years, according to multiple sources,
Topic including IBM.²
Data quality How many rows did you say Minitab could
handle?
Author Recently, Six Sigma Forum Magazine editor Jim
Gary S. Netherton
Bossert wrote that one of the many challenges the
Email Six Sigma community faces is handling big data.³ As
gary.netherton@live.com a Six Sigma Black Belt who has pursued semi-formal
(Udemy.com and Johns Hopkins University) and
personal training in data science (DataCamp.com as
well as many book purchases from Amazon), I have
attempted to use these data science techniques in
my profession as a quality manager and senior pro-
cess engineer. What I’ve learned is that the basics
of Six Sigma’s define, measure, analyze, improve
and control (DMAIC) measure phase are applicable,
regardless of data size.

Are the data good?


The first question I’ve learned to ask is, “Do I trust
the data?” The following is an embarrassing exam-
ple that explains why.
Earlier in my career, my employer was involved
in a dispute with a component supplier regarding
its parts that we used to manufacture our finished that was a code providing summary information
product. Our service history showed an abnormal fre- about the finished product—including information
quency of failure. This weighed heavily on our service regarding whether it used that vendor’s component.
department, as well as on our warranty costs. After the vendor asked a couple of questions, we
I had been provided a database full of service quickly realized that our data were not completely
information that was coded for failures related to accurate. As a matter of fact, 5% of the product we
this component. During a negotiation with the ven- were accusing the vendor of causing to fail not only
dor, the vendor asked about a field in our database didn’t contain its component, but it also couldn’t

10 |
November 2018   LEAN & SIX SIGMA REVIEW
contain its component because it did not use that impacted by differences in installation, environment,
kind of device. The situation was analogous to operator procedures and similar factors that make
saying that an ink toner cartridge was causing a car analysis difficult.”⁴
to stall. In instances where data are not field service data,
This led me to begin questioning any data that it may be real-time data from equipment running in
I analyzed. Later, I found several service jobs that a facility and stored continuously in a database. In
reported replacing components on products that these instances, it is nevertheless important that the
didn’t use those components, just like the earlier gauges or devices used are accurate—something
example. What started as confidence in the data that’s accomplished through regular calibration.
was quickly proven to be 95% accuracy. Over sub- Another embarrassing episode in my career
sequent months, 95% became 90%, and eventually involved temperature measurements needed for the
the team crossed its fingers and prayed the data appropriate application of a secondary sealant. The
were even 80% to 85% accurate. sealant was applied in conjunction with a primary
With the significant increase in data and its easy sealant that required a minimum temperature to
availability, it becomes more critical that quality and ensure a hermetic seal. The vendor of both sealants
process improvement professionals understand the had specific guidelines for the application of both to
quality of the data. I am a firm believer that having create a successful airtight unit.
no data is better than using bad data. At least if you During a visit to one of my facilities, I noted an
have no data, you can begin the search for data, operator applying sealant at a temperature (per the
ensuring its integrity. digital readout) 20 degrees Fahrenheit below the
Six Sigma instruction teaches several data review desired minimum. When maintenance was called
methods such as measurement system analysis, to evaluate the situation, it discovered that the
control charting and gauge calibration. These meth- Thanks to the readings on the display were 40 degrees Fahrenheit
ods ensure that gauges are useful, the measurement internet making lower than what was being displayed, meaning that
system works and users understand what their the process easier, the actual temperature was 60 degrees Fahrenheit
data are doing while attempting to explain natural quality and process lower than the desired minimum.
variation versus assignable cause. improvement When I confronted the plant leadership about
When businesses create huge databases (for professionals are the need to regularly calibrate the equipment, the
example, Hadoop or MongoDB) that serve as data often tasked with response was, “We didn’t have time.” While this
stores for organization information, there is often a working on data is more often an excuse given in today’s factory
curtain separating the analyst and the data source. from a multitude of settings, it is a reality in today’s world of manufac-
Users may access a data warehouse through a front sources that may turing. Management often wants more from less.
end—such as IBM Cognos or another customer or may not be in Nevertheless, such situations remind us to ques-
user interface—but have no background on where their building, their tion the collected data used for quality or process
exactly the data came from—something that is city or even their improvements. Using bad data to make changes
important to know. country. can lead to wrong decisions that can cause more
problems than if nothing had been done.
Where did the data come from?
Thanks to the internet making the process easier, How are the data arranged?
quality and process improvement professionals are Data are often presented in a dirty (that is, not
often tasked with working on data from a multitude clearly formatted) and inconsistent manner. I once
of sources that may or may not be in their building, used a data warehouse in which the manufacture
their city or even their country. A shortcut on a date was formatted as 200905 (May 2009), but
computer starts an interface that the professional the service requested data was formatted 20130412
uses to access the data warehouse or database. It is (April 12, 2013). Midway through my tenure at this
possible that the data may be identified in a manner firm, the latter was changed to 2013-04-12. For the
to tell the end user where it came from geographi- data-wrangling reader, all data were formatted as
cally or what facility (for multifacility organizations) text and not as a date, making date-related calcula-
provided the data, but there are no guarantees. tions impossible without data-type conversions.
Sometimes, the facility providing the data on Matters are complicated further when data are
the product or component is not the same facility presented in a multipage format in a text file. Every
that manufactured it. Consider field service data, page has multiple lines of header information that
for instance. According to the Certified Reliability may or may not be consistent in its layout—and with
Engineer Handbook, “This information is often the meat of the data in the bottom two-thirds of

|
LEAN & SIX SIGMA REVIEW   asq.org/pub/sixsigma 11
Case Study > More Data, More Problems?

each page. Compiling this data into a format that needs of the organization, it is important to ensure
would be accepted by Minitab or even Microsoft that the data collected are reliably stored in an
Excel is not intuitive, and typically is not something accessible manner so that quality and processes can
many users are used to doing on a regular basis. be monitored and analyzed.
While there are several tools available that make After you know why you want data, it is import-
cleaning this data less of a burden than even a ant to understand whether you are collecting the
couple of years ago, there is no easy way to simply right data. Don’t assume the folks running a process
import the data into Minitab or other Six Sigma were the ones who established the process. Perhaps
software, at least for the average user. With formal they inherited a machine or a collection of equip-
training or self instruction, there are several power- ment with preestablished data collection. That data
ful tools available from Microsoft that enable data may or may not be worthwhile based on the needs
wrangling in an effective manner. I only bring up of the organization or department. If you’re running
Microsoft because the Office Suite is typically what a process in which the pressure and temperature
an employee is provided by his or her organization. are important, for example, more than likely it won’t
Some organizations provide Minitab, Python or R. make any sense to measure voltage.
In most cases, however, there are several obstacles What are the units involved? Degrees Fahrenheit
to getting these nonstandard software packages or Celsius? Meters per minute or feet per min-
approved by corporate IT for installation. ute? This may seem like a simple enough issue to
Understanding how to handle the data is no less address. Identify the units needed and ensure that
important than the other concerns mentioned so far. those are the units used. Purchase a new piece of
Proper data arrangement helps users and analysts equipment that goes in your Richmond, VA, plant
go the extra mile to understand the data records from a German organization, and all the unit dis-
and fields within each. Anand Tamboli, the director plays on the machine will be in imperial units, right?
of emerging technologies at Knewron Technologies, This scenario is not only common, but it also
an electronics manufacturer in India, reiterates this often causes headaches because a thorough check
when addressing the importance of data mining to is not completed before commissioning the equip-
Six Sigma: ment. And it is not unheard of to have equipment
“Since quality of results is as good as the quality that has metric and imperial units, depending on the
and treatment of data, it is highly recommended to measure displayed.
follow the data mining approach religiously while Confirming all readings and metrics are correct
working on the measure and analyze phases. and in the correct units is vital to successfully
“While Six Sigma itself contains some of the data running a machine, completing an airplane flight⁷ or
mining steps, it does not provide detailed know- landing on Mars.⁸
how of these steps.”⁵ GO TO THE GEMBA, AND REVIEW CALIBRATION
Tamboli continues to address specific technical AND PREVENTIVE MAINTENANCE SCHEDULES.
steps for data mining, but not before first stating Only by properly calibrating measuring devices can
that preprocessing (cleaning) the data is a critical you be confident in the correctness of measurements While there are several
first step. taken by trained personnel using those devices. If a tools available that
“Cleaning can remove data with noise and machine setup requires certain speeds and pressures make cleaning this
missing information points. It is also necessary to to properly form a part, it behooves the equipment data less of a burden
validate integrity of data points in a set. These are owner to confirm that readouts and displays on the than even a couple of
essential steps to obtain sanity in the results.”⁶ equipment are accurate. Calibration is the only way years ago, there is no
If only I’d read this earlier in my career, I could to truly confirm this. Calibration isn’t just for hand easy way to simply
have helped save my employer some embarrass- tools such as calipers, micrometers or measuring import the data into
ment. Expending the extra energy to ensure data tapes. Thermocouples and pressure sensors on Minitab or other Six
quality and integrity is now a given in my work. automated equipment require the same amount of Sigma software, at
checking to always ensure they display as accurate least for the average
What can you do? a value as possible. A regularly scheduled calibration user.
REVIEW EXPECTATIONS. What are you trying to programs helps ensure that.
collect data for? Is it a critical-to-quality character- REVIEW OPERATOR TRAINING AND UNDER-
istic of the process or a dimension of a component? STANDING. For labs, “ISO/IEC 17025, subclause
Why do you want that data? Implementing a 4.1.5 d, requires that your management system ‘have
measurement system typically requires some capital policies and procedures to avoid involvement in
or, in some cases, some expense funding. Given the any activities that would diminish confidence in its

12 |
November 2018   LEAN & SIX SIGMA REVIEW
competence, impartiality, judgment, or operational the floor with operations can be frustrating, exhausting
integrity.’”⁹ This should apply to all operations, as and certainly appear to be a real drag on resources.
well as certified labs. Nevertheless, trying to solve a problem with bad data
A colleague recently pointed out that he had is a recipe for disaster. It can cause problems that, in
caught an operator on the floor falsifying data. many unfortunate cases, will not be found before the
Certain checks were to be completed hourly and customer finds them or, in a worst-case scenario, a
the time of sample collection written on the part. catastrophe occurs.
The operator had not collected samples regularly. With all due respect to Deming, I would modify his
When he finally collected the number of samples, quote to the following: “In God we trust, all others
he did so at one time and backdated some samples, bring good data.”
giving the impression that he’d been doing his job.
This deception was recognized when my colleague’s
quality department found nonconformances in REFERENCES
the hourly samples, but none in the product that Data collection 1. Ben Walker, “Every Day Big Statistics—2.5 Quintillion
had allegedly been made when the samples would devices may need to Bytes of Data Created Daily,” VCloud News, April 5, 2015,
have been cut. Putting two and two together, the be updated to handle https://tinyurl.com/vcloud-big-stats.
2. Ralph Jacobson, “2.5 Quintillion Bytes of Data Created
team determined that the sample submission was the measurements
Every Day. How Does CPG and Retail Manage It?” IBM,
fraudulent. involved or even April 24, 2013, https://tinyurl.com/ibm-jacobson-data.
The importance of providing operators with for the ability to 3. James L. Bossert, “Is Quality 4.0 the End of Six Sigma?”
required training and needed resources cannot communicate with a Six Sigma Forum Magazine, Vol. 27, No. 3, May 2016, p. 4.
4. Donald W. Benbow and Hugh W. Broome, The Certified
be understated. Additionally, a third leg to that cloud versus a local
Reliability Engineer Handbook, second edition, ASQ
stool would be holding personnel accountable for server. Quality Press, 2013.
compliance with company policies regarding data 5. Anand Tamboli, “An Introduction to Data Mining,”
integrity. While my colleague did not work in a Process Excellence Network, July 19, 2010, https://tinyurl.
com/intro-data-mining.
regulated industry, his scenario could have cost his
6. Ibid.
organization significantly with a quality spill and the 7. Richard Witkin, “Jet’s Fuel Ran Out after Metric
associated costs (financial and reputational). Conversion Errors,” New York Times, July 30, 1983,
REVIEW TOOLS’ ADEQUACY FOR THE JOB. https://tinyurl.com/nyt-jet-fuel.
8. Robert L. Hotz, “Mars Probe Lost Due to Simple Math
Data collection devices may need to be updated
Error,” Los Angeles Times, Oct. 1, 1999, https://tinyurl.
to handle the measurements involved or even for com/la-times-mars-errors.
the ability to communicate with a cloud versus a 9. Jason Stine, “ISO/IEC 17025: Data Integrity Begins With
local server. In one instance, a company needed Employee Integrity,” Quality Digest, Aug. 30, 2010,
https://tinyurl.com/qd-data-integrity.
to upgrade a device because it could not generate
sufficient heat to melt beads in preparation for mea-
surement, as well as being unable to measure those
higher temperatures. This testing and measurement
was critical to quality, but the existing equipment
could not handle the requirements, a fact that one
of the engineering directors discovered when dig-
ging into the details of testing a new product.

Never hesitate to question


Above and beyond these recommendations, don’t
be afraid to question the data. As an analyst, a
Six Sigma belt, or even simply a quality or pro-
cess improvement professional, do not shy away GARY S. NETHERTON is a qual-
from becoming intimately familiar with the data ity manager at Northwest Door in
presented. Only by having more than a passing Puyallup, WA. He holds a bachelor’s
familiarity with the data can you increase confi- degree in electrical engineering from
dence that following a data analysis process will Kettering University (formerly named the GMI
yield accurate results. Engineering and Management Institute) in Flint,
There can be no doubt that developing this MI. Netherton is an ASQ senior member and an
increased familiarity is time consuming. Asking ques- ASQ-certified quality engineer, manager and
tions, doing research and spending more time on Black Belt.

|
LEAN & SIX SIGMA REVIEW   asq.org/pub/sixsigma 13

Potrebbero piacerti anche