Database Archiving: How to Keep Lots of Data for a Very Long Time

Ebook626 pages4 hours

Database Archiving: How to Keep Lots of Data for a Very Long Time

Name: Database Archiving: How to Keep Lots of Data for a Very Long Time
Brand: Elsevier Science
Rating: 3.0 (3 reviews)

By Jack E. Olson

Rating: 3 out of 5 stars

3/5

()

Read preview

About this ebook

With the amount of data a business accumulates now doubling every 12 to 18 months, IT professionals need to know how to develop a system for archiving important database data, in a way that both satisfies regulatory requirements and is durable and secure. This important and timely new book explains how to solve these challenges without compromising the operation of current systems. It shows how to do all this as part of a standardized archival process that requires modest contributions from team members throughout an organization, rather than the superhuman effort of a dedicated team.

Exhaustively considers the diverse set of issues—legal, technological, and financial—affecting organizations faced with major database archiving requirements
Shows how to design and implement a database archival process that is integral to existing procedures and systems
Explores the role of players at every level of the organization—in terms of the skills they need and the contributions they can make.
Presents its ideas from a vendor-neutral perspective that can benefit any organization, regardless of its current technological investments
Provides detailed information on building the business case for all types of archiving projects

Skip carousel

LanguageEnglish

PublisherElsevier Science

Release dateJul 28, 2010

ISBN9780080884424

Author

Jack E. Olson

Jack E. Olson is a widely recognized database technology expert. His career includes significant contributions at IBM, BMC, Evoke, and now NEON Enterprise Software, where he serves as Chief Technology Office. Olson is author of Data Quality: The Accuracy Dimension, also published by Morgan Kaufmann. The inventor of record on several patents, he holds a BS from the Illinois Institute of Technology and an MBA from Northwestern University.

Related authors

Skip carousel

Related to Database Archiving

Related ebooks

Skip carousel

Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
Ebook
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
byDavid Hecksel
Rating: 5 out of 5 stars
5/5
Hadoop in Practice
Ebook
Hadoop in Practice
byAlex Holmes
Rating: 0 out of 5 stars
0 ratings
Data Deduplication Approaches: Concepts, Strategies, and Challenges
Ebook
Data Deduplication Approaches: Concepts, Strategies, and Challenges
byTin Thein Thwel
Rating: 0 out of 5 stars
0 ratings
SQL Server 2008 Administration in Action
Ebook
SQL Server 2008 Administration in Action
byRodney C. Colledge
Rating: 0 out of 5 stars
0 ratings
Oracle Database 11g - Underground Advice for Database Administrators: Beyond the basics
Ebook
Oracle Database 11g - Underground Advice for Database Administrators: Beyond the basics
byApril C. Sims
Rating: 0 out of 5 stars
0 ratings
Introduction to Oracle Database Administration
Ebook
Introduction to Oracle Database Administration
byYing Wang
Rating: 5 out of 5 stars
5/5
ASP.NET 4.0 in Practice
Ebook
ASP.NET 4.0 in Practice
byStefano Mostarda
Rating: 0 out of 5 stars
0 ratings
Data Mining: Know It All
Ebook
Data Mining: Know It All
bySoumen Chakrabarti
Rating: 0 out of 5 stars
0 ratings
Microsoft Certified Database Administrator A Complete Guide - 2020 Edition
Ebook
Microsoft Certified Database Administrator A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Implementing Database Security and Auditing
Ebook
Implementing Database Security and Auditing
byRon Ben Natan
Rating: 4 out of 5 stars
4/5
Database Design and Relational Theory: Normal Forms and All That Jazz
Ebook
Database Design and Relational Theory: Normal Forms and All That Jazz
byC.J. Date
Rating: 4 out of 5 stars
4/5
Beginning Oracle Database 12c Administration: From Novice to Professional
Ebook
Beginning Oracle Database 12c Administration: From Novice to Professional
byIgnatius Fernandez
Rating: 0 out of 5 stars
0 ratings
Organizing Information: Principles of Data Base and Retrieval Systems
Ebook
Organizing Information: Principles of Data Base and Retrieval Systems
byDagobert Soergel
Rating: 4 out of 5 stars
4/5
PostgreSQL 9 High Availability Cookbook
Ebook
PostgreSQL 9 High Availability Cookbook
byShaun M. Thomas
Rating: 5 out of 5 stars
5/5
Database Security A Complete Guide - 2021 Edition
Ebook
Database Security A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Oracle Database Security Interview Questions, Answers, and Explanations: Oracle Database Security Certification Review
Ebook
Oracle Database Security Interview Questions, Answers, and Explanations: Oracle Database Security Certification Review
byEquity Press
Rating: 0 out of 5 stars
0 ratings
An Introduction to Data Base Design
Ebook
An Introduction to Data Base Design
byBetty Joan Salzberg
Rating: 0 out of 5 stars
0 ratings
Oracle: Protect Your Data
Ebook
Oracle: Protect Your Data
byFloribert TCHOKO
Rating: 0 out of 5 stars
0 ratings
user stories A Complete Guide - 2019 Edition
Ebook
user stories A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Network Designs A Complete Guide - 2019 Edition
Ebook
Network Designs A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Oracle Database Programming using Java and Web Services
Ebook
Oracle Database Programming using Java and Web Services
byKuassi Mensah
Rating: 0 out of 5 stars
0 ratings
Demystifying the Azure Well-Architected Framework: Guiding Principles and Design Best Practices for Azure Workloads
Ebook
Demystifying the Azure Well-Architected Framework: Guiding Principles and Design Best Practices for Azure Workloads
byShijimol Ambi Karthikeyan
Rating: 0 out of 5 stars
0 ratings
WS-BPEL 2.0 Beginner's Guide
Ebook
WS-BPEL 2.0 Beginner's Guide
byMatjaz B. Juric
Rating: 0 out of 5 stars
0 ratings
Logical Data Warehouse A Complete Guide - 2019 Edition
Ebook
Logical Data Warehouse A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Pro SQL Server 2019 Administration: A Guide for the Modern DBA
Ebook
Pro SQL Server 2019 Administration: A Guide for the Modern DBA
byPeter A. Carter
Rating: 0 out of 5 stars
0 ratings
A Discipline of Software Engineering
Ebook
A Discipline of Software Engineering
byB. Walraet
Rating: 0 out of 5 stars
0 ratings
Mastering MariaDB
Ebook
Mastering MariaDB
byRazzoli Federico
Rating: 0 out of 5 stars
0 ratings
Microsoft SharePoint 2013 Disaster Recovery Guide
Ebook
Microsoft SharePoint 2013 Disaster Recovery Guide
byPeter Ward
Rating: 0 out of 5 stars
0 ratings
MySQL Cluster 7.5 inside and out
Ebook
MySQL Cluster 7.5 inside and out
byMikael Ronström
Rating: 0 out of 5 stars
0 ratings
Database design A Complete Guide - 2019 Edition
Ebook
Database design A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings

Databases For You

Skip carousel

SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
Ebook
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
byWilliam Sullivan
Rating: 5 out of 5 stars
5/5
MATLAB Machine Learning Recipes: A Problem-Solution Approach
Ebook
MATLAB Machine Learning Recipes: A Problem-Solution Approach
byMichael Paluszek
Rating: 0 out of 5 stars
0 ratings
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Excel 2021
Ebook
Excel 2021
byJIAYI SIMONDS
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
SQL Clearly Explained
Ebook
SQL Clearly Explained
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Business Intelligence Strategy and Big Data Analytics: A General Management Perspective
Ebook
Business Intelligence Strategy and Big Data Analytics: A General Management Perspective
bySteve Williams
Rating: 5 out of 5 stars
5/5
CompTIA DataSys+ Study Guide: Exam DS0-001
Ebook
CompTIA DataSys+ Study Guide: Exam DS0-001
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Data Science Using Python and R
Ebook
Data Science Using Python and R
byChantal D. Larose
Rating: 0 out of 5 stars
0 ratings
Data Stewardship: An Actionable Guide to Effective Data Management and Data Governance
Ebook
Data Stewardship: An Actionable Guide to Effective Data Management and Data Governance
byDavid Plotkin
Rating: 4 out of 5 stars
4/5
Learn SQL Server Administration in a Month of Lunches
Ebook
Learn SQL Server Administration in a Month of Lunches
byDon Jones
Rating: 3 out of 5 stars
3/5
The Visual Imperative: Creating a Visual Culture of Data Discovery
Ebook
The Visual Imperative: Creating a Visual Culture of Data Discovery
byLindy Ryan
Rating: 4 out of 5 stars
4/5
Python Projects for Everyone
Ebook
Python Projects for Everyone
byMohamad Charara
Rating: 0 out of 5 stars
0 ratings
JAVA for Beginner's Crash Course: Java for Beginners Guide to Program Java, jQuery, & Java Programming
Ebook
JAVA for Beginner's Crash Course: Java for Beginners Guide to Program Java, jQuery, & Java Programming
byQuick Start Guides
Rating: 4 out of 5 stars
4/5
Access 2019 For Dummies
Ebook
Access 2019 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
Serverless Architectures on AWS, Second Edition
Ebook
Serverless Architectures on AWS, Second Edition
byPeter Sbarski
Rating: 5 out of 5 stars
5/5
SQL Server: Tips and Tricks - 2
Ebook
SQL Server: Tips and Tricks - 2
byPriyanka Agarwal
Rating: 4 out of 5 stars
4/5
Relational Database Design and Implementation
Ebook
Relational Database Design and Implementation
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Go in Action
Ebook
Go in Action
byErik St. Martin
Rating: 5 out of 5 stars
5/5
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
Ebook
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
byJohn Ladley
Rating: 4 out of 5 stars
4/5
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
Ebook
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
byArmstrong Subero
Rating: 0 out of 5 stars
0 ratings
Learning ArcGIS Geodatabases
Ebook
Learning ArcGIS Geodatabases
byHussein Nasser
Rating: 5 out of 5 stars
5/5
Data Science Strategy For Dummies
Ebook
Data Science Strategy For Dummies
byUlrika Jägare
Rating: 0 out of 5 stars
0 ratings
COBOL Basic Training Using VSAM, IMS and DB2
Ebook
COBOL Basic Training Using VSAM, IMS and DB2
byRobert Wingate
Rating: 5 out of 5 stars
5/5
LINUX: Beginner's Crash Course. Your Step-By-Step Guide To Learning The Linux Operating System And Command Line Easy & Fast!
Ebook
LINUX: Beginner's Crash Course. Your Step-By-Step Guide To Learning The Linux Operating System And Command Line Easy & Fast!
byJeremy Li
Rating: 3 out of 5 stars
3/5
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
Ebook
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
byDan Clark
Rating: 0 out of 5 stars
0 ratings
100+ SQL Queries T-SQL for Microsoft SQL Server
Ebook
100+ SQL Queries T-SQL for Microsoft SQL Server
byIFS Harrison
Rating: 4 out of 5 stars
4/5
Developing High Quality Data Models
Ebook
Developing High Quality Data Models
byMatthew West
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
Podcast episode
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
byData Engineering Podcast
100%
100% found this document useful
Building Real Time Applications On Streaming Data With Eventador - Episode 129: An interview with Eventador CEO Kenny Gorman about the challenges of building a managed service for streaming data to simplify building real time applications
Podcast episode
Building Real Time Applications On Streaming Data With Eventador - Episode 129: An interview with Eventador CEO Kenny Gorman about the challenges of building a managed service for streaming data to simplify building real time applications
byData Engineering Podcast
0 ratings
0% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
Building A Cost Effective Data Catalog With Tree Schema - Episode 158: An interview about the Tree Schema data catalog platform and using it to quickly get visibility into your data assets.
Podcast episode
Building A Cost Effective Data Catalog With Tree Schema - Episode 158: An interview about the Tree Schema data catalog platform and using it to quickly get visibility into your data assets.
byData Engineering Podcast
0 ratings
0% found this document useful
Streaming Data Pipelines Made SQL With Decodable: An interview with Eric Sammer about the difficulty of working with streaming engines at a low level of abstraction and how he and his team at Decodable are working to make development of streaming data pipelines as straightforward as writing SQL
Podcast episode
Streaming Data Pipelines Made SQL With Decodable: An interview with Eric Sammer about the difficulty of working with streaming engines at a low level of abstraction and how he and his team at Decodable are working to make development of streaming data pipelines as straightforward as writing SQL
byData Engineering Podcast
0 ratings
0% found this document useful
Database Monitoring & Observability
Podcast episode
Database Monitoring & Observability
byThe Cloudcast
0 ratings
0% found this document useful
426: Will Yarborough - Photo Data Storage Made Simple
Podcast episode
426: Will Yarborough - Photo Data Storage Made Simple
byThe Beginner Photography Podcast
0 ratings
0% found this document useful
Serverless Data APIs
Podcast episode
Serverless Data APIs
byThe Cloudcast
0 ratings
0% found this document useful
State of Containers in the Public Cloud
Podcast episode
State of Containers in the Public Cloud
byThe Cloudcast
0 ratings
0% found this document useful
Data Residency: A Technical Perspective with Skyflow’s Manish Ahluwalia: In this episode, Manish Ahluwalia, the field CTO of Skyflow, discusses the technical aspects of data residency and the usage of a data privacy vault. He explains the concept of data residency and data localization. He noted that with the increasing a...
Podcast episode
Data Residency: A Technical Perspective with Skyflow’s Manish Ahluwalia: In this episode, Manish Ahluwalia, the field CTO of Skyflow, discusses the technical aspects of data residency and the usage of a data privacy vault. He explains the concept of data residency and data localization. He noted that with the increasing a...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
The Cloudcast #223 - Surprising Facts and Monitoring Docker: Aaron talks with Ilan Rabinovitch (<a href="http://twitter.com/irabinovitch">@irabinovitch</a>, Technical Community & Evangelism @Datadog) at DockerCon EU about his impressions from the Day 1 General Session, 8 Surprising Facts About Real Rocker Adopti...
Podcast episode
The Cloudcast #223 - Surprising Facts and Monitoring Docker: Aaron talks with Ilan Rabinovitch (<a href="http://twitter.com/irabinovitch">@irabinovitch</a>, Technical Community & Evangelism @Datadog) at DockerCon EU about his impressions from the Day 1 General Session, 8 Surprising Facts About Real Rocker Adopti...
byThe Cloudcast
0 ratings
0% found this document useful
Episode 15: Nagios was the Original Call of Duty: Let’s chat about the Cloud and everything in between. The people in this world are pretty comfortable with not running physical servers on their own, but trusting someone else to run them. Yet, people suffer from the psychological barrier of thinking they
Podcast episode
Episode 15: Nagios was the Original Call of Duty: Let’s chat about the Cloud and everything in between. The people in this world are pretty comfortable with not running physical servers on their own, but trusting someone else to run them. Yet, people suffer from the psychological barrier of thinking they
byScreaming in the Cloud
0 ratings
0% found this document useful
S3E24 Cloud Killer: Storj and the Decentralised Storage Movement w. Ben Golub
Podcast episode
S3E24 Cloud Killer: Storj and the Decentralised Storage Movement w. Ben Golub
byBlockchain Won't Save the World
0 ratings
0% found this document useful
Bringing DevOps to the Database with Automation
Podcast episode
Bringing DevOps to the Database with Automation
byThe Cloudcast
0 ratings
0% found this document useful
Autonomous Database on Dedicated Infrastructure: The Oracle Autonomous Database Dedicated deployment is a good choice for customers who want to implement a private database cloud in their own dedicated Exadata infrastructure. That dedicated infrastructure can either be in the Oracle Public Cloud or...
Podcast episode
Autonomous Database on Dedicated Infrastructure: The Oracle Autonomous Database Dedicated deployment is a good choice for customers who want to implement a private database cloud in their own dedicated Exadata infrastructure. That dedicated infrastructure can either be in the Oracle Public Cloud or...
byOracle University Podcast
0 ratings
0% found this document useful
Storage Launches with Brian Schwarz and Sean Derrington: On the podcast this week, our guests Brian Schwarz and Sean Derrington discuss the ins and outs of the new storage launches with your hosts Stephanie Wong and Jenny Brown.
Podcast episode
Storage Launches with Brian Schwarz and Sean Derrington: On the podcast this week, our guests Brian Schwarz and Sean Derrington discuss the ins and outs of the new storage launches with your hosts Stephanie Wong and Jenny Brown.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
Data-baeses: Writing data is easy. You take in the information and put it away for future use. It’s remembering exactly what you wrote and where you put it that’s the challenge. Just like having to look for your keys as you try to rush out the door, getting that data quickly makes all the difference. And when your database is your bestie, it can serve that information faster than you could imagine. Getting a database into shape takes specialized skills. From planning and development to maintenance and rebuilding, it’s a layer of the stack that needs constant attention and evaluation. It can be a performance booster—or an efficiency bottleneck. What does it take to keep your database and the information it stores available to the stack?
Podcast episode
Data-baeses: Writing data is easy. You take in the information and put it away for future use. It’s remembering exactly what you wrote and where you put it that’s the challenge. Just like having to look for your keys as you try to rush out the door, getting that data quickly makes all the difference. And when your database is your bestie, it can serve that information faster than you could imagine. Getting a database into shape takes specialized skills. From planning and development to maintenance and rebuilding, it’s a layer of the stack that needs constant attention and evaluation. It can be a performance booster—or an efficiency bottleneck. What does it take to keep your database and the information it stores available to the stack?
byCompiler
0 ratings
0% found this document useful
Debezium - Capturing Data the Instant it Happens (with Gunnar Morling)
Podcast episode
Debezium - Capturing Data the Instant it Happens (with Gunnar Morling)
byDeveloper Voices
0 ratings
0% found this document useful
#06 - Tech stack of Open Podcast: Which database is best?
Podcast episode
#06 - Tech stack of Open Podcast: Which database is best?
byTOPP - The Open Podcast Podcast
0 ratings
0% found this document useful
Cloud Firestore for Users who are new to Firestore: Brian Dorsey and Mark Mirchandani are talking intro to Firestore this week with fellow Googler Allison Kornher.
Podcast episode
Cloud Firestore for Users who are new to Firestore: Brian Dorsey and Mark Mirchandani are talking intro to Firestore this week with fellow Googler Allison Kornher.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
The Economics & Beyond of Object Storage
Podcast episode
The Economics & Beyond of Object Storage
byThe Cloudcast
0 ratings
0% found this document useful
Conquering the Last Mile in Data - Caitlin Moorman
Podcast episode
Conquering the Last Mile in Data - Caitlin Moorman
byDataTalks.Club
0 ratings
0% found this document useful
Building ML Apps
Podcast episode
Building ML Apps
byThe Cloudcast
0 ratings
0% found this document useful
Oracle Data Lakehouse: With each passing day, more and more data sources are sending greater volumes of data across the globe. For any organization, this combination of structured and unstructured data continues to be a challenge. Data lakehouses link, correlate, and...
Podcast episode
Oracle Data Lakehouse: With each passing day, more and more data sources are sending greater volumes of data across the globe. For any organization, this combination of structured and unstructured data continues to be a challenge. Data lakehouses link, correlate, and...
byOracle University Podcast
0 ratings
0% found this document useful
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
Podcast episode
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
byData Engineering Podcast
0 ratings
0% found this document useful
Scalable Databases on Kubernetes
Podcast episode
Scalable Databases on Kubernetes
byThe Cloudcast
0 ratings
0% found this document useful
Episode 8: Decentralized Storage Part 2: Solutions: Part 2 in our series on decentralized storage: Solutions. In this episode, we look at the problems decentralised storage are trying to solve, interesting projects trying to tackle these problems and some of the implications.
Podcast episode
Episode 8: Decentralized Storage Part 2: Solutions: Part 2 in our series on decentralized storage: Solutions. In this episode, we look at the problems decentralised storage are trying to solve, interesting projects trying to tackle these problems and some of the implications.
byZero Knowledge
0 ratings
0% found this document useful
Data Center War Stories with Mike Julian: Mike Julian is the CEO of The Duckbill Group, a company you might be familiar with. Prior to co-founding Duckbill with yours truly, Mike was editor in chief at Monitoring Weekly, principal at Aster Labs, a senior DevOps consultant at Taos, a senior system
Podcast episode
Data Center War Stories with Mike Julian: Mike Julian is the CEO of The Duckbill Group, a company you might be familiar with. Prior to co-founding Duckbill with yours truly, Mike was editor in chief at Monitoring Weekly, principal at Aster Labs, a senior DevOps consultant at Taos, a senior system
byScreaming in the Cloud
0 ratings
0% found this document useful

Skip carousel

MARIADB Optimise And Control Your Databases
Linux Format
Article
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Rokoko Studio 2.0
3D World
Article
Rokoko Studio 2.0
Feb 23, 2021
1 min read
Collect And Graph Metrics With Python
Linux Format
Article
Collect And Graph Metrics With Python
May 4, 2021
7 min read
Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
HotPicks
Linux Format
Article
HotPicks
Dec 15, 2020
13 min read
Getting To Grips With Docker
Linux Format
Article
Getting To Grips With Docker
Jan 12, 2021
9 min read
Filesystems
Linux Format
Article
Filesystems
Nov 16, 2021
1 min read
Hybrid Backup For Business
PC Pro Magazine
Article
Hybrid Backup For Business
Apr 8, 2021
4 min read
Cloud & Hybrid Backup 2022
PC Pro Magazine
Article
Cloud & Hybrid Backup 2022
Aug 7, 2022
4 min read
Become a Mac BACKUP EXPERT
iCreate
Article
Become a Mac BACKUP EXPERT
Oct 6, 2022
9 min read
Back Up To The Future
Linux Format
Article
Back Up To The Future
Jan 14, 2020
8 min read
Where is Streaming Data Stored Temporarily? The Role of Storage in Streaming Media
Techfastly
Article
Where is Streaming Data Stored Temporarily? The Role of Storage in Streaming Media
May 1, 2022
4 min read
Seven Steps To Backup Bliss
PC Pro Magazine
Article
Seven Steps To Backup Bliss
Nov 12, 2020
You certainly don’t, but there are reasons you might choose to. If your backup requirements are simple – say, you have just a single folder of critical files that needs backing up once a week – then any number of free programs will do the job. Indeed
6 min read
The Network NAS appliances 2024
PC Pro Magazine
Article
The Network NAS appliances 2024
Apr 4, 2024
4 min read
Cloud Backup 2023
PC Pro Magazine
Article
Cloud Backup 2023
Jul 6, 2023
4 min read
Observability Of The Kernel And Containers
Linux Format
Article
Observability Of The Kernel And Containers
Apr 4, 2023
Mihalis Tsoukalos is currently working on Time Series. You can reach him at: @mactsouk. For our final delve into eBPF, we’re tackling applications, the kernel and Docker containers. At the end of the day, all Linux machines execute code for applicat
10 min read
Find And Clean Up Your Config Files
Linux Format
Article
Find And Clean Up Your Config Files
Feb 11, 2020
10 min read
On-premises Business Backup
PC Pro Magazine
Article
On-premises Business Backup
Oct 5, 2023
4 min read
Business NAS appliances 2023
PC Pro Magazine
Article
Business NAS appliances 2023
Apr 6, 2023
4 min read
Arq 7 Backup: Uniquely Versatile Local And Online Backup
PCWorld
Article
Arq 7 Backup: Uniquely Versatile Local And Online Backup
Aug 1, 2023
4 min read
Eight Questions To Ask Before Buying External Storage
PC Pro Magazine
Article
Eight Questions To Ask Before Buying External Storage
May 11, 2023
6 min read
PHOTOGENEALOGY: STEP 1 GATHER & BACK-UP YOUR PHOTOS
Family Tree UK
Article
PHOTOGENEALOGY: STEP 1 GATHER & BACK-UP YOUR PHOTOS
Jul 8, 2022
3 min read
Safe & Sound
MacLife
Article
Safe & Sound
May 11, 2017
1 min read
Manage And Read Your System Logs
Linux Format
Article
Manage And Read Your System Logs
Mar 10, 2020
10 min read
Mac 911
MacWorld
Article
Mac 911
Sep 18, 2018
5 min read
Mac 911
MacWorld
Article
Mac 911
Apr 20, 2021
7 min read
Photogenealogy: Step 2
Family Tree UK
Article
Photogenealogy: Step 2
Aug 12, 2022
3 min read
Rediscover Speed With The Redis Revolution
Linux Format
Article
Rediscover Speed With The Redis Revolution
Jul 25, 2023
Credit: https://redis.io Redis is an open-source, in-memory data structure store that has gained popularity R as a highly efficient caching and messaging system. It prioritises speed, efficiency and versatility, making it a top choice for various ap
8 min read
Photogenealogy: Step 5 Your Photo Legacy
Family Tree UK
Article
Photogenealogy: Step 5 Your Photo Legacy
Nov 11, 2022
4 min read
Not End Of Life
Linux Format
Article
Not End Of Life
May 30, 2023
In case you haven’t heard, MySQL 5.7 is going end of life (EOL). The upstream project will stop updates in October and focus on MySQL 8.0. This is a logical decision and they’ve given users ample time to upgrade. But some users and organisations need
1 min read

Related categories

Skip carousel

Reviews for Database Archiving

Rating: 3 out of 5 stars

3/5

3 ratings1 review

Rating: 3 out of 5 stars
3/5
was good overview at a high level. would have been nice to see some use case examples in addition to the guidelines

Book preview

Database Archiving - Jack E. Olson

Olson

Brief Table of Contents

Copyright

Dedication

Preface

Acknowledgments

1. Archiving Basics

Chapter 1. Database Archiving Overview

Chapter 2. The Business Case for Database Archiving

Chapter 3. Generic Archiving Methodology

Chapter 4. Components of a Database Archiving System

2. Establishing a Database Archiving Project

Chapter 5. Origins of a Database Archiving Application

Chapter 6. Resources Needed

Chapter 7. Locating Data

Chapter 8. Locating Metadata

Chapter 9. Data and Metadata Validation

3. Designing Database Archiving Applications

Chapter 10. Designing for Archive Independence

Chapter 11. Modeling Archive Data

Chapter 12. Setting Archive Policies

Chapter 13. Changes to Data Structures and Policies

4. Database Archiving Application Software

Chapter 14. The Archive Data Store

Chapter 15. The Archive Data Extraction Component

Chapter 16. The Archive Discard Component

Chapter 17. The Archive Access Component

5. Administration of the Database Archive

Chapter 18. Ongoing Auditing and Testing

Chapter 19. Managing the Archive Over Time

Chapter 20. Nonoperational Sources of Data

Copyright

Dedication

Preface

Acknowledgments

1. Archiving Basics

Chapter 1. Database Archiving Overview

1.1. A Definition of Database Archiving

1.2. Forms of Data Archiving

1.3. The Data Lifeline

1.4. Types of Data Objects

1.5. Data Retention Requirements Versus Data Archives

1.5.1. Database Configuration Choices

1.5.2. Why Not Keep Data in One Database?

1.5.3. Reasons for Using a Reference Database

1.6. The Database Archives and Other Database Types

Summary

Chapter 2. The Business Case for Database Archiving

2.1. Why Database Archiving is a Problem Today

2.1.1. Regulatory Changes

2.1.2. Data Volume Increases

2.1.3. Data Aging

2.2. Implications of not Keeping Data

2.3. Data Volume Issues

2.4. Change Management

2.5. Current Database Archiving Practices

Summary

Chapter 3. Generic Archiving Methodology

3.1. The Methodology

3.2. Define Motivation for Archiving

3.2.1. Motivators

3.2.2. Motivation Initiation

3.2.3. Documentation

3.3. Identify Objects to Archive

3.3.1. Classification Process

3.4. Determine When to Put Objects in the Archive

3.5. Determine How Long to Keep Objects in the Archive

3.6. Determine What to do with Discarded Objects

Keeping records

3.7. Determine Who Needs Access to Archives and How

3.8. Determine the Form of Archive Objects

3.9. Determine Where the Archive will be Kept

3.9.1. Determining the Consequences of Losing Archive Objects

3.9.2. Determining what to Protect Against

3.9.3. Determining Requirements for Making Archive Objects Accessible

3.9.4. Requirements for the Archive Location

3.10. Determine Operational Processes Needed

3.11. Determine Necessary Administrative Processes

3.12. Determine Required Change Processes

Summary

Chapter 4. Components of a Database Archiving System

4.1. The Database Archiving Organization

4.2. Archive Application Data Gathering

4.3. Archive Application Design

4.4. Archive Data Extraction

4.5. Archive Data Management

4.6. Archive Access

4.7. Archive Administration

Summary

2. Establishing a Database Archiving Project

Chapter 5. Origins of a Database Archiving Application

An archiving organization

5.1. Problems that Lead to Database Archiving Solutions

5.2. Recognizing Problems

5.2.1. External Requirements

5.2.2. Trolling for Problems

5.3. Assignment of Problems for Initial Study

5.4. Initial Problem Study Components

5.4.1. Scoping the Problem

5.4.2. Establishing Basic Requirements

5.4.3. Establishing Archiving Application Goals

5.4.4. Determining That Database Archiving Is Part of the Solution

5.5. Determining the Basic Strategy for the Archiving Application

5.5.1. A Simple, One-Source Strategy

5.5.2. Distributed Applications

5.5.3. Parallel Applications

5.5.4. Retired Applications

5.5.5. Old Data in Active Databases

5.5.6. Anticipation of Future Projects

5.6. The Application Strategy Chart

Summary

Chapter 6. Resources Needed

6.1. People

6.1.1. In-House Participants

6.2. Authority

6.3. Education

6.4. Repository

6.5. Archive Server

6.6. Software Tools

6.7. Disk Storage

Summary

Chapter 7. Locating Data

7.1. Inventorying Data

7.1.1. Process Examination

7.1.2. Database Topologies

7.1.3. Finding Reference Data

7.2. Picking the Archivist's Data

7.2.1. What to Put Into the Archive

7.2.2. What to Purge From Databases

7.3. Documenting Data Sources

Diagrams

Summary

Chapter 8. Locating Metadata

8.1. Metadata Definitions

8.1.1. Business Objects

8.1.2. Data Elements

8.1.3. Data Rows

8.1.4. Tables

8.1.5. Table Relationships

8.1.6. Integrity Rules

8.2. Where to Find Metadata

8.3. Selecting a Version of the Metadata

8.4. Classifying Data

8.4.1. Classification Elements

8.4.2. Protection of Data in the Archive

8.4.3. Laws and Regulations

8.4.4. Additional Information

8.4.5. Granularity of Classification

8.4.6. Who to Get It From

8.5. Documenting Metadata

8.6. Keeping Up with Changes

Summary

Chapter 9. Data and Metadata Validation

9.1. Matching Data to Metadata

9.1.1. Looking at the Data

9.1.2. Comparing Data and Metadata to Reports and Displays

9.2. Assessing Data Quality

9.2.1. Data Profiling

9.2.2. Looking for Data-Cleansing Activities

9.3. Assessing Metadata Quality

9.4. Validating Data Classification

9.5. Documenting Validation Activities

9.6. Repeating Validation Activities

Summary

3. Designing Database Archiving Applications

Chapter 10. Designing for Archive Independence

10.1. Independence from Application Programs

10.1.1. Is operational Data Independent of the Application?

10.1.2. Version Independence

10.1.3. Transforming to an Independent Form

10.1.4. Special Processing Concerns

10.1.5. Virtualizing the Operational Data

10.1.6. Metadata

10.2. Independence from DBMS

10.2.1. Relational Data Sources

10.2.2. Nonrelational Data Sources

10.2.3. Multiple Data Sources

10.2.4. The Archive Data Store

10.3. Independence from Systems

10.4. Independence from Data Formats

Summary

Chapter 11. Modeling Archive Data

11.1. The Source Data Model

11.1.1. Data Source

11.1.2. Business Objects

11.1.3. Reference Tables

11.1.4. Archive Actions

11.1.5. Hierarchies of Business Objects

11.1.6. Data Elements

11.2. The Target Data Model

11.2.1. Business Objects in the Archive

11.2.2. Names

11.2.3. Normalization

11.2.4. Keys

11.2.5. Data Transformations

11.3. Model Representations

Summary

Chapter 12. Setting Archive Policies

12.1. Extract Policies

12.1.1. Selection Criteria for Extraction

12.1.2. Data Disposition on Extraction

12.1.3. When to Execute Extracts

12.2. Archive Storage Policies

12.2.1. Form of Data

12.2.2. Location of Data

12.2.3. Protection from Wrongful Access

12.2.4. Protection from Data Loss

12.3. Archive Discard Policies

12.3.1. Discard Criteria

12.3.2. Exceptions to the Criteria

12.3.3. When to Run Discard

12.3.4. Disposition of Data

12.3.5. Record of What Is Discarded

12.4. Validation and Approval

Summary

Chapter 13. Changes to Data Structures and Policies

13.1. Archiving System Strategies for Handling Metadata Changes

13.1.1. Single Data Definition Strategy

13.1.2. Multiple Data Definition Strategy

13.2. Metadata Change Categories

13.2.1. Minor Metadata Changes

13.2.2. Minor Structural Changes Within an Application

13.2.3. Major Application Restructuring

13.2.4. Upgrades to Packaged Applications

13.3. Changes to the Archive Data Model

13.3.1. Changes to the Extract Model

13.3.2. Changes to the Archive Transformed View

13.4. Managing the Metadata Change Process

13.5. Changes to Archive Policies

13.5.1. Changes to Extract Policies

13.5.2. Changes to Storage Polices

13.5.3. Changes to Discard Policies

13.6. Maintaining an Audit Trail of Changes

Summary

4. Database Archiving Application Software

Chapter 14. The Archive Data Store

14.1. Archive Database Choices

14.1.1. Data Stores

14.1.2. Metadata

14.2. Important Features

14.2.1. Independence from Operational Systems

14.2.2. Ability to Handle a Large Volume of Data

14.2.3. Ability to Store Enriched Metadata

14.2.4. Ability to Store Data in Units with Metadata Affinity

14.2.5. Ability to Store Original Data

14.2.6. Ability to Store Everything Needed

14.2.7. Ability to Add Data Without Disrupting Previously Stored Objects

14.2.8. Independent Access Authority

14.2.9. Search Capabilities

14.2.10. Ability to Query Data Without Looking at the Whole Data Store

14.2.11. Access from Application Programs

14.2.12. Ability to Restore Data to Any Desired Location

14.2.13. Protect from Unauthorized Updates or Deletes

14.2.14. Provide Encryption and Signature Capabilities

14.2.15. Selective delete for Discard

14.2.16. Use Disk Storage Efficiently

14.2.17. Low-Cost Solution

14.3. False Features

14.3.1. Transparent Access to Data from Application Programs

14.3.2. High Performance on Data Access

14.4. How Choices Stack Up

Summary

Chapter 15. The Archive Data Extraction Component

15.1. The Archive Extractor Model

15.1.1. Single-Process Model

15.1.2. Double-Process Model

15.1.3. Triple-Process Model

15.2. Extractor Implementation Approaches

15.2.1. Form of the Program

15.2.2. Online or Batch

15.2.3. Source of Data

15.3. Execution Options

15.4. Integrity Considerations

15.4.1. Interruptions

15.4.2. Locks and Commits

15.4.3. Restarting Versus Starting Over

15.4.4. Workspace Management

15.4.5. Data Errors

15.4.6. Metadata Validation

15.5. Other Considerations

15.5.1. Log Output and Audit Trails

15.5.2. Capacity Planning

15.5.3. Post-Extract Functions

Summary

Chapter 16. The Archive Discard Component

16.1. Data Structure Considerations

16.1.1. Unit of Discard: The Business Object

16.1.2. Scope of Discard

16.1.3. DBMS Data Store

16.1.4. File Data Store

16.2. Implementation Form

16.3. Integrity Considerations

16.3.1. Exclusive Locks

16.3.2. Interruptions

16.3.3. Restart/Startover

16.3.4. Forensic Delete

16.4. Operational Considerations

16.4.1. Discard Policy Considerations

16.4.2. Execution Options

16.4.3. hold check Function

16.4.4. Performance

16.4.5. Execution Authority

16.4.6. Capacity Issues

16.5. Audit Trail

16.5.1. Execution Log

16.5.2. Identification of Deleted Business Objects

16.5.3. Audit Trail

Summary

Chapter 17. The Archive Access Component

17.1. Direct Programming Access

17.2. Access Through Generic Query and Search Tools

17.3. Access Through Report Generators and BI Tools

17.4. Selective Data Unload

17.4.1. The unload Selection Criteria

17.4.2. The unload Package

17.5. Accessing Original Data

Returning data to its original data source

17.6. Metadata Services

Metadata updates

17.7. Other Access Considerations

17.7.1. Access Authorities

17.7.2. Access Auditing

17.7.3. Performance

17.7.4. Data Volumes

Summary

5. Administration of the Database Archive

Chapter 18. Ongoing Auditing and Testing

18.1. Responsibility for Ongoing Auditing and Testing

18.2. Auditing Activities

18.2.1. Access Auditing

18.2.2. Extract Auditing

18.2.3. Discard Auditing

18.2.4. Policy Auditing

18.3. Testing Activities

18.3.1. Testing Access

18.3.2. Testing Metadata Clarity

18.3.3. e-Discovery Readiness Test

18.4. Frequency of Audits and Tests

Summary

Chapter 19. Managing the Archive Over Time

19.1. Managing the Archive Database

19.2. Managing the Archive Repository

19.3. Using Hosted Solutions

19.4. Managing Archive Users

19.4.1. User Management Functions

19.4.2. Different Group of Users

19.4.3. Permissions for Using the Archive Data Store

19.4.4. Permissions for Using the Archive Repository

Summary

Chapter 20. Nonoperational Sources of Data

20.1. Retired Applications

20.2. Data from Mergers and Acquisitions

20.3. e-Discovery Applications

20.4. Business Intelligence Data

20.5. Logs, Audit Trails, and other Miscellaneous Stuff

Summary

Copyright

Morgan Kaufmann Publishers is an imprint of Elsevier

30 Corporate Drive, Suite 400, Burlington, MA 01803, USA

Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters. All trademarks that appear or are otherwise referred to in this work belong to their respective owners. Neither Morgan Kaufmann Publishers nor the authors and other contributors of this work have any relationship or affiliation with such trademark owners nor do such trademark owners confirm, endorse or approve the contents of this work. Readers, however, should contact the appropriate companies for more information regarding trademarks and any related registrations.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—without prior written permission of the publisher.

Permissions may be sought directly from Elsevier's Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail:permissions@elsevier.com. You may also complete your request online via the Elsevier homepage (http://www.elsevier.com), by selecting Support & Contact then Copyright and Permission and then Obtaining Permissions.

Library of Congress Cataloging-in-Publication Data

Application submitted.

ISBN: 978-0-12-374720-4

For information on all Morgan Kaufmann publications,

visit our Web site at: www.mkp.com or www.books.elsevier.com

Printed in the United States of America.

08 09 10 11 12 5 4 3 2 1

Dedication

I dedicate this book to my grandchildren: Gavin, Harper, and Dru—and grand they are!

Preface

This book is about archiving database data. It is not a general purpose book about archiving. It is also not a book about archiving any or all kinds of data. It is about archiving data in traditional IT databases (e.g., relational databases, hierarchical databases, and structured data in computer files). Examples of source data for archiving are data in DB2 DBMS systems on IBM mainframe systems, data stored in VSAM KSDS databases on IBM mainframes, and data stored in Oracle DBMS systems on Unix servers. It does not include data in spreadsheets or smaller relational systems such as Microsoft Access databases on desktop computers.

This perspective might seem narrow; however, this is a critically important emerging topic in IT departments worldwide. Actually I should say that the topic has emerged. The need to archive database data is showing up everywhere, although the database tools industry has been slow to provide tools, methodologies, and services to effectively accomplish it. Most IT departments know that they need to have an effective database archiving practice and are trying to figure out how to build one.

The data that needs to be archived from databases is generally the most important business data enterprises keep. It must be retained under very strict rules as specified by laws and common sense. These are not small amounts of data. The archival stores will eventually reach data volumes unheard of even by today's standards.

Archiving is becoming a much more important topic because it is a part of a major shift in thinking about data that is occurring throughout the United States and most of the rest of the world. This shift demands that enterprises manage their data better than they have in the past and manage it in a way that serves constituents other than the operating departments of an enterprise. The constituents I am talking about include auditors (both internal and external), government investigators, customers, suppliers, citizens' groups, and plaintiff lawyers for lawsuits filed against corporations. Data is becoming more public in that it reflects a company's activities and performance. As such, it must stand up to rigid scrutiny and be defended as accurate and authentic.

The constituent base also includes businesspeople within the enterprise who are not trying to assert a wrongdoing or to defend against one but rather who legitimately want to look at historical data to achieve some business goal. Although some will say that this is the role of a data warehouse and business intelligence, the reality is that data warehouse data stores generally do not go back more than two or three years. Sometimes you need to reach back farther, and thus the database archive becomes the logical source.

Archiving data is one facet of this massive change in thinking about managing data. Other parts are database security, database access auditing, data privacy protection, data quality, and data clarity.

The principles of archiving database data are not particularly different from archiving anything else, viewed from the highest level. However, viewed from a low level, the details of database archiving are very different from archiving other types of data. This book drills down into the topics that a database archivist needs to know and master to be effective.

Since there has been no need or attempt to archive database data until recently, there's no large body of experts who can guide an IT department through the process of building an effective database archiving practice. In fact, for most IT departments, there are no experts sitting around and little or no organized practice. Everyone is starting on a new venture that will mature into a standard practice in a few years. For now, on your journey you will undoubtedly encounter many stumbling blocks and learning experiences.

In addition to a lack of experts, there is a lack of educational material that can be used to understand how to become a database archivist. Hopefully this book will reduce that shortcoming.

How this Book is Organized

Before getting too deep into database archiving, it is helpful to begin by looking at the generic process of archiving to determine some basic principles that apply to archiving just about anything. Understanding these concepts will be helpful in later grasping the details of database archiving.

From the basics, the book moves on to discussing how a database archiving project comes into existence, gets organized, and acquires basic goals and policies. This is an area that is poorly understood and yet crucial to the success of database archiving projects.

This discussion is followed by a detailed treatment of designing a solution for a specific application. All the factors that need to come into play are discussed, as are criteria for selecting one path versus another.

A description of software required to execute a database archiving application follows the discussion of database archiving design. The strengths and weaknesses of various approaches and the features that you should look for are all analyzed.

Administration of the application from day to day over many years is then discussed, followed by some additional topics on the fringe of database archiving. Topics include archiving data that is not business critical, the role of the archive in e-discovery, and other ways to view the execution environment.

Organizations that are Likely to Need Database Archiving

This book is about database archiving for large database applications. The types of organizations that would benefit from building a database archiving practice are any that have long-term retention requirements and lots of data. This includes most public companies and those that are private but that work in industries that require retention of data (such as medical, insurance, or banking fields). It also includes educational and government organizations. The book refers to all organizations as enterprises.

Who this Book is for

This book is intended for the archive practitioner, referred to here as a database archive analyst or simply archive analyst. The book assumes that a person has adopted this position as a full-time job and possibly as a new career path. I say new because this topic is not taught in universities, nor have many IT departments been practicing it until recently. Database archiving will become a standard business activity in the next few years, and all IT departments will need experts on the topic to be effective. This generates the need for a new profession, just as data modeling, data warehousing, data quality, and data security have spawned specialist data management positions in the last decade. Those who enter into this field will find it a challenging, complex, and rewarding experience. If you become an expert in database archiving, you will find a large demand for your services.

After reading this book, the database archivist will be armed with a complete understanding of archiving concepts and will be conversant with the major issues that need to be addressed. This knowledge should enable the archivist to establish a database archiving practice, evaluate or develop tools and methodologies, and execute the planning and operational steps for archiving applications.

Other data professionals such as database administrators (DBA), data modelers, application developers, and system analysts will also benefit from the concepts presented here. The requirements for archiving data from databases should be considered when implementing data design, application design, application deployment, and change control. The data archivist is not the only person who will be involved in the process. For example, the database administrator staff will need to understand the impact of the archiving process on their operational systems and the impact that their actions could have on the archiving process. Likewise, data modelers will need to understand the impacts of their design decisions on archived data, particularly when they are changing existing data structures.

IT management will also benefit from reading this book. It will help them understand the principles involved in database archiving and help them in creating and managing an effective organization to control database archiving. An effective organization archives only what is needed to be retained for long periods of time, does not archive too early or too late, and uses storage devices wisely to minimize cost while preserving the integrity and safety of the archived data. If done correctly, archiving can benefit an enterprise and return more value than it costs, thus not becoming simply another expense.

So let's get started.

Acknowledgments

It took a lot of conversations to accumulate the thoughts and ideas that went into this book. There is little literature on the topic of data archiving and no place to go to get education. I started this journey three years ago with the belief that archiving was going to emerge as an important new component of the future data management—and I was right.

I visited a number of IT departments on the way to discuss their needs and understanding of the topic. In the course of doing so, I talked to dozens of IT professionals. I found a strong understanding of the concept and general problems with database archiving. I found little in the way of implemented practices. Those that did exist were primitive and lacking in many ways. Here I want to acknowledge the strong impact of these meetings on my understanding and development of this subject. I cannot list the companies due to confidentiality agreements, but many of you will remember my visits.

I also want to thank the Data Management Association (DAMA) organization for its support of my work. I visited a large number of DAMA regional chapters and gave presentations on the problems involved in managing long-term data retention. All of these meetings added new information and insights into my knowledge of the topic. In particular I want to thank John Schlei and Peter Aiken for aiding me in getting on calendars and in supporting my quest for knowledge. They are both giants in the data management space. I also want to thank all the DAMA regional chapters I visited for adding me to their agendas and for the lively discussions that took place. I visited so many chapters regarding this topic that DAMA International gave me a Community Service Award in 2007.

The company I work for, NEON Enterprise Software, launched a development project to produce software to be used for database archiving. At the time of launch I was not part of the development organization. In the past three years many people in that company have spent many hours with me discussing detailed points regarding this subject. You learn so much more about a technology when you have to produce a comprehensive solution for it. Those in the company who I want to single out for their help and education are John Wright, Ken Kornblum, Kevin Pintar, Dave Moore, Whitney Williams, Bill Craig, Rod Brick, Bill Chapin, Bruce Nye, Andrey Suhanov, Robin Reddick, Barbara Green, Jim Middleton, Don Pate, John Lipinski, Don Odom, Bill Baker, and Craig Mullins. Many others also were involved in the project and were helpful. I must also add Barbara Green, Carla Pleggi, and Sam Armstrong, who helped in getting the draft copies ready for the publisher. As technical as I am, I am a cripple when it comes to using technical writing tools. I needed their help.

I especially want to thank John Moores for his encouragement and support on this project.

Finally I want to acknowledge the patience and understanding of my family in putting up with me while I worked on this book. Without their support this book could not have existed.

Part 1. Archiving Basics

Chapter 1. Database Archiving Overview

Archiving is the process of preserving and protecting artifacts for future use. These artifacts have lived beyond their useful life and are being kept solely for the purpose of satisfying future historical investigations or curiosities that might or might not occur. An archive is a place where these artifacts are stored for long periods of time. They are retained in case someone will want or need them in the future. They are also kept in a manner so that they can be used in the future.

Archiving has existed in many forms for centuries. For example, the United States government employs a national archivist. The Presidential libraries are archives. Newspapers retain archives of all stories printed, since papers began to be published. Museums are archives of interesting objects from the past. Your local police department has an evidence archive. When a collection of items is placed in the cornerstone of a building during construction in anticipation that someone 100 or more years in the future will uncover it (a time capsule), an archive is being created and the creators are acting as archivists.

An archive is created for a specific purpose: to hold specific objects for future reference in the event that someone needs to look at them. The focus is on the objects that are to be included in the archive. Each archive has a specific purpose and stores a specific object type.

The process of archiving follows a common methodology. No matter what you archive, you should go through the same steps. If you leave out one of the steps, you will probably run into problems later on. This generic methodology is discussed in Chapter 3. However, before we go there, it is important to set the scope of this book.

The discussion that follows segregates data archiving into categories that are useful in understanding where this book fits into the broader archiving requirements. It also establishes some basic definitions and concepts that will be used later as we get deeper into the process.

1.1. A Definition of Database Archiving

Database archiving is the act of removing from an operational database selected data objects that are not expected to be referenced again and placing them in an archive data store where they can be accessed if needed. This is a powerful statement for which each part needs to be completely understood.

Data objects

A data object is the unit of information that someone in the future will seek from the archive. It represents a business event or object; it is the actual thing you are archiving. An example is banking transactions, such as deposits and withdrawals from customer accounts. The basic unit of an archive is a single transaction. Each transaction is a discrete object that reaches a point in time when it is ready to be archived. The data stored for a transaction would include all the particulars of the event, such as the account number, date, time, and place the transaction occurred; the dollar amount; and possibly more. This information might be wrapped with additional information, such as the account holder's name and address from the account master record, to make it a more complete record.

Another example of a data object might include an entire collection of data, such as production records for an airplane. The goal might be to archive the records for each individual airplane after it has been built and is ready to be sold. You might move this information into the archive one year after the airplane enters service. Although the archive object may have many different types of data, it still represents data for only a single airplane.

Selected data objects

You will have thousands of similar data objects in an operational database at any given point in time. These objects have unique characteristics in regard to their role in your information-processing systems and business. One characteristic is how long the data objects have been in the database. Some might have been there for months; others were created just a second ago. Some might be waiting for some other event to occur that would update them; some might not. Some might have a special status, others not.

In archiving, not all data objects are ready to be moved to the archive at the same time. The user must develop a policy expressed as selection criteria for determining when items are ready to be moved. Sometimes this policy is as simple as All transactions for which 90 days have passed since the create event occurred. Other times it is more complicated, such as Archive all account information for accounts that have been closed for more than 60 days and do not have any outstanding issues such as uncollected amounts.

The key point is that the database contains a large collection of like data objects. Archiving occurs on a periodic basis—say, once every week. At the time that a specific archive sweep of the database is done, only some of the objects will be ready to be archived. The policy used to select those objects that are ready needs to be defined in data value terms using information contained within or about the objects. The selected objects will probably be scattered all over the database and not contained in a single contiguous storage area or partition.

Removing data objects

The point of archiving is to take objects out of the active, operational environment and place them in a safe place for long-term maintenance. For database data this means that data is written to the archive data store and then deleted from the operational database. The whole point of archiving is to take data out of the operational database and store it safely for future use and reference.

In some instances, covered in later chapters, data that is moved to the archive is also left behind in the operational database and then deleted later. These are special circumstances; however, even here the archived data is frozen in the archive for long-term storage and considered archived. The copy left behind in the operational database is no longer the official record of the object.

Not expected to be referenced again

This is an important point for database archives. You do not want to move data from the operational database too early. The correct time to move it is when the data has a very low probability of being needed again. If there are normal business processes that will need to use this data, the data is probably not ready to be moved. Any subsequent need for the data should be unexpected and a clear exception.

Later chapters will discuss special circumstances in which this rule may be relaxed.

Placing them in an archive data store

The archive data store is a distinctly different data store than the one used for the operational database. The operational databases are designed and tuned to handle high volumes of data; incur high create, update, and delete activity levels; and process many random query transactions over the data. The data design, storage methods, and tuning requirements for the archive data store are dramatically different from those for operational databases. The archive data store will house much larger volumes of data, have no update activity, and infrequent query access.

In addition, other demands for managing the archive data store influence how and where it is to be stored. For example, the need to use low-cost storage and the need to have backup copies in geographically distributed locations are important considerations. Many of these requirements also dictate different storage design decisions as to how and where to store archived data.

The point is that the archive data store is a separate data store from the operational database and has uniquely different design and data management requirements.

Can be referenced again if needed

Even though the archive contains data that is not expected to be accessed again, it is there precisely so that it can be accessed again if needed. It makes no sense to bother archiving data if it cannot be accessed. Archives exist for legal and business requirements to keep data just in case a need arises.

The process of accessing data from archives is generally very different from accessing data for business processes. The reasons for access are different. The type of access is different. The user doing the accessing is also different. Queries against the archive are generally very simple but may involve large amounts of output data.

It is important that the archive query capabilities be strong enough to satisfy all future requirements and not require the user to restore data to the original systems for access through the original applications.

It is critical to understand the likely uses that could arise in the future and the shape of the queries that result. Anticipating these factors will impact decisions you make regarding how and where to keep the archive data store.

1.2. Forms of Data

Enjoying the preview?

Page 1 of 1

Database Archiving: How to Keep Lots of Data for a Very Long Time

About this ebook

Jack E. Olson

Related authors

Related to Database Archiving

Related ebooks

Databases For You

Related podcast episodes

Related articles

Related categories

Reviews for Database Archiving

What did you think?

Book preview

Database Archiving - Jack E. Olson

Brief Table of Contents

Table of Contents

Copyright

Dedication

Preface

How this Book is Organized

Organizations that are Likely to Need Database Archiving

Who this Book is for

Acknowledgments

Chapter 1. Database Archiving Overview

1.1. A Definition of Database Archiving

1.2. Forms of Data