Towards Open Data Initiative: 3 Data Quality Issues You

Should Know

Dr. Nurul Akmar Emran, (Database Unit, CIT Lab, C-ACT)

"Open data" initiative has been driven by the emergence of "open source" which promotes data
sharing through the creation of a common pool of data. These data can be freely used without
concerns about incompatible restrictions that may have been placed on different datasets by their
owners. One good example of open data is the General Transit Feed Specification (GTFS), that gains
its popularity since its creation in 2005 as a standard data format to describe fixed-route transit
In Malaysia, open data is still in its infancy. An open data portal ( which is
launched by the Malaysian government in 2014 and maintained by MAMPU is an evidence of the
official manifestation of open data initiative in Malaysia. Figure 1 shows the homepage of the portal. In
this portal, data related to 17 ministries are made available publicly in form of pdf format or
Figure 1: The homepage of open
Unfortunateley, only one Malaysian
GTFS open data is available to-date.
This data was published by RapidKL
transportation agency two years ago
that covers only Kuala Lumpur area,
as shown in Figure 2.

Figure 2: Malaysian GSTF data published in GTFS Data

Many tranportation agencies especially in the U.S shared their
data openly with the general public with the purpose of utilizing
Google Transit trip planner, a free public transportation
planning tool that combines the latest agency data (i.e., transit
stop, route, schedule, and fare information) which is powered
by Google Maps to ease trip planning. As Google Maps is the
largest mapping site in the world, having public transit
information publicly available means the data can be easily
accessible to millions of Google users who wish to plan their
trip. The need to make transit data publicly available becomes
increasingly important to many countries worldwide in preparing (and operating) their smart cities. The
demand to improve the quality of life in smart cities is unavoidable as cities population gradually
growing in size which unfortunately generating an increasing urban problem such as traffic and
transportation problem. In preparing the citizens towards the foreseen challenges in managing the
urban problems, many city governments (such as Johor and Malacca city), politicians and hi-tech

companies are adopting the initiatives of building smart cities, inspired by some already established
smart cities in Seattle (US), Kyoto (Japan) and Amsterdam (Holland).
In addition, the public GTFS data can also benefit software developers to develop many other different
types of transit and multimodal software applications, including multimodal trip planning, timetable
creation, mobile apps, visualization, accessibility, analysis tools for planning, real-time information, and
interactive voice response (IVR). With the development of these software applications, real-time digital
information and services can be delivered to the public and private transport users, which eventually
will improve quality of life in smart cities.
Nevertheless, in order to make transit data openly available through Google Transit, a transportation
agency needs to follow a multi-step process to ensure only high quality GTFS feeds are shared. To
create an effective traveller information system for example, an accurate and up-to-date description of
the routes, stops, and schedules that represent transit service is required. Three data quality criteria
that are crucial for GTFS data feeds are accuracy (format and values), freshness (up-to-date) and
completeness. As high quality data feeds is not an optional, the process of validating and reviewing
the quality of GTFS data feeds can be time-consuming especially for data feeds that have many
quality issues. This is a challenge that must be dealt by transportation agencies which call for
practical and affordable solution. Unless these data quality issues can be overcomed, open data
initiative remains a hurdle for both data owners and data consumers.