FANSI-Tool: An Integrated Software For Floating Data Analytics

25th ITS World Congress, Copenhagen, Denmark, 17-21 September 2018
Paper ID: EU-CP1321
FANSI-Tool: An Integrated Software for Floating Data Analytics
Walid Fourati1*, Bernhard Friedrich1

1. Technische Universität Braunschweig, Germany
* Corresponding author. E-mail address: w.fourati@tu-braunschweig.de
Abstract
Floating Car Data (FCD) are the most accurate and used source of traffic information originated in
mobile sensors. The increase in their production due to the abundance of smartphones and navigation
devices encouraged traffic researches and engineers to experiment their use on various traffic
applications. In most use cases, basic pre-processing steps, in particular map-matching, are necessary
before being able to extract any wanted insight from raw GPS trajectories. FCD ANalytics SImplified
Tool (FANSI-Tool) is a web based, project and teamwork oriented software that simplifies the import,
visualization, querying, and processing floating data in an interactive fashion. The tool integrates
advanced optimization techniques to minimize the intensive processing time through thread
parallelization, allows algorithm editing and application on different studied objects with no need to
edit queries, and features visual map and plot representation of individual trajectories and user-defined
insights for a deep understanding of the data and of their potential.
Keywords:
Floating Car Data, Data Analytics, R&D Tooling
Introduction
Mobile sensing is a relatively new source of information for the traffic industry, since most of traffic
science developed along the 20th century was essentially based on data originating from fixed sensors
such as inductive loop detectors and radars. Despite the fact that Floating Car Data (FCD) analytics
are increasingly gaining the attention of researchers and consultancies, a persistent need remains for a
tooling that deals with the technical particularities of working with FCD, and allows the user to
concentrate on traffic applications instead of dealing with elementary data handling and pre-processing
operations. This paper presents FCD ANalytics SImplified Tool (FANSI-Tool), a web based software
that can be hosted on the cloud or on a local normal machine of the organization, and brings in one
place to the user the pre-processing of raw GPS trajectories, algorithm editing, trajectory visualization
– on the map or in plots – and results export.
Floating Car Data

FCD commonly designates crowd-sourced timestamped GPS positions of unique trip IDs. Trip IDs are
usually anonymised by the service delivering FCD, and can be constant through time (in such case
they rather correspond to vehicle IDs). In the most general and minimalist format of FCD, it’s a list of
records in a table made by the columns: Trip ID, timestamp, longitude, latitude.
FCD are nowadays generated essentially by smartphones, either by the operating system itself
(Android by Google, iOS by Apple) or by specific apps that provide some service to the user – such as
navigation – against retrieving and saving its position. The other major source of FCD are navigation
devices installed in vehicles, either built-in or after-market devices.
Although FANSI-Tool was designed and essentially used with FCD, it is by design not limited to FCD
as defined above. Any series of timestamped GPS positions that are connected to each other by means
of an ID can be imported and processed. This can be for example already triangulated probe phone
data obtained from Call Detail Records (CDR) or pedestrian/cyclist tracking device records, etc.
Targeted and possible use cases

FCD are progressively taking place in the professional traffic practice, sometimes being the only
possible source of information for the application. We can classify applications in two categories from
a data size point of view: local and wide-area applications.
Local applications
Analogically to microscopic traffic modeling, the study in such use cases focuses on vehicle
movements within specific infrastructure objects, such as a highway sections, roundabouts,
intersections, corridors or network cut-offs. One common characteristic of such applications is that
spatial filtering of GPS positions inside the studied area would take much more time than
map-matching the corresponding trajectories to the road network. The overall pre-processing time is
however much smaller than wide-area applications. Most featured examples of local applications are
at urban intersections such as directional counting, flow and capacity analysis, delay and queue
estimation, and signal program reconstruction. Green wave analysis and coordination quality are
another local application for urban corridors.
Interurban applications are classic FCD use cases, and are the main production level applications
currently. This includes essentially speed and travel time estimation in traffic information and
navigation applications. FCD can also be used to identify traffic disturbances, either short term
(incidents) or long term (roadworks).
Wide area applications

This includes mainly use cases for traffic planning applications such as origin-destination demand
analysis and traffic assignment. For these typical applications, the software contains native routing
functions to find the shortest path or other common routing needs, keeping it always possible to write
from scratch any desired custom routing or traffic assignment approach.
2
FCD can also be used to estimate average travel time between two specific points of the network or
between urban centers, in order to be compared to regulations such as the German Guidelines for
Integrated Network Structuring from FGSV (2008).
Architecture and functional modules

The software is designed in a server-client architecture. The backend is responsible for the intensive
computing and heavy memory tasks. This includes for example the handling, chunking and processing
of raw GPS tracks files which can go in size up to several tens of gigabytes per file. The backend is
also responsible for the execution of the database queries and the data analytics scripts edited by the
user in the frontend. We describe in the following the main processes of the software.
Import process
The import process is the most computing intensive, including tasks such as spatial filtration of the
raw data inside the project’s spatial scope, spatial clustering and matching the coordinates of each
point to the most probable position on the street network (map-matching). Such computing intensive
tasks are usually coupled with big memory requirements. Figure 1 describes the software components
involved in the import process and the ordered dataflow.
The import process is all based on queueing for the different steps. The queues’ management is
event-based, i.e. the different modules listen to each other and react on events, allowing optimal
parallelization. The central process is necessary to control different queues that are theoretically
independent, such as the queue of parallel workers, the map-matching queue, and the database writing
queue.
xx xxxx xx
xx xxxx xx
xx xxxx xx
xx xxxx xx
Frontend Database
5 0
Spatial filtering Central process 4 Candidate points

Spatial filtering
1 Group by trajectory Probability calcul.

Group by trajectory 2 3
Spatial clustering Decision on path
Spatial clustering
Parallel
ParallelWorker
Worker Mapmatch. engine
Raw CSV
Backend
Figure 1 – Components and dataflow of the import process
Before starting the import process, i.e. at the start-up of backend central process, an independent
process for the map-matching engine is started. The engine loads in memory the road network from
the database (step 0). When the server intercepts a demand from the user interface to import FCD from
a specified raw CSV file within a specified area (black arrow in the figure), parallel workers are
launched to process independently different chunks of the CSV file (step 1). The parallelization
3
module attempts to maximize the use of the available computing capacity of the machine, i.e. keep the
maximum of CPU cores occupied during the heavy processing. For this purpose, the software
dynamically calculates the chunk sizes of the raw file to adequately occupy the current remaining
computing capacity while other workers are running or the map-matching engine is under high
demand.
A parallel worker performs sequential steps on his assigned chunk of CSV rows, which represent each
a timestamped GPS position of one trajectory. First, all the mixed points of all trajectories are spatially
filtered inside the project spatial scope. The remaining points are then grouped by trajectories.
Although this processing order could eliminate points of one trajectory that are outside the spatial
scope, and therefore cause map-matching errors or inaccuracies, this side effect is preferred over
spending resources on grouping points by trajectories that are non-relevant for the project. This side
effect can be avoided by selecting an area big enough and filtering out the altered trajectories during
the analytics of the map-matched data. For each group of points belonging to the same trajectory, a
spatial clustering is necessary to avoid map-matching errors.
In fact, especially when a vehicle is standing in a fixed position (e.g. congestion queue) and emitting
GPS positions every time interval, the positioning inaccuracy – usually due to a an urban canyon
phenomenon – often produces positions that are backward compared to previous positions in time,
which is very improbable. This is corrected by identifying points of the same trajectory that are within
a certain tolerance radius, using an algorithm known as Density-Based Spatial Clustering of
Applications with Noise (DBSCAN) first proposed by Ester et al. (1996). The identified points keep
their respective timestamps but their spatial centroid is assigned as their common position.
When a parallel worker has accomplished its task, it sends a packet of trajectories to the central
process (step 2), which sends them separately to the map-matching engine (step 3). The
map-matching queue with parallel workers needs to be passed by all trajectories coming for
all time chunks. Since each chunk may comprise more or less trajectories the processing time
varies which explains why the reconstruction of the trajectories is not finished in a sequential
order.The adopted map-matching algorithm is based on the Hidden Markov Model, described in
Newson et al. (2009), implemented and open sourced by BMW Car IT as described by Mattheis
(2015). The map-matched trajectories are received by the central process (step 4) and are written to the
database (step 5).
Analytics process
The software allows the user to edit, execute and visualize results of data analytics scripts. The user
can query the imported FCD from the database using Structured Query Language (SQL). The results
are returned in a tabular format and a query feedback message (either query success with the number
of the rows found, or the error name and location in the code).
SQL data retrieval can be enough for some use cases. For advanced analytics, the user can further
process the data using an R machine hosted on the server with a rich stack of data analytics and
statistics libraries that is extensible by the user.
4
xx xxxx xx xx xxxx xx xx xxxx xx xx xxxx xx

SELECT SELECT Read_sql_resul
xx xxxx xx
* xx xxxx xx xx xxxx xx
* xx xxxx xx
xx xxxx xx FROM xx xxxx xx xx xxxx xx FROM Library(plyr) xx xxxx xx
xx xxxx xx
FCD xx xxxx xx xx xxxx xx
FCD xx xxxx xx
Frontend Frontend
2 3
1 SQL Client 1 SQL Client 2 R
Database Backend Database Backend
Figure 2 – Analytics process dataflow with and without additional R script
Figure 2 shows the dataflow of both cases. When extracting the data with SQL, the user can
interactively customize the code to the current project by specifying road edges or GPS coordinates by
simply clicking on the map.
The query results can be exported in CSV format to be further processed as a worksheet or as an input
to other software. Additionally, the query or analytics results can also be displayed on the map when
they are of a geometry datatype. The user interface offers the possibility to display the default R plots
in the output area as well as advanced plotting capabilities using the plotly plotting library supported in
R and Javascript.
Scripting library
In order to automate the maximum of the workflow during a data mining / exploration experiment, the
software features a system of scripting libraries (both SQL and R) that is incrementally enriched and
maintained by the users in a collaborative way.
Two types of scripting libraries exist.
 Public scripting library: typically contains generic functions that are not specific to a particular
project but recurrently used for frequent needs. This includes for example functions to
visualize trajectories and inspect their general information, to extract and visualize trajectory
statistics (histograms), to filter and extract trajectories that use a certain network edge or cross
a certain line on the road, to count the number of trajectories between an origin and a
destination, to calculate and plot the speed on a defined road section, to get the shortest path
between a source and a target, to format the project network graph in a
Cell-Transmission-Model scheme and many others. The library is accessible to any user in any
project.
 Project scripting library: this library allows to save scripts customized to the projects (e.g.
queries extracting trajectories using a specific road element and information formatted in a
specific way) as well as the advanced analytics developed for the specific need (e.g. a machine
learning algorithm for a particular need)
User Interface
Figure 3 shows the current beta version of the user interface. The view is divided in 4 resizable areas.
5
The upper left is reserved for the map component, the upper right for the tabular results, the lower left
for the script editing (SQL and R), and the lower right for the results: R terminal output, R plots and
Plotly graphs for more advanced and sophisticated plotting.
The map component is based on OpenLayers, the open-source cartography library adopted by the
Open Source Geospatial Foundation. It features multiple layers each corresponding to a functionality.
The basic layer is OpenStreetMap, the community crowdsourced map. It provides to the user a view
on the location and scope of his project.
Figure 3 – Screenshot from the graphical user interface
A drawing layer allows the definition of the polygon making the project spatial scope. Vector layers
allow to draw the scripting results that are of geometric types to be drawn on the map and navigated
into separately. A satellite image layer imported from Google Maps API allows to display actual
geometric elements of streets for particular local applications, such as when the precise coordinates of
a stop line are needed. Network edges can also be displayed in a separate layer and individually
highlighted and selected to be used in the analytics interface, for example to filter trajectories crossing
a certain edge.
The screenshot in Figure 3 showcases the application of traffic signal program reconstruction in the
intersection Cellerstr. Neustadtring in Braunschweig, Germany. Trajectories are first visually
examined on the map. The query extracting the trajectories crossing the studied traffic light and
respective stop line takes interactive input from the map when the user clicks on the corresponding
road edge and clicks on the stop line to get its coordinates. An R script is then applied to reconstruct
the trajectories and plot them in a cycle time. The plot shows plateaus in the trajectories before the
stop line, reflecting a stopping behaviour in the first and last parts of the cycle, potentially a red time.
Trajectories crossing directly with a slight slow-down ahead of the stop line correspond potentially to
6
a green time crossing (middle part of the cycle).
Comparison with alternative products

A recent publication by Ruan et al. (2018) sponsored by Microsoft Research in the IEEE 34th
International Conference on Data Engineering unveiled a similar product named CloudTP. The
software targets a cloud based big data architecture (Spark) in rather corporate IT environments. On
the other hand, the querying capabilities are very basic and almost hard-coded. The software is
primarily addressing a production level pre-processing of trajectories. FANSI-tool, on the other hand,
is essentially a data-mining and exploration tool usable mainly in research & development, integrating
in the same time the tedious and CPU intensive step of mapmatching and preprocessing.
Future developments
The next phase of development would focus on transforming the backend into a RESTful service
platform where the current web client becomes a playground, existing in parallel with other
application specific clients allowing more automation of the application repetitive tasks. Examples of
such clients would be a mobility patterns analysis application, infrastructure elements performance
measurement, origin-destination counting, among others.
Acknowledgement
The research leading to the development of this software has received funding from the German
Research Society (Deutsche Forschungsgemeinschaft) under the program “Methoden für die
kontinuierliche raum-zeitliche Analyse der Verkehrsqualität in Straßennetzen” (n. FR 1670/7-1).
References
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A Density-Based Algorithm for Discovering
Clusters in Large Spatial Databases with Noise. In KDD-96 Proceedings (pp. 226–231).
FGSV. (2008). Richtlinien für integrierte Netzgestaltung.
Mattheis, S. (2015). Barefoot release – An Open Source Java library for map matching with
OpenStreetMap. Retrieved December 9, 2017, from
http://www.bmw-carit.com/blog/barefoot-release-an-open-source-java-library-for-map-matching-
with-openstreetmap/
Newson, P., & Krumm, J. (2009). Hidden Markov Map Matching Through Noise and Sparseness.
Ruan, S., Li, R., Bao, J., He, T., & Zheng, Y. (2018). CloudTP : A Cloud-based Flexible Trajectory
Preprocessing Framework, 3–6.

FANSI-Tool: An Integrated Software For Floating Data Analytics

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

FANSI-Tool: An Integrated Software For Floating Data Analytics

Caricato da

Copyright:

Formati disponibili

25th ITS World Congress, Copenhagen, Denmark, 17-21 September 2018

Paper ID: EU-CP1321

FANSI-Tool: An Integrated Software for Floating Data Analytics

Walid Fourati1*, Bernhard Friedrich1

Floating Car Data

Targeted and possible use cases

Wide area applications

Architecture and functional modules

Spatial filtering Central process 4 Candidate points

1 Group by trajectory Probability calcul.

Figure 1 – Components and dataflow of the import process

xx xxxx xx xx xxxx xx xx xxxx xx xx xxxx xx

xx xxxx xx FROM xx xxxx xx xx xxxx xx FROM Library(plyr) xx xxxx xx

1 SQL Client 1 SQL Client 2 R

Database Backend Database Backend

Figure 2 – Analytics process dataflow with and without additional R script

Figure 3 – Screenshot from the graphical user interface

a green time crossing (middle part of the cycle).

Comparison with alternative products

FGSV. (2008). Richtlinien für integrierte Netzgestaltung.

Potrebbero piacerti anche