Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/301294283
READS
50
6 authors, including:
All in-text references underlined in blue are linked to publications on ResearchGate, Available from: Leye Wang
letting you access and read them immediately. Retrieved on: 07 July 2016
Sparse Mobile Crowdsensing: Challenges and
Opportunities
Leye Wang, Daqing Zhang, Yasha Wang, Chao Chen, Xiao Han, Abdallah Mhamed
ABSTRACT
Sensing cost and data quality are two primary concerns in mobile crowdsensing. In this article, we
propose a new crowdsensing paradigm, sparse mobile crowdsensing, which leverages the spatial and
temporal correlation among the data sensed in different sub-areas to significantly reduce the required number
of sensing tasks allocated, thus lowering overall sensing cost (e.g., smartphone energy consumption and
incentives), yet ensuring the data quality. Sparse mobile crowdsensing applications intelligently select only
a small portion of the target area for sensing while inferring the data of the remaining unsensed area with
high accuracy. We discuss the fundamental research challenges in sparse mobile crowdsensing, and design
a general framework with potential solutions to the challenges. To verify the effectiveness of the proposed
framework, a sparse mobile crowdsensing prototype for temperature and traffic monitoring is implemented
and evaluated. With several future research directions identified in sparse mobile crowdsensing, we expect
that more research interests will be stimulated in this novel crowdsensing paradigm.
I. I NTRODUCTION
With the prevalence of rich-sensor equipped smartphones in recent years, Mobile CrowdSensing (MCS)
has become a promising paradigm to facilitate urban sensing applications, such as environment monitoring,
traffic congestion detection, hot spot identification, and public information sharing [1], [2], [3], [4].
Traditional urban sensing applications rely on the expensive specialized sensing infrastructure (e.g., air
quality monitoring stations); however, only utilizing the specialized sensing infrastructure to enable urban
sensing applications has some pitfalls such as high deployment/maintenance cost and lack of reusability
across multiple applications, which hinders the rapid growth of heterogeneous urban sensing applications to
a large scale. Complementary to the traditional sensing paradigm, MCS helps to achieve the urban sensing
goal by leveraging the mobility of mobile users, the sensors embedded in mobile phones and the existing
wireless infrastructure to sense and collect environment data, making it possible to inexpensively sense
various urban data in regions which are not covered by the specialized sensing infrastructure.
To obtain high-quality sensed results in MCS applications, a straightforward idea is to recruit enough
participants so as to ensure that their sensed data can cover almost the whole target area. Nevertheless,
this strategy may incur high sensing cost, including overall smartphone energy and network bandwidth
consumption, as well as incentives paid to the participants from the organizer.
Thus, data quality and sensing cost have certain intrinsic conflicts in MCS. A lot of existing works
have endeavored to address this issue by minimizing the redundant number of allocated tasks (or recruited
participants), under the quality requirement of full or high coverage of all the sub-areas in a city [1], [2].
However, the number of allocated tasks in these works should be roughly equivalent to the total number
of the sub-areas, which may still incur quite high sensing cost. Then one question arises: is it possible to
further reduce the sensing cost by only sensing a small number of the sub-areas, meanwhile still guarantee
a satisfactory level of data quality for the whole target area? To answer this question, we investigate a
novel MCS paradigm where only a small part of the city sub-areas are sensed by participants, while the
data of the rest sub-areas are inferred based on the sensed data. Due to the sparsity of the sensed sub-areas,
we call this MCS paradigm as Sparse Mobile CrowdSensing (Sparse MCS).
The theoretical feasibility of Sparse MCS lies in the fact that high spatio-temporal correlations exist in
most urban data, e.g., air quality and noise. Such correlations provide the basis for high-quality missing data
inference. Specifically, recent research progresses in missing data inference algorithms, such as compressive
sensing, could facilitate Sparse MCS to achieve high inferred data quality [5], [6], [7], [8].
2
Organizer A SUB-AREA
PARTICIPANTS
Individual Task 1 OVER TIME Sensed Data 1
create allocate sense
Overall Sensed Result
e.g. Air Quality Map
Individual Task 2
. A SENSING CYCLE
Sensed Data 2
OVER
MCS Task ALL SUB-AREAS
integrate
A SPATIAL-
… Z TEMPORAL
CELL …
Fig. 1: Overview of the mobile crowdsensing process. A cube is used to illustrate the complete set of
all possible individual tasks, where each individual task is specified by a spatio-temporal cell (a specific
sub-area in a specific cycle). Two dimensions (X and Y) of the cube represent the spatial space (sub-areas)
and the other dimension (Z) represents the temporal space (cycles).
Actually, Sparse MCS is expected to bring significant benefits for rapidly-growing MCS applications in
smart city. (1) From the perspective of organizers, Sparse MCS can obviously reduce the sensing cost for
each single application. With the increase of applications, such saving of sensing cost can be dramatic, thus
organizers will be highly motivated to adopt Sparse MCS, as long as it can satisfy their quality requirement.
(2) From the perspective of participants, each participant can accept only a limited number of sensing tasks
due to the constraints in her sensing capacity (e.g., energy consumption, network bandwidth). Thus, if the
number of MCS applications is increasing, the number of potential qualified participants for each application
is decreasing. Then, even if an organizer wants to ensure full or high coverage, it may be difficult to recruit
enough participants. Sparse MCS reduces the required participant number for each application, thus more
applications can run simultaneously.
As far as we know, this is the first article to explicitly introduce the notion of Sparse MCS paradigm.
Compared to the traditional MCS approaches that ensure high coverage ratio, Sparse MCS explicitly
introduces data inference into the MCS process and guarantees the quality of inferred data via only sparsely
sensed data. In this article, we aim to identify and address the key challenges in Sparse MCS. To this end,
we propose a general framework for Sparse MCS applications, implement a prototype to verify its feasibility,
and elaborate future research opportunities beyond the prototype. Starting from this work, we expect that
more research efforts could be devoted to the Sparse MCS area.
and uploads the sensed data to the MCS server. The server finally integrates all the participants’ sensed
data and obtains an overall sensed result, which is, in this case, an urban air quality monitoring map.
Sparse MCS, as a kind of MCS framework, also follows this general MCS process, while its characteristics
are presented as follows:
1) In order to reduce total sensing cost, Sparse MCS allocates individual tasks to only a small subset
of the spatio-temporal cells.
2) In order to achieve satisfactory data quality, Sparse MCS introduces missing data inference into the
server-side data integration process, which attempts to estimate the missing data of unsensed cells
based on the sensed data and spatio-temporal correlations between the cells.
a new sensing
cycle starts
Optimal Task
Sparse Sensing
Allocation Map
Sparse MCS
Participants not satisfying
Quality
task allocation
satisfies
terminates in requirement Missing Data
satisfying
current cycle ? Inference
Inferred Full
Data Quality Sensing Map
Assessment
Final Inferred Full
Sensing Map
Fig. 2: General framework for Sparse MCS applications in one sensing cycle.
quality, the allocation cannot be retracted to alter the previous decision. Thus, we need to make proper
real-time decision when allocating each individual task, so as to exploit the incentive budget efficiently.
The task allocation problem can be even more challenging as more realistic factors need to be taken into
consideration. First, the task allocation mechanism is sensitive to the design of the incentive mechanism.
For example, if the sensing cost of different cells is not homogeneous, then selecting the optimal cell
combination needs to consider the total sensing cost of all the selected cells, rather than just the number of
cells. Second, the spatio-temporal coverage of all the candidate participants also affects the task allocation.
If the candidates can cover all the spatio-temporal cells, then we can freely select any cell for sensing
because we can always find participants in the selected cell to upload data; however, if this is not the case,
we need to consider user mobility in task allocation to determine whether the selected cell can be covered
by any participant or not.
After a new sensing cycle starts, the first step is optimal task allocation, where we need to decide which
spatio-temporal cell(s) should be selected for allocating the next individual task.1 The output of this step is
a ‘sparse sensing map’, where the selected cells are marked with sensing tasks assigned to the participants,
and the other cells are left unsensed. Based on the ‘sparse sensing map’, the second step, missing data
inference, attempts to infer the data of unsensed cells to construct an ‘inferred full sensing map’.2 In the
third step, we conduct the data quality assessment on the ‘full inferred sensing map’ to see whether current
data quality can meet the predefined quality requirement. If the quality requirement is not satisfied, we go
back to the first step, optimal task allocation, to select more cells for sensing; otherwise, the task allocation
iteration terminates and the framework outputs the ‘final inferred full sensing map’ for current cycle. This
iterative process will restart when the next sensing cycle comes.
Next, we describe some potential solutions to address the issues in each step of the framework.
problem under this varying cost scenario (suppose the inference algorithm is compressive sensing), and
propose a randomized cell sampling scheme, in which the probability of selecting each cell is optimized
to ensure that the expected quality of the inferred data can be maximized under a fixed amount of sensing
cost. Another practical issue is that due to the limitation of human mobility, an MCS organizer may not
be able to find any qualified participant in the selected cell to finish an individual task, which can incur
a deadlock in the Sparse MCS framework (Figure 2) as no sensed data can be obtained from the selected
cell. Then, modeling the participants’ mobility may be a must for an effective task allocation mechanism to
avoid selecting such ‘inaccessible’ cells; or a cell re-selection procedure can be triggered if no participant
is found for the previously selected cell, so as to avoid the deadlock in the running process of Sparse MCS.
A. Prototype Implementation
Missing data inference: according to [7], [6], we turn the data inference in both temperature and
traffic monitoring applications as a matrix completion problem (one dimension refers to sub-areas, and the
other refers to cycles). To solve the problem, the temperature monitoring application uses spatio-temporal
compressive sensing [7], while the traffic sensing application leverages a different algorithm for traffic speed
estimation [6].
Optimal Task Allocation: based on query by committee, we use several inference algorithms, including
K nearest neighbors (KNN), compressive sensing (CS), spatio-temporal compressive sensing (STCS) [7],
and traffic-specialized compressive sensing (TRA-CS) [6], to infer the values of all the unsensed cells, and
then allocate the task to the cell that has the largest variance on these inferred values [13].
Data Quality Assessment: via the leave-one-out cross-validation on the sensed cells, first we get several
samples of the inference error (each sensed cell refers to one sample); based on these samples, we use
Bayesian inference to estimate the probability density function of the inference accuracy [13].
7
Temperature Traffic
0.5
STCS 5 TRA-CS
CS CS
0.4 KNN KNN
4
MAE (m/s)
0.3
MAE (°C)
0.2
2
0.1 1
0 0
0.1 0.3 0.5 0.1 0.3 0.5
Sensing Coverage Sensing Coverage
B. Experiment Setup
We conduct the experiments for the two use cases: the temperature3 and traffic4 monitoring. Detailed
experiment settings are summarized in Table I. Note that the data quality requirement is defined by the
(, p)-quality metric [13] — the MAE (mean absolute error) of the inferred temperature (traffic speed)
values should be smaller than ◦ C (m/s) in p · 100% of the sensing cycles. In the experiments, is set to
0.25◦ C (temperature) or 2m/s (traffic); p is set to 0.9 or 0.95.
C. Performance Analysis
We first evaluate the performance of the inference algorithms. Figure 3 illustrates that for temperature and
traffic scenarios, STCS and TRA-CS achieve less inference error than CS and KNN whatever the sensing
coverage (the proportion of sensed cells) is, respectively. From Figure 3, we can also know the data quality
given a sensing coverage: the MAE of STCS (TRA-CS) by sensing 10%/30%/50% of the spatial-temporal
cells is around 0.25/0.14/0.09◦ C (1.75/1.23/0.81m/s) per cycle. Note that this is an expected error, not a
guarantee. Next, by incorporating optimal task allocation and data quality assessment together, we attempt
to verify that our prototype can finish an MCS task with sparsely sensed data and guarantee the data
quality.
We compare our prototype with two baselines, RAND and FIX-k. RAND uses random sampling in task
allocation, and FIX-k allocates the fixed number (k) of tasks in each cycle. Same as our prototype, they
use STCS/TRA-CS as the inference algorithm. Table II shows that for a predefined error bound , in our
3
http://lcav.epfl.ch/op/edit/sensorscope-en
4
http://research.microsoft.com/en-us/projects/tdrive/
8
14 32
Prototype Prototype
12 RAND 28 RAND
Number of Allocated Tasks
Temperature ( = 0.25◦ C)
p = 0.9 0.915 0.900 0.911 (k = 9)
p = 0.95 0.943 0.943 0.949 (k = 12)
Traffic ( = 2m/s)
p = 0.9 0.895 0.908 0.908 (k = 24)
p = 0.95 0.987 0.974 0.960 (k = 28)
TABLE II: Fraction of the cycles whose errors are lower than 0.25◦ C (temperature) and 2m/s (traffic).
prototype, the actual fraction of the sensing cycles whose errors are smaller than is similar to the required
p in both scenarios. This verifies that our prototype meets the predefined (, p)-quality. So do the two
baselines.
Then, we compare the number of allocated tasks in Figure 4. Compared to RAND and FIX-k, our
prototype allocates 11.1–26.5% (11.9–28.5%) fewer tasks when ensuring the same level of quality in
temperature (traffic) monitoring. Specifically, by only allocating tasks to cover 12.9%/15.4% (18.3%/20.0%)
of spatio-temporal cells, the MAE of temperature (traffic speed) is smaller than 0.25◦ C (2m/s) in 90%/95% of
sensing cycles. This proves that by leveraging the idea of Sparse MCS, the sensing cost can be significantly
reduced while the overall data quality is ensured.
D. Discussion
The prototype verifies the feasibility and generality of the Sparse MCS framework, while many
opportunities still exist for improvement. Some are listed as follows:
Inference Algorithm. Our prototype includes two algorithms, STCS [7] and TRA-CS [6], to infer missing
temperature and traffic speed values, respectively. To tackle more MCS scenarios, different algorithms need
to be designed and incorporated in future. For instance, to infer missing PM2.5 values precisely in air
quality monitoring, the features of points of interests, road networks and weather may be considered into
the algorithm design [9].
User Mobility. The prototype assumes that a participant can always be found in a selected cell to contribute
data, while due to the user mobility pattern, this may not be true in reality; future studies on task allocation
9
should consider this issue. Also, if a sub-area lacks sensed data for a long time as no participant goes there,
specific strategies, e.g., using higher incentive to steer participants to that sub-area, may be necessary to
ameliorate the overall task quality.
Spatio-Temporal Cell Configuration. The configuration of sub-area size and cycle length is application-
specific, thus the setting of our experiment needs to be adjusted for other MCS tasks. For instance,
in air quality monitoring, the size of sub-area is relatively large, e.g., 1km*1km [12]. Besides, how to
adaptively configure the spatio-temporal cell in real time to catch the prompt changes of sensing values is
another promising research direction (e.g., automatically shortening cycle length when temperature fluctuates
frequently).
VII. C ONCLUSION
In this article, a novel MCS paradigm, Sparse MCS, is proposed to significantly reduce the sensing cost
while still guaranteeing the overall data quality for urban sensing. The rationale behind the Sparse MCS
framework is the fact that various urban sensing data has spatio-temporal correlations, so that the missing
data of unsensed spatio-temporal cells can be inferred based on the sparsely sensed data. A prototype of
Sparse MCS framework is implemented to show the feasibility of the proposed ideas and techniques. In
the future, practical issues in Sparse MCS applications, such as participant mobility, sensed data error,
malicious participants, and user privacy, need to be studied.
ACKNOWLEDGMENTS
The authors would like to thank the reviewers and the editors for their helpful comments and suggestions.
The authors also would like to thank L. Chen, T. Wang, and Z. Tang for insightful discussions and
10
careful proofreading. This paper was partially supported by the NSFC under Grant No. 61572048,
the MSRA collaboration project, the Fundamental Research Funds for the Central Universities (No.
106112015CDJXY180001), the Open Research Fund Program of Shenzhen Key Laboratory of Spatial Smart
Sensing and Services (Shenzhen University), and the Chongqing Basic and Frontier Research Program (No.
cstc2015jcyjA00016).
R EFERENCES
[1] Daqing Zhang, Leye Wang, Haoyi Xiong, and Bin Guo. 4w1h in mobile crowd sensing. IEEE Communications Magazine, 52(8):42–48,
2014.
[2] Bin Guo, Zhu Wang, Zhiwen Yu, Yu Wang, Neil Y Yen, Runhe Huang, and Xingshe Zhou. Mobile crowd sensing and computing: The
review of an emerging human-powered sensing paradigm. ACM Computing Surveys (CSUR), 48(1):7, 2015.
[3] Bin Guo, Huihui Chen, Zhiwen Yu, Xing Xie, Shenlong Huangfu, and Daqing Zhang. Fliermeet: A mobile crowdsensing system for
cross-space public information reposting, tagging, and sharing. IEEE Transactions on Mobile Computing, 14(10):2020–2033, Oct 2015.
[4] Zhiwen Yu, Huang Xu, Zhe Yang, and Bin Guo. Personalized travel package with multi-point-of-interest recommendation based on
crowdsourced user footprints. IEEE Transactions on Human-Machine Systems, 46(1):151–158, 2016.
[5] Giorgio Quer, Riccardo Masiero, Gianluigi Pillonetto, Michele Rossi, and Michele Zorzi. Sensing, compression, and recovery for wsns:
Sparse signal modeling and monitoring framework. IEEE Transactions on Wireless Communications, 11(10):3447–3461, 2012.
[6] Yanmin Zhu, Zhi Li, Hongzi Zhu, Minglu Li, and Qian Zhang. A compressive sensing approach to urban traffic estimation with probe
vehicles. IEEE Transactions on Mobile Computing, 12(11):2289–2302, 2013.
[7] Linghe Kong, Mingyuan Xia, Xiao-Yang Liu, Min-You Wu, and Xue Liu. Data loss and reconstruction in sensor networks. In Proc.
INFOCOM, pages 1654–1662. IEEE, 2013.
[8] Liwen Xu, Xiaohong Hao, Nicholas D Lane, Xin Liu, and Thomas Moscibroda. More with less: lowering user burden in mobile
crowdsourcing through compressive sensing. In Proc. UbiComp, pages 659–670. ACM, 2015.
[9] Aude Hofleitner, Ryan Herring, Pieter Abbeel, and Alexandre Bayen. Learning the dynamics of arterial traffic from probe data using a
dynamic bayesian network. IEEE Transactions on Intelligent Transportation Systems, 13(4):1679–1693, 2012.
[10] Thomas Liebig, Zhao Xu, Michael May, and Stefan Wrobel. Pedestrian quantity estimation with trajectory patterns. In Machine Learning
and Knowledge Discovery in Databases, pages 629–643. Springer, 2012.
[11] Krishna Y Kamath, James Caverlee, Zhiyuan Cheng, and Daniel Z Sui. Spatial influence vs. community influence: modeling the global
spread of social media. In Proc. CIKM, pages 962–971. ACM, 2012.
[12] Hsun-Ping Hsieh, Shou-De Lin, and Yu Zheng. Inferring air quality for station location recommendation based on urban big data. In
Proc. KDD, pages 437–446. ACM, 2015.
[13] Leye Wang, Daqing Zhang, Animesh Pathak, Chao Chen, Haoyi Xiong, Dingqi Yang, and Yasha Wang. CCS-TA: quality-guaranteed
online task allocation in compressive crowdsensing. In Proc. UbiComp, pages 683–694. ACM, 2015.
[14] Liwen Xu, Xiaohong Hao, Nicholas D Lane, Xin Liu, and Thomas Moscibroda. Cost-aware compressive sensing for networked sensing
systems. In Proc. IPSN, pages 130–141. ACM, 2015.
[15] Dong Wang, Lance Kaplan, Hieu Le, and Tarek Abdelzaher. On truth discovery in social sensing: A maximum likelihood estimation
approach. In Proc. IPSN, pages 233–244. ACM, 2012.