Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
skip to main content
research-article

Streaming Data Preprocessing via Online Tensor Recovery for Large Environmental Sensor Networks

Published: 30 July 2022 Publication History

Abstract

Measuring the built and natural environment at a fine-grained scale is now possible with low-cost urban environmental sensor networks. However, fine-grained city-scale data analysis is complicated by tedious data cleaning including removing outliers and imputing missing data. While many methods exist to automatically correct anomalies and impute missing entries, challenges still exist on data with large spatial-temporal scales and shifting patterns. To address these challenges, we propose an online robust tensor recovery (OLRTR) method to preprocess streaming high-dimensional urban environmental datasets. A small-sized dictionary that captures the underlying patterns of the data is computed and constantly updated with new data. OLRTR enables online recovery for large-scale sensor networks that provide continuous data streams, with a lower computational memory usage compared to offline batch counterparts. In addition, we formulate the objective function so that OLRTR can detect structured outliers, such as faulty readings over a long period of time. We validate OLRTR on a synthetically degraded National Oceanic and Atmospheric Administration temperature dataset, and apply it to the Array of Things city-scale sensor network in Chicago, IL, showing superior results compared with several established online and batch-based low-rank decomposition methods.

References

[1]
United Nations. 2016. The Sustainable Development Goals Report. UN, New York, NY.
[2]
C. E. Catlett, P. H. Beckman, R. Sankaran, and K. Galvin. 2017. Array of Things: A scientific research instrument in the public way: Platform design and early lessons learned. In Proceedings of the 2nd International Workshop on Science of Smart City Operations and Platforms Engineering. 26–33. DOI:
[3]
R. N. Murty, G. Mainland, I. Rose, A. R. Chowdhury, A. Gosain, J. Bers, and M. Welsh. 2008. CitySense: An urban-scale wireless sensor network and testbed. In Proceedings of the 2008 IEEE Conference on Technologies for Homeland Security. 583–588. DOI:
[4]
A. Lewis, W. R. Peltier, and E. von Schneidemesser. 2018. Low-cost sensors for the measurement of atmospheric composition: Overview of topic and future applications. World Meteorological Organization. https://www.wmo.int/pages/prog/arep/gaw/documents/Draft_low_cost_sensors.pdf.
[5]
F. Karagulian, M. Barbiere, A. Kotsev, L. Spinelle, M. Gerboles, F. Lagler, N. Redon, S. Crunaire, and A. Borowiak. 2019. Review of the performance of low-cost sensors for air quality monitoring. Atmosphere 10, 9 (2019), 506.
[6]
M. Daszykowski, K. Kaczmarek, Y. Vander Heyden, and B. Walczak. 2007. Robust statistics in data analysis – A review: Basic concepts. Chemometrics and Intelligent Laboratory Systems 85, 2 (2007), 203–219. DOI:
[7]
D. J. Hill and B. S. Minsker. 2010. Anomaly detection in streaming environmental sensor data: A data-driven modeling approach. Environmental Modelling & Software 25, 9 (2010), 1014–1022. DOI:
[8]
M. Y. Smirnov and G. D. Egbert. 2012. Robust principal component analysis of electromagnetic arrays with missing data. Geophysical Journal International 190, 3 (09 2012), 1423–1438. DOI:
[9]
P. Li, J. S. Feng, X. J. Jin, L. M. Zhang, X. H. Xu, and S. C. Yan. 2019. Online robust low-rank tensor modeling for streaming data analysis. IEEE Transactions on Neural Networks and Learning Systems 30, 4 (2019), 1061–1075.
[10]
A. Sobral, S. Javed, S. Ki Jung, T. Bouwmans, and E. H. Zahzah. 2015. Online stochastic tensor decomposition for background subtraction in multispectral video sequences. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 106–113.
[11]
Jiashi Feng, Huan Xu, and Shuicheng Yan. 2013. Online robust PCA via stochastic optimization. In Proceedings of the 26th International Conference on Neural Information Processing Systems. 404–412.
[12]
J. Shen, P. Li, and H. Xu. 2016. Online low-rank subspace clustering by basis dictionary pursuit. In Proceedings of the International Conference on Machine Learning. 622–631.
[13]
X. Y. Chen, Z. C. He, Y. X. Chen, Y. H. Lu, and J. W. Wang. 2019. Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies 104 (2019), 66–77. DOI:
[14]
Y. Hu and D. B. Work. 2020. Robust tensor recovery with fiber outliers for traffic events. ACM Transactions on Knowledge Discovery from Data 15, 1 (2020), 1–27.
[15]
G. Rajesh and Ashvini Chaturvedi. 2021. Data reconstruction in heterogeneous environmental wireless sensor networks using robust tensor principal component analysis. IEEE Transactions on Signal and Information Processing Over Networks 7 (2021), 539–550. DOI:
[16]
Yue Hu, Yanbing Wang, Canwen Jiao, Rajesh Sankaran, Charles Catlett, and Daniel Work. 2019. Automatic data cleaning via tensor factorization for large urban environmental sensor networks. In Proceedings of the NeurIPS 2019 Workshop on Tackling Climate Change with Machine Learning.
[17]
Jay H. Lawrimore, Ron Ray, Scott; Applequist, Bryant Korzeniewski, and Matthew J. Menne. 2016. Global Summary of the Year (GSOY), Version 1. NOAA National Centers for Environmental Information. [last accessed 2022/1].
[18]
University of Chicago. 2019. Array of Things File Browser. Retrieved April 2021 from https://afb.plenar.io/data-sets/chicago-complete.
[19]
T. G. Kolda and B. W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455–500.
[20]
D. Goldfarb and Z. Qin. 2014. Robust low-rank tensor recovery: Models and algorithms. SIAM Journal on Matrix Analysis and Applications 35, 1 (2014), 225–253.
[21]
L. R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3 (1966), 279–311.
[22]
J. F. Cai, E. J. Candès, and Z. Shen. 2010. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20, 4 (2010), 1956–1982.
[23]
E. J. Candès, X. Li, Y. Ma, and J. Wright. 2011. Robust principal component analysis?Journal of the ACM 58, 3 (2011), 11.
[24]
A. Famili, W. M. Shen, R. Weber, and E. Simoudis. 1997. Data preprocessing and intelligent data analysis. Intelligent Data Analysis 1, 1 (1997), 3–23.
[25]
Alessandro Foi, Mejdi Trimeche, Vladimir Katkovnik, and Karen Egiazarian. 2008. Practical Poissonian-Gaussian noise modeling and fitting for single-image raw-data. IEEE Transactions on Image Processing 17, 10 (2008), 1737–1754.
[26]
K. Nirmal, A. G. Sreejith, Joice Mathew, Mayuresh Sarpotdar, Ambily Suresh, Ajin Prakash, Margarita Safonova, and Jayant Murthy. 2016. Noise modeling and analysis of an IMU-based attitude sensor: Improvement of performance by filtering and sensor fusion. In Proceedings of SPIE, Advances in Optical and Mechanical Technologies for Telescopes and Instrumentation II, Vol. 9912. International Society for Optics and Photonics, 99126W.
[27]
HyungJune Lee, Alberto Cerpa, and Philip Levis. 2007. Improving wireless simulation through noise modeling. In Proceedings of the 6th International Conference on Information Processing in Sensor Networks. 21–30.
[28]
Nicola Acito, Marco Diani, and Giovanni Corsini. 2011. Signal-dependent noise modeling and model parameter estimation in hyperspectral images. IEEE Transactions on Geoscience and Remote Sensing 49, 8 (2011), 2957–2971.
[29]
Sophie Crommelinck, Rohan Bennett, Markus Gerke, Francesco Nex, Michael Ying Yang, and George Vosselman. 2016. Review of automatic feature extraction from high-resolution optical sensor data for UAV-based cadastral mapping. Remote Sensing 8, 8 (2016), 689.
[30]
Shuai Zheng and Chetan Gupta. 2020. Trace norm generative adversarial networks for sensor generation and feature extraction. In Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3187–3191.
[31]
Nadeem Ahmed, Jahir Ibna Rafiq, and Md Rashedul Islam. 2020. Enhanced human activity recognition based on smartphone sensor data using hybrid feature selection model. Sensors 20, 1 (2020), 317.
[32]
Muhammad Muzammal, Romana Talat, Ali Hassan Sodhro, and Sandeep Pirbhulal. 2020. A multi-sensor data fusion enabled ensemble approach for medical data from body sensor networks. Information Fusion 53, C (2020), 155–164.
[33]
Harvey B. Mitchell. 2007. Multi-Sensor Data Fusion: An Introduction. Springer Science & Business Media.
[34]
James Llinas and David L. Hall. 1998. An introduction to multi-sensor data fusion. In Proceedings of the 1998 IEEE International Symposium on Circuits and Systems, Vol. 6. IEEE, 537–540.
[35]
Christian Velasco-Gallego and Iraklis Lazakis. 2020. Real-time data-driven missing data imputation for short-term sensor data of marine systems. A comparative study. Ocean Engineering 218 (2020), 108261.
[36]
Aurora González-Vidal, Punit Rathore, Aravinda S. Rao, José Mendoza-Bernal, Marimuthu Palaniswami, and Antonio F. Skarmeta-Gómez. 2020. Missing data imputation with Bayesian maximum entropy for Internet of Things applications. IEEE Internet of Things Journal 8, 21 (2020), 16108–16120.
[37]
D. M. Dunlavy, T. G. Kolda, and E. Acar. 2011. Temporal link prediction using matrix and tensor factorizations. ACM Transactions on Knowledge Discovery from Data 5, 2 (2011), 10.
[38]
S. Li, M. Shao, and Y. Fu. 2018. Multi-view low-rank analysis with applications to outlier detection. ACM Transactions on Knowledge Discovery from Data 12, 3 (2018), 32.
[39]
Minhui Huang, Shiqian Ma, and Lifeng Lai. 2021. Robust low-rank matrix completion via an alternating manifold proximal gradient continuation method. IEEE Transactions on Signal Processing 69 (2021), 2639–2652.
[40]
Ji Li, Jian-Feng Cai, and Hongkai Zhao. 2020. Robust inexact alternating optimization for matrix completion with outliers. Journal of Computational Mathematics 38, 2 (2020), 337–354.
[41]
Y. K. Wu, H. C. Tan, Y. Li, F. Li, and H. W. He. 2017. Robust tensor decomposition based on Cauchy distribution and its applications. Neurocomputing 223 (2017), 107–117.
[42]
Y. N. Yang, Y. L. Feng, and J. A. K. Suykens. 2015. Robust low-rank tensor recovery with regularized redescending M-estimator. IEEE Transactions on Neural Networks and Learning Systems 27, 9 (2015), 1933–1946.
[43]
H. Xu, C. Caramanis, and S. Sanghavi. 2010. Robust PCA via outlier pursuit. In Proceedings of the 23rd International Conference on Neural Information Processing Systems. 2496–2504.
[44]
P. Zhou and J. Feng. 2017. Outlier-robust tensor PCA. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2263–2271.
[45]
Yuqiu Lu, Jingjing Liu, Wang Liu, Shiwei Ma, Xianchao Xiu, Wanquan Liu, and Hui Chen. 2020. Detecting moving objects from dynamic background combining subspace learning with mixed norm approach. Multimedia Tools & Applications 79, 25–26 (2020), 18747–18766.
[46]
F. L. Hitchcock. 1927. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics 6, 1–4 (1927), 164–189.
[47]
Q. B. Zhao, L. Q. Zhang, and A. Cichocki. 2015. Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 9 (2015), 1751–1763.
[48]
Y. K. Wu, H. C. Tan, Y. Li, J. Zhang, and X. X. Chen. 2018. A fused CP factorization method for incomplete tensors. IEEE Transactions on Neural Networks and Learning Systems 30, 3 (2018), 751–764.
[49]
Y. L. Chen, C. T. Hsu, and H. Y. M. Liao. 2013. Simultaneous tensor decomposition and completion using factor priors. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2013), 577–591.
[50]
Seyyid Emre Sofuoglu and Selin Aviyente. 2022. Gloss: Tensor-based anomaly detection in spatiotemporal urban traffic data. Signal Processing 192, C (2022), 108370.
[51]
Alexandre Hippert-Ferrer, Mohammed Nabil El Korso, Arnaud Breloy, and Guillaume Ginolhac. 2022. Robust low-rank covariance matrix estimation with a general pattern of missing values. Signal Processing 195 (2022), 108460.
[52]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro. 2010. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research 11, 1 (2010), 19–60.
[53]
J. He, L. Balzano, and J. Lui. 2011. Online robust subspace tracking from partial information. arXiv:1109.3827. Retrieved from https://arxiv.org/abs/1109.3827.
[54]
Dongjin Lee and Kijung Shin. 2021. Robust factorization of real-world tensor streams with patterns, missing values, and outliers. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering. IEEE, 840–851.
[55]
Rafał Zdunek and Krzysztof Fonał. 2022. Incremental nonnegative tucker decomposition with block-coordinate descent and recursive approaches. Symmetry 14, 1 (2022), 113.
[56]
Charul Paliwal, Uttkarsha Bhatt, Pravesh Biyani, and Ketan Rajawat. 2021. Traffic estimation and prediction via online variational Bayesian subspace filtering. IEEE Transactions on Intelligent Transportation Systems 23, 5 (2021), 4674–4684.
[57]
Albert Akhriev, Jakub Marecek, and Andrea Simonetto. 2020. Pursuit of low-rank models of time-varying matrices robust to sparse and measurement noise. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3171–3178.
[58]
G. Mateos and G. B. Giannakis. 2012. Robust PCA as bilinear decomposition with outlier-sparsity regularization. IEEE Transactions on Signal Processing 60, 10 (2012), 5176–5190.
[59]
H. Kasai, W. Kellerer, and M. Kleinsteuber. 2016. Network volume anomaly detection and identification in large-scale networks based on online time-structured traffic tensor tracking. IEEE Transactions on Network and Service Management 13, 3 (2016), 636–650.
[60]
B. Recht, M. Fazel, and P. A. Parrilo. 2010. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review 52, 3 (2010), 471–501.
[61]
J. D. M. Rennie and N. Srebro. 2005. Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd International Conference on Machine Learning. 713–719.
[62]
C. Y. Lu, J. S. Feng, Y. D. Chen, W. Liu, Z. C. Lin, and S. C. Yan. 2019. Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (2019), 925–938.
[63]
Morteza Mardani, Gonzalo Mateos, and Georgios B. Giannakis. 2015. Subspace learning and imputation for streaming big data matrices and tensors. IEEE Transactions on Signal Processing 63, 10 (2015), 2663–2677.
[64]
National Centers for Environmental Information. 2018. What’s a USCRN Station? Retrieved July 17, 2019 fromhttps://www.ncei.noaa.gov/news/what-is-a-uscrn-station.

Cited By

View all
  • (2023)A Novel Nonconvex Low-Rank Tensor Completion Approach for Traffic Sensor Data Recovery From Incomplete MeasurementsIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.328492972(1-15)Online publication date: 2023
  • (2023)State of the art on quality control for data streamsComputer Science Review10.1016/j.cosrev.2023.10055448:COnline publication date: 1-May-2023
  • (2022)A Contemporary and Comprehensive Survey on Streaming Tensor DecompositionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323087435:11(10897-10921)Online publication date: 20-Dec-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 6
December 2022
631 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3543989
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2022
Online AM: 04 May 2022
Accepted: 01 April 2022
Revised: 01 March 2022
Received: 01 September 2021
Published in TKDD Volume 16, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Robust tensor recovery
  2. tensor factorization
  3. multilinear analysis
  4. outlier detection
  5. internet of things
  6. urban computing

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Science Foundation
  • USDOT Eisenhower Fellowship program

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)98
  • Downloads (Last 6 weeks)7
Reflects downloads up to 27 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Novel Nonconvex Low-Rank Tensor Completion Approach for Traffic Sensor Data Recovery From Incomplete MeasurementsIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2023.328492972(1-15)Online publication date: 2023
  • (2023)State of the art on quality control for data streamsComputer Science Review10.1016/j.cosrev.2023.10055448:COnline publication date: 1-May-2023
  • (2022)A Contemporary and Comprehensive Survey on Streaming Tensor DecompositionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323087435:11(10897-10921)Online publication date: 20-Dec-2022
  • (2022)Scalable Joins over Big Data Streams: Actual and Future Research Trends2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00132(1016-1019)Online publication date: Nov-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media