Development of a real-time connected corridor data-driven digital twin and data imputation methods

Smart cities -- equipped with connected infrastructure -- receive significant real-time traffic data. The simulation platform developed in this research leverages these high-frequency connected data streams to derive meaningful insights on the current traffic state by providing near real-time corridor performance measures. A data-driven traffic simulation model, i.e. digital twin, capable of providing environmental and traffic performance measures in near real-time is developed for a connected corridor. The data streams driving the developed simulation model are traffic volumes and signal indications. The research demonstrates the feasibility of the overall connected corridor simulation approach. In addition, investigation of the real-time data streams from the connected corridor revealed the presence of data gaps. Such data gaps can impact the simulation generated performance measures. This research investigates the sensitivity of simulated performance measures to data loss and data imputations developed to infill the detector stream gaps. The impact of data stream gaps on the simulated performance measures is seen, in part, to be dependent on the combination of intersection approaches experiencing data loss. This combination effect can be attributed to both the vehicle volumes observed at these approaches and the ability of the approaches to process additional vehicles. The corridor location of the intersection approaches that have missing data, as well as the travel path of interest, also influence performance measure accuracy. The research demonstrates that to successfully leverage real-time high frequency connected corridor data streams for (near) real-time applications, it is crucial to develop data imputation methodologies that can both learn from historically available data and adapt to recent data trends. In this research, a Long Short Term Memory Recurrent Neural Network layers approach, modeling univariate and multivariate time series data, is developed for data imputation. Experiments are conducted to compare the performance of the univariate and multivariate models and to investigate the impact of these imputation approaches on the simulation performance measures. The findings show the potential advantages of using a multivariate model approach for imputations over a univariate model under atypical traffic conditions. Results also suggest better performance of the univariate model to impute missing data under typical traffic conditions. Future work includes additional development of the model using increased training and validation data along with hyper parameter tuning to increase robustness of the model performance.

Tags