Research Projects

AI Climate

ongoing

Utilizing AI to enhance knowledge of the environment and climate, specifically in fields such as agriculture or forestry.

Link

Permafrost Peatlands Mapping Modeling

ongoing

This project employs machine learning models to map and predict the extent and characteristics of permafrost and peatlands across the United States. This project is conducted under AI-CLIMATE and in conjunction with the Jelinski Lab

CEDAR

ongoing

Carbon Estimation with Deep LeARning (CEDAR) is a project designed to estimate aboveground biomass and carbon stocks in forests by leveraging deep learning models. CEDAR is a collaboration with Chad Babcock's Lab

Critical MAAS

ongoing

Developing tools to rapidly and accurately assess resources of critical mineral commodities in the United States.

Sponsor: DARPA

HAYSTAC

ongoing

Establishing models of “normal” human movement across times, locations, and people in order to characterize what makes an activity detectable as anomalous.

Sponsor: IARPA

Link

LASI-DAD

ongoing

Exploring human and environment interaction to better understand the determinants of dementia and its impact on society in India

Physics + ML

ongoing

We work on physics-guided machine learning methods to solve real-world problems in various domains, such as Astrophysics and Material Science. Previously, we developed a physics-guided neural network, SVPNet, for spatiotemporal predictive learning. The SVPNet learns effective physics representations by estimating the error evolution in physics states for correction and modeling spatially varying physical dynamics to predict future state.

Selected Paper(s):

Modeling Spatially Varying Physical Dynamics for Spatiotemporal Predictive Learning. Lin, Y; Chiang, Y Y; Accepted by ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL). 2023.

GitHub

Aquasense: Time Series Analysis with Sparse Sensor Data

ongoing

The project aims to develop an end-to-end pipeline for real-time water quality monitoring and assessment, including building water sensors, sensor deployment, data collection, and data analysis like spatiotemporal prediction to provide solutions for protecting water quality

AI-based Program for Advancing Research, Education and Extension Activities in Precision Agriculture at PVAMU

ongoing

Link

SCH: Wearables for Health and Disease Knowledge (W4H)

ongoing

Link

Gateway Exposome Coordinating Center (GECC) For AD/ADRD Research

ongoing

Link

Building Long-term, National-scale Spatiotemporal Data Collections from Historical Map Archives

ongoing

Link

MNDot

completed

Evaluating the performance of different kinds of traffic detection systems in actuated control intersections under a variety of environmental conditions.

Seaport

ongoing

Identifying the target seaport(s) described in a document so that a geoint analyst can extract relevant information from the document about the seaport(s)

GeoBERT

ongoing

Building a geolocality aware natural language model using geographic and linguistic contex

Selected Paper(s):

SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation. Li, Z; Kim, J; Chiang, Y Y; Chen M; Accepted by Empirical Methods in Natural Language Processing (EMNLP). 2022.

GitHub Link

DeepLATTE

completed

Building a spatially-explicit approach for hierarchical forecasting on geolocated multiscale multivariate time-series data

Selected Paper(s):

Lin, Y., Chiang, Y.-Y., Franklin, M., Eckel, P. S., and Ambite, J. L. (November 2020). Building Autocorrelation-Aware Representations for Fine-Scale Spatiotemporal Prediction, In Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 352-361, Sorrento, Italy (9.8% acceptance rate)

GitHub

EarthScan

completed

Incorporating prior knowledge for robust object detection and geographic layer extraction from overhead imagery

Machines Reading Maps

completed

Machines Reading Maps (MRM) is a collaborative project between the Digital Library and Spatial Sciences Institute at the University of Southern California (US)and the Alan Turing Institute (UK). The project is funded by the United States’ National Endowment for the Humanities (NEH) and the United Kingdom’s Arts and Humanities Research Council (AHRC) under the first round of NEH/AHRC New Directions for Digital Scholarship. MRM seeks to normalize map text as a new kind of data that can be used across the humanities and the heritage sector. To do so MRM will change the way that humanists and heritage professionals interact with digitised map images. Maps constitute a significant body of global cultural heritage, and they are being scanned at a rapid pace in the US and UK. However, most critical investigation of maps continues on a small scale, through close readings of a few maps. Individual maps communicate through visual grammars, supplemented by text. But text on maps, particularly in aggregate, is a nearly untapped source about the construction of knowledge about place (with the notable exception of the GB1900 project, which crowdsourced transcriptions of all labels on the ca.1900 6-inch Ordnance Survey maps of Britain). While we speak colloquially about reading maps, MRM concretely addresses how to make text on maps an accessible resource. We will make maps searchable and linked to other geospatial data and collections, creating the possibility for humanities research that uses map text as a primary source. Spatial searching will no longer be limited by metadata fields like place of publication, but instead allows queries based on the labeled, spatial content of visual materials.

Sponsor: National Endowment for the Humanities (NEH)

GitHub Demo Link

Moving Behavior Detection from Trajectory

completed

Mining moving behaviors from trajectories is an important task but often relies on tedious manual work that cannot scale to process large amounts of trajectory data. Here, the moving behavior of a trajectory refers to the activity type describing the purpose of the movement (in space and time) regardless of the spatial and temporal scale of the trajectory (hence the term “multi-scale”). We develop several new capabilities for robust mining moving behaviors from large multi-scale trajectory data.

Selected Paper(s):

Yue, M., Chiang, Y.-Y., Shahabi, C. (2021). VAMBC: A Variational Approach for Mobility Behavior Clustering. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), pp. 453-469, online
Yue, M., Li, Y., Yang, H., Ahuja, R., Chiang, Y.-Y., and Shahabi, C. (December 2019).
DETECT: Deep Trajectory Clustering for Mobility-Behavior Analysis. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), pp. 988–997, Los Angeles, CA, USA

Sponsor: National Geospatial-Intelligence Agency (NGA)

AI-Driven Analytics for Network Operations

completed

This project's overall goal is to develop a data and AI-driven approach to analyze anonymized data logs (~1 billion records) from the network devices (e.g., WAN routers and LAN switches) that NTT Global Networks (NTT GN) collects, monitors, and manages. The project results include a machine learning approach and its software implementation to detect issues that ultimately give rise to network outages or trouble tickets requiring NTT GN network engineers to investigate and resolve. Towards this end, we have developed an end-to-end approach to 1) generate baseline representations in the form of real-number feature vectors for capturing network activities and 2) predict events (issues) given recent network activities (e.g., in the last 300 minutes).

Sponsor: NTT Global Networks

GitHub

mapKurator

completed

Historical maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). mapKurator addresses the real-world problem of finding and indexing historical map images by automatically extract their text content and generates a set of metadata linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California.

Selected Paper(s):

An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images. Li, Z; Chiang, Y Y; Tavakkol, S; Shbita, B; Uhl, J H; and others Proceedings of ACM Knowledge Discovery and Data Mining Conference (KDD). 2020.

Sponsor: National Science Foundation (NSF), National Endowment for the Humanities (NEH), Google AI (collaborator)

GitHub Demo

Strabo

completed

Many historical maps exist as scanned images, which contain valuable information difficult to find elsewhere. In this project, we develop an open-source map-processing tool, Strabo. Strabo processes scanned map images to recognize map symbols, extract road geometries, recognize text labels, and associate extracted road geometries with the recognized text labels to generate named road vector data.

Selected Paper(s):

Generating Named Road Vector Data from Raster Maps. Chiang, Y.; and Knoblock, C. A In Xiao, N.; Kwan, M.; Goodchild, M. F; and Shekhar, S., editor(s), Geographic Information Science, volume 7478, of Lecture Notes in Computer Science, pages 57–71. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
Recognizing Text in Raster Maps. Chiang, Y.; and Knoblock, C. A GeoInformatica, 19(1): 1–27. February 2014.
A General Approach for Extracting Road Vector Data from Raster Maps. Chiang, Y.; and Knoblock, C. A International Journal on Document Analysis and Recognition (IJDAR), 16(1): 55–81. March 2013.

Sponsor: National Geospatial-Intelligence Agency (NGA), CLS Group (UK), National Science Foundation (NSF), National Endowment for the Humanities (NEH)

GitHub Demo

Jonsnow

completed

Air quality models are important for studying the impact of air pollutants on health conditions. Existing work typically relies on area-specific, expert-selected attributes of pollution emissions (e.g., transportation) and dispersion (e.g., meteorology) for building the model for each combination of study areas, pollutant types, and spatiotemporal scales. In this project, we build a data mining approach, JonSnow, which utilizes publicly available OpenStreetMap (OSM) data to automatically generate an air quality model for the prediction and forecasting concentrations for any type of pollutants at various temporal scales. Our approach utilizes the PRISMS-DSCIC infrastructure developed at the USC Information Sciences Institute as the data collection, manipulation, and analysis platform. The PRISMS-DSCIC (Pediatric Research using Integrated Sensor Monitoring Systems - Data and Software Coordination and Integration Center) is an NIH-NIBIB (National Institutes of Health - National Institute of Biomedical Imaging and Bioengineering) funded initiative to address pediatric asthma as a chronic disease of childhood. PRISMS-DSCIC is responsible for collecting, storing, integrating, and analyzing real-time environmental, physiological, and behavioral data obtained from heterogeneous sensors and traditional data sources to help researchers predict and prevent asthma attacks efficiently. JonSnow automatically generates (domain-) expert-free models for accurate PM2.5 concentration predictions and forecasting, which can be used to improve air quality studues that traditionally rely on expert-selected input.

Selected Paper(s):

Stripelis, D., Ambite, J. L., Chiang, Y.-Y., Eckel, S. P., and Habre, R. (April 2017). A Scalable Data Integration and Analysis Architecture for Sensor Data of Pediatric Asthma, In Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1407-1408, San Diego, CA, USA
Lin, Y., Mago, N., Gao, Y., Li, Y., Chiang, Y.-Y., Shahabi, C., and Ambite, J. L. (November 2018). Exploiting Spatiotemporal Patterns for Accurate Air Quality Forecasting using Deep Learning. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 359 – 368, Seattle, WA, USA
Lin, Y., Chiang, Y.-Y., Pan F., Stripelis, D., Ambite, J. L., Eckel, S. P., and Habre, R. (November 2017). Mining Public Datasets for Modeling Intra-city PM2.5 Concentrations at a Fine Spatial Resolution. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Article No. 25, Redondo Beach, CA, USA

Sponsor: National Institutes of Health (NIH)

GitHub GitHub

LinkedMap

completed

The major goal of this project is to develop a framework to efficiently extract contents and build semantics for large volumes of maps as well as link the extracted contents through space and time for robust, meaningful change analysis. This framework is intended to exploit existing geographic data to build generic semantic models of geographic phenomena and use the models to extract geographic features from maps, evaluate (the semantic consistency), update the extracted data, and link the data across space and time.

Sponsor: National Science Foundation (NSF)

Link

MINT

completed

The Model INTegration (MINT) project is developing a modeling environment which will significantly reduce the time needed to develop new integrated models while ensuring their utility and accuracy.

Selected Paper(s):

Gil, Y., Garijo, D., Khider, D., Knoblock, C. A., Ratnakar, V., Osorio, M., Vargas, H., Pham, M., Pujara, J., Shbita, B., Vu, B., Chiang, Y.-Y., Feldman, D., Lin, Y., Song, H., Kumar, V., Khandelwal, A., Steinbach, M., Tayal, K., … Shu, L. (2021). Artificial Intelligence for Modeling Complex Systems: Taming the Complexity of Expert Models to Improve Decision Making. ACM Trans. Interact. Intell. Syst., 11(2), 1–49. https://doi.org/10.1145/3453172

Sponsor: Defense Advanced Research Projects Agency (DARPA)

GitHub Link

Bus Arrival Time Estimation

completed

Accurate forecasting of public transportation metrics is critical towards the high reliability and efficiency of the public transportation system. However, deploying a forecasting system to serve city-level public transportation with long-term forecasting is challenging. In this project, we develop the capability to process the entire Los Angeles Metropolitan Area (LAMA) for long-term forecasting of various public transportation system performance metrics. First, we explore both spatial statistical methods and machine learning methods to estimate traffic flows for the road segments that do not have traffic sensors. Second, we develop methods to enable traffic forecasting with a deep learning model designed for small networks for the entire LAMA road network. We also study various training strategies (e.g., teacher forcing) to enable accurate long-term forecasting of traffic flows and bus arrival times. Lastly, we develop an end-to-end deep learning approach that combines the estimation and forecasting of traffic flow with data imputation methods for estimating bus arrival time for each stop in individual bus routes in LAMA. Using the real-world traffic data in the University of Southern California Archived Transportation Data Management System (ADMS), we show that the proposed approach and system can predict bus arrival times with a city-level spatial coverage and a route-level temporal forecasting horizon. We also demonstrate the overall result of the bus arrival time estimation in a web dashboard. This dashboard enables users at all levels of technical skills to benefit from the developed machine learning approach and access valuable information for trip planning, vehicle management, and policymaking.

Selected Paper(s):

Nguyen, K., Yang, J., Lin, Y., Lin, J., Chiang, Y.-Y. and Shahabi, C. (November 2018). Los Angeles Metro Bus Data Analysis Using GPS Trajectory and Schedule Data (Demo Paper) In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 560 – 563, Seattle, WA, USA

Sponsor: California Department of Transportation (Caltrans)

Demo

Archived Data Management System

completed

Collaborating with the University of Southern California’s Integrated Media Systems Center (USC, IMSC), Los Angeles Metropolitan Transportation Authority (LA Metro), and USC METRANS, we develop a big transportation data warehouse – the Archived Traffic Data Management System (ADMS). ADMS fuses and analyzes a very large-scale and high-resolution (both spatial and temporal) traffic sensor data from different transportation authorities in Southern California, including the California Department of Transportation (Caltrans), Los Angeles Department of Transportation (LADOT), California Highway Patrol (CHP), Long Beach Transit (LBT). This dataset includes both inventory and real-time data with update rate as high as every 30 seconds for freeway and arterial traffic sensors (14,500 loop-detectors) covering 4,300 miles, 2,000 bus, and train automatic vehicle location (AVL), incidents such as accidents, traffic hazards and road closures reported (approximately 400 per day) by LAPD (Los Angeles Police Department) and CHP, and ramp meters. USC IMSC has been continuously collecting and archiving the datasets mentioned above since 2011. ADMS is the largest traffic sensor data warehouse built so far in Southern California.

Selected Paper(s):

Anastasiou, C., Lin, J., He, C., Chiang, Y.-Y., and Shahabi, C. (November 2019). ADMSv2: A Modern Architecture for Transportation Data Management and Analysis. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances on Resilient and Intelligent Cities (ARIC 2019), pp. 25–28, Chicago, IL, USA
Chiang, Y-Y. and Lin, Y. (2020). Design, Development, Testing, and Deployment of GIS Applications. The Geographic Information Science & Technology Body of Knowledge (4th Quarter 2020 Edition), John P. Wilson (Ed.). doi: 10.22224/gistbok/2020.4.2

Sponsor: Los Angeles County Metropolitan Transportation (LA Metro)

Demo