Research Projects

msaenet-icon

AI Climate  

 ongoing

Utilizing AI to enhance knowledge of the environment and climate, specifically in fields such as agriculture or forestry.


msaenet-icon

Critical MAAS  

 ongoing

Developing tools to rapidly and accurately assess resources of critical mineral commodities in the United States.

Sponsor: DARPA

Link


msaenet-icon

HAYSTAC  

 ongoing

Establishing models of “normal” human movement across times, locations, and people in order to characterize what makes an activity detectable as anomalous.

Sponsor: IARPA

Link


msaenet-icon

LASI-DAD  

 ongoing

Detecting elements of the built environment in India to better understand aging and dementia.


msaenet-icon

MNDot  

 ongoing

Evaluating the performance of different kinds of traffic detection systems in actuated control intersections under a variety of environmental conditions.


msaenet-icon

Seaport  

 ongoing

Identifying the target seaport(s) described in a document so that a geoint analyst can extract relevant information from the document about the seaport(s)


msaenet-icon

GeoBERT  

 ongoing

Building a geolocality aware natural language model using geographic and linguistic contex

Selected Paper(s):

  • SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation. Li, Z; Kim, J; Chiang, Y Y; Chen M; Accepted by Empirical Methods in Natural Language Processing (EMNLP). 2022.

GitHub Link


msaenet-icon

DeepLATTE  

 completed

Building a spatially-explicit approach for hierarchical forecasting on geolocated multiscale multivariate time-series data

Selected Paper(s):

  • Lin, Y., Chiang, Y.-Y., Franklin, M., Eckel, P. S., and Ambite, J. L. (November 2020). Building Autocorrelation-Aware Representations for Fine-Scale Spatiotemporal Prediction, In Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 352-361, Sorrento, Italy (9.8% acceptance rate)

GitHub


msaenet-icon

EarthScan  

 completed

Incorporating prior knowledge for robust object detection and geographic layer extraction from overhead imagery


msaenet-icon

Machines Reading Maps  

 completed

Machines Reading Maps (MRM) is a collaborative project between the Digital Library and Spatial Sciences Institute at the University of Southern California (US)and the Alan Turing Institute (UK). The project is funded by the United States’ National Endowment for the Humanities (NEH) and the United Kingdom’s Arts and Humanities Research Council (AHRC) under the first round of NEH/AHRC New Directions for Digital Scholarship. MRM seeks to normalize map text as a new kind of data that can be used across the humanities and the heritage sector. To do so MRM will change the way that humanists and heritage professionals interact with digitised map images. Maps constitute a significant body of global cultural heritage, and they are being scanned at a rapid pace in the US and UK. However, most critical investigation of maps continues on a small scale, through close readings of a few maps. Individual maps communicate through visual grammars, supplemented by text. But text on maps, particularly in aggregate, is a nearly untapped source about the construction of knowledge about place (with the notable exception of the GB1900 project, which crowdsourced transcriptions of all labels on the ca.1900 6-inch Ordnance Survey maps of Britain). While we speak colloquially about reading maps, MRM concretely addresses how to make text on maps an accessible resource. We will make maps searchable and linked to other geospatial data and collections, creating the possibility for humanities research that uses map text as a primary source. Spatial searching will no longer be limited by metadata fields like place of publication, but instead allows queries based on the labeled, spatial content of visual materials.

Sponsor: National Endowment for the Humanities (NEH)

GitHub Demo Link


msaenet-icon

Moving Behavior Detection from Trajectory  

 completed

Mining moving behaviors from trajectories is an important task but often relies on tedious manual work that cannot scale to process large amounts of trajectory data. Here, the moving behavior of a trajectory refers to the activity type describing the purpose of the movement (in space and time) regardless of the spatial and temporal scale of the trajectory (hence the term “multi-scale”). We develop several new capabilities for robust mining moving behaviors from large multi-scale trajectory data.

Selected Paper(s):

  • Yue, M., Chiang, Y.-Y., Shahabi, C. (2021). VAMBC: A Variational Approach for Mobility Behavior Clustering. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), pp. 453-469, online
    Yue, M., Li, Y., Yang, H., Ahuja, R., Chiang, Y.-Y., and Shahabi, C. (December 2019).
  • DETECT: Deep Trajectory Clustering for Mobility-Behavior Analysis. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), pp. 988–997, Los Angeles, CA, USA

Sponsor: National Geospatial-Intelligence Agency (NGA)


msaenet-icon

AI-Driven Analytics for Network Operations  

 completed

This project's overall goal is to develop a data and AI-driven approach to analyze anonymized data logs (~1 billion records) from the network devices (e.g., WAN routers and LAN switches) that NTT Global Networks (NTT GN) collects, monitors, and manages. The project results include a machine learning approach and its software implementation to detect issues that ultimately give rise to network outages or trouble tickets requiring NTT GN network engineers to investigate and resolve. Towards this end, we have developed an end-to-end approach to 1) generate baseline representations in the form of real-number feature vectors for capturing network activities and 2) predict events (issues) given recent network activities (e.g., in the last 300 minutes).

Sponsor: NTT Global Networks

GitHub


msaenet-icon

mapKurator  

 completed

Historical maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). mapKurator addresses the real-world problem of finding and indexing historical map images by automatically extract their text content and generates a set of metadata linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California.

Selected Paper(s):

  • An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images. Li, Z; Chiang, Y Y; Tavakkol, S; Shbita, B; Uhl, J H; and others Proceedings of ACM Knowledge Discovery and Data Mining Conference (KDD). 2020.

Sponsor: National Science Foundation (NSF), National Endowment for the Humanities (NEH), Google AI (collaborator)

GitHub Demo


msaenet-icon

Strabo  

 completed

Many historical maps exist as scanned images, which contain valuable information difficult to find elsewhere. In this project, we develop an open-source map-processing tool, Strabo. Strabo processes scanned map images to recognize map symbols, extract road geometries, recognize text labels, and associate extracted road geometries with the recognized text labels to generate named road vector data.

Selected Paper(s):

  • Generating Named Road Vector Data from Raster Maps. Chiang, Y.; and Knoblock, C. A In Xiao, N.; Kwan, M.; Goodchild, M. F; and Shekhar, S., editor(s), Geographic Information Science, volume 7478, of Lecture Notes in Computer Science, pages 57–71. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
  • Recognizing Text in Raster Maps. Chiang, Y.; and Knoblock, C. A GeoInformatica, 19(1): 1–27. February 2014.
    A General Approach for Extracting Road Vector Data from Raster Maps. Chiang, Y.; and Knoblock, C. A International Journal on Document Analysis and Recognition (IJDAR), 16(1): 55–81. March 2013.

Sponsor: National Geospatial-Intelligence Agency (NGA), CLS Group (UK), National Science Foundation (NSF), National Endowment for the Humanities (NEH)

GitHub Demo


msaenet-icon

Jonsnow  

 completed

Air quality models are important for studying the impact of air pollutants on health conditions. Existing work typically relies on area-specific, expert-selected attributes of pollution emissions (e.g., transportation) and dispersion (e.g., meteorology) for building the model for each combination of study areas, pollutant types, and spatiotemporal scales. In this project, we build a data mining approach, JonSnow, which utilizes publicly available OpenStreetMap (OSM) data to automatically generate an air quality model for the prediction and forecasting concentrations for any type of pollutants at various temporal scales. Our approach utilizes the PRISMS-DSCIC infrastructure developed at the USC Information Sciences Institute as the data collection, manipulation, and analysis platform. The PRISMS-DSCIC (Pediatric Research using Integrated Sensor Monitoring Systems - Data and Software Coordination and Integration Center) is an NIH-NIBIB (National Institutes of Health - National Institute of Biomedical Imaging and Bioengineering) funded initiative to address pediatric asthma as a chronic disease of childhood. PRISMS-DSCIC is responsible for collecting, storing, integrating, and analyzing real-time environmental, physiological, and behavioral data obtained from heterogeneous sensors and traditional data sources to help researchers predict and prevent asthma attacks efficiently. JonSnow automatically generates (domain-) expert-free models for accurate PM2.5 concentration predictions and forecasting, which can be used to improve air quality studues that traditionally rely on expert-selected input.

Selected Paper(s):

  • Stripelis, D., Ambite, J. L., Chiang, Y.-Y., Eckel, S. P., and Habre, R. (April 2017). A Scalable Data Integration and Analysis Architecture for Sensor Data of Pediatric Asthma, In Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1407-1408, San Diego, CA, USA
  • Lin, Y., Mago, N., Gao, Y., Li, Y., Chiang, Y.-Y., Shahabi, C., and Ambite, J. L. (November 2018). Exploiting Spatiotemporal Patterns for Accurate Air Quality Forecasting using Deep Learning. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 359 – 368, Seattle, WA, USA
  • Lin, Y., Chiang, Y.-Y., Pan F., Stripelis, D., Ambite, J. L., Eckel, S. P., and Habre, R. (November 2017). Mining Public Datasets for Modeling Intra-city PM2.5 Concentrations at a Fine Spatial Resolution. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Article No. 25, Redondo Beach, CA, USA

Sponsor: National Institutes of Health (NIH)

GitHub GitHub


msaenet-icon

LinkedMap  

 completed

The major goal of this project is to develop a framework to efficiently extract contents and build semantics for large volumes of maps as well as link the extracted contents through space and time for robust, meaningful change analysis. This framework is intended to exploit existing geographic data to build generic semantic models of geographic phenomena and use the models to extract geographic features from maps, evaluate (the semantic consistency), update the extracted data, and link the data across space and time.

Sponsor: National Science Foundation (NSF)

Link


msaenet-icon

MINT  

 completed

The Model INTegration (MINT) project is developing a modeling environment which will significantly reduce the time needed to develop new integrated models while ensuring their utility and accuracy.

Selected Paper(s):

  • Gil, Y., Garijo, D., Khider, D., Knoblock, C. A., Ratnakar, V., Osorio, M., Vargas, H., Pham, M., Pujara, J., Shbita, B., Vu, B., Chiang, Y.-Y., Feldman, D., Lin, Y., Song, H., Kumar, V., Khandelwal, A., Steinbach, M., Tayal, K., … Shu, L. (2021). Artificial Intelligence for Modeling Complex Systems: Taming the Complexity of Expert Models to Improve Decision Making. ACM Trans. Interact. Intell. Syst., 11(2), 1–49. https://doi.org/10.1145/3453172

Sponsor: Defense Advanced Research Projects Agency (DARPA)

GitHub Link


msaenet-icon

Bus Arrival Time Estimation  

 completed

Accurate forecasting of public transportation metrics is critical towards the high reliability and efficiency of the public transportation system. However, deploying a forecasting system to serve city-level public transportation with long-term forecasting is challenging. In this project, we develop the capability to process the entire Los Angeles Metropolitan Area (LAMA) for long-term forecasting of various public transportation system performance metrics. First, we explore both spatial statistical methods and machine learning methods to estimate traffic flows for the road segments that do not have traffic sensors. Second, we develop methods to enable traffic forecasting with a deep learning model designed for small networks for the entire LAMA road network. We also study various training strategies (e.g., teacher forcing) to enable accurate long-term forecasting of traffic flows and bus arrival times. Lastly, we develop an end-to-end deep learning approach that combines the estimation and forecasting of traffic flow with data imputation methods for estimating bus arrival time for each stop in individual bus routes in LAMA. Using the real-world traffic data in the University of Southern California Archived Transportation Data Management System (ADMS), we show that the proposed approach and system can predict bus arrival times with a city-level spatial coverage and a route-level temporal forecasting horizon. We also demonstrate the overall result of the bus arrival time estimation in a web dashboard. This dashboard enables users at all levels of technical skills to benefit from the developed machine learning approach and access valuable information for trip planning, vehicle management, and policymaking.

Selected Paper(s):

  • Nguyen, K., Yang, J., Lin, Y., Lin, J., Chiang, Y.-Y. and Shahabi, C. (November 2018). Los Angeles Metro Bus Data Analysis Using GPS Trajectory and Schedule Data (Demo Paper) In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 560 – 563, Seattle, WA, USA

Sponsor: California Department of Transportation (Caltrans)

Demo


msaenet-icon

Archived Data Management System  

 completed

Collaborating with the University of Southern California’s Integrated Media Systems Center (USC, IMSC), Los Angeles Metropolitan Transportation Authority (LA Metro), and USC METRANS, we develop a big transportation data warehouse – the Archived Traffic Data Management System (ADMS). ADMS fuses and analyzes a very large-scale and high-resolution (both spatial and temporal) traffic sensor data from different transportation authorities in Southern California, including the California Department of Transportation (Caltrans), Los Angeles Department of Transportation (LADOT), California Highway Patrol (CHP), Long Beach Transit (LBT). This dataset includes both inventory and real-time data with update rate as high as every 30 seconds for freeway and arterial traffic sensors (14,500 loop-detectors) covering 4,300 miles, 2,000 bus, and train automatic vehicle location (AVL), incidents such as accidents, traffic hazards and road closures reported (approximately 400 per day) by LAPD (Los Angeles Police Department) and CHP, and ramp meters. USC IMSC has been continuously collecting and archiving the datasets mentioned above since 2011. ADMS is the largest traffic sensor data warehouse built so far in Southern California.

Selected Paper(s):

  • Anastasiou, C., Lin, J., He, C., Chiang, Y.-Y., and Shahabi, C. (November 2019). ADMSv2: A Modern Architecture for Transportation Data Management and Analysis. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances on Resilient and Intelligent Cities (ARIC 2019), pp. 25–28, Chicago, IL, USA
  • Chiang, Y-Y. and Lin, Y. (2020). Design, Development, Testing, and Deployment of GIS Applications. The Geographic Information Science & Technology Body of Knowledge (4th Quarter 2020 Edition), John P. Wilson (Ed.). doi: 10.22224/gistbok/2020.4.2

Sponsor: Los Angeles County Metropolitan Transportation (LA Metro)

Demo