PH.D DEFENCE - PUBLIC SEMINAR

Usage of very high-resolution optical RGB satellite imagery in geo-information extraction for fine-scale map-making

Speaker
Ms. Hu Wenmiao
Advisor
Dr Roger Zimmermann, Professor, School of Computing


30 Apr 2024 Tuesday, 01:00 PM to 02:30 PM

SR12, COM3 01-21

Abstract:

Maps are the fundamental elements of any navigation and localization system. They also play critical roles in other fields such as urban planning and city redevelopment. With the fast expansion of urban areas and the increasing complexity of modern cities, traditional mapping techniques cannot meet the need for frequent map updates with enriched map details. Satellite remote sensing is the technique that can conduct large-scale regular surveys from a distance without additional permission for local access and requires low human interaction once the satellites are deployed. It has been used in many areas, such as climate study, disaster management, etc. With the increasing acquisition capacities of high-resolution commercial satellite constellations and increasingly maturing post-acquisition processing pipelines, we can purchase ready-to-use geo-referenced very high-resolution (VHR) RGB satellite imagery with ground sampling distance (GSD) below 1 meter per pixel. Such data provides useful information for fine-scale map-making and updating. As a result, geo-information extraction from satellite imagery for map-making has become an emerging research field in recent years.

In this thesis, we investigate how geo-referenced VHR RGB satellite imagery can be used to extract geo-information to support the creation of fine-scale maps and facilitate map updates. New methods are proposed to better utilize satellite imagery in different scenarios.

First, we discuss how satellite imagery can be used as the main data resource for geo-information extraction (target-object segmentation) and focus on providing solutions and a new dataset to alleviate the issues of data scarcity. Then, we investigate the possibility of using satellite imagery as a geo-referenced data source to extract location and orientation information for other potential map entities, specifically for street-view imagery, with the aim of correctly locating them on digital maps for navigation or for other downstream map-making tasks.

For the first direction, two works are presented. In Chapter 3, we propose GeoPalette and GAN-assisted training to generate synthetic satellite training pairs and use the synthetic data to improve the overall performance of a given task when the real training dataset has a limited number of samples and diversities. Our proposed image generators can augment the existing samples at the appearance level and create new samples with novel scene structures that are not covered in the real dataset. For assisted-training, we test different approaches to utilize the synthetic data and provide a set of metrics to shortlist synthetic data from a large pool of synthetic candidates. Our results on road segmentation show that indeed using synthetic training pairs can improve overall performance when the training dataset is limited. Our proposed metrics are more aligned with the trained model performance compared to commonly used GAN evaluation metric Fréchet Inception Distance (FID). We believe the GAN-assisted training scheme can be applied to other satellite geo-information tasks and reduce the efforts in creating large datasets. On the other hand, in Chapter 4, we create the context-enriched Grab-Pklot dataset for parking lot detection in urban areas. Grab-Pklot contains 1,344 training samples with 1024 x1024 pixels with contextual enriched road and building features in the neighborhood. To the best of our knowledge, Grab-Pklot is the first high-resolution and context-enriched satellite imagery dataset for parking lot detection. We provided a benchmark data fusion baseline that can utilize the road and building features with satellite imagery. Our experiment shows that by using the surrounding road and building information on top of satellite imagery, the performance and outcome of parking lot detection are improved.

For the second direction, we explored the cross-view matching between satellite images and street-view images in large-scale search and local-scale search. It is for extracting accurate meta location and orientation information of the street-view images so that they can be correctly placed in the digital maps as a new type of map entity for visual/machine review/navigation or other downstream map-making applications. Chapter 5 focuses on improving the granularity of orientation extraction for image-retrieval-based approaches in large-scale search over a region of interest without location prior. We define new reference coordinate systems and metrics to process and evaluate the fine-grained orientation extraction tasks and propose two methods to refine the granularity and accuracy of the extracted results. Both methods give significant improvements in orientation extraction and demonstrate that by incorporating fine-grained orientation estimation, the performance of geolocalization can be further improved. Chapter 6 studies the local-scale search that restricts the location search within one satellite reference image. We propose the PetalView feature extractors and multi-scale search to find the fine-grained location and orientation information of the street-view images at the sub-meter and sub-degree level with reduced computational requirements. Additionally, a learnable prior angle mixer is presented. It learns how to fuse the angle prior in other modalities into the similarity-based orientation estimation curve to guide the outcome when noisy orientation information is available. Although our method in this work is designed for local-scale search, we believe the usage has potential to be extended for large-scale search without location and orientation prior for future study.