how to batch upload local images to gee

part 2 - IMD rainfall example

Posted by Craig Dsouza on Sunday, May 30, 2021 Tags: IMD rainfall GEE   6 minute read
  • Share this article on :

The Indian Meteorological Department (IMD) has recently published a dataset of daily gridded rainfall (0.25 deg) for the entire time period from 1901 until 2020. The reference paper for this dataset states that rainfall records from 6995 rain gage stations were used in it’s preparation. A comparison of the dataset (named IMD4) suggests that it is comparable with other gridded rainfall datasets across the country and in regions such as the Western Sahyadris it is more realistic given that a higher density of rainfall stations was used.

The method used to convert station data to gridded was the Inverse Distance Weighted (IDW) approach, (Shepard, 1968) using a minimum of 1 and maximum of 4 stations within a radial distance of 1.5 deg around the gridded pixel.

Here we discuss accessing this data and publishing it to a Google Earth Engine Image Collection to leverage it’s utility against other remote sensing datasets.

final map layer
preview of final map layer

pre-requisites : clone the following github repo to your pc, then create a python virtual environment with the following requirements.txt . The packages listed here may be a bit excessive, alternatively you can install packages individually in your environment as required. The code here has been tested with Python 3.7

Table of Contents

  1. Download Historical Data
  2. Convert Grid to Tif
  3. Split Tif
  4. Upload Tif to GCP Bucket
  5. Subprocess loop to upload images

1. Download Historical Data

imd website
imd source of dataset

Data can be downloaded directly from the link provided above or programmatically using the python imdlib library. The IMDHistoricalGrid script uses the python imdlib library to download historical data programmatically from the IMD website. Pass the script the parameters ‘rain’, starting year and ending year according to which years you wish to download data for. Data is available for 1901 until 2020 and each file downloaded is 25MB.

C:Users\[Username]\Code\imdgrid> python IMDHistoricalGrid.py rain 2000 2001

This script downloads annual grd files in the following directory structure

  • C > Users > [Username] > Data > imd > rain > 2000.grd,
  • C > Users > [Username] > Data > imd > rain > 2001.grd

return to top

2. Convert Grid to Tif

the IMDHistoricalGrid2Tif script uses the imdlib library and rasterio to read in grd files and output tif files. Pass the script the parameters ‘rain’, starting year and ending year as per your needs.

C:Users\[Username]\Code\imdgrid> python IMDHistoricalGrid2Tif.py rain 2000 2001

you are then left with annual tif files in the following directory structure

  • C > Users > [Username] > Data > imd > rain > tif > 2000.tif,
  • C > Users > [Username] > Data > imd > rain > tif > 2001.tif

return to top

3. Split Tif

the IMDHistoricalTif2Daily script splits the annual tif files into daily tif files and save to sub-directories with IMDHistoricalTif2Daily . Pass the script the parameters ‘rain’, starting year and ending year as per your needs.

C:Users\[Username]\Code\imdgrid> python IMDHistoricalTif2Daily.py rain 2000 2001

after running this script you are left with daily files in the following directory structure.

  • C > Users > [Username] > Data > imd > rain > tif > 2000 > 20000101.tif,
  • C > Users > [Username] > Data > imd > rain > tif > 2000 > 20000102.tif,
  • C > Users > [Username] > Data > imd > rain > tif > 2000 > 20001231.tif

return to top

4. Upload Tif to GCP Bucket

First create a Google Cloud Project Bucket and set it’s permissions to public, as described here.

bucket public access
editing permissions on bucket to make it public

Upload all files to the GCP bucket with the following command.

C:Users\[Username]\Data\imd\rain\tif> gsutil -m cp *.tif gs://[bucketname]

return to top

5. Subprocess loop to upload images

upload images to GEE using the IMDHistoricalGCP2GEE script, which calls the earthengine CLI from within python to copy daily tif files from your GCP to GEE Assets. This script must be passed several parameters including year, bucketname, geeusername and geeimagecollectionpath.

C:Users\[Username]\Code\imdgrid> python IMDHistoricalGCP2GEE.py 2020 [bucketname] [geeusername] [gee-image-collection-path]

finally delete all images from the GCP Bucket after they have been copied to Google Earth Engine Assets

C:Users\[Username]\Code\imdgrid> gsutil -m rm gs://[bucketname]/**

Access image collection within your GEE code editor with the following demo script

Map.addLayer(rainfall.mean(),{},'rainfall')
print("number of images in rainfall collection: ",rainfall.size())
print("first image date ",ee.Date(rainfall.first().get("system:time_start")))
print("last image date ",ee.Date(rainfall.sort("system:time_start",false).first().get("system:time_start")))

return to top

Attribution: much of the code here is inspired by Ujaval Gandhi ,Qiusheng Wu and Saswata Nandi