Search, Batch, and Upload: Planet Basemaps to Descartes Labs
The Descartes Labs Platform is a data refinery combined with large-scale computational processing and scaling. The platform allows you to run analysis on their catalog of pre-ingested datasets, while also letting you bring your own private data archives. For this post, we will be bringing in and uploading Planet Basemaps as a private dataset into the Descartes Labs catalog.
I will be using Planet Basemaps made available via Norway’s International Climate and Forests Initiative (NICFI). Learn more and apply for access here. You can also apply for access to a Descartes labs account here. This post includes a few assumptions or prerequisites, including an active Planet account with access to a Basemap and an active Descartes Labs (DL) account to be able to ingest and upload these to your own DL catalog. This work is a result of my collaboration with Jeremy Malczyk, a Solutions Architect at Descartes Labs.
Setup and Housekeeping
Descartes Labs provides a workbench environment running a Jupyter Lab instance on top of their data catalog and compute. However, you can completely run the same locally using their Python Client and API services. For this example, I will be using their workbench since it comes preinstalled with a lot of the libraries I will be using and integrated into a browser-based and scalable JupyterLab instance
The first thing to do is to make sure you have all the libraries installed and the Planet client installed and initiated. Everything except the Planet Python client is preinstalled in the Descartes Labs server, where the Jupyterlab instance runs.
Once you enter your email and password that you use to log into Planet Explorer, a client initialized message is returned confirming that you are now set up.
The next step in the code block imports the libraries of interest. Some of the things you need for this include a geometry file as a GeoJSON object. You can either upload a GeoJSON file or create an empty aoi.geojson file and simply visit geojson.io to draw the area of interest. Use the editor in the Jupyter lab to copy and paste the geometry and save the file. You now have a shiny new version of the geometry copied over to your Jupuyterlab instance within Descartes Lab. The notebook uses this to convert it into a bounding box and then search for quads within the bounding box.
Search and Build the Quad List
The Planet Basemap search returns links to quads for the Basemap of interest and based on the mosaic name provided and the geometry provided. Learn more from our earlier post on “The Scoop on Planet Basemaps.”
The quads are 4096 by 4096 pixel chunks of a single mosaic, which are easier to download.
The code block uses the following input to look for all quads that intersect your given area and copy the URL list. This relies on you having access through the Planet program with which you are affiliated. For this setup, if you are working with forests, you can apply for and once granted permission simply use the Planet Basemap for Norway’s International Climate and Forests Initiative (NICFI), which is what we have used in the example here.
Currently, the notebook requires the following input
product name: "Product name for the Descartes Lab Catalog"
mosaic name: " Planet provided mosaic name which is used to look for the mosaic using Planet's Basemap API"
product description: "This describes your product"
aoi path: (Try /home/jovyan/Your folder name/aoi.geojson) or wherever you saved your aoi.geojson file
The Descartes Labs Catalog
Now that you have gotten started, let’s turn to Jeremy Malczyk from Descartes Labs.
Jeremy, would you mind unpacking what the concepts mean for users and the hierarchy of things within the Descartes Labs catalog, like Product Name, Product ID, and so on. I know you have documentation on these concepts but could you give us a quick short run-down?
So, Images are stored in the Descartes Labs platform within containers called “products’’ within the platform’s Catalog service, which correspond to collections of imagery that follow a similar structure. These containers/products can be searched and managed programmatically via a Python client, and imagery may be uploaded to the system as individual files or as multi-file packages corresponding to some spatial footprint and time.
Products, their bands, and images stored within products are referenced in the system by unique identifiers, which can be used by DL user interfaces and APIs to visualize and analyze the pixel data in various ways. Additional optional metadata describing these objects can be added to document details such as the wavelengths references by a band, the resolution of the imagery loaded, or the physical range of values represented by the raw data. The system will then automatically scale these values, warp, and resample the imagery as it is accessed. Multiband images, or collections of multiple images, make up a “scene” that represents a grouping of multiple images into a single object. The definition of this event is flexible and can reference any sort of event that spans milliseconds to mosaics of data over years.
Setting up your catalog product and fine tuning the dials
A couple of things have been preset here for the four band normalized analytic mosaic, but you can change these to fit your needs, including the list of band names, the default unit, data range, display range (for now we set a fixed min max range for band display and recalculate and update it later) and the pixel resolution depending on your setup.
Creating the uploader
The Descartes Labs Tasks API allows image ingestion from the Planet archive to scale via serverless cloud functions that asynchronously transfer imagery to the DL Catalog. Given a set of parameters (a remote image URL and a DL Catalog product id to upload to), DL Tasks workers can pull and register up to 500 images simultaneously, with the system automatically scaling the number of compute instances to match the work assigned. Once complete, the system scales back down to conserve resources. This also allows the user to scale this as needed and we can use the wait_for_completion call to let the Jupyter notebook wait until all workers have finished the task. You can combine this with a webhook (how about making a slack bot using the webhook to let you know when all your tasks have finished).
The Task Monitor
The Task Monitor is useful if you are using this to monitor the running tasks, though as discussed earlier, you could automate alerts and even using the Tasks API endpoints to get information on successful, pending, failed tasks and workers, and check progress and inform you as needed.
Setting band display ranges and visualization
To ensure that visualization takes into consideration the minimum and maximum value range per band adjustment is made to the visualization range after ingestion. In this case, a single image object is used from the collection to get the minima and maxima from each band and this range and then passed to the catalog product. The allows for a good starting point in terms of visualization and before adding the imagery to the notebook or via the catalog.
Protip: The notebook visualization also has an autoscale button that allows you to scale the imagery to the band min-max in the field of view visible to the user.
Once your tasks have completed you can go back to your catalog and browse using the My Products selection in the catalog. The Open in Viewer button will then allow for quick visualization. The minimum and maximum range we created is now used for the updated visualization and can be modified as needed.
There you have it, for now, this will allow you to quickly move data. Apply for access, create an AOI, and get crunching not only for ingesting data, but also analysis at scale. We hope this notebook and guide which you can download here helps users bring their datasets into the Descartes Labs environment quicker and jump into analysis. Here is the browsable version of the same Jupyter notebook to be used with Descartes Labs setup.