From Analysis Ready Data to Analysis Engines and Everything in between
With the advent of Machine learning, and the stream of Earth Observation data (EOD) coming in from different constellations of Earth Observation Systems(EOS), the traditional notion of remote sensing and GIS applications performed on personal laptops and desktop computers has changed. It is tempting to get into conversations about data and forget about how analysis itself has changed with the influx of large volumes of EO data. the end user experience users have when they are looking and analyzing the datasets. Carl Steinitz, wrote “If you’re going to change the world, you might as well do it at the world scale.” and with the introduction of machine learning techniques in object detection and classification, it is often possible to perform machine learning at scale.
From the time we started looking at satellite imagery we took the concept of pre-processing and added that as a step to the overall method or analysis pipelines. This was before we had automated pipelines churning through thousands and millions of images a day. This process of getting imagery into analysis ready formats such as Cloud Optimized Geotiff added further importance to the concept of Analysis Ready Datasets(ARD). So our data seemed ready to go, but the level of readiness involved in analysis by the end user varied, and what maybe ARD for some agencies and users may need more by others. We are at a cross between standardized and custom ARD requirements and specification that suits the user’s processing pipeline and a standard that could be used to supply data at a specific level.
The next logical step seemed to be to link wide array of data sets to analytical engines that is repeatable, scalable and more important can be near resource agnostic and serving similar experience without much importance on client hardware and network constraints. Through this article I pick a few of these analytical engines that I have used and explored. This is in no way an exhaustive list and there are many more use cases and examples that I have not explored in this article.
Google Earth Engine (GEE)
Google Earth Engine is well defined by their heavily cited paper in Remote Sensing of Environment
Google Earth Engine is a cloud-based platform for planetary-scale geospatial analysis that brings Google’s massive computational capabilities to bear on a variety of high-impact societal issues including deforestation, drought, disaster, disease, food security, water management, climate monitoring and environmental protection.
With over 70,000 users of GEE, this platform is constantly updated and users can join Google Earth Engine using nothing more than an active internet connection and a Google account. With over a thousand plus tools in their libraries and constant data set and feature updates based on user requests and input, the system serves as a platform that adapts based on the expertise and extensive use cases. From the first Global Forest Loss Map produced by Hansen to the Global Surface Water Map.
This platform allows users to not only use the over 500+ public data sets it already houses, but allows users to bring their own data and analyze it using GEE’s backend. Making this one of the easiest adaptable and yet one of the most advanced geospatial analysis engines for researchers, and non commercial developers to use. Some of the faces include
- Datasets are updated and maintained frequently so it eliminates the need for the user to search and ingest datasets. Find all raster datasets here
- All user account comes with 250 GB free quota which allows users to bring in additional datasets such as tables, commercial imagery, drone data and much more.
- Large library and change log and indeed boasts of a very well written documentation and tutorials.
- Owing to the number of users and the uniqueness of their problems, it is probably also one of the largest curated and active Google Groups on geospatial analysis.
- Now allows users to create external apps that does not you to sign into Google Earh Engine making it more accessible. You can read more about their app features here
- GEE also allows you to export Tensor Flow Record (TFRecord) and forward looking into integration with Tensor Flow for custom ML developments.
Sign up for an account and give it a whirl. GEE is constantly evolving and we hope certain things will become better for example, error reporting and error messages, information about rate limits on the Earth Engine API and quota and pool of tasks. GEE is not subject to any Service-Level Agreement (SLA) or deprecation policy and resources are often shared in a common pool so there is no direct way of running tasks with priority. As new data sets become available Earth Engine will remain an invaluable tool allowing the user to be resource agnostic at their end and perform complex analysis in a browser and make it repeatable.
Geospatial Big Data Platform (GBDX) and GBDX notebooks
Conversation with SeanGorman , Digital Globe
GBDX is a cloud platform that allows the users to bring computer processes to Digital Globe assets and perform analysis at scale without the need to move data. GBDX notebooks allowed for an easier on boarding to GBDX itself and gave users a more familiar Jupyter Notebook type environment and feel. This also allows for inline compute and fast prototyping something before running large tasks at scale.
GBDX notebooks consists of Open Data sets as well as DG assets. Open data sets include Digital Globe Open Data, Sentinel-2, Landsat and IKONOS data sets which have now been opened up for larger audiences to use. These open data sets are available in the community edition of GBDX platform.
I wanted to talk a little more about GBDX community edition since it it the only free tier available as an offering from GBDX platform.
- Community edition still gives you access to a 6 GB instance with about 20 GB of drive space
- All notebooks created under the community edition are open and not private so while you can create code repositories online they are shared under a common open MIT license.
- You can sign up for an account here and play around with some basic tutorials to get started. If you don’t want to sign up but still want to read a tutorial find one here.
So when we come down to the guts of it all, if you are happy with data analysis and prototyping within the Open environment this is a great tool to have. For researchers this may be challenging since methodology is often not in the public domain before publication of papers. So the open MIT licenses makes it difficult to prototype and test algorithms for publications without making them available to the community. Current limitations include, no efficient way to export derivatives or the open images out of the notebooks as they get large. The platform is still pretty new and they are constantly evolving, so it is definitely worth getting your feet wet.
Radiant Earth
Conversation with Yonah Bromberg Gaber, Radiant Earth Foundation
Between closed source but free like Google Earth Engine and closed products within a freemium model like GBDX notebooks, Radiant Earth Foundation is taking a completely radical approach. Radiant Earth Application which is another player offering cloud computing infrastructure for geospatial analysis made their code open source. Also being a non commercial entity meant they are neutral, non profit and completely open. This meant in theory users can spin their own version of Radiant Earth App if needed.
This platform further allows you to maintain your own personal projects and to bring in additional Earth observation and secondary data. The platform was designed with the idea of applying geospatial data analysis for good approach allowing users to visualize and quickly analyze Earth Observation data and enable analytics.
Based on the conversation the primary goal and target audience for Radiant Earth App include Global Development community, NGOs, academic researchers and students, and even entrepreneurs. The community at Radiant Earth along with others are also doing some amazing work with Spatial Temporal Asset Catalogs(STAC) and Open Source Machine learning commons(MLHUB). Here are somethings to keep in mind
- The target here in not for a global analysis perspective but rather more focused and local scale analysis but enabling flexibility and ease of use.
- The idea is to not have a steep learning curve and they have achieved that using tutorial and now with the help of regular webinars for people from around the world to join.
- There are no imposed quotas on analysis and data but rather on the fair and free use of available data sets.
While this platform promises to be many things, it does not benefit from large volumes of ingested data sets and as such the ingest on demand model may seem to lag for some users. Similar to GBDX this platform uses Sentinel-2 and Lansat data but falls short of the hundreds of others already ingested in Google Earth Engine. This is not designed for you to scale to a global analysis but the projects you created are shareable
Sentinel Playground
Sentinel Playground functions similar to the GBDX and seems to be functionally closer to Google Earth Engine in terms of javascript applications and the analysis results being displayed and generated directly in the browser. Sentinel Playground and Hub is operated by Sinergise and once again is a cloud platform for analysis of Sentinel-2A and B and so on. This is a project that is supported by ESA among other players
Somethings to keep in mind
- Similar to GBDX there is a free model but it is not very powerful in terms of what the free users get out of it. You can check their pricing plan here
- It does allow you to create custom analysis scripts similar to Google Earth Engine but their use is limited within the scope of the plan you subscribe.
- Pre-calculated indices and tiles make generating publish ready image or pdf easy using the generate tool.
- Limited in availability of datasets and includes Sentinel-2, Landsat-8 and MODIS.
- Also check out EO Browser goes one step further and their education focused use-cases .
- Check out their open-source Python libraries, which allow many more functions. Here is an example to do start-to-end process of land classification.
Though the platform itself is robust the limitations in terms of total number of assets available and the cost per user would act as a barrier to entry though it does have special programs for educational users within the EU.
Vane: Query Language
One of the platforms that I feel deserves a mention for creating basemaps and taking a Structured Query Language(SQL) based analysis model to an analytics platform is “Vane” by the same people who created Open Weather Map. Similar to Sentinel Playground , vane allows you to choose multiple sensors such as Landsat, Sentinel-2 and MODIS and band presets.
Vane has a similar model in terms of enabling creation of beautiful basemaps but also compute processes in their infrastructure and the capability to share this with other users. Best part, it is free for the most things or atleast I could not find a pricing plan. That being said the purpose of Vane also included bringing in datasets with hourly cadence such as weather data along with daily or weekly cadence satellite imagery. You can go through their overview here. As with other models Vane does have limitations in terms of a steep learning curve for custom queries and the number of datasets. Check out their create a basemap in four easy steps tutorial.
There are many more models and analytic engines with different target audiences and as we start getting more data, it is only a matter of time that local analysis will include elements of the cloud and we will send analysis to data rather than bringing the data to analysis. For now what constitutes a custom ARD and what stage you wish to use it remains an open ended question. Advances in image analytics and evolution of hardware will make asking questions interesting and make the questions more challenging.
If you find guide useful please click the clap button👏 to show your support! I hope to add to the article as I discover and explore new platforms.