Caspian Sea (credit: NASA)

Git Stories: Earth Engine Code Commit & Heatmaps

Samapriya Roy
5 min readNov 16, 2021

Telling a story or even writing code comes in steps and strides. On GitHub, users can think of this as commit logs and histories. GitHub also transforms your commit history into a contribution calendar map for a quick sneak peek at when you contribute code. While this transforms your commit history to a graphical representation one has to keep in mind that different users commit at different phases of work and larger commit logs do not necessarily imply more code or better code. Yet this allows users to understand their own behavior while writing and saving code. It is indicative of long breaks as well as sudden increments and coding habits, to say the least.

For this blog, I wanted to see if we can take the commit history from the code editor where users mostly write their script in JS and plot it in a calendar heatmap. But I wanted to go a step further than just plotting my history, what’s the fun in that. I realized that over a period of time we get access to three types of privileges over code, you are either

  • An owner: Owners are essentially the creators of code, this gives you permission to read, write, delete and pretty much do anything with your code.
  • A writer: This is perhaps the second more powerful setup in terms of code block because you do have read and write privileges to a code.
  • A reader: This permission is quite often shared during tutorials, for papers, simply shared for anyone to read, and is a great way to push modified code rather than the state save of code when you click on the get link button.

All these provide make you a reader and almost all users often have access to a lot of globally shared, or examples or tutorial codes over a period of time. I like to share a classic example is everyone who uses users/gena/packages which house the palettes module and most users can read this repository. As a result, I wanted to look at how all the repositories to which I have read access commit their code and how that changes over time.

Calendar Contributions Map for Samapriya Roy 2021: GitHub

Basic Assumptions and Setup

With that out of the way, it made me curious to think about effective ways to represent Google Earth Engine (GEE) code commits within the code editor. For GEE, the following assumptions are held as true

  • A commit refers to every time you save your script in the code editor
  • All commit history is for JS code written only in the code editor and with a commit being made every time the script is saved
  • For users using the Python implementation, these commits are probably stored locally or in the git subsystem, an individual user might choose. Since these are not captured in this representation so this is a subset in terms of representation of actual code written with GEE.
  • Last but not least save histories and behavior vary, meaning different users commit at different stages.

Turns out you can access your GEE Github implementation at https://earthengine.googlesource.com/ and find your repo and then download it using git clone or downloading the tar zipped file.

Earthengine code source to get to your repo codes

Google Earth Engine Commits

Seems I had read access to about 9634 repositories with at least read access with me or global read access. Over 7175 unique users (not all users have shared code or commits and some logs and repositories are empty as expected). In total I parsed a total of approximately 2.5 million open commits.

I used a python library called calmap to plot the data frame along with pandas to handle the data series with DateTime index.

Animated Moving Frame Commit Heatmaps all users 2015–2021

Some of the obvious things that jump out include the fact that there is relatively less activity in terms of the number of commits during the weekends with a steady increase in both the number of repositories and total commits per day over the years

Commit log (total commits per day for approximately 2.5 million commits)

Commit History vs Operations

While the commit history gives us an understanding of actual commits GEE operations like line additional and deletion shed light on the actual amount of changes going into these commits. The code was modified to look into each operation and then add them by each day to create the same calendar map for date and operations.

Since the number of operations per day could vary by a large number, the total was divided by 100 to help with scaling. So actual operation counts are the values on the scale bar x 100.

GEE operations log (addition and deletion counts per day/100 for approximately 7175 unique users)

Commit History, Commit count and Active Users

I envision this to lead to the next step thinking about the number of unique active users per day who are actually committing code. This includes users count per day on users contributing code on those specific days through the years, this is a better representation of how distributed are the commits and the operations across users.

User count per day for code commits and operations

Git Stories Recap

Seems I have gotten a bit rusty since my graduate days and my overall commits have gone down but the number of lines per commit has remained large. Perhaps this also eludes to the fact that many users might choose to create a state save version of code to share rather than actually sharing scripts in repositories.

Might also point to the fact that currently sharing an individual code inside a repo is not possible which makes this less than optimal in many cases.

This was an exercise in data science with the intention of exploring how and when people I collaborate with or whose codes I have used over years write code and to understand the story just a bit better. I hope you enjoyed the short read as much as I had fun writing and creating this piece.

--

--

Samapriya Roy

Remote sensing applications, large scale data processing and management, API applications along with network analysis and geostatistical methods