Help rework code to prevent kernel crash? #166

shijithpk · 2023-01-21T10:05:21Z

shijithpk
Jan 21, 2023

So what's happening is that when I run the code below, the kernel crashes after 15 minutes. If anyone could give me suggestions on how I can rework it to prevent a kernel crash, I'd really appreciate it.

Basically what the code does is get temperature data for two adjacent MODIS tiles for the last 5 years and stitch them together.

Should I split the code into smaller functions? Or break the data down into smaller chunks? Can dask help me in any way here? Should I parallelize this code?

The way the code is written right now, it's hitting planetary computer's memory and processing limits. How do I solve this problem?

Here's the code:

import pystac_client
import planetary_computer
import odc.stac
import numpy as np 
import xarray as xr
import rioxarray
from rioxarray.merge import merge_datasets
import geopandas as gpd 

geojson_url = ("https://raw.githubusercontent.com/opendatakerala"+\
		"/lsg-kerala-data/main/data/kerala_lsg_data.geojson")
unmerged = gpd.read_file(geojson_url)
merged = unmerged.dissolve()
bbox = merged.total_bounds

def get_dataset_kerala(start,end, clip_df):
	'''
	start and end are time strings in YYYY-MM-DD format
	clip_df is the geopandas dataframe we crop our data layer to 
	'''
	datetime_string = start + 'T00:00:00+05:30/' + end + 'T23:59:59+05:30'

	# so kerala as a region spans two tiles in modis data 
	# code below gets data for both tiles separately & stitches them together
    
	collection_id = "modis-11A1-061"
	day_field = 'LST_Day_1km'
	night_field = 'LST_Night_1km'

	stac = pystac_client.Client.open(
		"https://planetarycomputer.microsoft.com/api/stac/v1",
		modifier=planetary_computer.sign_inplace,
		)

	dataset_list = []

	for tile_id in ['51025007','51025008']:
		search = stac.search(
			bbox=bbox,
			datetime=datetime_string,
			collections=[collection_id], 
			query={
					'platform':{'eq':'aqua'},
					'modis:tile-id':{ 'eq':tile_id},
					},
			)

		search_results = search.item_collection()

		data = odc.stac.load( 
				items = search_results,
				crs = "EPSG:4326",
				resolution = 0.00836036036036036, #no. of degrees in 928m
				bands=[day_field, night_field],
				bbox=bbox,
				)

		day_field_mod = day_field + '_mod'
		night_field_mod = night_field + '_mod'

		data[day_field_mod] = data[day_field] * 0.02 #scale down Kelvin values
		data[night_field_mod] = data[night_field] * 0.02 
		data = data.drop_vars([day_field, night_field])
		data = data.rio.clip(clip_df['geometry']) # clip layer to geometry

		dataset_list.append(data)

	def clean_up_dataset_list(dataset_list):
		'''
		Removes days that don't have data for both tiles
		Makes it easier to merge datasets for both tiles later
		'''
		dataset_list_cleaned = []
		intersection = np.intersect1d(dataset_list[0]['time'].data, 
									dataset_list[1]['time'].data)

		for dataset in dataset_list:
			diff = np.setdiff1d(dataset['time'].data, intersection)
			if diff.size > 0:
				dataset = dataset.drop_sel(time=diff)
			dataset_list_cleaned.append(dataset)

		return dataset_list_cleaned
	
	dataset_list = clean_up_dataset_list(dataset_list)
	
	combined = merge_datasets(dataset_list, method='first')
	combined = xr.where(combined == 0, np.nan, combined)
	combined.rio.write_crs("epsg:4326", inplace=True)

	return combined

start = '2018-01-01'
end = '2022-12-31'
clip_df = merged

dataset_kerala_2018_2022 = get_dataset_kerala(start, end, clip_df)

dataset_kerala_2018_2022.to_netcdf('test_output.nc',
	encoding={'LST_Day_1km_mod': {'zlib': True},
			'LST_Night_1km_mod': {'zlib': True},
			})

And this is the error message i get in jupyter notebook:

The Kernel crashed while executing code in the the current cell or a previous cell. 
Please review the code in the cell(s) to identify a possible cause of the failure.

Answered by TomAugspurger

Jan 23, 2023

Most likely you are hitting the memory limit of the notebook server node. The taskbar at the bottom should give you an indication of whether that's true.

If so, you'll want to find out where the memory usage is spiking. You can either do that manually, by stepping through your code line-by-line, or use a memory profiler. I've used https://pypi.org/project/memory-profiler/, and https://bloomberg.github.io/memray/ is supposed to be nice.

Once you've determined where the issue is Dask might be able to help. I'm not sure if the functions you're using work well with Dask arrays (i.e. without converting them to a single large NumPy array), but the memory profiler should be able to help there.

View full answer

TomAugspurger · 2023-01-23T12:43:37Z

TomAugspurger
Jan 23, 2023

Most likely you are hitting the memory limit of the notebook server node. The taskbar at the bottom should give you an indication of whether that's true.

If so, you'll want to find out where the memory usage is spiking. You can either do that manually, by stepping through your code line-by-line, or use a memory profiler. I've used https://pypi.org/project/memory-profiler/, and https://bloomberg.github.io/memray/ is supposed to be nice.

Once you've determined where the issue is Dask might be able to help. I'm not sure if the functions you're using work well with Dask arrays (i.e. without converting them to a single large NumPy array), but the memory profiler should be able to help there.

1 reply

shijithpk Jan 23, 2023
Author

cool, will try a memory profiler and see where the usage is spiking, thanks for the help, Tom! :)

brunosan · 2023-01-23T16:53:52Z

brunosan
Jan 23, 2023

When I had a similar problem, I fixed it by lowering the resolution so it uses overviews (lowering the memory footprint) or making smaller chunks so each job also uses less memory.

1 reply

shijithpk Jan 23, 2023
Author

thanks bruno, will try this too! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help rework code to prevent kernel crash? #166

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Help rework code to prevent kernel crash? #166

shijithpk Jan 21, 2023

Replies: 2 comments · 2 replies

TomAugspurger Jan 23, 2023

shijithpk Jan 23, 2023 Author

brunosan Jan 23, 2023

shijithpk Jan 23, 2023 Author

shijithpk
Jan 21, 2023

Replies: 2 comments 2 replies

TomAugspurger
Jan 23, 2023

shijithpk Jan 23, 2023
Author

brunosan
Jan 23, 2023

shijithpk Jan 23, 2023
Author