Few columns will capture or transfer to table format as JSON tag.
While decoding the JSON take huge effort! below code set to help to split into new feature by python.
import json
extracted_event_data = pd.io.json.json_normalize(train.event_data.apply(json.loads))
def flatten_json(y):
out = {}def flatten(x, name=”):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + ‘_’)
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + ‘_’)
i += 1
else:
out[name[:-1]] = xflatten(y)
return outfrom pandas.io.json import json_normalize
flat = flatten_json(che)
pd.set_option(‘display.max_colwidth’, -1)
json_normalize(flat)
For a sample of 100K rows, this code runs in ~12 sec in a Kaggle Kernel (resulting a DataFrame with 136 columns). That means that processing all train_df will require ~20 min.
