Split the JSON – Python

November 29, 2019 Vinoth

Few columns will capture or transfer to table format as JSON tag.

While decoding the JSON take huge effort! below code set to help to split into new feature by python.

import json
extracted_event_data = pd.io.json.json_normalize(train.event_data.apply(json.loads))

def flatten_json(y):
out = {}

def flatten(x, name=”):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + ‘_’)
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + ‘_’)
i += 1
else:
out[name[:-1]] = x

flatten(y)
return out

from pandas.io.json import json_normalize
flat = flatten_json(che)
pd.set_option(‘display.max_colwidth’, -1)
json_normalize(flat)

For a sample of 100K rows, this code runs in ~12 sec in a Kaggle Kernel (resulting a DataFrame with 136 columns). That means that processing all train_df will require ~20 min.

Published by Vinoth

Highly Qualified Data Scientist with 3+ years of experience in Machine Learning, Data Analytics and 5 years of prior experience in Data Warehousing. Highly skilled in Predictive Modeling, Data Analytics, Deep Learning, Neural Networks, Tableau, Python, Hadoop framework and Spark. View all posts by Vinoth

Share this:

Related

Published by Vinoth

Leave a comment Cancel reply