In this article, we'll explore how to effectively manipulate JSON data structures with nested objects using pandas.
Consider the following JSON structure:
<code class="json">{ "number": "", "date": "01.10.2016", "name": "R 3932", "locations": [ { "depTimeDiffMin": "0", "name": "Spital am Pyhrn Bahnhof", "arrTime": "", "depTime": "06:32", "platform": "2", "stationIdx": "0", "arrTimeDiffMin": "", "track": "R 3932" }, { "depTimeDiffMin": "0", "name": "Windischgarsten Bahnhof", "arrTime": "06:37", "depTime": "06:40", "platform": "2", "stationIdx": "1", "arrTimeDiffMin": "1", "track": "" }, { "depTimeDiffMin": "", "name": "Linz/Donau Hbf", "arrTime": "08:24", "depTime": "", "platform": "1A-B", "stationIdx": "22", "arrTimeDiffMin": "1", "track": "" } ] }</code>
pandas' json_normalize function allows us to flatten nested objects into a tabular format:
<code class="python">import json with open('myJson.json') as data_file: data = json.load(data_file) df = pd.json_normalize(data, 'locations', ['date', 'number', 'name'], record_prefix='locations_')</code>
This results in a DataFrame with columns for each key in the nested "locations" object.
If flattening is not desired, you can use Pandas' grouping and concatenation capabilities:
<code class="python">df = pd.read_json("myJson.json") df.locations = pd.DataFrame(df.locations.values.tolist())['name'] df = df.groupby(['date', 'name', 'number'])['locations'].apply(','.join).reset_index()</code>
This approach concatenates the "locations" values as a comma-separated string for each unique combination of "date", "name", and "number".
By utilizing pandas' json_normalize and grouping/concatenation features, we can effectively handle nested JSON structures, allowing us to extract and manipulate data in a tabular format.
The above is the detailed content of How Do Pandas Handle Nested JSON Objects?. For more information, please follow other related articles on the PHP Chinese website!