Reading Nested JSON Files as Pandas DataFrames
When working with JSON data containing nested objects, it can be necessary to convert it into a more structured format for analysis or manipulation. Pandas provides useful tools for efficiently handling such data.
Scenario:
Consider a JSON file with the following structure:
<code class="json">{ "number": "", "date": "01.10.2016", "name": "R 3932", "locations": [ { ... }, { ... }, { ... } ] }</code>
Using json_normalize:
The json_normalize function allows you to flatten nested JSON into a DataFrame. For the given JSON, you can do the following:
<code class="python">import pandas as pd with open('myJson.json') as data_file: data = json.load(data_file) df = pd.json_normalize(data, 'locations', ['date', 'number', 'name'], record_prefix='locations_') print (df)</code>
This will create a DataFrame with the following columns:
Extending to Keep Nested Data:
If you prefer to keep the nested array intact, you can use read_json with the parsing parameter. This will parse the JSON into a DataFrame with the locations column as a list of dictionaries.
<code class="python">df = pd.read_json("myJson.json", orient='records', parsing = True)</code>
Alternatively, you can parse the locations column using the constructor parameter:
<code class="python">df = pd.read_json("myJson.json", orient='records', constructor=lambda x: pd.DataFrame(x['locations']))</code>
Concatenating Nested Values:
If you want to join the values in the locations column into a single string, you can use the groupby and apply functions:
<code class="python">df = df.groupby(['date', 'name', 'number'])['locations'].apply(','.join).reset_index()</code>
The above is the detailed content of How to Import and Process Nested JSON Data into Pandas DataFrames?. For more information, please follow other related articles on the PHP Chinese website!