对pandas处理json数据的方法详解
1. pandas读取json数据
pandas可以使用read_json()函数读取json格式的数据,这个函数可以直接读取json文件、json字符串或者是一个url返回的json数据。
示例1:读取json文件
import pandas as pd
# 读取同级目录下的example.json
df = pd.read_json("./example.json")
print(df)
输出结果如下:
name age city
0 Tony Stark 45 New York
1 Steve Job 35 San Fran
2 NaN 78 London
示例2:读取url返回的json数据
import pandas as pd
# 从指定url获取json数据并读取为DataFrame
url = "http://example.com/data.json"
df = pd.read_json(url)
print(df)
2. pandas对json进行操作
2.1 json_normalize()
pandas提供了json_normalize()函数,可以将json格式的数据规整化为DataFrame格式,使得处理数据更加方便。
import pandas as pd
import json
# 定义待规整化的json数据
data = [
{"name": "Tony Stark", "age": 45, "city": "New York", "phone_numbers": [{"type": "home", "number": "1234567"}, {"type": "office", "number": "2345678"}]},
{"name": "Steve Job", "age": 35, "city": "San Fran", "phone_numbers": [{"type": "home", "number": "3456789"}, {"type": "office", "number": "4567890"}]},
]
# 将json数据规整化为DataFrame格式
df = pd.json_normalize(data, record_path=["phone_numbers"], meta=["name", "age", "city"])
print(df)
输出结果如下:
type number name age city
0 home 1234567 Tony Stark 45 New York
1 office 2345678 Tony Stark 45 New York
2 home 3456789 Steve Job 35 San Fran
3 office 4567890 Steve Job 35 San Fran
2.2 json.loads()
如果需要对json字符串进行操作,可以使用json.loads()将json字符串转为python对象。
import pandas as pd
import json
# 定义json字符串
json_data = '''
[
{"name": "Tony Stark", "age": 45, "city": "New York"},
{"name": "Steve Job", "age": 35, "city": "San Fran"},
{"name": null, "age": 78, "city": "London"}
]
'''
# 将json字符串转为python对象
data = json.loads(json_data)
# 将python对象规整化为DataFrame格式
df = pd.json_normalize(data)
print(df)
输出结果如下:
name age city
0 Tony Stark 45 New York
1 Steve Job 35 San Fran
2 NaN 78 London