问题:您有一个具有特定结构的文本文件,并且您需要根据以下模式创建一个 Pandas DataFrame:
Alabama[edit] Auburn (Auburn University)[1] Florence (University of North Alabama) Jacksonville (Jacksonville State University)[2] Livingston (University of West Alabama)[2] Montevallo (University of Montevallo)[2] Troy (Troy University)[2] Tuscaloosa (University of Alabama, Stillman College, Shelton State)[3][4] Tuskegee (Tuskegee University)[5] ... <State>[edit] <Region Name 1> <Region Name 2> ...
每个地区名称应重复州名称。
解决方案:
<code class="python">import pandas as pd # Read the text file into a DataFrame with the column name 'Region Name' df = pd.read_csv('filename.txt', sep=";", names=['Region Name']) # Extract the state names from the rows containing '[edit]' state_names = df[df['Region Name'].str.contains('\[edit\]')]['Region Name'] # Replace the region names with state names in the rows where the region name contains '[edit]' df['Region Name'] = df['Region Name'].str.replace('\[edit\]', state_names) # Replace the region names with state names in the rows where the region name contains '[number]' or '[characters]' df['Region Name'] = df['Region Name'].str.replace(' \(.+$', '') # Insert a new column 'State' with the state name for each region name df.insert(0, 'State', df['Region Name'].ffill()) # Drop the rows where the region name contains '[edit]' leaving the columns State and Region Name df = df[~df['Region Name'].str.contains('\[edit\]')].reset_index(drop=True) print(df)</code>
生成的 DataFrame 将具有以下输出:
State Region Name 0 Alabama Auburn 1 Alabama Florence 2 Alabama Jacksonville 3 Alabama Livingston 4 Alabama Montevallo 5 Alabama Troy 6 Alabama Tuscaloosa 7 Alabama Tuskegee 8 Alaska Fairbanks 9 Arizona Flagstaff 10 Arizona Tempe 11 Arizona Tucson
以上是如何从具有特定模式的文本文件创建 Pandas DataFrame?的详细内容。更多信息请关注PHP中文网其他相关文章!