CSV - Process Local & Remote Files in Python

WBOY
Release: 2024-08-28 18:30:37
Original
303 people have browsed it

CSV - Process Local & Remote Files in Python

Hello coders!

This article presents anopen-source tool that is able to process local and remote CSV files, load and print the information, and later map the column to Django Types. Processing CSV files is usually required when the dataset becomes large, custom reports are unsupported by Excel or full data manipulation via data tables, and API is needed.
The current list of features can be further extended to map CSV files to database tables/models and fully generate dashboard web apps.

Source Code: CSV Processor part of the AppSeed Service (open-source)


Before we start explaining the code and the usage, let's summarize the tool features:

  • load local and remote files
  • print values
  • print detected column types
  • print the mapping types to a Django Model

The CSV parser can be executed via the CLI after cloning the project sources and making it usable as explained in the README. Once the installation is completed, we can call the CVS processor using this one-liner:

$ python manage.py tool_inspect_source -f media/tool_inspect/csv_inspect.json
Copy after login

The tool performs the following tasks:

  • validate the input
  • locate the CSV file (exit with error if not found)
  • loads the information and detects the column types
  • detects the Django column type
  • print the first 10 rows

The same can be applied to local and remote files. For instance, we can analyze the notorious Titanic.cvs by running this one-liner:

$ python manage.py tool_inspect_source -f media/tool_inspect/csv_inspect_distant.json # Output > Processing .\media\tool_inspect\csv_inspect_distant.json |-- file: https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv |-- type: csv Field CSV Type Django Types ----------- ---------- ------------------------------------------ PassengerId int64 models.IntegerField(blank=True, null=True) Survived int64 models.IntegerField(blank=True, null=True) Pclass int64 models.IntegerField(blank=True, null=True) Name object models.TextField(blank=True, null=True) Sex object models.TextField(blank=True, null=True) Age float64 models.FloatField(blank=True, null=True) SibSp int64 models.IntegerField(blank=True, null=True) Parch int64 models.IntegerField(blank=True, null=True) Ticket object models.TextField(blank=True, null=True) Fare float64 models.FloatField(blank=True, null=True) Cabin object models.TextField(blank=True, null=True) Embarked object models.TextField(blank=True, null=True) [1] - PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked [2] - 1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S [3] - 2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C [4] - 3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S [5] - 4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S [6] - 5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S [7] - 6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q [8] - 7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S [9] - 8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S [10] - 9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S ... (truncated output)
Copy after login

Here are the relevant parts of the tool:

Loads the informationand prior checks the source if is local or remote

print( '> Processing ' + ARG_JSON ) print( ' |-- file: ' + JSON_DATA['source'] ) print( ' |-- type: ' + JSON_DATA['type' ] ) print( '\n') tmp_file_path = None if 'http' in JSON_DATA['source']: url = JSON_DATA['source'] r = requests.get(url) tmp_file = h_random_ascii( 8 ) + '.csv' tmp_file_path = os.path.join( DIR_TMP, tmp_file ) if not file_write(tmp_file_path, r.text ): return JSON_DATA['source'] = tmp_file_path else: if not file_exists( JSON_DATA['source'] ): print( ' > Err loading SOURCE: ' + JSON_DATA['source'] ) return csv_types = parse_csv( JSON_DATA['source'] )
Copy after login

Analyze the headersand map the detected types to Django Types.

For the tabular view, Tabulate Library is used:

csv_types = parse_csv( JSON_DATA['source'] ) #pprint.pp ( csv_types ) table_headers = ['Field', 'CSV Type', 'Django Types'] table_rows = [] for t in csv_types: t_type = csv_types[t]['type'] t_type_django = django_fields[ t_type ] table_rows.append( [t, t_type, t_type_django] ) print(tabulate(table_rows, table_headers))
Copy after login

The last step is toPrint the CSV data:

csv_data = load_csv_data( JSON_DATA['source'] ) idx = 0 for l in csv_data: idx += 1 print( '['+str(idx)+'] - ' + str(l) ) # Truncate output .. if idx == 10: print( ' ... (truncated output) ' ) break
Copy after login

At this point, the code provides us access to the CSV information, data types, and the correspondent data types for Django. The mapping can be easily extended for any framework like Flask, Express, or NextJS.

The type mapping for Django is this one:

# Pandas Type django_fields = { 'int' : 'models.IntegerField(blank=True, null=True)', 'integer' : 'models.IntegerField(blank=True, null=True)', 'string' : "models.TextField(blank=True, null=True)", 'string_unique' : "models.TextField(blank=True, null=False, unique=True)", 'object' : "models.TextField(blank=True, null=True)", 'object_unique' : "models.TextField(blank=True, null=False, unique=True)", 'int64' : 'models.IntegerField(blank=True, null=True)', 'float64' : 'models.FloatField(blank=True, null=True)', 'bool' : 'models.BooleanField(null=True)', }
Copy after login

This tool is under active development and here are the next steps:

  • connect the tool to more data sources like remote/local databases (SQLite, MySql, PgSQL), JSON
  • Generate Models for any framework: FastAPI, Flask, Express, NextJS
  • Generate secure APIs on top
  • Generate server-side paginated DataTables using Tailwind/Bootstrap for styling

Thanks for reading!

For those interested in contributing, feel free to join the new AppSeed platform and connect with the community on Discord:

  • AppSeed - open-source platform for developers
  • AppSeed Community - 3k+ Discord members

The above is the detailed content of CSV - Process Local & Remote Files in Python. For more information, please follow other related articles on the PHP Chinese website!

source:dev.to
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!