Since LLMs hit the scene, one of the very first use cases/demo was data analysis. At this stage, most of us have used ChatGPT, Claude or some other AI to generate a chart, but it feels like the jury is still out on the role AI will play in data visualization. Will we continue to default to point and click charting? Will AI generate 100% of charts? Or is the future hybrid, intermixing some AI generation and some point and click?
As a founder in the AI and data visualization space, I find this topic almost existential. Founded post-2022 (ie. after LLMs hit the scene in a real way), we have to make a decision about how we want to handle charting. Do we invest hours and hours of dev work (and funds) to develop charting functionality, or is that going away and a sunk cost for all tools built pre-LLMs? Or is the future hybrid? I recently came across Data Formulator, a research project, which explores some really interesting interactions between AI and traditional charting which revived this question for me.
In this post I’m going to take a look at where we are today for text-to-chart (or text-to-visualization) and where we might be headed in the future.
Like all things AI, this post likely won’t age very well. Some new piece of information or model will come out in the next 6 months and completely change how we think about this topic. Nonetheless, let’s take a look at the various states of data visualization and AI.
I won’t linger on this one too much since most readers know this one well. Open up Excel, Google Sheets or any other data tool built pre-2023 and you’ll have some form of this. Sometimes you click to add data to an axis, sometimes you drag and drop a field, but the concept is the same: You structure the data appropriately, then you press a few buttons to generate a chart.
In this paradigm, the vast majority of data cleaning and transformation happens prior to the charting. You can generally apply aggregation metrics like average, median, count, min, max etc. but all transformations are fairly rudimentary.
AI generated charts, or text-to-visualization, has only really existed since the advent of modern LLMs (if we dig around, there were experiments going on before then, but for all practical purposes we can focus on post-2022 LLMs).
OpenAI’s ChatGPT can generate non-interactive charts using Python, or a limited set of interactive charts using front end libraries (see OpenAI Canvas for some examples). As with all things OpenAI, Anthropic has its own analogous concepts and has Artifacts.
It’s worth noting here that AI-generated charts can be subdivided into two families: Purely Pythonic/back end generated charts or a mix of back end and front end.
ChatGPT and Claude alternate between the two. Training an AI to generate front end code, and integrating that front end code to create visualizations can be a lot more work than just relying on Python, using a library such as plotly, matplotlib, seaborn. On the other hand, front end libraries give the providers and users more control over the look and feel of the chart and interactivity. This is why LLM providers have their AI generate basic charts like bar charts, line charts or scatter plots, but anything more sophisticated like a Sankey diagram or waterfall chart falls back to Python.
A brief sidebar on Fabi.ai: Seeing as we’re a data analysis platform, we obviously offer charting, and despite some point-and-click charting, the vast majority of charts created by our users are AI-generated. So far, we’ve found that AI is remarkably good at generating charts, and by leveraging pure Python for charting, we’ve been able to train the AI to generate nearly any chart the user can dream up. So far, we’ve chosen that accuracy and flexibility over point-and-click functionality and custom UI designs.
Hybrid: AI generation in a point-and-click paradigm
This is where things start to get interesting in the debate of where AI text-to-visualization is headed. Fast forward 3 years from now, when someone is doing an analysis, if they use AI, will they let AI take 100% control, or will the AI be used in a mixed-environment where it can only edit the charts within the confines of certain point-and-click functionality.
To help make this picture more concrete, check out Data Formulator. This is a recent research project that attempts to offer a true mixed environment where AI can make certain edits, but the user can take over and use the point-and-click functionality as needed.
If we ask the question using a car analogy: Do you believe that in the future cars will not have a steering wheel, or do you believe that there will be a driver who will have to sit there and pay attention and occasionally take over, similar to how the Tesla self-driving functionality currently works?
The question of where things are headed is really important to us at Fabi.ai seeing as this could greatly influence certain decisions we make: Do we invest in integrating a charting library in the front end? Do we even bother with point-and-click functionality at all? As a growing, innovative company leading in the AI data analysis space, we need to be thinking about where the puck is going, not where it currently is.
So to answer this question, I’m going to use some first-principle thinking.
From the very first time I used AI and complaints arose around the speed and cost, I’ve believed that AI was going to continue getting better, faster and cheaper. Roughly speaking, the cost per token has fallen by 87% per year in the past few years. Not only has the cost gone down, but accuracy and speed have both gone up drastically as well.
In the next 10 years, we’re going to look back on 2024 LLMs the same way we look back on “supercomputers” from the 80s and 90s now that we all have supercomputers in our pockets everywhere we go.
All that to say, that any argument for or against any of the various charting approaches mentioned above cannot be that AI is too slow, expensive or inaccurate to generate charts. In other words, to believe that point-and-click charting will still exist in any way, shape or form, you have to believe that there is something about the user experience or the use case, that merits that functionality.
In my experience, when doing any form of data analysis that involves visualization, the hard part is not the charting. The hard part is getting the data cleaned and ready in the right format for the chart I’m trying to create.
Say I some user event data that has the following fields:
Now say I want to plot the average event duration by hour to measure latency. Before I can do any sort of charting in a spreadsheet or legacy charting tool, I have to:
But asking AI to do this, it takes care of all of that and the charting in just a second or two:
# Calculate the event duration in hours df['Event duration (hours)'] = (df['Event end datetime'] - df['Event start datetime']).dt.total_seconds() / 3600 # Extract the start hour from the start datetime df['Start hour'] = df['Event start datetime'].dt.hour # Group by start hour and calculate the average duration average_duration_by_hour = df.groupby('Start hour')['Event duration (hours)'].mean().reset_index() # Plot using Plotly fig = px.bar( average_duration_by_hour, x='Start hour', y='Event duration (hours)', title='Average Event Duration by Hour', labels={'Event duration (hours)': 'Average Duration (hours)', 'Start hour': 'Hour of Day'}, text='Event duration (hours)' ) # Show the figure fig.show()
And this was one of the simplest possible examples. Most times real-world data is much more complicated.
At this point, you likely have a sense of where I’m leaning. As long as you can get your dataset roughly right with all the data needed for an analysis, AI already does a remarkably good job at manipulating it and charting it in the blink of an eye. Fast forward one, two or three years from now, it’s hard to imagine that this won’t be the standard.
That said, there are some interesting hybrid approaches that are cropping up like Data Formulator. The case for this type of approach is that perhaps our hands and brains are able to move faster to quickly make tweaks than it takes us to think about what we want and explain it sufficiently clearly for the AI to do its job. If I ask “Show me total sales by month over the last 12 months” with the assumption that this should be a stacked bar chart broken out by region, it’s possible that we may find it easier to just move our mouse around. If that’s the case, the hybrid approach may be the most interesting: Ask the AI to take a first stab at it, then a few clicks and you have what you want.
The key to success for either a full AI approach or a hybrid approach is going to be in the user experience. Especially for the hybrid approach, the AI and human interactions have to work perfectly hand in hand and be incredibly intuitive to the user.
I’m excited to watch the space develop and where we head with text-to-visualization in the next 12 months.
The above is the detailed content of The future of AI data visualization. For more information, please follow other related articles on the PHP Chinese website!