the idea

Today we mostly interact with LLMs via chat. This might be cozy for asking questions, but could there be a faster way to get (read/receive) the answers?
Can there be a GPT that answers with visuals instead of text?

initial thoughts

on text information & LLMs

Large Language Models are about language. They revolutionised how many people interact with information, notably because they “speak” our human language.

Yet, looking at many presentations, articles, and otherwise documents conveying information, one can notice only some of it comes in text. Besides the text, we often see tables, charts, graphs, and various infographics. And photos, but that’s another topic.

  • Text: good when you need to transmit information where the concept of time is helpful (example: explaining more and more complex things; storytelling; manipulation; …)
  • Bullet points: good for listing, and, to a certain extent, for multi-level information
  • Tables: good for numbers, or otherwise information that you might need to visually search or compare
  • Charts: good for quickly conveying/understanding trends, comparisons, classification…
  • Schemas: good for a visual representation of things that can be harder to explain in text
  • Infographics: good for a quicker reading & understanding of simple information, often gives a high degree of freedom to the reader

We chat with ChatGPT and other LLMs in text, and we get answers in text, lists (bullet points) and sometimes tables. Now, it can sometimes include charts. But never with great visuals, schemas, infographics.

Link to original

the attempt

Y is an attempt to create a GPT that answers with infographics & visuals instead of text. This is possible thanks to Dall•E.

  • techs at the time of creation: GPT 4 + Dall•E 3
  • link to the GPT: Y ↗

observation

ChatGPT and Dall•E, in its current versions, struggle to create meaningful visuals. Dall•E certainly knows how they should look, i.e. the result looks beautiful, but fails to draw images with actual information.