The new Open AI GPT-4 Vision on ChatGPT: Bridging the Gap Between Text and Image Understanding

On September 25th, 2023, OpenAI expanded the capabilities of its advanced model, GPT-4, by introducing the ability to interpret images and speech alongside text. Dubbed GPT-4V(ision) or GPT-4V, this feature catapults GPT-4 into the realm of multimodal models, offering a richer interaction experience by allowing visual question answering (VQA). Users can upload images, inquire about them, and even have GPT-4V assess context and relationships within the visuals, marking a significant stride toward a holistic understanding of textual and visual data and broadening the horizons for real-world applications of GPT-4.

The GPT-4V feature sets a new benchmark in AI by successfully bridging the gap between text and image understanding. This leap is imperative in evolving towards more intuitive and human-like interactions with machines. Through experiments, GPT-4V has showcased its potential in accurately interpreting and answering questions about various visual inputs, from understanding humor in memes to identifying currency types and even providing a high-level analysis of a movie from a mere image. This impressive ability opens up endless possibilities and applications, making GPT-4V a groundbreaking stride in the journey of AI.

What can I do with GPT-4 Vision?

  1. Object Identification: GPT-4 Vision can accurately pinpoint objects and provide a detailed description by processing an image. Its recognition capabilities are spectacular, from everyday objects to complex machinery.
  2. Text Transcription from Images: Say goodbye to manually typing text from an image. This feature allows users to seamlessly extract written or printed text from images, a boon for those dealing with vast documents or old archives.
  3. Data Deciphering: Extract essential data from graphical content, such as charts, diagrams, and infographics, translating visual data into comprehensive text insights.
  4. Multi-condition Processing: GPT-4 Vision is adept at analyzing images under multiple conditions. Whether understanding a photograph taken under varying lighting or discerning details in a cluttered scene, its analytical prowess is unmatched.
  5. Educational Assistance: Students and educators can leverage GPT-4 Vision to analyze diagrams, illustrations, and visual aids, transforming them into detailed textual explanations, thus enhancing the learning process.
  6. Coding Enhancement: For the tech-savvy, GPT-4 Vision can interpret visual code structures or flowcharts and translate them into actual coding languages, making the development process swifter and more efficient.
  7. Design Understanding: For designers, understanding intricate patterns, layouts, or visual structures becomes easier. GPT-4 Vision can break down design elements and offer textual insights, aiding in refining and improving design concepts.

How To Get GPT-4 Vision Access on ChatGPT?

To access GPT-4 Vision, follow these steps:

  1. Visit the ChatGPT website and sign in or create an account.
  2. Click the “Upgrade to Plus” option.
  3. Click “Upgrade plan” under ChatGPT Plus in the pop-up.
  4. Enter your payment information, then click “Subscribe”.
  5. After payment, select “GPT-4” from the drop-down menu on your screen to use GPT-4 with ChatGPT.

Examples of Incredible Features You Can Do with GPT-4 Vision

Data Deciphering

It’s fantastic. I give this infographic with this prompt: “What does this tell you? Be very, very exact.

Follow the results ChatGPT gave me.

This infographic provides insights into what SEO professionals believe will be the most significant disruptors to SEO performance in 2024. Here are the exact details:

  1. Title and Context:
  • The title is “SEO Performance In 2024.”
  • SEO professionals were asked about what they thought would cause the most significant disruption to their SEO performance in the upcoming years.
  1. Most Significant Disruption In SEO Performance:
  • Generative AI is perceived to be the top disruptor, with 21.0% of professionals identifying it.
  • E-E-A-T & Trusted Sources comes second with 13.5%. There’s a note mentioning that an extra “E” was added to E-A-T in December 2022, which stands for “Experience.”
  • Automation (Tools) is the third most significant disruptor at 10.7%.
  • Other factors listed in descending order of their perceived disruption potential are:
    • Core Web Vitals (8.2%)
    • Google Discover (6.4%)
    • Social Search (6.3%)
    • Structured Data (5.9%)
    • Site Security (5.3%)
    • Google Passages (4.2%)
    • Knowledge Graph and Entities (3.9%)
    • Video Search (3.6%)
    • SERP Features (3.5%)
    • Mobile SEO (2.8%)
    • Voice Search (2.6%)
    • None/Other (2.2%).
  1. Visual Imagery:
  • There’s an image of a mobile phone with the ChatGPT logo on its screen.
  • Another visual element is a colorful wheel containing the letters E, E, A, T, and icons of various tech giants.
  1. Additional Text Information:
  • SEO professionals have many new technologies and updates to consider for the year. Specifically, tools like ChatGPT and Google’s “Generative Search Experience” are mentioned as generating considerable interest and concern among these professionals.

In summary, the infographic emphasizes the rapidly evolving landscape of SEO, with new technologies, especially Generative AI, becoming significant factors that professionals in the field need to be aware of and adapt to.

Another example of extracting information from infographics with the same prompt: “What does this tell you? Be very, very exact.

Certainly! The provided infographic conveys information about the usage patterns of social media among adults and teens. Here’s a detailed breakdown of the infographic’s contents:

  1. Title: The infographic is titled “Adults vs Teens: How We Use Social Media.”
  2. Background Information: The Pew Research Center released a study comparing the social media usage of adults and teens. The commentary mentions that teens today are inherently digital-savvy, having grown up in the digital era. However, when it comes to newer social platforms like Pinterest and Instagram, adults appear to have a slight edge in usage.
  3. Overall Social Media Usage:
  • Adults: 72% of adults use social media.
  • Teens: 81% of teens use social media.
  1. Platform-Specific Usage:
  • Facebook: Teens dominate the usage, with 94% of teens using it compared to 67% of adults.
  • Twitter: 26% of teens use Twitter, compared to 18% of adults.
  • Pinterest: 15% of adults use Pinterest, whereas only 1% of teens do.
  • Instagram: 13% of adults use Instagram, compared to 11% of teens.
  • Tumblr: 6% of adults use Tumblr, and 5% of teens use it.
  1. Age Breakdown of Social Media Users:
  • Ages 13-18: 89% use social media.
  • Ages 18-29: 81% use social media.
  • Ages 30-49: 78% use social media.
  • Ages 50-64: 60% use social media.
  • Age 65+: 43% use social media.
  1. Source: The data comes from Pew Research Center studies conducted in August 2013, May 2013, and February 2013.
  2. Sponsor/Creator: The infographic was created by “NextAdvisor,” which offers in-depth, independent research.

In summary, the infographic suggests that while teens use social media platforms like Facebook more dominantly, adults are more inclined towards newer platforms like Pinterest and Instagram. The percentage of users also declines as the age group increases, with the youngest age group (13-18) having the highest percentage of users and the oldest age group (65+) having the lowest.

Text Transcription

Text Transcription from Images is instrumental when extracting texts from images. I used this prompt: “Can you extract the text from this image?“.

Another example with text transcription:

Educational Assistance

Coding Enhancement

Create a live website from an image.

Multi-Condition Processing

This is awesome. Can you imagine with a mobile phone what can you do with this feature?

Identifying Objects

There are so many things you can do with this technology. Lock at the identifying objects.

What are the Potential Risks Of GPT-4V

OpenAI released a paper outlining potential risks associated with the use of GPT-4V, which include:

  • Privacy risks from identifying people in images or determining their location, potentially impacting companies’ data practices and compliance. The paper notes that GPT-4V has some ability to identify public figures and geolocate images.
  • Potential biases during image analysis and interpretation could negatively impact different demographic groups.
  • Safety risks from providing inaccurate or unreliable medical advice, specific directions for dangerous tasks, or hateful/violent content.
  • Cybersecurity vulnerabilities such as solving CAPTCHAs or multimodal jailbreaks.

Risks posed by the model have resulted in limitations, such as its refusal to offer analysis of images with people.

Screenshot from ChatGPT, September 2023

Screenshot from ChatGPT, September 2023

Overall, brands interested in leveraging GPT-4V for marketing must assess and mitigate these and other generative AI usage risks to use the technology responsibly and avoid negative impacts on consumers and brand reputation.


In conclusion, OpenAI’s GPT-4 Vision marks a monumental step towards harmonizing text and image understanding, paving the way for more intuitive and enriched interactions between humans and machines. As GPT-4V unfolds its potential, it not only broadens the horizon for real-world applications but also beckons a future where AI can perceive and interpret the world in a manner akin to human cognition, thereby significantly driving forward the frontier of what is achievable in the realm of artificial intelligence.

That’s it for today!


11 creative ways to use GPT-4’s vision features in ChatGPT (

First Impressions with GPT-4V(ision) (

7 Incredible Features of GPT-4 Vision (

GPT-4 With Vision: Examples, Limitations, And Potential Risks (