# Image-to-text

## Overview

Bridging the gap between visual and textual content is a crucial step in unlocking the full potential of digital assets. The Image-to-text ML model is an advanced solution designed to do just that by providing general-purpose visual and language understanding.&#x20;

The model leverages state-of-the-art natural language processing and computer vision techniques to facilitate the understanding of images and textual data. When a user submits an image and an accompanying textual prompt (typically in the form of a question regarding the image), the model processes the visual and textual data, identifying objects, context and relationships within the image, and generates a relevant response.

Users can pose a wide range of questions, from object recognition and content analysis to more complex queries related to the image. The output is a properly constructed natural language answer that provides insights or information pertaining to the submitted data.

Our Image-to-text functionality is a versatile tool that gives customers the ability to extract insights, enrich content and enhance the overall management of digital assets.

## Typical use cases

The Image-to-text functionality is powerful enough to be applied across a spectrum of industries and domains, such as:

* **Content tagging** - Customers can automatically generate descriptive metadata for images, simplifying the organization and retrieval of digital assets.
* **E-commerce and product catalogs** - E-commerce platforms can utilize the model to answer user queries about product images, providing detailed information and enhancing the shopping experience.
* **Media and entertainment** - Media companies can analyze and describe scenes, characters and objects in images, aiding in content categorization and analysis.
* **Educational content** - Educational institutions can enhance e-learning platforms by automatically generating explanations and descriptions for visual content in course materials.

## API endpoints

Information about the specific API endpoints is available in an always up-to-date documentation, that can be accessed via the following link:

{% embed url="<https://documenter.getpostman.com/view/24652109/2s93z86NDt#88f5e9bf-7bce-42d2-aae0-9951fd8d4f5a>" %}

There, you can find detailed information about the API endpoints, together with all required request parameters, so you know how to interact with them.

## Example API responses

<table data-full-width="true"><thead><tr><th>Input image</th><th width="351.3333333333333">Input prompt</th><th>API response</th></tr></thead><tbody><tr><td><img src="https://2536208400-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fqsq2dR7Fjagf5eTifWZx%2Fuploads%2FqmRDwraJ0V49YpebjHJ5%2Fit_tesla.jpeg?alt=media&#x26;token=cf77b0b1-5640-44f2-815c-a690f0d02c8d" alt="" data-size="original"></td><td><pre class="language-json"><code class="lang-json">{
    "prompt": "Is the car moving fast?"
}
</code></pre></td><td><pre class="language-json"><code class="lang-json">{
  "status": "success",
  "version": "3.0.2",
  "image_to_text": "No, the car is not moving fast. It is parked in a grassy field.",
  "file_downloaded": "it_tesla.jpeg",
  "request_uuid": "5145e03f-78f0-4667-a867-72d955c2d4bd",
  "sha1": "cc9a4427c90ba5881e5be8d3521724cfcfc9f5b8"
}
</code></pre></td></tr><tr><td><img src="https://2536208400-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fqsq2dR7Fjagf5eTifWZx%2Fuploads%2FFD5DZrFaSr4uYPCh0aFL%2Fit_amsterdam.jpeg?alt=media&#x26;token=3dfd6348-bb31-4286-8f94-0a05fb69d5c8" alt="" data-size="original"></td><td><pre class="language-json"><code class="lang-json">{
    "prompt": "Describe the picture"
}
</code></pre></td><td><pre class="language-json"><code class="lang-json">{
  "status": "success",
  "version": "3.0.2",
  "image_to_text": "The picture is a black and white photograph of a canal in a city, featuring a bridge with a clock tower in the background. The canal is filled with water, and there are boats floating on it. The scene is set in a European city, with a large building and a clock tower visible in the background. The image captures the charm and beauty of the city's waterways and architecture.",
  "file_downloaded": "it_amsterdam.jpeg",
  "request_uuid": "cb456045-f63f-4b71-9d22-c9b8f9e6eb42",
  "sha1": "00ceb83e817872c6b2ef103478f62af7eac3dc8e"
}
</code></pre></td></tr></tbody></table>
