Visual Experience Platform
Help CenterRelease NotesBlogWebsite
  • VXP - Visual Experience Platform
    • The VXP
      • Token
    • Settings
      • My Account
        • Profile
        • Support
      • Organisation
        • General Organisation Settings
        • Billing
          • Payment
          • Invoices
          • Plan
        • Users
          • Users
          • Roles
        • Teams
      • Project
        • Branding
        • Subscription
        • Analytics
          • Standard Dashboards
            • Multi tenants
            • User Dashboard
            • DAM Storage
            • Workflows Monitoring
            • Traffic Origin
            • Caching (Volumetry)
            • CDN performance
          • Custom Dashboard
        • Access
          • API keys
          • Security templates
          • OAuth2
  • Digital Asset Management (DAM)
    • Library
      • Assets
        • Asset Details window
          • Asset actions
            • Regional settings
            • Context menu ("..." button)
            • Add to My favorites (heart icon)
            • Edit media (image, video, etc)
              • Edit image
              • Edit video
              • Edit design template (coming soon)
            • Share
          • Asset information tabs
            • General
            • Metadata
            • Variations
            • Comments
            • Approvals
            • History
      • Folders
      • Collections
      • Labels
      • Products
      • My favorites
      • Help
      • Sharebox
      • Airbox
      • Search and Filters
        • Faceted Search
    • Plugins & Connectors
      • Plugins
        • Adobe Creative Cloud
        • Adobe Commerce (Magento)
        • Canva
        • Contentful
        • Contentstack
        • Directus
        • Drupal
        • Opencart
          • Opencart (v4)
        • Prestashop
        • Shopware
        • Storyblok
        • Strapi (v4)
        • Sylius
        • Uniform CMS
        • WordPress - VXP [Beta]
        • Wordpress
      • Connectors
        • Akeneo PIM
          • Akeneo Community PIM Connector
          • Akeneo Enterprise PIM App
        • Canva App
        • CI-Hub
        • Commercetools FaaS App
        • Hygraph app
        • Kontent.ai
        • OneTeg
        • Pabbly Connect
        • Prismic
        • Shopify / Shopify plus app
        • Zapier automation
    • Settings
      • Library
        • Components
        • Appearance
      • Metadata
        • Metadata Configuration
        • Assets
      • Tags
        • Configuration
        • Dictionary
      • Notifications
      • Automations
        • Post processing
        • Webhooks
        • Workflows
      • Storage
        • Providers
        • Upload
        • Video
        • Listing
        • Retrieval
        • Custom routing
  • Visual AI
    • Welcome
    • Visual AI
      • Images
        • Classification models
          • Auto-tagging
          • Brand detect
          • Dominant color extraction
          • Faces
            • Face analysis
            • Face clustering
          • Image quality
          • OCR
          • Number Plate recognition
          • Product type
          • Property classification
          • Scene Classification
          • Sport Classification
        • Generative AI models
          • Image-to-text
          • Plate blurring
          • Quality improvement (remove artifacts)
          • Remove background
          • Text-to-Image
        • Moderation models
          • Face count
          • NSFW - Not Safe For Work
          • Real estate authenticity
          • Watermark detection
      • Videos
        • Face detection
      • Search & find assets
        • Text Search
        • Similar Assets
  • Portals
    • Welcome
    • Creating a Portal
    • Editing a Portal
      • Pages
      • Sections
      • Design
      • Fonts
    • Managing a Portal
      • Settings
      • Access
      • Users
      • Labeling, Cloning, Archiving
    • Publishing a Portal
  • Dynamic Media Optimization (DMO)
    • Welcome
      • Responsive libraries
      • Native plugins
      • 360° view builder
      • Service status
    • Insights
      • Delivery
      • Optimization
      • Alerts
      • Logs
    • Transformations
      • Image optimization
        • Operations
          • Width and height
          • Prevent enlargement
          • Crop
            • Automatic Gravity Crop
            • Positionable Crop
            • Focal point Crop
            • Face Crop
            • Face hide
            • Aspect ratio crop
          • Fit
          • Cropfit
          • Bound
          • Boundmin
          • Cover
          • Device pixel ratio
          • Flip
          • Rotate
          • Trim
          • Rounded corners
        • Filters
          • Adjustment
            • Brightness
            • Contrast
            • Saturate
          • Color manipulation
            • Color overlay
            • Grayscale
            • Duotone
            • Sepia
            • Invert
          • Blur
          • Pixelate
          • Sharpen
        • Watermarking
          • Static watermark
          • Dynamic watermark
          • Text watermark
        • Image compression
          • Image formats
          • Optipress
      • Video optimization
        • Video API
          • Editing
            • Chapters
            • Combine
            • Trim
          • Optimizing
            • Convert
            • Compress
            • Transcode
        • On-the-fly-video optimization
      • Static content optimization
        • PDF to image
        • JS/CSS optimization
    • Invalidation
    • Settings
      • Asset Origin
        • AWS S3 or any other S3-compatible storage provider
        • Google Cloud storage (GCP)
        • Azure Blob storage
        • Own HTTP-based Storage
      • Images
        • Compression
        • Options
        • Watermark
      • Static content
      • Delivery
        • CNAME
        • URL format
        • Rules
        • Security
        • Caching
        • Default behaviors
  • Developers / Headless
    • Headless DAM
      • DAM APIs
        • API Authentication
        • API Reference
      • Command Line Interface (CLI)
      • Media Asset Widget(MAW)
        • Overview
        • Developer reference
        • V2 End-of-life
Powered by GitBook
LogoLogo

Quick links

  • Go to website
  • Legal Center

©2024 Scaleflex SAS

On this page
  • Overview
  • Typical use cases
  • API endpoints
  • Example API responses
Export as PDF
  1. Visual AI
  2. Visual AI
  3. Images
  4. Generative AI models

Image-to-text

Enhance content management with general-purpose visual and language understanding

PreviousGenerative AI modelsNextPlate blurring

Last updated 6 months ago

Overview

Bridging the gap between visual and textual content is a crucial step in unlocking the full potential of digital assets. The Image-to-text ML model is an advanced solution designed to do just that by providing general-purpose visual and language understanding.

The model leverages state-of-the-art natural language processing and computer vision techniques to facilitate the understanding of images and textual data. When a user submits an image and an accompanying textual prompt (typically in the form of a question regarding the image), the model processes the visual and textual data, identifying objects, context and relationships within the image, and generates a relevant response.

Users can pose a wide range of questions, from object recognition and content analysis to more complex queries related to the image. The output is a properly constructed natural language answer that provides insights or information pertaining to the submitted data.

Our Image-to-text functionality is a versatile tool that gives customers the ability to extract insights, enrich content and enhance the overall management of digital assets.

Typical use cases

The Image-to-text functionality is powerful enough to be applied across a spectrum of industries and domains, such as:

  • Content tagging - Customers can automatically generate descriptive metadata for images, simplifying the organization and retrieval of digital assets.

  • E-commerce and product catalogs - E-commerce platforms can utilize the model to answer user queries about product images, providing detailed information and enhancing the shopping experience.

  • Media and entertainment - Media companies can analyze and describe scenes, characters and objects in images, aiding in content categorization and analysis.

  • Educational content - Educational institutions can enhance e-learning platforms by automatically generating explanations and descriptions for visual content in course materials.

API endpoints

Information about the specific API endpoints is available in an always up-to-date documentation, that can be accessed via the following link:

There, you can find detailed information about the API endpoints, together with all required request parameters, so you know how to interact with them.

Example API responses

Input image
Input prompt
API response

{
    "prompt": "Is the car moving fast?"
}
{
  "status": "success",
  "version": "3.0.2",
  "image_to_text": "No, the car is not moving fast. It is parked in a grassy field.",
  "file_downloaded": "it_tesla.jpeg",
  "request_uuid": "5145e03f-78f0-4667-a867-72d955c2d4bd",
  "sha1": "cc9a4427c90ba5881e5be8d3521724cfcfc9f5b8"
}
{
    "prompt": "Describe the picture"
}
{
  "status": "success",
  "version": "3.0.2",
  "image_to_text": "The picture is a black and white photograph of a canal in a city, featuring a bridge with a clock tower in the background. The canal is filled with water, and there are boats floating on it. The scene is set in a European city, with a large building and a clock tower visible in the background. The image captures the charm and beauty of the city's waterways and architecture.",
  "file_downloaded": "it_amsterdam.jpeg",
  "request_uuid": "cb456045-f63f-4b71-9d22-c9b8f9e6eb42",
  "sha1": "00ceb83e817872c6b2ef103478f62af7eac3dc8e"
}
Scaleflex API for Digital Asset Management (DAM), Visual AI and Media OptimizationScaleflex API
Logo