Publication: Image-to-audio generation as a tool for stress relief
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Perception, Sage Journals
Abstract
Recent advancements in technology and machine learning, al
low straightforwardly producing images, music, and text. This
research evaluates the relaxation and calmness state induced
by AI-generated audio from an image without human supervi
sion. The image-to-music process consists of the following
steps: (i) generate a text description of an input image using
Blip2 vision-language pre-training (VLP) model (Li et al., 2023),
(ii) improve the generated text with more descriptive details us
ing OpenAI ChatGPT large model language for a better audio
quality generation, (iii) synthesize audio output based on gen
erated text description using AudioLDM text-to-audio model
(Liu et al., 2023). The generated audio from a set of meditation
images was tested on 17 participants (aged 26-43 years) as a
stimulus for audio-guided relaxation. The level of relaxation
and calmness (scaled from 1 to 1000) was evaluated using a
portable single-channel dry electrode Neurosky Mindwave EEG
system placed on the user’s forehead. The Lucid Scribe software
can measure “Meditation” values corresponding to the user’s
level of relaxation and calmness. The measured mean values of
the participants were between the ranges of 400 – 800 (aver
age=602,4), which corresponds to a slightly elevated relaxation
level.
