Publication: Image-to-audio generation as a tool for stress relief
| dc.contributor.author | Girbacia Florin | |
| dc.contributor.author | Voinea Gheorghe Daniel | |
| dc.date.accessioned | 2025-09-23T19:04:11Z | |
| dc.date.issued | 2023-08-31 | |
| dc.description.abstract | Recent advancements in technology and machine learning, al low straightforwardly producing images, music, and text. This research evaluates the relaxation and calmness state induced by AI-generated audio from an image without human supervi sion. The image-to-music process consists of the following steps: (i) generate a text description of an input image using Blip2 vision-language pre-training (VLP) model (Li et al., 2023), (ii) improve the generated text with more descriptive details us ing OpenAI ChatGPT large model language for a better audio quality generation, (iii) synthesize audio output based on gen erated text description using AudioLDM text-to-audio model (Liu et al., 2023). The generated audio from a set of meditation images was tested on 17 participants (aged 26-43 years) as a stimulus for audio-guided relaxation. The level of relaxation and calmness (scaled from 1 to 1000) was evaluated using a portable single-channel dry electrode Neurosky Mindwave EEG system placed on the user’s forehead. The Lucid Scribe software can measure “Meditation” values corresponding to the user’s level of relaxation and calmness. The measured mean values of the participants were between the ranges of 400 – 800 (aver age=602,4), which corresponds to a slightly elevated relaxation level. | |
| dc.identifier.uri | https://repository.unitbv.ro/handle/123456789/2012 | |
| dc.language.iso | en_US | |
| dc.publisher | Perception, Sage Journals | |
| dc.title | Image-to-audio generation as a tool for stress relief | |
| dc.type | Article | |
| dspace.entity.type | Publication |
